24.11.2025
GPT-OSS 120B on AI Cube Pro: Run OpenAI's Open-Source Model Locally
With GPT-OSS 120B, OpenAI released their first open-weight model since GPT-2 in August 2025 – and it's impressive. The model achieves near o4-mini performance but...
Your entry into local AI inference with NVIDIA RTX PRO 4000 Blackwell – perfect for chatbots, code assistance and models up to 20B parameters.
230 V • 292×185×372 mm • Mini-ITX
Enterprise hardware with 24 GB VRAM in compact Mini-ITX format
24 GB GDDR7 VRAM
Sufficient for models up to 20B parameters (quantized)
8,960 CUDA Cores
Fast real-time inference
Run AI assistants for customer service or internal knowledge bases – completely local and GDPR-compliant.
Use models like Qwen or DeepSeek for code completion, review and documentation – without sending your codebase to the cloud.
Llama 3.1 (7B-13B), Gemma 3, Mistral 7B, Phi-4 and many other models.
Analyze documents, contracts and reports with AI – completely local and confidential.
Process knowledge bases with thousands to millions of documents.
Run multiple models in parallel – depending on hardware configuration.
Real-world performance of AI Cube Basic with open-source models
~20 Milliarden Parameter
Batch Size 1
All values were measured with batch size 1 and represent inference speed for interactive use cases. Actual performance may vary depending on model configuration and prompt length. Higher batch sizes increase throughput for parallel requests.
Use Open WebUI for a ChatGPT-like experience – completely local on your own hardware
The AI Cube can be delivered with Open WebUI based on customer requirements – an intuitive, user-friendly interface that enables a local ChatGPT-like experience. No cloud dependency, no API keys, no token limits – just you and your AI models.
Familiar and intuitive user interface for natural conversations with your local AI models
All data and conversations stay on your hardware – no connection to external servers required
Switch seamlessly between different AI models within the same interface
Unlimited usage without pay-per-use fees or monthly API costs
For our AI Cube Pro customers, we offer personal delivery and professional commissioning in Germany and the Netherlands. For Enterprise customers, this service is available Europe-wide.
Directly to your company premises or to your customers – personally
Professional installation and cabling on-site
Operating system, GPU drivers, container environment and security configuration (VPN, firewall, backup)
Performance test, stability check and GDPR compliance review before commissioning
For Enterprise & Pro Customers
Our on-site service ensures that your AI Cube runs optimally from the start – without you having to worry about installation or configuration.
Perfect for companies that value:
All models and data remain in your network. No cloud dependency, no data transmission to third parties.
One-time investment instead of monthly token fees. The acquisition pays for itself within a few months.
Minimal latency through local inference. No waiting times from cloud connections.
Root access, free model choice, no API limits. You decide what runs.
Compare the two AI Cube models
From €4,299.90
excl. VAT
From €13,599.90
excl. VAT
View Pro ModelHow a law firm uses AI Cube for confidential research
A medium-sized law firm needed an AI solution for internal document research. Sensitive client data could not go to the cloud.
80% faster research
Complete data control
ROI within 6 months
| Graphics Card | NVIDIA RTX PRO 4000 Blackwell (24 GB GDDR7) |
| Network | 1 GbE (10 GbE optional) |
| Dimensions & Weight | 292×185×372 mm (H×W×D), approx. 8 kg |
| Certification | CE, RoHS, GDPR-compliant |
| Security | Secure Boot, TPM 2.0, WireGuard VPN |
Get free consultation
24.11.2025
With GPT-OSS 120B, OpenAI released their first open-weight model since GPT-2 in August 2025 – and it's impressive. The model achieves near o4-mini performance but...
09.11.2025
In times of rising cloud costs, data sovereignty challenges and vendor lock-in, the topic of local AI inference is becoming increasingly important for companies. With...
08.11.2025
The use of Large Language Models (LLMs) such as GPT-4, Claude or Llama has evolved from experimental applications to mission-critical tools in recent years. However,...
Whether a specific IT challenge or just an idea – we look forward to the exchange. In a brief conversation, we'll evaluate together if and how your project fits with WZ-IT.
Timo Wevelsiep & Robin Zins
CEOs of WZ-IT


