24.11.2025
GPT-OSS 120B on AI Cube Pro: Run OpenAI's Open-Source Model Locally
With GPT-OSS 120B, OpenAI released their first open-weight model since GPT-2 in August 2025 – and it's impressive. The model achieves near o4-mini performance but...
High-end AI inference with NVIDIA RTX PRO 6000 Blackwell – perfect for large LLMs, RAG systems with millions of documents, training & fine-tuning and models up to 120B+ parameters.
230 V • 292×185×372 mm • Mini-ITX
Enterprise hardware with 96 GB VRAM for maximum performance
96 GB GDDR7 VRAM
Sufficient for models up to 120B+ parameters
24,064 CUDA Cores
Maximum performance for large LLMs
Run AI assistants for customer service or internal knowledge bases – completely local and GDPR-compliant.
Use models like Qwen or DeepSeek for code completion, review and documentation – without sending your codebase to the cloud.
Llama 3.1 (7B-13B), Gemma 3, Mistral 7B, Phi-4 and many other models.
Run models like Llama 3.1 70B, DeepSeek-R1 or GPT-OSS 120B completely locally.
Process knowledge bases with thousands to millions of documents.
Run multiple models in parallel – depending on hardware configuration.
Enterprise performance of AI Cube Pro with large open-source models
~20 Milliarden Parameter
Batch Size 1
~120 Milliarden Parameter
Batch Size 1
All values were measured with batch size 1 and represent inference speed for interactive use cases. Actual performance may vary depending on model configuration and prompt length. Higher batch sizes increase throughput for parallel requests.
Use Open WebUI for a ChatGPT-like experience – completely local on your own hardware
The AI Cube can be delivered with Open WebUI based on customer requirements – an intuitive, user-friendly interface that enables a local ChatGPT-like experience. No cloud dependency, no API keys, no token limits – just you and your AI models.
Familiar and intuitive user interface for natural conversations with your local AI models
All data and conversations stay on your hardware – no connection to external servers required
Switch seamlessly between different AI models within the same interface
Unlimited usage without pay-per-use fees or monthly API costs
For our AI Cube Pro customers, we offer personal delivery and professional commissioning in Germany and the Netherlands. For Enterprise customers, this service is available Europe-wide.
Directly to your company premises or to your customers – personally
Professional installation and cabling on-site
Operating system, GPU drivers, container environment and security configuration (VPN, firewall, backup)
Performance test, stability check and GDPR compliance review before commissioning
For Enterprise & Pro Customers
Our on-site service ensures that your AI Cube runs optimally from the start – without you having to worry about installation or configuration.
Perfect for companies that value:
125 TFLOPS and 96 GB VRAM – the most powerful Blackwell GPU for local inference.
Even the largest models and extensive RAG systems remain completely in your network.
With 96 GB VRAM you're equipped for the coming years – even for future model generations.
Compare the two AI Cube models
From €13,599.90
excl. VAT
From €4,299.90
excl. VAT
View Basic ModelHow a private clinic uses AI Cube Pro for medical knowledge bases
A network of private psychiatric clinics needed an AI solution for the central knowledge base with medical protocols, SOPs and training materials. Sensitive patient data could not go to the cloud.
Immediate access to relevant protocols
Cross-location knowledge consistency
Complete GDPR compliance
| Graphics Card | NVIDIA RTX PRO 6000 Blackwell (96 GB GDDR7) |
| Network | 1 GbE (10 GbE optional) |
| Dimensions & Weight | 292×185×372 mm (H×W×D), approx. 8 kg |
| Certification | CE, RoHS, GDPR-compliant |
| Security | Secure Boot, TPM 2.0, WireGuard VPN |
Get free consultation
24.11.2025
With GPT-OSS 120B, OpenAI released their first open-weight model since GPT-2 in August 2025 – and it's impressive. The model achieves near o4-mini performance but...
09.11.2025
In times of rising cloud costs, data sovereignty challenges and vendor lock-in, the topic of local AI inference is becoming increasingly important for companies. With...
08.11.2025
The use of Large Language Models (LLMs) such as GPT-4, Claude or Llama has evolved from experimental applications to mission-critical tools in recent years. However,...
Whether a specific IT challenge or just an idea – we look forward to the exchange. In a brief conversation, we'll evaluate together if and how your project fits with WZ-IT.
Timo Wevelsiep & Robin Zins
CEOs of WZ-IT


