24.11.2025
GPT-OSS 120B on AI Cube Pro: Run OpenAI's Open-Source Model Locally
With GPT-OSS 120B, OpenAI released their first open-weight model since GPT-2 in August 2025 – and it's impressive. The model achieves near o4-mini performance but...
Prevent data leakage by employees using ChatGPT & Co. – run your AI infrastructure locally, without cloud and without huge server racks!
Ready to use with pre-installed software
100% data sovereignty in your network
One-time investment instead of monthly fees
Europe-wide personal delivery & commissioning
Cloud services offer convenience – but also dependency. With an AI Cube, you retain full control over your data, models, and systems. Whether chatbots, RAG systems, or internal AI automation: Your sensitive data stays within your company, while computing power is directly on-site.
The AI Cube is owned by your company – no monthly fees, no token limits, no vendor lock-in. You decide which software runs, which models are used, and how your AI infrastructure grows.
Your models and data never leave your corporate network. Complete control over sensitive information.
No API limits, no external updates, no restrictions. You decide every aspect of your AI infrastructure.
Minimal latency through local inference. No delays from cloud connections.
No token or pay-per-use fees. One-time investment instead of ongoing costs.
The AI Cube is completely yours. No monthly subscriptions, no vendor dependency.
If desired, we handle operation, maintenance, and updates – you focus on your projects.
At 500 tokens/s continuous load, the AI Cube Pro pays for itself in under 4 months
Cloud API
Input: $0.25/1M • Output: $2.00/1M • 500 t/s output, 1,500 t/s input (3:1 ratio)
On-Premises
96 GB VRAM • 500+ t/s output • Unlimited usage
Use Open WebUI for a ChatGPT-like experience – completely local on your own hardware
The AI Cube can be delivered with Open WebUI based on customer requirements – an intuitive, user-friendly interface that enables a local ChatGPT-like experience. No cloud dependency, no API keys, no token limits – just you and your AI models.
Familiar and intuitive user interface for natural conversations with your local AI models
All data and conversations stay on your hardware – no connection to external servers required
Switch seamlessly between different AI models within the same interface
Unlimited usage without pay-per-use fees or monthly API costs
Our customers benefit from the locally operated AI solution – independent, secure and efficient. Here are two exemplary use cases.
RAG-based Document Research
A medium-sized law firm with numerous mandates and a large file archive found that research for precedent cases, briefs, and internal evidence was often very time-consuming – several hours per case. Additionally, sensitive client data was present that should not go to external cloud systems.
Drastically Reduced Research Time
Lawyers can argue and decide faster
Strengthened Knowledge Base
New employees access proven documents much faster
Knowledge Database for Medical Protocols
A clinic network with multiple locations must manage large amounts of medical protocols, SOPs, training materials and internal reports. Documentation was fragmented and difficult to access – especially when it came to quick decision support and quality checks.
Drastically Reduced Access Time
Relevant documents are accessed immediately
Strengthened Quality & Compliance
Employees at different locations consistently access the same knowledge pool
As a reseller, you offer local AI solutions – we deliver the hardware and service
Want to not only use local AI solutions yourself, but also resell them to your customers? As a reseller, you receive preferred terms, technical support, and fully pre-installed systems. For Enterprise and Pro customers, we deliver personally.
Direct margin advantages for resellers and integrators.
On request, we deliver the AI Cube completely neutral – ideal for system integrators who want to operate under their own brand.
Ollama, vLLM, Open WebUI – ready to use for your end customers.
Direct contact with us for questions about integration, RAG, models & hardware.
Custom models, RAG pipelines, GPU layouts, and network setups for specific customer requirements.
You can now offer your customers their own local AI solutions – without having to develop hardware yourself.
Contact us for a non-binding conversation about terms, technical details, and your individual requirements.
For our AI Cube Pro customers, we offer personal delivery and professional commissioning in Germany and the Netherlands. For Enterprise customers, this service is available Europe-wide.
Directly to your company premises or to your customers – personally
Professional installation and cabling on-site
Operating system, GPU drivers, container environment and security configuration (VPN, firewall, backup)
Performance test, stability check and GDPR compliance review before commissioning
For Enterprise & Pro Customers
Our on-site service ensures that your AI Cube runs optimally from the start – without you having to worry about installation or configuration.
Perfect for companies that value:
Our AI Cubes now use NVIDIA RTX PRO Blackwell GPUs – the latest generation with more VRAM, higher efficiency, and better performance. Benefit from the latest technology for your local AI infrastructure.
Proven configurations for every use case
Aufgrund von steigenden Speicherpreisen mussten wir unsere Preise anpassen, um weiterhin den gewohnten Support und Unterstützung gewährleisten zu können.
NVIDIA RTX PRO 4000 Blackwell
24 GB
Performance
46.9 TFLOPS
CUDA Cores
8.960
Recommended Use:
Chatbots, Code Assistance, Text Inference
50
token/sBatch Size 1
NVIDIA RTX PRO 6000 Blackwell
96 GB
Performance
125 TFLOPS
CUDA Cores
24.064
Recommended Use:
Large LLM Models, Training
200
token/sBatch Size 1
Multi-GPU Setups (e.g. H200, RTX Blackwell)
Configurable
Performance
Configurable
CUDA Cores
Configurable
Recommended Use:
Multi-GPU Workloads, High-Performance Training
Test different token speeds and see the difference
Experience the difference of various token rates
At 50 tok/s, generating takes:
1.0s
Chat response
(~50 tokens)
3.0s
(~150 tokens)
40.0s
Report
(~2000 tokens)
* Token rates vary depending on model size and query complexity
Your requirements are increasing — e.g. larger models, more concurrent users or more intensive AI workloads? With our trade-in program, you can easily exchange your existing AI Cube for a more powerful model — whether from Basic to Pro or from Pro to Custom.
Upgrade affordably
No complete new purchase — credit towards your new system
Planning security
Start small and upgrade as needed
Sustainable & secure
Secure data deletion and environmentally friendly recycling
Express interest
Contact us
Evaluation
We assess your device and determine a fair residual value
Receive credit
Discount on your new AI Cube Pro or Custom
With us you get not only powerful hardware, but also a competent partner for your entire AI infrastructure
From planning to implementation – we build your complete AI infrastructure and integrate the AI Cube seamlessly.
Tailored software solutions, RAG pipelines, APIs and integrations – perfectly matched to your requirements.
Together we develop new AI applications for your specific use cases – from idea to production readiness.
Continuous support, updates and optimizations – so your AI infrastructure always runs optimally.
Timo Wevelsiep & Robin Zins
CEOs of WZ-IT
A clinic network purchased the AI Cube Pro for local AI inference. We not only delivered the hardware, but also programmed a custom RAG pipeline that uses BookStack as a knowledge source and is integrated into Open WebUI. The result: employees can access medical protocols and SOPs in seconds – fully GDPR compliant and without cloud.
Let's realize your AI vision together
Ready to Use with Leading Open-Source Frameworks

Simple model management with one-command installation. Perfect for rapid prototyping and smaller projects.
$ ollama run llama3.1:70b
High-performance inference with PagedAttention for production workloads with high throughput.
$ vllm serve llama3.1:70bReal performance metrics of our AI Cubes with large open-source models – measured in tokens per second at batch size 1
| Modell | AI Cube Basic RTX PRO 4000 (24 GB) | AI Cube Pro RTX PRO 6000 (96 GB) |
|---|---|---|
GPT-OSS 20B ~20 Milliarden Parameter | 50 token/s | 200 token/s |
GPT-OSS 120B ~120 Milliarden Parameter | — Not enough VRAM | 150 token/s |
All values were measured with batch size 1 and represent inference speed for interactive use cases. Actual performance may vary depending on model configuration and prompt length. Higher batch sizes increase throughput for parallel requests.
More technical details on request
| Komponente | AI Cube Basic | AI Cube Pro |
|---|---|---|
| Graphics Card | NVIDIA RTX PRO 4000 Blackwell (24 GB GDDR7) | NVIDIA RTX PRO 6000 Blackwell (96 GB GDDR7) |
| Network | 1 GbE (10 GbE optional) | 1 GbE (10 GbE optional) |
| Dimensions & Weight | 292×185×372 mm (H×W×D), approx. 8 kg | 292×185×372 mm (H×W×D), approx. 8 kg |
| Certification | CE, RoHS, GDPR-compliant | CE, RoHS, GDPR-compliant |
| Security | Secure Boot, TPM 2.0, WireGuard VPN | Secure Boot, TPM 2.0, WireGuard VPN |
Find the Right Model for Your Business
All benefits at a glance
Cloud-based LLM APIs like OpenAI, Anthropic, or Google Gemini are convenient – but expensive and risky. At high volumes, costs can quickly spiral out of control: 1 million tokens per day via cloud APIs can easily cost €15,000 per month or more. With an AI Cube, you pay once from €4,299.90 and run unlimited inferences – no token fees, no monthly bills.
Additionally, on-premise LLM hosting gives you full control over your data. Sensitive information – customer data, internal documents, proprietary content – never leaves your corporate network. You're independent of API downtimes, price increases, or sudden service changes.
We jointly evaluate your requirements and use cases. In a free consultation, we determine which hardware configuration is optimal for your models and use cases.
Based on model size and requirements, we select the appropriate GPU configuration. We fully configure the system and install Ollama, vLLM, Open WebUI, and other software according to your preferences.
The Cube is delivered pre-installed and tested. After plugging it in, it can be operational within minutes. We support you in integrating it into your network.
You operate the Cube independently with full root access – or leave operation, maintenance, and updates to us. We remain your contact for extensions, support, and new requirements.
For sensitive data that cannot go to the cloud. Run internal chatbots, document analysis, or code assistants completely locally and GDPR-compliant.
Test and develop AI applications locally without cloud dependency. Ideal for rapid prototyping, model fine-tuning, and experimental projects.
Integrate AI capabilities directly into your existing infrastructure. No internet connection required, complete control over your data.
Tailored AI solutions for specific requirements
GDPR-compliant document research, contract analysis and client communication. Attorney-client privilege maintained.
Local AI for patient data, protocol analysis and medical knowledge databases.
Compliance-conform AI for risk assessment, document analysis and advisory support.
Your industry not listed? We create custom solutions for your requirements.
With AI Cubes, you retain full decision-making freedom: you can install your own models, migrate existing setups, or integrate software solutions of your choice – without license binding, API constraints, or external control. All components are open-source based and documented.
Answers to the most important questions about your local AI solution
Topics
The AI Cube is a plug-and-play AI hardware for businesses — ideal for running LLMs, transcriptions, or data-intensive workloads locally in your own network, without cloud dependency and fully GDPR-compliant.
We offer standard setups (AI Cube Basic / Pro) as well as custom systems: multi-GPU, large VRAM cards, rack-mount servers, or clusters with NVLink — depending on model size, user count, and workload.
The AI Cube Basic requires approx. 150–250W, the Pro approx. 350–450W. Both run on standard 230V and don't require special power supply. Individual builds are assessed separately.
Yes — since you own the hardware, you can replace or expand RAM, storage (NVMe/SSD), or GPU yourself at any time. We're happy to assist if needed — but you have full control over your hardware.
Yes — the AI Cube runs entirely locally. There's no communication with external cloud servers, no data transfer outside your network. This ensures maximum data sovereignty and GDPR compliance.
The AI Cube stores data exclusively locally. With TPM 2.0, Secure Boot, and optionally encrypted SSD/NVMe, we ensure maximum protection. For sensitive data, we recommend encrypted filesystem and restrictive access control.
Yes — on request, we deliver the AI Cube as plug-and-play: with pre-installed software, GPU drivers, and basic configuration. After powering on, you can start working with AI models immediately — no complex setup required.
Yes — for AI Cube Pro, we offer personal delivery and professional on-site setup in Germany and the Netherlands. For enterprise customers, this service is available Europe-wide.
Our technician delivers the AI Cube, connects it to power and network, and configures VPN/firewall on request. This is followed by a functional test and optional onboarding. We also offer training and documentation.
The AI Cube supports common open-source frameworks and models — e.g., Llama, Mistral, Qwen, Gemma, DeepSeek, multimodal and transcription models. The pre-installed environment allows quick start.
Yes — depending on hardware configuration, multiple models can run in parallel. For intensive or parallel use, we recommend more powerful or customized hardware configurations.
Beyond chatbots and RAG systems: audio/video transcription, document indexing, data processing, code assistance, automation of internal processes — ideal for privacy-critical or compliance-relevant scenarios.
The entry configuration (AI Cube Basic) starts at approx. €4,299.90 (excl. VAT). Compared to cloud solutions, you save long-term — no ongoing token or API costs, no vendor lock-in.
When data privacy, control, consistent performance, and long-term planning are important — e.g., with sensitive data, compliance requirements, or frequent AI use.
Yes. We support migration: data and model transfer, re-setup on your on-prem system — without external dependency.
The AI Cube is owned by your company (one-time payment from €4,299.90 excl. VAT), while our AI servers are rented (from €499/month excl. VAT with managed service). The Cube is suitable for long-term planning, rented servers for flexible projects.
Our pre-configured models are designed to be low-maintenance. If needed, we offer managed service: regular security patches, monitoring, updates — keeping your infrastructure stable and secure.
Yes — the AI Cube is compatible with common corporate networks. On request, we configure VPN, firewall, and connectivity so the Cube integrates securely and seamlessly.
In addition to hardware, we optionally offer managed service, maintenance, updates, monitoring, and support — especially for enterprise customers. Hardware, software, and support from a single source.
On request, we provide a backup concept: regular snapshots, redundant or external storage options, remote backup — keeping you protected even in case of hardware failure.
We deliver Europe-wide — with special focus on Germany, the Ruhr area, and the Netherlands. This means short delivery times, regional service, and direct support.
Our AI Cubes are custom-built in our workshop in Dortmund. Each AI Cube is an individual configuration optimized for hardware and use case.
Yes — we offer a reseller program with attractive purchasing conditions, technical support, and optional white-label license. Ideal for system integrators and IT service providers.
More questions? We are happy to help!
Still have questions? Contact us!Discover Our Other AI Services
24.11.2025
With GPT-OSS 120B, OpenAI released their first open-weight model since GPT-2 in August 2025 – and it's impressive. The model achieves near o4-mini performance but...
09.11.2025
In times of rising cloud costs, data sovereignty challenges and vendor lock-in, the topic of local AI inference is becoming increasingly important for companies. With...
08.11.2025
The use of Large Language Models (LLMs) such as GPT-4, Claude or Llama has evolved from experimental applications to mission-critical tools in recent years. However,...
CTO, EVA Real Estate, UAE
"I recently worked with Timo and the WZ-IT team, and honestly, it turned out to be one of the best tech decisions I have made for my business. Right from the start, Timo took the time to walk me through every step in a simple and calm way. No matter how many questions I had, he never rushed me. The results speak for themselves. With WZ-IT, we reduced our monthly expenses from $1,300 down to $250. This was a huge win for us."
Data Manager, ARGE, Germany
"With Timo and Robin, you're not only on the safe side technically - you also get the best human support! Whether it's quick help in everyday life or complex IT solutions: the guys from WZ-IT think along with you, act quickly and speak a language you understand. The collaboration is uncomplicated, reliable and always on an equal footing. That makes IT fun - and above all: it works! Big thank you to the team! (translated) "
CEO, Aphy B.V., Netherlands
"WZ-IT manages our Proxmox cluster reliably and professionally. The team handles continuous monitoring and regular updates for us and responds very quickly to any issues or inquiries. They also configure new nodes, systems, and applications that we need to add to our cluster. With WZ-IT's proactive support, our cluster and the business-critical applications running on it remain stable, and high availability is consistently ensured. We value the professional collaboration and the noticeable relief it brings to our daily operations."
CEO, Odiseo Solutions, Spain
"Counting on WZ-IT team was crucial, their expertise and solutions gave us the pace to deploy in production our services, even suggesting and performing improvements over our configuration and setup. We expect to keep counting on them for continuous maintenance of our services and implementation of new solutions."
Timo and Robin from WZ-IT set up a RocketChat server for us - and I couldn't be more satisfied! From the initial consultation to the final implementation, everything was absolutely professional, efficient, and to my complete satisfaction. I particularly appreciate the clear communication, transparent pricing, and the comprehensive expertise that both bring to the table. Even after the setup, they take care of the maintenance, which frees up my time enormously and allows me to focus on other important areas of my business - with the good feeling that our IT is in the best hands. I can recommend WZ-IT without reservation and look forward to continuing our collaboration! (translated)
We have had very good experiences with Mr. Wevelsiep and WZ-IT. The consultation was professional, clearly understandable, and at fair prices. The team not only implemented our requirements but also thought along and proactively. Instead of just processing individual tasks, they provided us with well-founded explanations that strengthened our own understanding. WZ-IT took a lot of pressure off us with their structured approach - that was exactly what we needed and is the reason why we keep coming back. (translated)
Robin and Timo provided excellent support during our migration from AWS to Hetzner! We received truly competent advice and will gladly return to their services in the future. (translated)
WZ-IT set up our Jitsi Meet Server anew - professional, fast, and reliable. (translated)
Whether a specific IT challenge or just an idea – we look forward to the exchange. In a brief conversation, we'll evaluate together if and how your project fits with WZ-IT.
Timo Wevelsiep & Robin Zins
CEOs of WZ-IT














