24.11.2025
GPT-OSS 120B on AI Cube Pro: Run OpenAI's Open-Source Model Locally
With GPT-OSS 120B, OpenAI released their first open-weight model since GPT-2 in August 2025 – and it's impressive. The model achieves near o4-mini performance but...
The Managed AI Server Service allows you to concentrate fully on the development and deployment of your AI applications. We take over the complete management of your AI server infrastructure - from the initial setup to continuous monitoring and technical support.
With our managed service, you get powerful NVIDIA RTX GPU servers in German data centers, managed by experienced DevOps engineers. No vendor lock-in, transparent pricing, and full control over your data and models.
Ideal for businesses and developers who want to run AI workloads in production without having to build their own hardware and infrastructure teams. From training large models to deploying high-performance inference services.
We support both leading open-source frameworks for AI inference. Each has its strengths – we help you choose the right one for your use case.
The user-friendly framework for easy deployment and management of Large Language Models
Prototypes, chatbots, internal tools, RAG applications with moderate requirements
The high-performance framework for production-grade AI inference with maximum throughput optimization
Production APIs with high traffic, batch processing, multi-user applications, performance-critical services
| Ollama | vLLM | |
|---|---|---|
| Ease of Use | Very easy | Complex |
| Throughput | Good | Excellent (up to 24x) |
| Latency Under Load | Increases linearly | Stays low |
| Best For | Development, prototypes, moderate workloads | Production, high traffic, performance-critical |
Start with Ollama for fast development and prototyping. When you have high requirements for throughput and scaling or need production-grade performance, migrate to vLLM. We fully support both frameworks and help with migration.
We handle all operational tasks around your AI server infrastructure
Upon request: Complete setup of your AI servers including operating system, GPU drivers, CUDA, Docker, Kubernetes or your preferred orchestration. Installation and configuration of AI frameworks like PyTorch, TensorFlow, Ollama or vLLM according to your requirements.
24/7 monitoring of all critical system metrics: GPU utilization, temperature, memory, network, and application performance. Automated alerts for anomalies and proactive intervention before problems occur. Grafana dashboards with real-time insight into your infrastructure.
Regular security updates for operating system, GPU drivers, and all installed components. Automated patch management processes with rollback capabilities. Firewall configuration, SSH hardening, and proactive vulnerability scans.
Automated backups of your configurations, models, and data available (optional). Secure storage in geographically separated data centers. Tested recovery processes with defined Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).
Direct access to experienced DevOps and AI infrastructure experts via email, phone, or ticket system. Fast response times according to agreed SLAs. Support with performance optimization, scaling, and troubleshooting of AI workloads.
Guaranteed availability from 99.5% (Basic) to 99.9% (Premium). Defined response and resolution times for different priority levels. Monthly SLA reports and transparent incident documentation.
High-performance hardware in German data centers
We use professional NVIDIA RTX GPUs. The AI Server Basic with RTX 4000 SFF Ada (20GB VRAM) is ideal for inference and medium-sized models. The AI Server Pro with RTX 6000 Blackwell Max-Q (96GB GDDR7 VRAM) enables training and operation of very large models like Llama-3-70B or DeepSeek-R1-32B.
All servers are located in high-security German data centers with ISO 27001 certification. Full GDPR compliance and data sovereignty. Redundant power supply, cooling, and physical security measures according to the highest standards.
Direct connection to European internet backbones with low latencies. 1 Gbit/s included, 10 Gbit/s optionally available. DDoS protection and redundant network paths for maximum reliability.
NVMe SSD storage for maximum I/O performance during model loading and data preprocessing. Optional connection to object storage (S3-compatible) for large datasets and model repositories. Automated backup systems with encrypted storage.
Clear prices without hidden costs – monthly cancellable
Fully managed AI server with NVIDIA RTX 4000 SFF Ada for inference and medium-sized models
excl. VAT
Fully managed AI server with NVIDIA RTX 6000 Blackwell Max-Q for training and large models
Our Managed AI Server Service starts from €499 per month for the AI Server Basic with full management service. This investment includes hardware, operations, monitoring, updates, and support – all from one source without additional personnel costs for system administration.
The managed service includes: NVIDIA RTX GPU server (hardware), data center costs, power, network traffic (up to 20TB/month), 24/7 monitoring, security updates, and system maintenance. Setup & installation are available as optional services.
Monthly cancellation period, complete export of your data and configurations possible at any time. You retain full control over your AI models and training data. If needed, we support you in migrating to other infrastructures.
As a Managed Service customer at WZ-IT, you have access to our exclusive portal: Monitor your infrastructure in real-time, schedule maintenance, request quotes, and get direct support – all in one central location.

All servers are located in German data centers with full GDPR compliance. Your AI models and training data remain in Germany. No data transfers to third countries, maximum data protection for your sensitive AI workloads.
Years of experience with open-source AI stacks: Ollama, vLLM, PyTorch, TensorFlow, CUDA optimization. We know the pitfalls of GPU drivers, model quantization, and performance tuning. Benefit from best practices from numerous successful AI projects.
No anonymous ticket support: You have direct contacts who know your infrastructure and your requirements. Fast decision-making, pragmatic solutions, and true partnership instead of call center mentality. On-site meetings possible if needed.
Full root access to your servers, export of all data possible at any time, monthly cancellation. We use standard technologies without proprietary dependencies. Your investment in code and configuration remains portable and future-proof.
Start with one server and grow as needed. Easy expansion with additional GPU nodes, storage, or network capacity. We advise you on optimal sizing strategies and support implementation of auto-scaling concepts.
Significantly cheaper than comparable cloud GPU instances for continuous operation. No unexpected costs from storage or traffic fees. Fixed monthly prices enable precise budget planning. ROI already after a few months compared to self-operated hardware.
| Managed Service | Unmanaged Server | |
|---|---|---|
| Setup & Configuration | Fully by us | Self-service |
| Monitoring | 24/7 proactive | Self-implementation required |
| Updates | Automated with testing | Manual required |
| Support | Fast expert support | No support |
| Time Investment | Focus on development | Time for admin tasks |
We support all common frameworks: PyTorch, TensorFlow, Ollama, vLLM, LangChain, Hugging Face Transformers, and many more. We install and configure the tools you need according to your specifications.
Yes, you get full root access via SSH. You can install your own software or adjust configurations at any time. We take care of basic system maintenance while you retain full control over your applications.
After contract signing, we can typically provision, configure, and hand over your Managed AI Server within 3-5 business days. Express setup in 24 hours is available for an additional fee.
We handle complete hardware management. In case of defects, the data center performs quick replacement, and your data is restored from backups. You don't need to worry about anything – we just keep you informed about the status.
Let's discuss your requirements and create a customized offer
24.11.2025
With GPT-OSS 120B, OpenAI released their first open-weight model since GPT-2 in August 2025 – and it's impressive. The model achieves near o4-mini performance but...
09.11.2025
In times of rising cloud costs, data sovereignty challenges and vendor lock-in, the topic of local AI inference is becoming increasingly important for companies. With...
09.11.2025
More and more companies are considering running Large Language Models (LLMs) on their own hardware rather than via cloud APIs. The reasons for this are...
Whether a specific IT challenge or just an idea – we look forward to the exchange. In a brief conversation, we'll evaluate together if and how your project fits with WZ-IT.
Timo Wevelsiep & Robin Zins
CEOs of WZ-IT



