24.11.2025
GPT-OSS 120B on AI Cube Pro: Run OpenAI's Open-Source Model Locally
With GPT-OSS 120B, OpenAI released their first open-weight model since GPT-2 in August 2025 – and it's impressive. The model achieves near o4-mini performance but...
When choosing the right server infrastructure for artificial intelligence, the distinction between training and inference is crucial.
While training AI models requires enormous computing resources over long periods of time, inference – i.e. the practical use of trained models – requires above all fast response times and efficient throughput.
The right decision can save significant costs while optimizing the performance of your AI applications.
Powerful Hardware for Model Development
A training server is designed for the computationally intensive task of machine learning training. Here, neural networks are fed with large amounts of data in order to recognize and learn patterns.
The training process can take days to weeks and requires maximum computing power to optimize model parameters.
48 GB+ for large models and batch processing
TFLOPS and Tensor Cores for faster training runs
128 GB+ RAM for large datasets
NVMe SSD for data access during training
Optimized for Fast Production Deployments
An inference server uses pre-trained models to deliver predictions and results in real time. The focus here is on speed and efficiency.
Inference requires significantly fewer resources than training, as only forward passes through the network are computed – without backpropagation or weight updates.
20-24 GB sufficient for most models
Fast response times for end users
Process many parallel requests simultaneously
Quantization and pruning for efficiency
The Most Important Differences at a Glance
| Aspect | Training Server | Inference Server |
|---|---|---|
Main Purpose | Develop & train models | Deploy models in production |
GPU Recommendation | RTX 6000 Blackwell Max-Q (96 GB) | RTX 4000 Ada (20 GB) |
VRAM Requirements | 96 GB for large models | 20-24 GB sufficient |
Computing Power | 1457 TFLOPS (Maximum) | 307 TFLOPS (Optimal) |
Time Characteristics | Hours to weeks | Milliseconds to seconds |
Monthly Cost | €1,549.90 | €499.90 |
Scaling | Vertical (more power) | Horizontal (more instances) |
Workload Type | Batch processing | Request/Response |
Optimization Goal | Training speed | Latency & throughput |
Develop & train models
Deploy models in production
RTX 6000 Blackwell Max-Q (96 GB)
RTX 4000 Ada (20 GB)
96 GB for large models
20-24 GB sufficient
1457 TFLOPS (Maximum)
307 TFLOPS (Optimal)
Hours to weeks
Milliseconds to seconds
€1,549.90
€499.90
Vertical (more power)
Horizontal (more instances)
Batch processing
Request/Response
Training speed
Latency & throughput
The Right Hardware for Every Use Case
Perfect for inference and production deployments
For training and large models
Combine training and inference servers for optimal workflows: train on the Pro server and deploy on cost-effective Basic servers for production.
Answer These Questions for the Right Choice
You need maximum computing power and lots of VRAM for training new models or fine-tuning.
You use existing, pre-trained models for production applications and APIs.
Models like Llama 3.1 70B or larger require 48 GB+ VRAM, even for inference.
Most production models like Gemma 27B, DeepSeek 32B run perfectly on 20 GB.
During development, you need maximum flexibility and power for experiments.
In production, cost efficiency with consistent performance matters.
For APIs, chatbots, and interactive applications, an optimized inference server is ideal.
For non-time-critical analyses, you can leverage the power of the training server.
Start with an inference server and existing models. Fast time-to-market, low costs.
Scale horizontally with multiple inference servers for higher capacity and fault tolerance.
Combine training servers for development with multiple inference servers for production. Optimal price-performance ratio.
Training server for model development and experiments. Optional inference servers for demos and testing.
Both server types offer full control over your data. Server location Germany, GDPR compliant.
Upon request, we take care of installation, configuration, and maintenance – for both training and inference (optional).
Start with one server type and switch if needed. Models are portable.
Our team helps you select and optimize your server configuration.
Let's find the optimal server solution for your project together
Unsure which server fits your needs? Book a free consultation with our CTO and find the best solution for your AI requirements.
Or contact us directly
24.11.2025
With GPT-OSS 120B, OpenAI released their first open-weight model since GPT-2 in August 2025 – and it's impressive. The model achieves near o4-mini performance but...
09.11.2025
In times of rising cloud costs, data sovereignty challenges and vendor lock-in, the topic of local AI inference is becoming increasingly important for companies. With...
09.11.2025
More and more companies are considering running Large Language Models (LLMs) on their own hardware rather than via cloud APIs. The reasons for this are...
CTO, EVA Real Estate, UAE
"I recently worked with Timo and the WZ-IT team, and honestly, it turned out to be one of the best tech decisions I have made for my business. Right from the start, Timo took the time to walk me through every step in a simple and calm way. No matter how many questions I had, he never rushed me. The results speak for themselves. With WZ-IT, we reduced our monthly expenses from $1,300 down to $250. This was a huge win for us."
Data Manager, ARGE, Germany
"With Timo and Robin, you're not only on the safe side technically - you also get the best human support! Whether it's quick help in everyday life or complex IT solutions: the guys from WZ-IT think along with you, act quickly and speak a language you understand. The collaboration is uncomplicated, reliable and always on an equal footing. That makes IT fun - and above all: it works! Big thank you to the team! (translated) "
CEO, Aphy B.V., Netherlands
"WZ-IT manages our Proxmox cluster reliably and professionally. The team handles continuous monitoring and regular updates for us and responds very quickly to any issues or inquiries. They also configure new nodes, systems, and applications that we need to add to our cluster. With WZ-IT's proactive support, our cluster and the business-critical applications running on it remain stable, and high availability is consistently ensured. We value the professional collaboration and the noticeable relief it brings to our daily operations."
CEO, Odiseo Solutions, Spain
"Counting on WZ-IT team was crucial, their expertise and solutions gave us the pace to deploy in production our services, even suggesting and performing improvements over our configuration and setup. We expect to keep counting on them for continuous maintenance of our services and implementation of new solutions."
Timo and Robin from WZ-IT set up a RocketChat server for us - and I couldn't be more satisfied! From the initial consultation to the final implementation, everything was absolutely professional, efficient, and to my complete satisfaction. I particularly appreciate the clear communication, transparent pricing, and the comprehensive expertise that both bring to the table. Even after the setup, they take care of the maintenance, which frees up my time enormously and allows me to focus on other important areas of my business - with the good feeling that our IT is in the best hands. I can recommend WZ-IT without reservation and look forward to continuing our collaboration! (translated)
We have had very good experiences with Mr. Wevelsiep and WZ-IT. The consultation was professional, clearly understandable, and at fair prices. The team not only implemented our requirements but also thought along and proactively. Instead of just processing individual tasks, they provided us with well-founded explanations that strengthened our own understanding. WZ-IT took a lot of pressure off us with their structured approach - that was exactly what we needed and is the reason why we keep coming back. (translated)
Robin and Timo provided excellent support during our migration from AWS to Hetzner! We received truly competent advice and will gladly return to their services in the future. (translated)
WZ-IT set up our Jitsi Meet Server anew - professional, fast, and reliable. (translated)
Whether a specific IT challenge or just an idea – we look forward to the exchange. In a brief conversation, we'll evaluate together if and how your project fits with WZ-IT.
Timo Wevelsiep & Robin Zins
CEOs of WZ-IT











