24.11.2025
GPT-OSS 120B on AI Cube Pro: Run OpenAI's Open-Source Model Locally
With GPT-OSS 120B, OpenAI released their first open-weight model since GPT-2 in August 2025 – and it's impressive. The model achieves near o4-mini performance but...

LiteLLM is a gateway between applications and LLM providers. Teams call an OpenAI-compatible API while routing between local models, European endpoints and cloud providers.

LiteLLM is a gateway between applications and LLM providers. Teams call an OpenAI-compatible API while routing between local models, European endpoints and cloud providers.
We operate LiteLLM as a controlled AI access layer: with virtual keys, budgets, provider fallbacks, logging, SSO integration, network segmentation and integration into Ollama, Langfuse and internal applications.
Without a gateway, API keys, provider dependencies and cost control quickly spread uncontrolled. LiteLLM centralizes routing, quotas and fallbacks.
LiteLLM has an MIT-licensed core and commercial enterprise features. We check upfront whether core features are sufficient or enterprise features such as advanced authentication are useful.
Route requests to Ollama, OpenAI-compatible endpoints, European providers or hyperscaler APIs through a unified interface.
Provider outages, rate limits or cost limits can be handled through fallback rules and routing policies.
Teams and applications receive their own keys, budgets and policies instead of direct provider access.
Token usage, models, providers and budgets become centrally visible and controllable.
Existing applications can often move to new models or providers without major rewrites.
We harden the gateway, separate networks, integrate authentication and monitor metrics, logs and availability.
LiteLLM decouples applications from individual model providers and makes provider changes, fallbacks and local models controllable.
Virtual keys, budgets and policies create a clean technical boundary between teams, applications and model providers.
We operate LiteLLM with TLS, network segmentation, monitoring and clear operating processes for production AI workloads.
Open source enterprise-ready for productive workloads - we run your applications with highest security standards and enterprise support
Open Source Software für geschäftskritische Prozesse erfordert professionelle Wartung, kontinuierliche Updates und enterprise-grade Support. Wir übernehmen Hosting und Betrieb von LiteLLM auf unserer DSGVO-konformen Infrastruktur in Deutschland (oder optional in Ihrer Cloud) – inklusive Backups, SLAs, Telefon-Support und persönlichem Ansprechpartner. Damit Sie sich auf Ihr Kerngeschäft konzentrieren können.
Wir bieten auch maßgeschneiderte Hosting- und Entwicklungs-Lösungen für Ihre speziellen Anforderungen rund um LiteLLM. Kontaktieren Sie uns für ein individuelles Angebot.
From fully managed GPU servers to compact AI Cubes - we provide the ideal infrastructure for your local LLM applications.
Powerful GPU servers with dedicated hardware for compute-intensive LLM workloads. Fully managed, scalable, and optimized for maximum performance.
Compact AI workstation for local LLM inference. Perfect for office environments, with top-tier performance and absolute data sovereignty.
Good choice - we'll help you get started or with operations.
As a Managed Service customer at WZ-IT, you have access to our exclusive portal: Monitor your infrastructure in real-time, schedule maintenance, request quotes, and get direct support - all in one central location.

24.11.2025
With GPT-OSS 120B, OpenAI released their first open-weight model since GPT-2 in August 2025 – and it's impressive. The model achieves near o4-mini performance but...
09.11.2025
In times of rising cloud costs, data sovereignty challenges and vendor lock-in, the topic of local AI inference is becoming increasingly important for companies. With...
09.11.2025
More and more companies are considering running Large Language Models (LLMs) on their own hardware rather than via cloud APIs. The reasons for this are...
These solutions are often used together with LiteLLM
These solutions offer similar functionalities and can be evaluated together
Proof for production deployments, architecture decisions and ongoing operations around modern software stacks.
Whether a specific IT challenge or just an idea - we look forward to the exchange. In a brief conversation, we'll evaluate together if and how your project fits with WZ-IT.
Timo Wevelsiep & Robin Zins
Managing Directors of WZ-IT

