We design, build and operate AI systems that run in production: internal assistants, RAG, agents, LLM gateways, GPU servers, AI Cubes and open-source AI stacks.
Run models where the data lives.
Stack without unnecessary platform lock-in.
Monitoring, updates and operations included.
Proof for production deployments, architecture decisions and ongoing operations around modern software stacks.
Production AI needs application layer and operations: model access, data context, permissions, monitoring, cost control and updates have to be designed together.
Assistants, agents, UX, permissions and integration into existing software.
Ollama, vLLM, GPU sizing and model operations.
Qdrant, embeddings, data preparation and retrieval quality.
LiteLLM, API keys, budgets, SSO and network boundaries.
Langfuse, traces, costs, evaluations and audits.
Monitoring, backups, updates, security and SLA.
From AI feature to operations. Not a demo stack, but software and infrastructure for real workloads.
Integrate AI features, internal assistants and agents directly into your portals, dashboards and business apps.
Learn moreData preparation, embeddings, retrieval, access control and answer quality.
Learn moreOperate models, APIs, web UIs and gateways on controlled infrastructure.
Learn moreLocal AI inference in the office, without a rack and without external data transfer.
Learn moreDedicated hardware for inference, training, fine-tuning and larger model workloads.
Learn moreMonitoring, updates, cost control and operations for production AI stacks.
Learn moreCore components for local inference, RAG, gateway, observability and vector search.

Local inference engine
Run and manage models locally and expose them through APIs.
View expertiseChat and admin interface
User interface for local models, teams, tools and document chat.
View expertise
Enterprise RAG app
RAG system for documents, workspaces, agents and knowledge bases.
View expertise
Multi-LLM gateway
Central API layer for routing, budgets, keys and provider fallbacks.
View expertiseLLM observability
Tracing, prompt management, evaluations, cost and quality control.
View expertise
Vector database
Semantic search and retrieval layer for production RAG systems.
View expertiseWe do not operate infrastructure only on slides. WZ-IT delivers cloud, open-source and software stacks; with merkaio we run IoT and remote-site systems for real sites such as ABCO Water and nextGYM. That practice feeds into local AI infrastructure.
The hub leads into the relevant services and technology pages. Depending on the situation, we start with hardware, model operations, RAG or observability.
Not every use case needs dedicated GPUs. The relevant factors are data protection, latency, cost, model size and operations responsibility. We size the stack to the risk and workload.
Send us the use case. We will respond with a pragmatic view on architecture, hardware and operations.
Proof for production deployments, architecture decisions and ongoing operations around modern software stacks.
Whether a specific IT challenge or just an idea - we look forward to the exchange. In a brief conversation, we'll evaluate together if and how your project fits with WZ-IT.
Timo Wevelsiep & Robin Zins
Managing Directors of WZ-IT

