WZ-IT Logo
WZ-IT AI Cube - Die kompakte und lokale KI-Lösung für Unternehmen
GDPR Compliant
NVIDIA GB10 / Blackwell
Support in German & English
MadeinGermany

The local plug-and-play AI solution for businesses

The WZ-IT AI Cube brings ChatGPT-like AI, local models and internal knowledge search into your company - ready to use, without cloud lock-in and with support by WZ-IT. Plug it in, open it in the browser and work with your own AI.

Ready to use with Open WebUI, vLLM/Ollama and local models

Local data processing inside your own network

Owned hardware instead of external API dependency

Remote commissioning & support in German and English

Leading companies trust WZ-IT

  • Keymate
  • SolidProof
  • Rekorder
  • Führerscheinmacher
  • ARGE
  • NextGym
  • Paritel
  • EVADXB
  • Boese VA
  • Maho Management
  • Aphy
  • Negosh
  • Millenium
  • Yonju
  • Mr. Clipart

Why Local AI Infrastructure?

Cloud services offer convenience – but also dependency. With an AI Cube, you retain full control over your data, models, and systems. Whether chatbots, RAG systems, or internal AI automation: Your sensitive data stays within your company, while computing power is directly on-site.

The AI Cube is hardware owned by your company - without external token limits and without vendor lock-in. You decide which models run, which data is connected and whether WZ-IT optionally operates the system for you.

Data Sovereignty

Your models and data never leave your corporate network. Complete control over sensitive information.

Full Control

No API limits, no external updates, no restrictions. You decide every aspect of your AI infrastructure.

Performance

Minimal latency through local inference. No delays from cloud connections.

Cost Efficiency

No external token or pay-per-use fees. Local operating costs stay predictable.

Ownership vs. Rental

The AI Cube is fully yours. No cloud subscription, no external API dependency.

Optional Managed Service

If desired, we handle operation, maintenance, and updates – you focus on your projects.

Cost & performance

AI Cube vs. external AI APIs: what really matters?

The entry price is clear. Whether local AI pays off depends on usage, privacy requirements, model size and operating model.

Externe KI-APIs

Pay-per-use / Cloud

Cost modelPay-per-use
Data flowexternal
Dependencyprovider & API

Good for a fast start. Critical with sensitive data, high volume or the need for own control.

WZ-IT AI Cube

On-Premises

Investment€5,990.90 excl. VAT
External tokens0
Token Limit

128 GB unified memory • Open WebUI • Ollama/vLLM • optional RAG

5.990,90 €
excl. VAT entry
0
external token costs
100%
local data control

GB10 benchmark framing for GPT-OSS

GPT-OSS-20B

Conservative interactive guidance: around 80-90 tok/s. Long context can reduce visible output to roughly 60-80 tok/s.

GPT-OSS-120B

Conservative interactive guidance: around 35-60 tok/s. Under concurrency aggregate throughput rises significantly, but each single response remains workload-dependent.

Based on publicly available DGX Spark / GB10 benchmarks with vLLM, SGLang and llama.cpp. Final values are validated with the customer target model, context length and RAG setup.

Local AI Usage

Local GPT with our AI Cube

Use Open WebUI for a ChatGPT-like experience – completely local on your own hardware

Open WebUI Screenshot - ChatGPT-like interface

The AI Cube can be delivered with Open WebUI based on customer requirements – an intuitive, user-friendly interface that enables a local ChatGPT-like experience. No cloud dependency, no API keys, no token limits – just you and your AI models.

ChatGPT-like Interface

Familiar and intuitive user interface for natural conversations with your local AI models

Completely Local

All data and conversations stay on your hardware – no connection to external servers required

Multi-Model Support

Switch seamlessly between different AI models within the same interface

No Token Fees

Unlimited usage without pay-per-use fees or monthly API costs

Open WebUI can be pre-installed and delivered ready to use upon request. Simply plug in, power on, and immediately interact with your local AI models – like ChatGPT, but completely under your control.

Vorinstalliert
Sofort einsatzbereit
100% lokal

Typical AI Cube use cases

Examples of requirements where locally operated AI makes sense: independent, secure and controllable.

Case Study: Law Firm

RAG-based Document Research

!Challenge

A medium-sized law firm with numerous mandates and a large file archive found that research for precedent cases, briefs, and internal evidence was often very time-consuming – several hours per case. Additionally, sensitive client data was present that should not go to external cloud systems.

Solution with AI Cube

  • RAG solution for knowledge database search: All briefs, judgments and internal documents in searchable knowledge database
  • Lawyers ask questions in natural language and immediately receive relevant document sections with source citations
  • Infrastructure remains completely in the firm's own network, operation and maintenance by the firm's IT service provider

Result

Drastically Reduced Research Time

Lawyers can argue and decide faster

Strengthened Knowledge Base

New employees access proven documents much faster

Use Case: Healthcare and Care Facilities

Knowledge Database for Medical Protocols

!Challenge

Healthcare and care facilities need to manage large volumes of protocols, SOPs, training materials and internal reports. Documentation is often distributed and hard to access - especially when teams need reliable information quickly for workflows, quality assurance or internal training.

Solution with AI Cube

  • Knowledge platform with BookStack as knowledge source (integration programmed by us), connected to RAG pipeline with Open WebUI + vLLM
  • Employees can ask questions directly with immediate citation of the source
  • AI Cube runs locally in the corporate network, operation and maintenance by us

Result

Drastically Reduced Access Time

Relevant documents are accessed immediately

Strengthened Quality & Compliance

Employees at different locations consistently access the same knowledge pool

Reseller Program

Your clients need AI hardware?

As a reseller, you offer local AI solutions – we deliver the hardware and service

Want to not only use local AI solutions yourself, but also resell them to your customers? As a reseller, you receive preferred terms, technical support and fully pre-installed systems with the WZ-IT Local AI Stack.

Attractive Purchase Terms

Direct margin advantages for resellers and integrators.

White-Label Option

On request, we deliver the AI Cube completely neutral – ideal for system integrators who want to operate under their own brand.

Pre-installed AI Software

Ollama, vLLM, Open WebUI – ready to use for your end customers.

Technical Priority Support

Direct contact with us for questions about integration, RAG, models & hardware.

Custom Configurations

Custom models, RAG pipelines, GPU layouts, and network setups for specific customer requirements.

Expand Your Service Portfolio

You can now offer your customers their own local AI solutions – without having to develop hardware yourself.

Become a Reseller Partner

Contact us for a non-binding conversation about terms, technical details, and your individual requirements.

Setup & service

Setup, integration and operations

The AI Cube is prepared by WZ-IT, installed with the Local AI Stack and commissioned remotely. On-site appointments, workshops or deep network integration are scoped per project.

Prepared system

Hardware, operating system, drivers and AI stack are prepared before handover.

Remote commissioning

We support the initial setup inside the customer network remotely and document the key steps.

Initial Setup

Operating system, GPU drivers, container environment and security configuration (VPN, firewall, backup)

Validation & Acceptance

Performance test, stability check and GDPR compliance review before commissioning

All-Inclusive Package

For standard and custom builds

Our on-site service ensures that your AI Cube runs optimally from the start – without you having to worry about installation or configuration.

Perfect for companies that value:

Highest quality standards
Compliance & Data Protection
Clean Integration
Remote setup included
On-site per project
Standard product

Local AI without your own hardware project

The AI Cube combines validated ASUS/NVIDIA hardware with our open Local AI Stack. For special requirements, we still deliver custom builds with larger GPUs, rackmount or multi-GPU.

From August 2026: EU AI Act high-risk requirements. Local AI infrastructure simplifies compliance.

Buy AI Cube

One standard product. Custom builds when needed.

The WZ-IT AI Cube is the fast entry into local business AI. AI Cube Custom remains available for larger or special requirements.

The WZ-IT AI Cube starts at €5,990.90 excl. VAT including hardware, pre-installed AI stack, initial model setup, remote commissioning and technical onboarding. Custom builds are quoted per project.

Standard product

WZ-IT AI Cube

ASUS/NVIDIA appliance base

VRAM

128 GB

Performance

up to 1 PFLOP FP4

CUDA Cores

NVIDIA GB10

Ideal for:

Internal AI assistants, document chat and local LLMs

Sizing & benchmarks

We validate target model, context length and concurrent users before the project starts.

  • Compact validated AI appliance
  • Open WebUI, Ollama and/or vLLM pre-installed
  • Initial model setup based on your use case
  • Remote commissioning and technical onboarding
  • Support by WZ-IT, managed service optional
€5,990.90 excl. VAT
Custom build

AI Cube Custom

RTX PRO / multi-GPU / rackmount / special hardware

VRAM

Configurable

Performance

Configurable

CUDA Cores

Configurable

Ideal for:

Large models, many concurrent users, special requirements

  • NVIDIA RTX PRO, H200 or comparable GPU options
  • Multi-GPU or NVLink setups if required
  • Extended storage, backup and network options
  • Rackmount, tower or custom chassis
On request

Included in Delivery

Pre-installed Software (Ollama, vLLM, Open WebUI) – plug in & infer
Operating System & GPU Drivers
Setup Documentation
Support in German & English

Note on pricing: Listed prices are non-binding reference prices and may change. The specific price depends on your individual configuration, term, and scope of services. For a binding quote, please contact us directly.

Manage Your Stack in the Customer Portal

As a Managed Service customer at WZ-IT, you have access to our exclusive portal: Monitor your infrastructure in real-time, schedule maintenance, request quotes, and get direct support - all in one central location.

  • Real-time infrastructure status
  • Reschedule maintenance windows yourself
  • View complete access logs
  • Direct support without detours
Explore Portal
WZ-IT Customer Portal Dashboard
Interactive Demo

How fast is the AI Cube?

Test different token speeds and see the difference

Token Speed Simulator

Interactive output for GB10-based AI Cube setups

45 tok/s
10 tok/s120 tok/s

At 45 tok/s, generating takes:

1.1s

Chat response

(~50 tokens)

3.3s

Email

(~150 tokens)

44.4s

Report

(~2000 tokens)

* Visible chat speed. Concurrent batch workloads can reach much higher aggregate throughput.

Upgrade Program

Upgrade & Trade-In – When Your AI Cube Needs to Grow

Your requirements are increasing — e.g. larger models, more concurrent users or more intensive AI workloads? With our trade-in program, you can easily exchange your existing AI Cube for a more powerful model — e.g. from Pro to Custom.

Upgrade affordably

No complete new purchase — credit towards your new system

Planning security

Start small and upgrade as needed

Sustainable & secure

Secure data deletion and environmentally friendly recycling

How it works

1

Express interest

Contact us

2

Evaluation

We assess your device and determine a fair residual value

3

Receive credit

Credit towards your new AI Cube or AI Cube Custom

More than just Hardware

Your AI Cube & WZ-IT
Possibilities are endless together

With us you get not only powerful hardware, but also a competent partner for your entire AI infrastructure

Infrastructure Setup

From planning to implementation – we build your complete AI infrastructure and integrate the AI Cube seamlessly.

Custom Development

Tailored software solutions, RAG pipelines, APIs and integrations – perfectly matched to your requirements.

Innovative Solutions

Together we develop new AI applications for your specific use cases – from idea to production readiness.

Support & Maintenance

Continuous support, updates and optimizations – so your AI infrastructure always runs optimally.

Timo Wevelsiep & Robin Zins - CEOs of WZ-IT

Timo Wevelsiep & Robin Zins

CEOs of WZ-IT

Example project: from local AI box to complete solution

A typical project starts with the AI Cube as local AI infrastructure and grows into a domain solution: RAG pipeline, knowledge source, Open WebUI and operations are adapted to the concrete use case. This turns the box into a productive AI platform without sending sensitive data to external cloud services.

Let's realize your AI vision together

Software Stack & Compatibility

Ready to Use with Leading Open-Source Frameworks

Pre-installed Software:

Ollama – for simple model management
vLLM – for high-performance inference
TensorRT-LLM / NIM – Höchster Durchsatz, 1.63× H100 bei NVFP4
Open WebUI – for visual interaction
Docker / Podman – for containerized deployments
REST API Access – for integration

Compatible with:

Llama, Mistral, Qwen, Gemma
DeepSeek, Phi, Mixtral und weitere Open-Source-Modelle
Embedding-Modelle für RAG und semantische Suche
Whisper / Speech-to-Text Workloads
Coding- und Assistenzmodelle
Custom Models
Ollama

Ollama

Simple model management with one-command installation. Perfect for rapid prototyping and smaller projects.

$ ollama run qwen3.5:122b
vLLM

vLLM

High-performance inference with PagedAttention for production workloads with high throughput.

$ vllm serve gpt-oss-120b --quantization nvfp4
Performance Benchmarks

The right performance depends on the use case

Model, quantization, context length, concurrent users and RAG setup determine which hardware makes sense. That is why we do not publish one-size-fits-all token/s promises as standard performance, but validate your target workload before quoting.

Standard AI Cube

For internal assistants, document chat, initial RAG systems and local LLM usage.

Custom build

For large models, many concurrent users, special network or rack requirements.

Benchmark on request

We benchmark relevant models with your target setup and document realistic performance.

Technical Specifications

More technical details on request

KomponenteWZ-IT AI Cube
Graphics CardASUS/NVIDIA appliance base with NVIDIA GB10 class hardware and 128 GB unified memory
NetworkStandard networking, extended connectivity per project
Dimensions & WeightCompact appliance form factor, depending on hardware configuration
CertificationCE, RoHS, GDPR-compliant
SecuritySecure Boot, TPM 2.0, WireGuard VPN

AI Cubes (Purchase) vs Managed AI Server (Rental)

Find the Right Model for Your Business

AI Cube - purchase

  • Complete hardware ownership
  • CapEx: from €5,990.90 excl. VAT for the standard product
  • Full data sovereignty – hardware stays with you
  • No recurring fees (except optional support)
  • Ideal for long-term projects

Managed AI Server – Rental

  • OpEx: Monthly quote depending on hardware and service level
  • Fast start without capital commitment
  • 24/7 monitoring & maintenance included
  • Scalable: upgrade or downgrade anytime
  • Ideal for flexible or experimental projects

Why AI Cube?

All benefits at a glance

On-Prem LLM Hosting vs. Cloud API: Costs & Risks

Cloud-based LLM APIs like OpenAI, Anthropic, or Google Gemini are convenient – but expensive and risky. At high volumes, costs can quickly spiral out of control. With an AI Cube, you run local inference in your own network – without external token dependency and without a monthly API bill per request.

Additionally, on-premise LLM hosting gives you full control over your data. Sensitive information – customer data, internal documents, proprietary content – never leaves your corporate network. You're independent of API downtimes, price increases, or sudden service changes.

How the WZ-IT AI Cube Works

1

Analysis & Consultation

We jointly evaluate your requirements and use cases. In a free consultation, we determine which hardware configuration is optimal for your models and use cases.

2

Hardware Selection & Configuration

Based on model size and requirements, we select the appropriate GPU configuration. We fully configure the system and install Ollama, vLLM, Open WebUI, and other software according to your preferences.

3

Delivery & Setup

The Cube is delivered pre-installed and tested. After plugging it in, it can be operational within minutes. We support you in integrating it into your network.

4

Operation & Support (Optional)

You operate the Cube independently with full root access – or leave operation, maintenance, and updates to us. We remain your contact for extensions, support, and new requirements.

Typical Use Cases

Enterprises & Government

For sensitive data that cannot go to the cloud. Run internal chatbots, document analysis, or code assistants completely locally and GDPR-compliant.

Development & Research

Test and develop AI applications locally without cloud dependency. Ideal for rapid prototyping, model fine-tuning, and experimental projects.

On-Premise Deployment

Integrate AI capabilities directly into your existing infrastructure. No internet connection required, complete control over your data.

Industry Solutions

AI Cube for Your Industry

Tailored AI solutions for specific requirements

For Law Firms

GDPR-compliant document research, contract analysis and client communication. Attorney-client privilege maintained.

Learn more

For Clinics & Practices

Local AI for patient data, protocol analysis and medical knowledge databases.

Coming soon

For Financial Services

Compliance-conform AI for risk assessment, document analysis and advisory support.

Coming soon

Your industry not listed? We create custom solutions for your requirements.

No Dependencies. No Vendor Lock-in.

With AI Cubes, you retain full decision-making freedom: you can install your own models, migrate existing setups, or integrate software solutions of your choice – without license binding, API constraints, or external control. All components are open-source based and documented.

100% Open Source Stack

Frequently Asked Questions about AI Cube

Answers to the most important questions about your local AI solution

Topics

Hardware & Technology

The AI Cube is a plug-and-play AI hardware for businesses — ideal for running LLMs, transcriptions, or data-intensive workloads locally in your own network, without cloud dependency and fully GDPR-compliant.

The WZ-IT AI Cube is our standard product based on validated ASUS/NVIDIA appliance hardware. For larger models, many concurrent users or special infrastructure requirements, we design AI Cube Custom builds with dedicated NVIDIA GPUs, multi-GPU, rackmount or custom networking.

Power consumption depends on the appliance or custom setup and the actual workload. The standard AI Cube is designed as a compact local AI box for office and enterprise environments; larger custom systems are assessed for power, cooling and site requirements in advance.

Yes — since you own the hardware, you can replace or expand RAM, storage (NVMe/SSD), or GPU yourself at any time. We're happy to assist if needed — but you have full control over your hardware.

Privacy & Compliance

Yes — the AI Cube runs entirely locally. There's no communication with external cloud servers, no data transfer outside your network. This ensures maximum data sovereignty and GDPR compliance.

The AI Cube stores data exclusively locally. With TPM 2.0, Secure Boot, and optionally encrypted SSD/NVMe, we ensure maximum protection. For sensitive data, we recommend encrypted filesystem and restrictive access control.

Delivery & Service

Yes — on request, we deliver the AI Cube as plug-and-play: with pre-installed software, GPU drivers, and basic configuration. After powering on, you can start working with AI models immediately — no complex setup required.

Yes. We prepare the AI Cube, install the local AI stack and handle remote commissioning. On-site appointments, training or integration workshops can be added project by project.

The standard scope includes technical preparation, the pre-installed AI stack, initial model setup, remote commissioning and technical onboarding. RAG, SSO, monitoring or managed service can be added optionally.

Software & Usage

The AI Cube is prepared with an open local AI stack: Open WebUI as the user interface, Ollama and/or vLLM for local inference, suitable open-source models based on your use case and optional RAG with Qdrant or pgvector. We validate the model choice against your workload.

Yes — depending on hardware configuration, multiple models can run in parallel. For intensive or parallel use, we recommend more powerful or customized hardware configurations.

Beyond chatbots and RAG systems: audio/video transcription, document indexing, data processing, code assistance, automation of internal processes — ideal for privacy-critical or compliance-relevant scenarios.

Costs & Economics

The AI Cube is quoted per project. Price, delivery scope and service level depend on GPU, memory, form factor, software stack and operating model. You receive a binding quote on request. The AI Cube becomes attractive especially for sensitive data, predictable workloads and long-term use: no external token dependency, full control over data and hardware.

When data privacy, control, consistent performance, and long-term planning are important — e.g., with sensitive data, compliance requirements, or frequent AI use.

Yes. We support migration: data and model transfer, re-setup on your on-prem system — without external dependency.

The AI Cube is purchased and owned by your company, while our AI servers are rented and run as a monthly managed service. The Cube is suitable for long-term planning, local control and fixed sites; rented servers are better for flexible projects or variable workloads.

Maintenance & Support

Our pre-configured models are designed to be low-maintenance. If needed, we offer managed service: regular security patches, monitoring, updates — keeping your infrastructure stable and secure.

Yes — the AI Cube is compatible with common corporate networks. On request, we configure VPN, firewall, and connectivity so the Cube integrates securely and seamlessly.

In addition to hardware, we optionally offer managed service, maintenance, updates, monitoring, and support — especially for enterprise customers. Hardware, software, and support from a single source.

On request, we provide a backup concept: regular snapshots, redundant or external storage options, remote backup — keeping you protected even in case of hardware failure.

Regions & Reseller

We provide remote commissioning and support in German and English. On-site appointments or partner delivery can be planned for suitable projects.

Our AI Cubes are custom-built in our workshop in Dortmund. Each AI Cube is an individual configuration optimized for hardware and use case.

Yes — we offer a reseller program with attractive purchasing conditions, technical support, and optional white-label license. Ideal for system integrators and IT service providers.

More questions? We are happy to help!

Still have questions? Contact us!

AI projects need software and operations maturity

Proof for production deployments, architecture decisions and ongoing operations around modern software stacks.

  • Odiseo Solutions
  • Golem.de
  • ARGE

What do our customers say?

Let's Talk About Your Idea

Whether a specific IT challenge or just an idea - we look forward to the exchange. In a brief conversation, we'll evaluate together if and how your project fits with WZ-IT.

E-Mail
[email protected]

Leading companies trust WZ-IT

  • Rekorder
  • Keymate
  • Führerscheinmacher
  • SolidProof
  • ARGE
  • Boese VA
  • NextGym
  • Maho Management
  • Golem.de
  • Millenium
  • Paritel
  • Yonju
  • EVADXB
  • Mr. Clipart
  • Aphy
  • Negosh
  • ABCO Water
Timo Wevelsiep & Robin Zins - CEOs of WZ-IT

Timo Wevelsiep & Robin Zins

Managing Directors of WZ-IT

1/3 – Topic Selection33%

What is your inquiry about?

Select one or more areas where we can support you.