WZ-IT AI Cube - Die kompakte und lokale KI-Lösung für Unternehmen

GDPR Compliant

NVIDIA GB10 / Blackwell

Support in German & English

MadeinGermany

The local plug-and-play AI solution for businesses

Name: WZ-IT AI Cube
Brand: WZ-IT
Price: 5990.90 EUR
Availability: PreOrder

The WZ-IT AI Cube brings ChatGPT-like AI, local models and internal knowledge search into your company - ready to use, without cloud lock-in and with support by WZ-IT. Plug it in, open it in the browser and work with your own AI.

Ready to use with Open WebUI, vLLM/Ollama and local models

Local data processing inside your own network

Owned hardware instead of external API dependency

Remote commissioning & support in German and English

Leading companies trust WZ-IT

Why Local AI Infrastructure?

Cloud services offer convenience – but also dependency. With an AI Cube, you retain full control over your data, models, and systems. Whether chatbots, RAG systems, or internal AI automation: Your sensitive data stays within your company, while computing power is directly on-site.

The AI Cube is hardware owned by your company - without external token limits and without vendor lock-in. You decide which models run, which data is connected and whether WZ-IT optionally operates the system for you.

Data Sovereignty

Your models and data never leave your corporate network. Complete control over sensitive information.

Full Control

No API limits, no external updates, no restrictions. You decide every aspect of your AI infrastructure.

Performance

Minimal latency through local inference. No delays from cloud connections.

Cost Efficiency

No external token or pay-per-use fees. Local operating costs stay predictable.

Ownership vs. Rental

The AI Cube is fully yours. No cloud subscription, no external API dependency.

Optional Managed Service

If desired, we handle operation, maintenance, and updates – you focus on your projects.

Cost & performance

AI Cube vs. external AI APIs: what really matters?

The entry price is clear. Whether local AI pays off depends on usage, privacy requirements, model size and operating model.

Externe KI-APIs

Pay-per-use / Cloud

Cost modelPay-per-use

Data flowexternal

Dependencyprovider & API

Good for a fast start. Critical with sensitive data, high volume or the need for own control.

WZ-IT AI Cube

On-Premises

Investment€5,990.90 excl. VAT

External tokens0

Token Limit∞

128 GB unified memory • Open WebUI • Ollama/vLLM • optional RAG

5.990,90 €

excl. VAT entry

external token costs

100%

local data control

GB10 benchmark framing for GPT-OSS

GPT-OSS-20B

Conservative interactive guidance: around 80-90 tok/s. Long context can reduce visible output to roughly 60-80 tok/s.

GPT-OSS-120B

Conservative interactive guidance: around 35-60 tok/s. Under concurrency aggregate throughput rises significantly, but each single response remains workload-dependent.

Based on publicly available DGX Spark / GB10 benchmarks with vLLM, SGLang and llama.cpp. Final values are validated with the customer target model, context length and RAG setup.

Local AI Usage

Local GPT with our AI Cube

Use Open WebUI for a ChatGPT-like experience – completely local on your own hardware

Open WebUI Screenshot - ChatGPT-like interface

The AI Cube can be delivered with Open WebUI based on customer requirements – an intuitive, user-friendly interface that enables a local ChatGPT-like experience. No cloud dependency, no API keys, no token limits – just you and your AI models.

ChatGPT-like Interface

Familiar and intuitive user interface for natural conversations with your local AI models

Completely Local

All data and conversations stay on your hardware – no connection to external servers required

Multi-Model Support

Switch seamlessly between different AI models within the same interface

No Token Fees

Unlimited usage without pay-per-use fees or monthly API costs

Open WebUI can be pre-installed and delivered ready to use upon request. Simply plug in, power on, and immediately interact with your local AI models – like ChatGPT, but completely under your control.

Vorinstalliert

Sofort einsatzbereit

100% lokal

Typical AI Cube use cases

Examples of requirements where locally operated AI makes sense: independent, secure and controllable.

Case Study: Law Firm

RAG-based Document Research

!Challenge

A medium-sized law firm with numerous mandates and a large file archive found that research for precedent cases, briefs, and internal evidence was often very time-consuming – several hours per case. Additionally, sensitive client data was present that should not go to external cloud systems.

✓Solution with AI Cube

RAG solution for knowledge database search: All briefs, judgments and internal documents in searchable knowledge database
Lawyers ask questions in natural language and immediately receive relevant document sections with source citations
Infrastructure remains completely in the firm's own network, operation and maintenance by the firm's IT service provider

→Result

Drastically Reduced Research Time

Lawyers can argue and decide faster

Strengthened Knowledge Base

New employees access proven documents much faster

Use Case: Healthcare and Care Facilities

Knowledge Database for Medical Protocols

!Challenge

Healthcare and care facilities need to manage large volumes of protocols, SOPs, training materials and internal reports. Documentation is often distributed and hard to access - especially when teams need reliable information quickly for workflows, quality assurance or internal training.

✓Solution with AI Cube

Knowledge platform with BookStack as knowledge source (integration programmed by us), connected to RAG pipeline with Open WebUI + vLLM
Employees can ask questions directly with immediate citation of the source
AI Cube runs locally in the corporate network, operation and maintenance by us

→Result

Drastically Reduced Access Time

Relevant documents are accessed immediately

Strengthened Quality & Compliance

Employees at different locations consistently access the same knowledge pool

Reseller Program

Your clients need AI hardware?

As a reseller, you offer local AI solutions – we deliver the hardware and service

Want to not only use local AI solutions yourself, but also resell them to your customers? As a reseller, you receive preferred terms, technical support and fully pre-installed systems with the WZ-IT Local AI Stack.

Attractive Purchase Terms

Direct margin advantages for resellers and integrators.

White-Label Option

On request, we deliver the AI Cube completely neutral – ideal for system integrators who want to operate under their own brand.

Pre-installed AI Software

Ollama, vLLM, Open WebUI – ready to use for your end customers.

Technical Priority Support

Direct contact with us for questions about integration, RAG, models & hardware.

Custom Configurations

Custom models, RAG pipelines, GPU layouts, and network setups for specific customer requirements.

Expand Your Service Portfolio

You can now offer your customers their own local AI solutions – without having to develop hardware yourself.

Become a Reseller Partner

Setup & service

Setup, integration and operations

The AI Cube is prepared by WZ-IT, installed with the Local AI Stack and commissioned remotely. On-site appointments, workshops or deep network integration are scoped per project.

Prepared system

Hardware, operating system, drivers and AI stack are prepared before handover.

Remote commissioning

We support the initial setup inside the customer network remotely and document the key steps.

Initial Setup

Operating system, GPU drivers, container environment and security configuration (VPN, firewall, backup)

Validation & Acceptance

Performance test, stability check and GDPR compliance review before commissioning

All-Inclusive Package

For standard and custom builds

Our on-site service ensures that your AI Cube runs optimally from the start – without you having to worry about installation or configuration.

Perfect for companies that value:

Highest quality standards

Compliance & Data Protection

Clean Integration

Remote setup included

On-site per project

Standard product

Local AI without your own hardware project

The AI Cube combines validated ASUS/NVIDIA hardware with our open Local AI Stack. For special requirements, we still deliver custom builds with larger GPUs, rackmount or multi-GPU.

From August 2026: EU AI Act high-risk requirements. Local AI infrastructure simplifies compliance.

Buy AI Cube

One standard product. Custom builds when needed.

The WZ-IT AI Cube is the fast entry into local business AI. AI Cube Custom remains available for larger or special requirements.

The WZ-IT AI Cube starts at €5,990.90 excl. VAT including hardware, pre-installed AI stack, initial model setup, remote commissioning and technical onboarding. Custom builds are quoted per project.

Standard product

WZ-IT AI Cube

ASUS/NVIDIA appliance base

VRAM

128 GB

Performance

up to 1 PFLOP FP4

CUDA Cores

NVIDIA GB10

Ideal for:

Internal AI assistants, document chat and local LLMs

Sizing & benchmarks

We validate target model, context length and concurrent users before the project starts.

Compact validated AI appliance
Open WebUI, Ollama and/or vLLM pre-installed
Initial model setup based on your use case
Remote commissioning and technical onboarding
Support by WZ-IT, managed service optional

€5,990.90 excl. VAT

Custom build

AI Cube Custom

RTX PRO / multi-GPU / rackmount / special hardware

VRAM

Configurable

Performance

Configurable

CUDA Cores

Configurable

Ideal for:

Large models, many concurrent users, special requirements

NVIDIA RTX PRO, H200 or comparable GPU options
Multi-GPU or NVLink setups if required
Extended storage, backup and network options
Rackmount, tower or custom chassis

On request

Included in Delivery

Pre-installed Software (Ollama, vLLM, Open WebUI) – plug in & infer

Operating System & GPU Drivers

Setup Documentation

Support in German & English

Note on pricing: Listed prices are non-binding reference prices and may change. The specific price depends on your individual configuration, term, and scope of services. For a binding quote, please contact us directly.

Manage Your Stack in the Customer Portal

As a Managed Service customer at WZ-IT, you have access to our exclusive portal: Monitor your infrastructure in real-time, schedule maintenance, request quotes, and get direct support - all in one central location.

Real-time infrastructure status
Reschedule maintenance windows yourself
View complete access logs
Direct support without detours

Explore Portal

Interactive Demo

How fast is the AI Cube?

Test different token speeds and see the difference

Token Speed Simulator

Interactive output for GB10-based AI Cube setups

Adjust speed45 tok/s

10 tok/s120 tok/s

At 45 tok/s, generating takes:

1.1s

Chat response

(~50 tokens)

3.3s

(~150 tokens)

44.4s

Report

(~2000 tokens)

* Visible chat speed. Concurrent batch workloads can reach much higher aggregate throughput.

Upgrade Program

Upgrade & Trade-In – When Your AI Cube Needs to Grow

Your requirements are increasing — e.g. larger models, more concurrent users or more intensive AI workloads? With our trade-in program, you can easily exchange your existing AI Cube for a more powerful model — e.g. from Pro to Custom.

Upgrade affordably

No complete new purchase — credit towards your new system

Planning security

Start small and upgrade as needed

Sustainable & secure

Secure data deletion and environmentally friendly recycling

How it works

Express interest

Evaluation

We assess your device and determine a fair residual value

Receive credit

Credit towards your new AI Cube or AI Cube Custom

More than just Hardware

Your AI Cube & WZ-IT
Possibilities are endless together

With us you get not only powerful hardware, but also a competent partner for your entire AI infrastructure

Infrastructure Setup

From planning to implementation – we build your complete AI infrastructure and integrate the AI Cube seamlessly.

Custom Development

Tailored software solutions, RAG pipelines, APIs and integrations – perfectly matched to your requirements.

Innovative Solutions

Together we develop new AI applications for your specific use cases – from idea to production readiness.

Support & Maintenance

Continuous support, updates and optimizations – so your AI infrastructure always runs optimally.

Timo Wevelsiep & Robin Zins

CEOs of WZ-IT

Example project: from local AI box to complete solution

A typical project starts with the AI Cube as local AI infrastructure and grows into a domain solution: RAG pipeline, knowledge source, Open WebUI and operations are adapted to the concrete use case. This turns the box into a productive AI platform without sending sensitive data to external cloud services.

Let's realize your AI vision together

Software Stack & Compatibility

Ready to Use with Leading Open-Source Frameworks

Pre-installed Software:

Ollama – for simple model management

vLLM – for high-performance inference

TensorRT-LLM / NIM – Höchster Durchsatz, 1.63× H100 bei NVFP4

Open WebUI – for visual interaction

Docker / Podman – for containerized deployments

REST API Access – for integration

Compatible with:

Llama, Mistral, Qwen, Gemma

DeepSeek, Phi, Mixtral und weitere Open-Source-Modelle

Embedding-Modelle für RAG und semantische Suche

Whisper / Speech-to-Text Workloads

Coding- und Assistenzmodelle

Custom Models

Ollama

Simple model management with one-command installation. Perfect for rapid prototyping and smaller projects.

$ ollama run qwen3.5:122b

vLLM

High-performance inference with PagedAttention for production workloads with high throughput.

$ vllm serve gpt-oss-120b --quantization nvfp4

Performance Benchmarks

The right performance depends on the use case

Model, quantization, context length, concurrent users and RAG setup determine which hardware makes sense. That is why we do not publish one-size-fits-all token/s promises as standard performance, but validate your target workload before quoting.

Standard AI Cube

For internal assistants, document chat, initial RAG systems and local LLM usage.

Custom build

For large models, many concurrent users, special network or rack requirements.

Benchmark on request

We benchmark relevant models with your target setup and document realistic performance.

Technical Specifications

More technical details on request

Komponente	WZ-IT AI Cube
Graphics Card	ASUS/NVIDIA appliance base with NVIDIA GB10 class hardware and 128 GB unified memory
Network	Standard networking, extended connectivity per project
Dimensions & Weight	Compact appliance form factor, depending on hardware configuration
Certification	CE, RoHS, GDPR-compliant
Security	Secure Boot, TPM 2.0, WireGuard VPN

AI Cubes (Purchase) vs Managed AI Server (Rental)

Find the Right Model for Your Business

AI Cube - purchase

Complete hardware ownership
CapEx: from €5,990.90 excl. VAT for the standard product
Full data sovereignty – hardware stays with you
No recurring fees (except optional support)
Ideal for long-term projects

Managed AI Server – Rental

OpEx: Monthly quote depending on hardware and service level
Fast start without capital commitment
24/7 monitoring & maintenance included
Scalable: upgrade or downgrade anytime
Ideal for flexible or experimental projects

View Managed AI Servers

Why AI Cube?

All benefits at a glance

On-Prem LLM Hosting vs. Cloud API: Costs & Risks

Cloud-based LLM APIs like OpenAI, Anthropic, or Google Gemini are convenient – but expensive and risky. At high volumes, costs can quickly spiral out of control. With an AI Cube, you run local inference in your own network – without external token dependency and without a monthly API bill per request.

Additionally, on-premise LLM hosting gives you full control over your data. Sensitive information – customer data, internal documents, proprietary content – never leaves your corporate network. You're independent of API downtimes, price increases, or sudden service changes.

How the WZ-IT AI Cube Works

Analysis & Consultation

We jointly evaluate your requirements and use cases. In a free consultation, we determine which hardware configuration is optimal for your models and use cases.

Hardware Selection & Configuration

Based on model size and requirements, we select the appropriate GPU configuration. We fully configure the system and install Ollama, vLLM, Open WebUI, and other software according to your preferences.

Delivery & Setup

The Cube is delivered pre-installed and tested. After plugging it in, it can be operational within minutes. We support you in integrating it into your network.

Operation & Support (Optional)

You operate the Cube independently with full root access – or leave operation, maintenance, and updates to us. We remain your contact for extensions, support, and new requirements.

Typical Use Cases

Enterprises & Government

For sensitive data that cannot go to the cloud. Run internal chatbots, document analysis, or code assistants completely locally and GDPR-compliant.

Development & Research

Test and develop AI applications locally without cloud dependency. Ideal for rapid prototyping, model fine-tuning, and experimental projects.

On-Premise Deployment

Integrate AI capabilities directly into your existing infrastructure. No internet connection required, complete control over your data.

Industry Solutions

AI Cube for Your Industry

Tailored AI solutions for specific requirements

For Law Firms

GDPR-compliant document research, contract analysis and client communication. Attorney-client privilege maintained.

Learn more

For Clinics & Practices

Local AI for patient data, protocol analysis and medical knowledge databases.

Coming soon

For Financial Services

Compliance-conform AI for risk assessment, document analysis and advisory support.

Coming soon

Your industry not listed? We create custom solutions for your requirements.

No Dependencies. No Vendor Lock-in.

With AI Cubes, you retain full decision-making freedom: you can install your own models, migrate existing setups, or integrate software solutions of your choice – without license binding, API constraints, or external control. All components are open-source based and documented.

100% Open Source Stack

Frequently Asked Questions about AI Cube

Answers to the most important questions about your local AI solution

Topics

Hardware & Technology

What is the AI Cube and what is it suitable for?

The AI Cube is a plug-and-play AI hardware for businesses — ideal for running LLMs, transcriptions, or data-intensive workloads locally in your own network, without cloud dependency and fully GDPR-compliant.

What hardware configurations are possible?

The WZ-IT AI Cube is our standard product based on validated ASUS/NVIDIA appliance hardware. For larger models, many concurrent users or special infrastructure requirements, we design AI Cube Custom builds with dedicated NVIDIA GPUs, multi-GPU, rackmount or custom networking.

How much power does the AI Cube consume?

Power consumption depends on the appliance or custom setup and the actual workload. The standard AI Cube is designed as a compact local AI box for office and enterprise environments; larger custom systems are assessed for power, cooling and site requirements in advance.

Can I expand or upgrade the AI Cube later?

Yes — since you own the hardware, you can replace or expand RAM, storage (NVMe/SSD), or GPU yourself at any time. We're happy to assist if needed — but you have full control over your hardware.

Privacy & Compliance

Does my data really stay in my network?

Yes — the AI Cube runs entirely locally. There's no communication with external cloud servers, no data transfer outside your network. This ensures maximum data sovereignty and GDPR compliance.

How do I ensure the AI Cube is GDPR-compliant?

The AI Cube stores data exclusively locally. With TPM 2.0, Secure Boot, and optionally encrypted SSD/NVMe, we ensure maximum protection. For sensitive data, we recommend encrypted filesystem and restrictive access control.

Delivery & Service

Is the AI Cube delivered pre-configured and ready to use?

Yes — on request, we deliver the AI Cube as plug-and-play: with pre-installed software, GPU drivers, and basic configuration. After powering on, you can start working with AI models immediately — no complex setup required.

Do you set up the AI Cube?

Yes. We prepare the AI Cube, install the local AI stack and handle remote commissioning. On-site appointments, training or integration workshops can be added project by project.

What is included in commissioning?

The standard scope includes technical preparation, the pre-installed AI stack, initial model setup, remote commissioning and technical onboarding. RAG, SSO, monitoring or managed service can be added optionally.

Software & Usage

What software and models can I use on the AI Cube?

The AI Cube is prepared with an open local AI stack: Open WebUI as the user interface, Ollama and/or vLLM for local inference, suitable open-source models based on your use case and optional RAG with Qdrant or pgvector. We validate the model choice against your workload.

Can I run multiple AI models simultaneously?

Yes — depending on hardware configuration, multiple models can run in parallel. For intensive or parallel use, we recommend more powerful or customized hardware configurations.

What practical use cases does the AI Cube support?

Beyond chatbots and RAG systems: audio/video transcription, document indexing, data processing, code assistance, automation of internal processes — ideal for privacy-critical or compliance-relevant scenarios.

Costs & Economics

What does the AI Cube cost and how does the investment pay off?

The AI Cube is quoted per project. Price, delivery scope and service level depend on GPU, memory, form factor, software stack and operating model. You receive a binding quote on request. The AI Cube becomes attractive especially for sensitive data, predictable workloads and long-term use: no external token dependency, full control over data and hardware.

When is an AI Cube more worthwhile than cloud offerings?

When data privacy, control, consistent performance, and long-term planning are important — e.g., with sensitive data, compliance requirements, or frequent AI use.

Can I switch from a cloud-based solution to the AI Cube?

Yes. We support migration: data and model transfer, re-setup on your on-prem system — without external dependency.

What's the difference from rented AI servers?

The AI Cube is purchased and owned by your company, while our AI servers are rented and run as a monthly managed service. The Cube is suitable for long-term planning, local control and fixed sites; rented servers are better for flexible projects or variable workloads.

Maintenance & Support

How much effort is maintenance?

Our pre-configured models are designed to be low-maintenance. If needed, we offer managed service: regular security patches, monitoring, updates — keeping your infrastructure stable and secure.

Can the AI Cube be integrated into existing networks?

Yes — the AI Cube is compatible with common corporate networks. On request, we configure VPN, firewall, and connectivity so the Cube integrates securely and seamlessly.

What service and support options do you offer?

In addition to hardware, we optionally offer managed service, maintenance, updates, monitoring, and support — especially for enterprise customers. Hardware, software, and support from a single source.

What happens if hardware fails?

On request, we provide a backup concept: regular snapshots, redundant or external storage options, remote backup — keeping you protected even in case of hardware failure.

Regions & Reseller

Where do you provide support?

We provide remote commissioning and support in German and English. On-site appointments or partner delivery can be planned for suitable projects.

Where are the AI Cubes manufactured?

Our AI Cubes are custom-built in our workshop in Dortmund. Each AI Cube is an individual configuration optimized for hardware and use case.

Can I offer the AI Cube as a reseller or white-label?

Yes — we offer a reseller program with attractive purchasing conditions, technical support, and optional white-label license. Ideal for system integrators and IT service providers.

AI projects need software and operations maturity

Proof for production deployments, architecture decisions and ongoing operations around modern software stacks.

What do our customers say?

Let's Talk About Your Idea

Whether a specific IT challenge or just an idea - we look forward to the exchange. In a brief conversation, we'll evaluate together if and how your project fits with WZ-IT.

E-Mail

[email protected]

Leading companies trust WZ-IT

Timo Wevelsiep & Robin Zins

Managing Directors of WZ-IT

1/3 – Topic Selection33%

What is your inquiry about?

Select one or more areas where we can support you.

The local plug-and-play AI solution for businesses

Why Local AI Infrastructure?

Data Sovereignty

Full Control

Performance

Cost Efficiency

Ownership vs. Rental

Optional Managed Service

AI Cube vs. external AI APIs: what really matters?

Externe KI-APIs

WZ-IT AI Cube

GB10 benchmark framing for GPT-OSS

Local GPT with our AI Cube

ChatGPT-like Interface

Completely Local

Multi-Model Support

No Token Fees

Open WebUI can be pre-installed and delivered ready to use upon request. Simply plug in, power on, and immediately interact with your local AI models – like ChatGPT, but completely under your control.

Typical AI Cube use cases

Case Study: Law Firm

!Challenge

✓Solution with AI Cube

→Result

Use Case: Healthcare and Care Facilities

!Challenge

✓Solution with AI Cube

→Result

Your clients need AI hardware?

Attractive Purchase Terms

White-Label Option

Pre-installed AI Software

Technical Priority Support

Custom Configurations

Expand Your Service Portfolio

Become a Reseller Partner

Setup, integration and operations

Prepared system

Remote commissioning

Initial Setup

Validation & Acceptance

All-Inclusive Package

Local AI without your own hardware project

One standard product. Custom builds when needed.

WZ-IT AI Cube

AI Cube Custom

Included in Delivery

Manage Your Stack in the Customer Portal

How fast is the AI Cube?

Token Speed Simulator

Upgrade & Trade-In – When Your AI Cube Needs to Grow

How it works

Your AI Cube & WZ-ITPossibilities are endless together

Infrastructure Setup

Custom Development

Innovative Solutions

Support & Maintenance

Example project: from local AI box to complete solution

Software Stack & Compatibility

Pre-installed Software:

Compatible with:

Ollama

vLLM

The right performance depends on the use case

Standard AI Cube

Custom build

Benchmark on request

Technical Specifications

AI Cubes (Purchase) vs Managed AI Server (Rental)

AI Cube - purchase

Managed AI Server – Rental

Why AI Cube?

On-Prem LLM Hosting vs. Cloud API: Costs & Risks

How the WZ-IT AI Cube Works

Analysis & Consultation

Hardware Selection & Configuration

Delivery & Setup

Operation & Support (Optional)

Typical Use Cases

Enterprises & Government

Development & Research

Your AI Cube & WZ-IT
Possibilities are endless together