WZ-IT AI Cube - Die kompakte und lokale KI-Lösung für Unternehmen

GDPR Compliant

NVIDIA RTX Blackwell

Support from Germany

MadeinGermany

The local plug-and-play AI solution for businesses

Name: AI Cube Basic - NVIDIA RTX PRO 4000 Blackwell
Brand: NVIDIA
Price: 4299.90 EUR
Availability: InStock
Rating: 5.0 (8 reviews)

Prevent data leakage by employees using ChatGPT & Co. – run your AI infrastructure locally, without cloud and without huge server racks!

Ready to use with pre-installed software

100% data sovereignty in your network

One-time investment instead of monthly fees

Europe-wide personal delivery & commissioning

Trusted by leading companies

Why Local AI Infrastructure?

Cloud services offer convenience – but also dependency. With an AI Cube, you retain full control over your data, models, and systems. Whether chatbots, RAG systems, or internal AI automation: Your sensitive data stays within your company, while computing power is directly on-site.

The AI Cube is owned by your company – no monthly fees, no token limits, no vendor lock-in. You decide which software runs, which models are used, and how your AI infrastructure grows.

Data Sovereignty

Your models and data never leave your corporate network. Complete control over sensitive information.

Full Control

No API limits, no external updates, no restrictions. You decide every aspect of your AI infrastructure.

Performance

Minimal latency through local inference. No delays from cloud connections.

Cost Efficiency

No token or pay-per-use fees. One-time investment instead of ongoing costs.

Ownership vs. Rental

The AI Cube is completely yours. No monthly subscriptions, no vendor dependency.

Optional Managed Service

If desired, we handle operation, maintenance, and updates – you focus on your projects.

ROI Calculator

Cloud vs. On-Premises: When Does AI Cube Pay Off?

At 500 tokens/s continuous load, the AI Cube Pro pays for itself in under 4 months

OpenAI GPT-5 mini

Cloud API

Monthly$3,564

Yearly$42,768

Tokens/Mo.5.18B

Input: $0.25/1M • Output: $2.00/1M • 500 t/s output, 1,500 t/s input (3:1 ratio)

AI Cube Pro

On-Premises

One-time€13,599

Token costs€0

Token Limit∞

96 GB VRAM • 500+ t/s output • Unlimited usage

Months Break-Even

€30K+

Savings/Year

100%

Data Control

Local AI Usage

Local GPT with our AI Cube

Use Open WebUI for a ChatGPT-like experience – completely local on your own hardware

Open WebUI Screenshot - ChatGPT-like interface

The AI Cube can be delivered with Open WebUI based on customer requirements – an intuitive, user-friendly interface that enables a local ChatGPT-like experience. No cloud dependency, no API keys, no token limits – just you and your AI models.

ChatGPT-like Interface

Familiar and intuitive user interface for natural conversations with your local AI models

Completely Local

All data and conversations stay on your hardware – no connection to external servers required

Multi-Model Support

Switch seamlessly between different AI models within the same interface

No Token Fees

Unlimited usage without pay-per-use fees or monthly API costs

Open WebUI can be pre-installed and delivered ready to use upon request. Simply plug in, power on, and immediately interact with your local AI models – like ChatGPT, but completely under your control.

Vorinstalliert

Sofort einsatzbereit

100% lokal

How Our Customers Successfully Use the AI Cube

Our customers benefit from the locally operated AI solution – independent, secure and efficient. Here are two exemplary use cases.

Case Study: Law Firm

RAG-based Document Research

!Challenge

A medium-sized law firm with numerous mandates and a large file archive found that research for precedent cases, briefs, and internal evidence was often very time-consuming – several hours per case. Additionally, sensitive client data was present that should not go to external cloud systems.

✓Solution with AI Cube

RAG solution for knowledge database search: All briefs, judgments and internal documents in searchable knowledge database
Lawyers ask questions in natural language and immediately receive relevant document sections with source citations
Infrastructure remains completely in the firm's own network, operation and maintenance by the firm's IT service provider

→Result

Drastically Reduced Research Time

Lawyers can argue and decide faster

Strengthened Knowledge Base

New employees access proven documents much faster

Case Study: Private Clinics Network (Psychiatric Facilities)

Knowledge Database for Medical Protocols

!Challenge

A clinic network with multiple locations must manage large amounts of medical protocols, SOPs, training materials and internal reports. Documentation was fragmented and difficult to access – especially when it came to quick decision support and quality checks.

✓Solution with AI Cube

Knowledge platform with BookStack as knowledge source (integration programmed by us), connected to RAG pipeline with Open WebUI + vLLM
Employees can ask questions directly with immediate citation of the source
AI Cube runs locally in the corporate network, operation and maintenance by us

→Result

Drastically Reduced Access Time

Relevant documents are accessed immediately

Strengthened Quality & Compliance

Employees at different locations consistently access the same knowledge pool

Reseller Program

Your clients need AI hardware?

As a reseller, you offer local AI solutions – we deliver the hardware and service

Want to not only use local AI solutions yourself, but also resell them to your customers? As a reseller, you receive preferred terms, technical support, and fully pre-installed systems. For Enterprise and Pro customers, we deliver personally.

Attractive Purchase Terms

Direct margin advantages for resellers and integrators.

White-Label Option

On request, we deliver the AI Cube completely neutral – ideal for system integrators who want to operate under their own brand.

Pre-installed AI Software

Ollama, vLLM, Open WebUI – ready to use for your end customers.

Technical Priority Support

Direct contact with us for questions about integration, RAG, models & hardware.

Custom Configurations

Custom models, RAG pipelines, GPU layouts, and network setups for specific customer requirements.

Expand Your Service Portfolio

You can now offer your customers their own local AI solutions – without having to develop hardware yourself.

Become a Reseller Partner

Enterprise & Pro Service

On-Site Service for Maximum Security & Comfort

For our AI Cube Pro customers, we offer personal delivery and professional commissioning in Germany and the Netherlands. For Enterprise customers, this service is available Europe-wide.

Secure Delivery

Directly to your company premises or to your customers – personally

Physical Installation

Professional installation and cabling on-site

Initial Setup

Operating system, GPU drivers, container environment and security configuration (VPN, firewall, backup)

Validation & Acceptance

Performance test, stability check and GDPR compliance review before commissioning

All-Inclusive Package

For Enterprise & Pro Customers

Our on-site service ensures that your AI Cube runs optimally from the start – without you having to worry about installation or configuration.

Perfect for companies that value:

Highest quality standards

Compliance & Data Protection

Clean Integration

AI Cube Pro: DE & NL

Enterprise: Europe-wide

New

We have replaced the Ada Generation!

Our AI Cubes now use NVIDIA RTX PRO Blackwell GPUs – the latest generation with more VRAM, higher efficiency, and better performance. Benefit from the latest technology for your local AI infrastructure.

Hardware for Purchase

Hardware Options for Your AI Projects

Proven configurations for every use case

Aufgrund von steigenden Speicherpreisen mussten wir unsere Preise anpassen, um weiterhin den gewohnten Support und Unterstützung gewährleisten zu können.

Entry Model

AI Cube Basic

NVIDIA RTX PRO 4000 Blackwell

VRAM

24 GB

Performance

46.9 TFLOPS

CUDA Cores

8.960

Recommended Use:

Chatbots, Code Assistance, Text Inference

GPT-OSS 20B Performance

token/s

Batch Size 1

Up to 5 concurrent users
Ideal for models up to 20B parameters
Fast real-time inference
Perfect for 24/7 operation
Mini-ITX form factor
< 6 months ROI vs. cloud APIs
Trade-In available

View More Details

from €4,299.90

excl. VAT

Learn More

Enterprise Model

AI Cube Pro

NVIDIA RTX PRO 6000 Blackwell

VRAM

96 GB

Performance

125 TFLOPS

CUDA Cores

24.064

Recommended Use:

Large LLM Models, Training

GPT-OSS 20B Performance

200

token/s

Batch Size 1

Up to 20 concurrent users
For models up to 120B+ parameters (e.g. GPT-OSS 120B)
96 GB VRAM for largest models
Enterprise-Grade Performance
< 4 months ROI vs. cloud APIs
Personal delivery & commissioning (DE & NL)
Trade-In available

View More Details

from €13,599.90

excl. VAT

Learn More

Custom Configuration

AI Cube Custom

Multi-GPU Setups (e.g. H200, RTX Blackwell)

VRAM

Configurable

Performance

Configurable

CUDA Cores

Configurable

Recommended Use:

Multi-GPU Workloads, High-Performance Training

Multi-GPU with NVLink (2-8 GPUs)
NVIDIA H200 or RTX Blackwell
Extended storage & network options
Rack-Mount or Tower chassis

On Request

Learn More

Included in Delivery

Pre-installed Software (Ollama, vLLM, Open WebUI) – plug in & infer

Operating System & GPU Drivers

Setup Documentation

German Support

Manage Your Stack in the Customer Portal

As a Managed Service customer at WZ-IT, you have access to our exclusive portal: Monitor your infrastructure in real-time, schedule maintenance, request quotes, and get direct support – all in one central location.

Real-time infrastructure status
Reschedule maintenance windows yourself
View complete access logs
Direct support without detours

Explore Portal

Interactive Demo

How fast is the AI Cube?

Test different token speeds and see the difference

Token Speed Simulator

Experience the difference of various token rates

Adjust speed50 tok/s

10 tok/s300 tok/s

At 50 tok/s, generating takes:

1.0s

Chat response

(~50 tokens)

3.0s

(~150 tokens)

40.0s

Report

(~2000 tokens)

* Token rates vary depending on model size and query complexity

Upgrade Program

Upgrade & Trade-In – When Your AI Cube Needs to Grow

Your requirements are increasing — e.g. larger models, more concurrent users or more intensive AI workloads? With our trade-in program, you can easily exchange your existing AI Cube for a more powerful model — whether from Basic to Pro or from Pro to Custom.

Upgrade affordably

No complete new purchase — credit towards your new system

Planning security

Start small and upgrade as needed

Sustainable & secure

Secure data deletion and environmentally friendly recycling

How it works

Express interest

Evaluation

We assess your device and determine a fair residual value

Receive credit

Discount on your new AI Cube Pro or Custom

More than just Hardware

Your AI Cube & WZ-IT
Possibilities are endless together

With us you get not only powerful hardware, but also a competent partner for your entire AI infrastructure

Infrastructure Setup

From planning to implementation – we build your complete AI infrastructure and integrate the AI Cube seamlessly.

Custom Development

Tailored software solutions, RAG pipelines, APIs and integrations – perfectly matched to your requirements.

Innovative Solutions

Together we develop new AI applications for your specific use cases – from idea to production readiness.

Support & Maintenance

Continuous support, updates and optimizations – so your AI infrastructure always runs optimally.

Timo Wevelsiep & Robin Zins

CEOs of WZ-IT

Success Story: From Hardware to Complete Solution

A clinic network purchased the AI Cube Pro for local AI inference. We not only delivered the hardware, but also programmed a custom RAG pipeline that uses BookStack as a knowledge source and is integrated into Open WebUI. The result: employees can access medical protocols and SOPs in seconds – fully GDPR compliant and without cloud.

Let's realize your AI vision together

Software Stack & Compatibility

Ready to Use with Leading Open-Source Frameworks

Pre-installed Software:

Ollama – for simple model management

vLLM – for high-performance inference

Open WebUI – for visual interaction

Docker / Podman – for containerized deployments

REST API Access – for integration

Compatible with:

Llama 3.3

Gemma 3

DeepSeek-R1

Ministral 3

Qwen 3

Phi-4

Custom Models

Ollama

Simple model management with one-command installation. Perfect for rapid prototyping and smaller projects.

$ ollama run llama3.1:70b

vLLM

High-performance inference with PagedAttention for production workloads with high throughput.

$ vllm serve llama3.1:70b

Performance Benchmarks

Datacenter Performance for Your Office

Real performance metrics of our AI Cubes with large open-source models – measured in tokens per second at batch size 1

Modell	AI Cube Basic RTX PRO 4000 (24 GB)	AI Cube Pro RTX PRO 6000 (96 GB)
GPT-OSS 20B ~20 Milliarden Parameter	50 token/s	200 token/s
GPT-OSS 120B ~120 Milliarden Parameter	— Not enough VRAM	150 token/s

All values were measured with batch size 1 and represent inference speed for interactive use cases. Actual performance may vary depending on model configuration and prompt length. Higher batch sizes increase throughput for parallel requests.

Technical Specifications

More technical details on request

Komponente	AI Cube Basic	AI Cube Pro
Graphics Card	NVIDIA RTX PRO 4000 Blackwell (24 GB GDDR7)	NVIDIA RTX PRO 6000 Blackwell (96 GB GDDR7)
Network	1 GbE (10 GbE optional)	1 GbE (10 GbE optional)
Dimensions & Weight	292×185×372 mm (H×W×D), approx. 8 kg	292×185×372 mm (H×W×D), approx. 8 kg
Certification	CE, RoHS, GDPR-compliant	CE, RoHS, GDPR-compliant
Security	Secure Boot, TPM 2.0, WireGuard VPN	Secure Boot, TPM 2.0, WireGuard VPN

AI Cubes (Purchase) vs Managed AI Server (Rental)

Find the Right Model for Your Business

AI Cubes – Purchase

Complete hardware ownership
CapEx: One-time investment from €4,299.90
Full data sovereignty – hardware stays with you
No recurring fees (except optional support)
Ideal for long-term projects

Managed AI Server – Rental

OpEx: Monthly payment from €499/month
Fast start without capital commitment
24/7 monitoring & maintenance included
Scalable: upgrade or downgrade anytime
Ideal for flexible or experimental projects

View Managed AI Servers

Why AI Cube?

All benefits at a glance

On-Prem LLM Hosting vs. Cloud API: Costs & Risks

Cloud-based LLM APIs like OpenAI, Anthropic, or Google Gemini are convenient – but expensive and risky. At high volumes, costs can quickly spiral out of control: 1 million tokens per day via cloud APIs can easily cost €15,000 per month or more. With an AI Cube, you pay once from €4,299.90 and run unlimited inferences – no token fees, no monthly bills.

Additionally, on-premise LLM hosting gives you full control over your data. Sensitive information – customer data, internal documents, proprietary content – never leaves your corporate network. You're independent of API downtimes, price increases, or sudden service changes.

How the WZ-IT AI Cube Works

Analysis & Consultation

We jointly evaluate your requirements and use cases. In a free consultation, we determine which hardware configuration is optimal for your models and use cases.

Hardware Selection & Configuration

Based on model size and requirements, we select the appropriate GPU configuration. We fully configure the system and install Ollama, vLLM, Open WebUI, and other software according to your preferences.

Delivery & Setup

The Cube is delivered pre-installed and tested. After plugging it in, it can be operational within minutes. We support you in integrating it into your network.

Operation & Support (Optional)

You operate the Cube independently with full root access – or leave operation, maintenance, and updates to us. We remain your contact for extensions, support, and new requirements.

Typical Use Cases

Enterprises & Government

For sensitive data that cannot go to the cloud. Run internal chatbots, document analysis, or code assistants completely locally and GDPR-compliant.

Development & Research

Test and develop AI applications locally without cloud dependency. Ideal for rapid prototyping, model fine-tuning, and experimental projects.

On-Premise Deployment

Integrate AI capabilities directly into your existing infrastructure. No internet connection required, complete control over your data.

Industry Solutions

AI Cube for Your Industry

Tailored AI solutions for specific requirements

For Law Firms

GDPR-compliant document research, contract analysis and client communication. Attorney-client privilege maintained.

Learn more

For Clinics & Practices

Local AI for patient data, protocol analysis and medical knowledge databases.

Coming soon

For Financial Services

Compliance-conform AI for risk assessment, document analysis and advisory support.

Coming soon

Your industry not listed? We create custom solutions for your requirements.

No Dependencies. No Vendor Lock-in.

With AI Cubes, you retain full decision-making freedom: you can install your own models, migrate existing setups, or integrate software solutions of your choice – without license binding, API constraints, or external control. All components are open-source based and documented.

100% Open Source Stack

Frequently Asked Questions about AI Cube

Answers to the most important questions about your local AI solution

Topics

Hardware & Technology

What is the AI Cube and what is it suitable for?

The AI Cube is a plug-and-play AI hardware for businesses — ideal for running LLMs, transcriptions, or data-intensive workloads locally in your own network, without cloud dependency and fully GDPR-compliant.

What hardware configurations are possible?

We offer standard setups (AI Cube Basic / Pro) as well as custom systems: multi-GPU, large VRAM cards, rack-mount servers, or clusters with NVLink — depending on model size, user count, and workload.

How much power does the AI Cube consume?

The AI Cube Basic requires approx. 150–250W, the Pro approx. 350–450W. Both run on standard 230V and don't require special power supply. Individual builds are assessed separately.

Can I expand or upgrade the AI Cube later?

Yes — since you own the hardware, you can replace or expand RAM, storage (NVMe/SSD), or GPU yourself at any time. We're happy to assist if needed — but you have full control over your hardware.

Privacy & Compliance

Does my data really stay in my network?

Yes — the AI Cube runs entirely locally. There's no communication with external cloud servers, no data transfer outside your network. This ensures maximum data sovereignty and GDPR compliance.

How do I ensure the AI Cube is GDPR-compliant?

The AI Cube stores data exclusively locally. With TPM 2.0, Secure Boot, and optionally encrypted SSD/NVMe, we ensure maximum protection. For sensitive data, we recommend encrypted filesystem and restrictive access control.

Delivery & Service

Is the AI Cube delivered pre-configured and ready to use?

Yes — on request, we deliver the AI Cube as plug-and-play: with pre-installed software, GPU drivers, and basic configuration. After powering on, you can start working with AI models immediately — no complex setup required.

Do you also deliver personally and handle the setup?

Yes — for AI Cube Pro, we offer personal delivery and professional on-site setup in Germany and the Netherlands. For enterprise customers, this service is available Europe-wide.

How does the personal delivery & setup work?

Our technician delivers the AI Cube, connects it to power and network, and configures VPN/firewall on request. This is followed by a functional test and optional onboarding. We also offer training and documentation.

Software & Usage

What software and models can I use on the AI Cube?

The AI Cube supports common open-source frameworks and models — e.g., Llama, Mistral, Qwen, Gemma, DeepSeek, multimodal and transcription models. The pre-installed environment allows quick start.

Can I run multiple AI models simultaneously?

Yes — depending on hardware configuration, multiple models can run in parallel. For intensive or parallel use, we recommend more powerful or customized hardware configurations.

What practical use cases does the AI Cube support?

Beyond chatbots and RAG systems: audio/video transcription, document indexing, data processing, code assistance, automation of internal processes — ideal for privacy-critical or compliance-relevant scenarios.

Costs & Economics

What does the AI Cube cost and how does the investment pay off?

The entry configuration (AI Cube Basic) starts at approx. €4,299.90 (excl. VAT). Compared to cloud solutions, you save long-term — no ongoing token or API costs, no vendor lock-in.

When is an AI Cube more worthwhile than cloud offerings?

When data privacy, control, consistent performance, and long-term planning are important — e.g., with sensitive data, compliance requirements, or frequent AI use.

Can I switch from a cloud-based solution to the AI Cube?

Yes. We support migration: data and model transfer, re-setup on your on-prem system — without external dependency.

What's the difference from rented AI servers?

The AI Cube is owned by your company (one-time payment from €4,299.90 excl. VAT), while our AI servers are rented (from €499/month excl. VAT with managed service). The Cube is suitable for long-term planning, rented servers for flexible projects.

Maintenance & Support

How much effort is maintenance?

Our pre-configured models are designed to be low-maintenance. If needed, we offer managed service: regular security patches, monitoring, updates — keeping your infrastructure stable and secure.

Can the AI Cube be integrated into existing networks?

Yes — the AI Cube is compatible with common corporate networks. On request, we configure VPN, firewall, and connectivity so the Cube integrates securely and seamlessly.

What service and support options do you offer?

In addition to hardware, we optionally offer managed service, maintenance, updates, monitoring, and support — especially for enterprise customers. Hardware, software, and support from a single source.

What happens if hardware fails?

On request, we provide a backup concept: regular snapshots, redundant or external storage options, remote backup — keeping you protected even in case of hardware failure.

Regions & Reseller

Where do you deliver and provide support?

We deliver Europe-wide — with special focus on Germany, the Ruhr area, and the Netherlands. This means short delivery times, regional service, and direct support.

Where are the AI Cubes manufactured?

Our AI Cubes are custom-built in our workshop in Dortmund. Each AI Cube is an individual configuration optimized for hardware and use case.

Can I offer the AI Cube as a reseller or white-label?

Yes — we offer a reseller program with attractive purchasing conditions, technical support, and optional white-label license. Ideal for system integrators and IT service providers.

Let's Talk About Your Idea

Whether a specific IT challenge or just an idea – we look forward to the exchange. In a brief conversation, we'll evaluate together if and how your project fits with WZ-IT.

E-Mail

[email protected]

Trusted by leading companies

Timo Wevelsiep & Robin Zins

CEOs of WZ-IT

1/3 – Topic Selection33%

What is your inquiry about?

Select one or more areas where we can support you.

The local plug-and-play AI solution for businesses

Why Local AI Infrastructure?

Data Sovereignty

Full Control

Performance

Cost Efficiency

Ownership vs. Rental

Optional Managed Service

Cloud vs. On-Premises: When Does AI Cube Pay Off?

OpenAI GPT-5 mini

AI Cube Pro

Local GPT with our AI Cube

ChatGPT-like Interface

Completely Local

Multi-Model Support

No Token Fees

Open WebUI can be pre-installed and delivered ready to use upon request. Simply plug in, power on, and immediately interact with your local AI models – like ChatGPT, but completely under your control.

How Our Customers Successfully Use the AI Cube

Case Study: Law Firm

!Challenge

✓Solution with AI Cube

→Result

Case Study: Private Clinics Network (Psychiatric Facilities)

!Challenge

✓Solution with AI Cube

→Result

Your clients need AI hardware?

Attractive Purchase Terms

White-Label Option

Pre-installed AI Software

Technical Priority Support

Custom Configurations

Expand Your Service Portfolio

Become a Reseller Partner

On-Site Service for Maximum Security & Comfort

Secure Delivery

Physical Installation

Initial Setup

Validation & Acceptance

All-Inclusive Package

We have replaced the Ada Generation!

Hardware Options for Your AI Projects

AI Cube Basic

AI Cube Pro

AI Cube Custom

Included in Delivery

Manage Your Stack in the Customer Portal

How fast is the AI Cube?

Token Speed Simulator

Upgrade & Trade-In – When Your AI Cube Needs to Grow

How it works

Your AI Cube & WZ-ITPossibilities are endless together

Infrastructure Setup

Custom Development

Innovative Solutions

Support & Maintenance

Success Story: From Hardware to Complete Solution

Software Stack & Compatibility

Pre-installed Software:

Compatible with:

Ollama

vLLM

Datacenter Performance for Your Office

Technical Specifications

AI Cubes (Purchase) vs Managed AI Server (Rental)

AI Cubes – Purchase

Managed AI Server – Rental

Why AI Cube?

On-Prem LLM Hosting vs. Cloud API: Costs & Risks

How the WZ-IT AI Cube Works

Analysis & Consultation

Hardware Selection & Configuration

Delivery & Setup

Operation & Support (Optional)

Typical Use Cases

Enterprises & Government

Development & Research

On-Premise Deployment

AI Cube for Your Industry

For Law Firms

Your AI Cube & WZ-IT
Possibilities are endless together