Which models can I run on the Basic Cube?

All common open-source models up to about 20B parameters (quantized): Llama 3.1 (7B-20B), Gemma 3, Mistral 7B, Phi-4, DeepSeek-Coder, Qwen and many more. With 4-bit quantization, even larger models fit.

Can I expand the Cube later?

Yes, you can expand RAM, storage, network cards and also replace the GPU. If you need more VRAM, we recommend upgrading to the AI Cube Pro or replacing the GPU with a more powerful model.

Is the Basic sufficient for a RAG system?

Yes, for RAG systems with small to medium document volumes (up to approx. 100,000 documents), the Basic is ideal. For larger data volumes or more complex queries, we recommend the Pro.

Can I run multiple models simultaneously?

Yes, with 24 GB VRAM you can load 2-3 smaller models (7B quantized each) in parallel. Via Ollama or vLLM, multiple models can be accessed simultaneously.

DE EN

[email protected]

DE EN

AI Cube Basic NVIDIA RTX PRO 4000 Blackwell 24 GB VRAM - Lokale KI-Inferenz Server für Unternehmen

Entry Model

GDPR Compliant

NVIDIA RTX Blackwell

MadeinGermany

AI Cube Basic – compact. powerful. local.

Name: AI Cube Basic - NVIDIA RTX PRO 4000 Blackwell
Brand: WZ-IT
SKU: AI-CUBE-BASIC-RTX4000
Price: 4299.90 EUR
Availability: InStock
Rating: 5.0 (8 reviews)

Your entry into local AI inference with NVIDIA RTX PRO 4000 Blackwell – perfect for chatbots, code assistance and models up to 20B parameters.

230 V • 292×185×372 mm • Mini-ITX

View Pro Model

Technical Highlights RTX PRO 4000 Blackwell

Enterprise hardware with 24 GB VRAM in compact Mini-ITX format

NVIDIA RTX PRO 4000 Blackwell

24 GB GDDR7 VRAM

Sufficient for models up to 20B parameters (quantized)

46.9 TFLOPS FP32

8,960 CUDA Cores

Fast real-time inference

Use cases for local AI inference

Internal Chatbots

Run AI assistants for customer service or internal knowledge bases – completely local and GDPR-compliant.

Code Assistance

Use models like Qwen or DeepSeek for code completion, review and documentation – without sending your codebase to the cloud.

Small to Medium Models

Llama 3.1 (7B-13B), Gemma 3, Mistral 7B, Phi-4 and many other models.

Document Analysis

Analyze documents, contracts and reports with AI – completely local and confidential.

RAG Systems

Process knowledge bases with thousands to millions of documents.

Multi-Model Operation

Run multiple models in parallel – depending on hardware configuration.

Recommended for up to 5 concurrent users

Ideal for small teams or individual users with moderate inference requirements

Performance Benchmarks

Datacenter Performance for Your Office

Real-world performance of AI Cube Basic with open-source models

GPT-OSS 20B

~20 Milliarden Parameter

50token/s

Batch Size 1

GPT-OSS 120B

~120 Milliarden Parameter

—

Go to AI Cube Pro

All values were measured with batch size 1 and represent inference speed for interactive use cases. Actual performance may vary depending on model configuration and prompt length. Higher batch sizes increase throughput for parallel requests.

Scalability: With typical usage, up to 5 users can work simultaneously, depending on model size and query complexity.

Local AI Usage

Local GPT with our AI Cube

Use Open WebUI for a ChatGPT-like experience – completely local on your own hardware

Open WebUI Interface - Lokale ChatGPT Alternative für AI Cube Basic mit RTX PRO 4000

The AI Cube can be delivered with Open WebUI based on customer requirements – an intuitive, user-friendly interface that enables a local ChatGPT-like experience. No cloud dependency, no API keys, no token limits – just you and your AI models.

ChatGPT-like Interface

Familiar and intuitive user interface for natural conversations with your local AI models

Completely Local

All data and conversations stay on your hardware – no connection to external servers required

Multi-Model Support

Switch seamlessly between different AI models within the same interface

No Token Fees

Unlimited usage without pay-per-use fees or monthly API costs

Open WebUI can be pre-installed and delivered ready to use upon request. Simply plug in, power on, and immediately interact with your local AI models – like ChatGPT, but completely under your control.

Vorinstalliert

Sofort einsatzbereit

100% lokal

Enterprise & Pro Service

On-Site Service for Maximum Security & Comfort

For our AI Cube Pro customers, we offer personal delivery and professional commissioning in Germany and the Netherlands. For Enterprise customers, this service is available Europe-wide.

Secure Delivery

Directly to your company premises or to your customers – personally

Physical Installation

Professional installation and cabling on-site

Initial Setup

Operating system, GPU drivers, container environment and security configuration (VPN, firewall, backup)

Validation & Acceptance

Performance test, stability check and GDPR compliance review before commissioning

All-Inclusive Package

For Enterprise & Pro Customers

Our on-site service ensures that your AI Cube runs optimally from the start – without you having to worry about installation or configuration.

Perfect for companies that value:

Highest quality standards

Compliance & Data Protection

Clean Integration

AI Cube Pro: DE & NL

Enterprise: Europe-wide

Advantages of local AI inference with AI Cube Basic

Data Sovereignty

All models and data remain in your network. No cloud dependency, no data transmission to third parties.

Cost Efficiency

One-time investment instead of monthly token fees. The acquisition pays for itself within a few months.

Performance

Minimal latency through local inference. No waiting times from cloud connections.

Full Control

Root access, free model choice, no API limits. You decide what runs.

Basic vs. Pro – which model fits?

Compare the two AI Cube models

YOU ARE HERE

AI Cube Basic

Up to 5 concurrent users
NVIDIA RTX PRO 4000 Blackwell
24 GB VRAM
Models up to 20B parameters
Ideal for chatbots & code assistance

From €4,299.90

excl. VAT

ENTERPRISE

AI Cube Pro

Up to 20 concurrent users
NVIDIA RTX PRO 6000 Blackwell
96 GB VRAM
Models up to 120B+ parameters
Ideal for large LLMs, RAG & training

From €13,599.90

excl. VAT

View Pro Model

Case Study: Law Firm

How a law firm uses AI Cube for confidential research

!Challenge

A medium-sized law firm needed an AI solution for internal document research. Sensitive client data could not go to the cloud.

✓Solution with AI Cube Basic

RAG system with Llama 3.1 (13B) for document search
Completely local operation in the firm's network
Integration with Open WebUI for easy use

→Result

80% faster research

Complete data control

ROI within 6 months

Technical Specifications

Graphics Card	NVIDIA RTX PRO 4000 Blackwell (24 GB GDDR7)
Network	1 GbE (10 GbE optional)
Dimensions & Weight	292×185×372 mm (H×W×D), approx. 8 kg
Certification	CE, RoHS, GDPR-compliant
Security	Secure Boot, TPM 2.0, WireGuard VPN

Included in Delivery

Pre-installed Software (Ollama, vLLM, Open WebUI)

Operating System & GPU Drivers

Setup Documentation

Root Access & Full Control

German Support

No recurring costs

Frequently Asked Questions about AI Cube Basic

Ready for your own AI infrastructure?

Get free consultation

Other AI Cube Models

AI Cube Pro

For large LLMs and multi-GPU workloads

Learn more

All AI Cube Models

Overview of all available configurations

View overview

Blog & Tutorials

Manage Your Stack in the Customer Portal

Monitor your infrastructure in real-time, schedule maintenance and get direct support – all in one central portal.

Real-time infrastructure status
Reschedule maintenance windows yourself
View complete access logs
Direct support without detours

Explore Portal

Let's Talk About Your Idea

Whether a specific IT challenge or just an idea – we look forward to the exchange. In a brief conversation, we'll evaluate together if and how your project fits with WZ-IT.

E-Mail

[email protected]

Trusted by leading companies