25.06.2026
Local AI for Law Firms: §43e BRAO, §203 StGB and the RAG Path
Summarising client emails, drafting briefs, researching precedent - AI can save a lot of time in a law firm. The problem is not the AI...
The WZ-IT AI Cube brings ChatGPT-like AI, local models and internal knowledge search into your company - ready to use, without cloud lock-in and with support by WZ-IT. Plug it in, open it in the browser and work with your own AI.
Ready to use with Open WebUI, vLLM/Ollama and local models
Local data processing inside your own network
Owned hardware instead of external API dependency
Remote commissioning & support in German and English
Cloud services offer convenience - but also dependency. With an AI Cube, you retain full control over your data, models, and systems. Whether chatbots, RAG systems, or internal AI automation: Your sensitive data stays within your company, while computing power is directly on-site.
The AI Cube is hardware owned by your company - without external token limits and without vendor lock-in. You decide which models run, which data is connected and whether WZ-IT optionally operates the system for you.
Your models and data never leave your corporate network. Complete control over sensitive information.
No API limits, no external updates, no restrictions. You decide every aspect of your AI infrastructure.
Minimal latency through local inference. No delays from cloud connections.
No external token or pay-per-use fees. Local operating costs stay predictable.
The AI Cube is fully yours. No cloud subscription, no external API dependency.
If desired, we handle operation, maintenance, and updates - you focus on your projects.
The entry price is clear. Whether local AI pays off depends on usage, privacy requirements, model size and operating model.
Pay-per-use / Cloud
Good for a fast start. Critical with sensitive data, high volume or the need for own control.
On-Premises
128 GB unified memory • Open WebUI • Ollama/vLLM • optional RAG
Conservative interactive guidance: around 80-90 tok/s. Long context can reduce visible output to roughly 60-80 tok/s.
Conservative interactive guidance: around 35-60 tok/s. Under concurrency aggregate throughput rises significantly, but each single response remains workload-dependent.
Based on publicly available DGX Spark / GB10 benchmarks with vLLM, SGLang and llama.cpp. Final values are validated with the customer target model, context length and RAG setup.
Use Open WebUI for a ChatGPT-like experience - completely local on your own hardware
The AI Cube can be delivered with Open WebUI based on customer requirements - an intuitive, user-friendly interface that enables a local ChatGPT-like experience. No cloud dependency, no API keys, no token limits - just you and your AI models.
Familiar and intuitive user interface for natural conversations with your local AI models
All data and conversations stay on your hardware - no connection to external servers required
Switch seamlessly between different AI models within the same interface
Unlimited usage without pay-per-use fees or monthly API costs
Examples of requirements where locally operated AI makes sense: independent, secure and controllable.
RAG-based Document Research
A medium-sized law firm with numerous mandates and a large file archive found that research for precedent cases, briefs, and internal evidence was often very time-consuming - several hours per case. Additionally, sensitive client data was present that should not go to external cloud systems.
Drastically Reduced Research Time
Lawyers can argue and decide faster
Strengthened Knowledge Base
New employees access proven documents much faster
Knowledge Database for Medical Protocols
Healthcare and care facilities need to manage large volumes of protocols, SOPs, training materials and internal reports. Documentation is often distributed and hard to access - especially when teams need reliable information quickly for workflows, quality assurance or internal training.
Drastically Reduced Access Time
Relevant documents are accessed immediately
Strengthened Quality & Compliance
Employees at different locations consistently access the same knowledge pool
As a reseller, you offer local AI solutions - we deliver the hardware and service
Want to not only use local AI solutions yourself, but also resell them to your customers? As a reseller, you receive preferred terms, technical support and fully pre-installed systems with the WZ-IT Local AI Stack.
Direct margin advantages for resellers and integrators.
On request, we deliver the AI Cube completely neutral - ideal for system integrators who want to operate under their own brand.
Ollama, vLLM, Open WebUI - ready to use for your end customers.
Direct contact with us for questions about integration, RAG, models & hardware.
Custom models, RAG pipelines, GPU layouts, and network setups for specific customer requirements.
You can now offer your customers their own local AI solutions - without having to develop hardware yourself.
Contact us for a non-binding conversation about terms, technical details, and your individual requirements.
The AI Cube is prepared by WZ-IT, installed with the Local AI Stack and commissioned remotely. On-site appointments, workshops or deep network integration are scoped per project.
Hardware, operating system, drivers and AI stack are prepared before handover.
We support the initial setup inside the customer network remotely and document the key steps.
Operating system, GPU drivers, container environment and security configuration (VPN, firewall, backup)
Performance test, stability check and GDPR compliance review before commissioning
For standard and custom builds
Our on-site service ensures that your AI Cube runs optimally from the start - without you having to worry about installation or configuration.
Perfect for companies that value:
The AI Cube combines validated ASUS/NVIDIA hardware with our open Local AI Stack. For special requirements, we still deliver custom builds with larger GPUs, rackmount or multi-GPU.
From August 2026: EU AI Act high-risk requirements. Local AI infrastructure simplifies compliance.
The WZ-IT AI Cube is the fast entry into local business AI. AI Cube Custom remains available for larger or special requirements.
The WZ-IT AI Cube starts at €5,990.90 excl. VAT including hardware, pre-installed AI stack, initial model setup, remote commissioning and technical onboarding. Custom builds are quoted per project.
ASUS/NVIDIA appliance base
128 GB
Performance
up to 1 PFLOP FP4
CUDA Cores
NVIDIA GB10
Ideal for:
Internal AI assistants, document chat and local LLMs
We validate target model, context length and concurrent users before the project starts.
RTX PRO / multi-GPU / rackmount / special hardware
Configurable
Performance
Configurable
CUDA Cores
Configurable
Ideal for:
Large models, many concurrent users, special requirements
Note on pricing: Listed prices are non-binding reference prices and may change. The specific price depends on your individual configuration, term, and scope of services. For a binding quote, please contact us directly.
As a Managed Service customer at WZ-IT, you have access to our exclusive portal: Monitor your infrastructure in real-time, schedule maintenance, request quotes, and get direct support - all in one central location.

Test different token speeds and see the difference
Interactive output for GB10-based AI Cube setups
At 45 tok/s, generating takes:
1.1s
Chat response
(~50 tokens)
3.3s
(~150 tokens)
44.4s
Report
(~2000 tokens)
* Visible chat speed. Concurrent batch workloads can reach much higher aggregate throughput.
Your requirements are increasing - e.g. larger models, more concurrent users or more intensive AI workloads? With our trade-in program, you can easily exchange your existing AI Cube for a more powerful model - e.g. from Pro to Custom.
Upgrade affordably
No complete new purchase - credit towards your new system
Planning security
Start small and upgrade as needed
Sustainable & secure
Secure data deletion and environmentally friendly recycling
Express interest
Contact us
Evaluation
We assess your device and determine a fair residual value
Receive credit
Credit towards your new AI Cube or AI Cube Custom
With us you get not only powerful hardware, but also a competent partner for your entire AI infrastructure
From planning to implementation - we build your complete AI infrastructure and integrate the AI Cube seamlessly.
Tailored software solutions, RAG pipelines, APIs and integrations - perfectly matched to your requirements.
Together we develop new AI applications for your specific use cases - from idea to production readiness.
Continuous support, updates and optimizations - so your AI infrastructure always runs optimally.
Timo Wevelsiep & Robin Zins
CEOs of WZ-IT
A typical project starts with the AI Cube as local AI infrastructure and grows into a domain solution: RAG pipeline, knowledge source, Open WebUI and operations are adapted to the concrete use case. This turns the box into a productive AI platform without sending sensitive data to external cloud services.
Let's realize your AI vision together
Ready to Use with Leading Open-Source Frameworks

Simple model management with one-command installation. Perfect for rapid prototyping and smaller projects.
$ ollama run qwen3.5:122b
High-performance inference with PagedAttention for production workloads with high throughput.
$ vllm serve gpt-oss-120b --quantization nvfp4Model, quantization, context length, concurrent users and RAG setup determine which hardware makes sense. That is why we do not publish one-size-fits-all token/s promises as standard performance, but validate your target workload before quoting.
For internal assistants, document chat, initial RAG systems and local LLM usage.
For large models, many concurrent users, special network or rack requirements.
We benchmark relevant models with your target setup and document realistic performance.
More technical details on request
| Komponente | WZ-IT AI Cube |
|---|---|
| Graphics Card | ASUS/NVIDIA appliance base with NVIDIA GB10 class hardware and 128 GB unified memory |
| Network | Standard networking, extended connectivity per project |
| Dimensions & Weight | Compact appliance form factor, depending on hardware configuration |
| Certification | CE, RoHS, GDPR-compliant |
| Security | Secure Boot, TPM 2.0, WireGuard VPN |
Find the Right Model for Your Business
All benefits at a glance
Cloud-based LLM APIs like OpenAI, Anthropic, or Google Gemini are convenient - but expensive and risky. At high volumes, costs can quickly spiral out of control. With an AI Cube, you run local inference in your own network - without external token dependency and without a monthly API bill per request.
Additionally, on-premise LLM hosting gives you full control over your data. Sensitive information - customer data, internal documents, proprietary content - never leaves your corporate network. You're independent of API downtimes, price increases, or sudden service changes.
We jointly evaluate your requirements and use cases. In a free consultation, we determine which hardware configuration is optimal for your models and use cases.
Based on model size and requirements, we select the appropriate GPU configuration. We fully configure the system and install Ollama, vLLM, Open WebUI, and other software according to your preferences.
The Cube is delivered pre-installed and tested. After plugging it in, it can be operational within minutes. We support you in integrating it into your network.
You operate the Cube independently with full root access - or leave operation, maintenance, and updates to us. We remain your contact for extensions, support, and new requirements.
For sensitive data that cannot go to the cloud. Run internal chatbots, document analysis, or code assistants completely locally and GDPR-compliant.
Test and develop AI applications locally without cloud dependency. Ideal for rapid prototyping, model fine-tuning, and experimental projects.
Integrate AI capabilities directly into your existing infrastructure. No internet connection required, complete control over your data.
Tailored AI solutions for specific requirements
GDPR-compliant document research, contract analysis and client communication. Attorney-client privilege maintained.
Local AI for patient data, protocol analysis and medical knowledge databases.
Compliance-conform AI for risk assessment, document analysis and advisory support.
Your industry not listed? We create custom solutions for your requirements.
With AI Cubes, you retain full decision-making freedom: you can install your own models, migrate existing setups, or integrate software solutions of your choice - without license binding, API constraints, or external control. All components are open-source based and documented.
Answers to the most important questions about your local AI solution
Topics
The AI Cube is a plug-and-play AI hardware for businesses - ideal for running LLMs, transcriptions, or data-intensive workloads locally in your own network, without cloud dependency and fully GDPR-compliant.
The WZ-IT AI Cube is our standard product based on validated ASUS/NVIDIA appliance hardware. For larger models, many concurrent users or special infrastructure requirements, we design AI Cube Custom builds with dedicated NVIDIA GPUs, multi-GPU, rackmount or custom networking.
Power consumption depends on the appliance or custom setup and the actual workload. The standard AI Cube is designed as a compact local AI box for office and enterprise environments; larger custom systems are assessed for power, cooling and site requirements in advance.
Yes - since you own the hardware, you can replace or expand RAM, storage (NVMe/SSD), or GPU yourself at any time. We're happy to assist if needed - but you have full control over your hardware.
Yes - the AI Cube runs entirely locally. There's no communication with external cloud servers, no data transfer outside your network. This ensures maximum data sovereignty and GDPR compliance.
The AI Cube stores data exclusively locally. With TPM 2.0, Secure Boot, and optionally encrypted SSD/NVMe, we ensure maximum protection. For sensitive data, we recommend encrypted filesystem and restrictive access control.
Yes - on request, we deliver the AI Cube as plug-and-play: with pre-installed software, GPU drivers, and basic configuration. After powering on, you can start working with AI models immediately - no complex setup required.
Yes. We prepare the AI Cube, install the local AI stack and handle remote commissioning. On-site appointments, training or integration workshops can be added project by project.
The standard scope includes technical preparation, the pre-installed AI stack, initial model setup, remote commissioning and technical onboarding. RAG, SSO, monitoring or managed service can be added optionally.
The AI Cube is prepared with an open local AI stack: Open WebUI as the user interface, Ollama and/or vLLM for local inference, suitable open-source models based on your use case and optional RAG with Qdrant or pgvector. We validate the model choice against your workload.
Yes - depending on hardware configuration, multiple models can run in parallel. For intensive or parallel use, we recommend more powerful or customized hardware configurations.
Beyond chatbots and RAG systems: audio/video transcription, document indexing, data processing, code assistance, automation of internal processes - ideal for privacy-critical or compliance-relevant scenarios.
The AI Cube is quoted per project. Price, delivery scope and service level depend on GPU, memory, form factor, software stack and operating model. You receive a binding quote on request. The AI Cube becomes attractive especially for sensitive data, predictable workloads and long-term use: no external token dependency, full control over data and hardware.
When data privacy, control, consistent performance, and long-term planning are important - e.g., with sensitive data, compliance requirements, or frequent AI use.
Yes. We support migration: data and model transfer, re-setup on your on-prem system - without external dependency.
The AI Cube is purchased and owned by your company, while our AI servers are rented and run as a monthly managed service. The Cube is suitable for long-term planning, local control and fixed sites; rented servers are better for flexible projects or variable workloads.
Our pre-configured models are designed to be low-maintenance. If needed, we offer managed service: regular security patches, monitoring, updates - keeping your infrastructure stable and secure.
Yes - the AI Cube is compatible with common corporate networks. On request, we configure VPN, firewall, and connectivity so the Cube integrates securely and seamlessly.
In addition to hardware, we optionally offer managed service, maintenance, updates, monitoring, and support - especially for enterprise customers. Hardware, software, and support from a single source.
On request, we provide a backup concept: regular snapshots, redundant or external storage options, remote backup - keeping you protected even in case of hardware failure.
We provide remote commissioning and support in German and English. On-site appointments or partner delivery can be planned for suitable projects.
Our AI Cubes are custom-built in our workshop in Dortmund. Each AI Cube is an individual configuration optimized for hardware and use case.
Yes - we offer a reseller program with attractive purchasing conditions, technical support, and optional white-label license. Ideal for system integrators and IT service providers.
More questions? We are happy to help!
Still have questions? Contact us!Discover Our Other AI Services
§203-compliant local AI for law firms, doctors, notaries & more.
Rented GPU servers with complete managed service - price on request
Detailed information about our GPU servers and their applications
Managed hosting for Large Language Models - we operate the infrastructure, you use the models
25.06.2026
Summarising client emails, drafting briefs, researching precedent - AI can save a lot of time in a law firm. The problem is not the AI...
10.05.2026
Three unauthenticated API calls. No login, no exploit framework, no privilege escalation. Three POST requests to a default port, and the machine's memory is on...
24.11.2025
With GPT-OSS 120B, OpenAI released their first open-weight model since GPT-2 in August 2025 – and it's impressive. The model achieves near o4-mini performance but...
09.11.2025
In times of rising cloud costs, data sovereignty challenges and vendor lock-in, the topic of local AI inference is becoming increasingly important for companies. With...
08.11.2025
The use of Large Language Models (LLMs) such as GPT-4, Claude or Llama has evolved from experimental applications to mission-critical tools in recent years. However,...
Whether a specific IT challenge or just an idea - we look forward to the exchange. In a brief conversation, we'll evaluate together if and how your project fits with WZ-IT.
Timo Wevelsiep & Robin Zins
Managing Directors of WZ-IT




