Self-Hosted Langfuse: LLM Observability and AI Logging for the EU AI Act

Editorial note: The information in this article was compiled to the best of our knowledge at the time of publication. Technical details, prices, versions, licensing terms, and external content may change. Please verify the information provided independently, particularly before making business-critical or security-related decisions. This article does not replace individual professional, legal, or tax advice.

Have your sovereign LLM stack operated - WZ-IT plans, hosts, and operates AI infrastructure in German data centers: local models, LLM gateway, and observability including monitoring, patch management, and CVE monitoring. Schedule a free consultation
Anyone bringing AI into production business processes quickly faces an uncomfortable question: what is actually happening in there? Which prompt went to which model, why did this answer come about, what did the call cost, and where in a multi-step agent chain did something go wrong? Without observability, an LLM application is a black box. And a black box can neither be reliably improved nor cleanly documented.
This is exactly the gap Langfuse fills: currently the most widely used open-source platform for LLM observability. It captures every model call as a traceable record, enables systematic quality evaluation, and can be run entirely self-hosted. That makes it a central building block for companies that do not just want to experiment with AI but operate it responsibly and accountably, especially with the EU AI Act in mind.
This article clarifies what LLM observability delivers, what the EU AI Act actually requires regarding logging (and what it does not), and why self-hosted Langfuse is the obvious choice for sovereignty-minded companies.
Table of Contents
- What LLM Observability Is and Why AI Logging Is Different
- What the EU AI Act Actually Requires
- The Timeline Is Shifting: Digital Omnibus
- Langfuse at a Glance
- Self-Hosting: Architecture and Operations
- Langfuse vs. LangSmith, Helicone and Phoenix
- Observability as a Lever for Compliance and Cost
- Do Not Forget Security and Updates
- Our Approach at WZ-IT
- Further Reading
- Sources
What LLM Observability Is and Why AI Logging Is Different
Classic application monitoring asks: is the service running, what is the latency, how many errors are there? For AI applications that is not enough. An LLM response can be technically successful (HTTP 200) and still be factually wrong, misleading, or expensive. The real value lies in the content of the call, not just its status code.
LLM observability therefore captures the functional layer of every call as a trace:
- Which prompt was sent, with which system prompt and which context?
- Which model and version answered?
- How many tokens were consumed and what did the call cost?
- How long did processing take?
- For multi-step chains and agents: which step did what, and where did it break?
From these traces you can build evaluations (evals), for example to measure response quality over time, detect regressions after a model change, or attribute cost per team and feature. This turns the vague feeling that "the AI has somehow gotten worse" into a provable statement. This traceability is not only a quality matter but increasingly a regulatory one.
What the EU AI Act Actually Requires
Precision matters here, because many oversimplified statements circulate around AI logging. Regulation (EU) 2024/1689, the EU AI Act, requires concrete traceability obligations for high-risk AI systems under Annex III:
- Article 12 mandates the automatic recording of events (logs) over the entire lifecycle of the system. The goal is traceability appropriate to the intended purpose (Art. 12).
- Article 13 requires operation to be sufficiently transparent so that deployers can interpret the results and use them appropriately (Art. 13).
- Article 19 stipulates that providers retain the automatically generated logs for a period appropriate to the intended purpose of at least six months, unless other law provides otherwise (Art. 19).
The fines are substantial: up to 35 million euros or 7 percent of worldwide annual turnover for prohibited practices, and up to 15 million euros or 3 percent for breaches of the other obligations, which include the high-risk requirements (Art. 99).
An honest framing is important: these obligations concern the provider's high-risk system, not automatically every single interaction with a language model. An LLM is not high-risk in itself. The logging obligations only apply once it is used in an Annex III use case, for example in recruitment or credit scoring. LLM observability is therefore a practical tool for implementing Articles 12 and 19, not a law prescribing a particular trace format per API call. Communicating this clearly keeps you credible.
The Timeline Is Shifting: Digital Omnibus
The EU AI Act originally provided that the high-risk requirements for Annex III systems would become applicable on 2 August 2026. But that very deadline is in motion.
In November 2025 the European Commission presented the so-called Digital Omnibus, a simplification package that, among other things, proposes postponing the high-risk deadlines. On 7 May 2026, the Council and Parliament reached a provisional political agreement on it: applicability for Annex III standalone systems is to be postponed to 2 December 2027, and for high-risk AI embedded in products to 2 August 2028 (Council of the EU, 07.05.2026).
As of May 2026, however, this postponement is not yet final: it is a provisional agreement that still has to be formally adopted and published in the Official Journal. Formal adoption is expected before August 2026. What already applies: the obligations for general-purpose AI models have been in force since 2 August 2025, and the Commission's enforcement powers take effect from 2 August 2026.
For companies this does not mean "wait and see" but the opposite: whether the deadline is 2026 or 2027, the obligations themselves do not disappear. Anyone building the traceability of their AI systems now is prepared regardless of the final date. A detailed breakdown of the high-risk obligations is in our article on the EU AI Act from August 2026.
Langfuse at a Glance
Langfuse is a platform for observability, tracing, and evaluation of LLM applications. It integrates with common frameworks and LLM gateways and collects every call as a structured trace. The feature set covers tracing, prompt management, datasets, a playground, and systematic evaluations.
The decisive point for companies is the licensing model. The core of Langfuse is MIT-licensed and can be self-hosted without usage limits. Tracing, evals, prompt management, playground, and datasets are free to use, as is basic single sign-on via SAML or OIDC (Langfuse license). Commercially licensed, by contrast, are mainly SCIM user provisioning, project-level RBAC roles, audit logs, data retention policies, and server-side data masking. So if you want to run the platform and analyze traces, you do not need a commercial license.
One piece of news drew attention in the market: ClickHouse acquired Langfuse on 16 January 2026, as part of a Series D round (Langfuse blog). According to the official statement, nothing changes about licensing, pricing, or the self-hosting option. That is plausible because ClickHouse is the analytical database Langfuse builds its trace analysis on anyway. For self-hosters the acquisition initially means no change; the MIT core remains.
One detail for sovereignty-minded companies: the hosted Langfuse Cloud does offer an EU region, but it is located in Ireland (AWS eu-west-1), not in Germany. Anyone who wants full data sovereignty cannot avoid self-hosting, which is the core of our recommendation anyway.
Self-Hosting: Architecture and Operations
Self-hosted Langfuse is not a single-container project. The current self-hostable version 3 consists of several services (self-hosting docs):
| Component | Role |
|---|---|
| PostgreSQL | transactional data (users, projects, configuration) |
| ClickHouse | analytical trace, observation, and score data |
| Redis / Valkey | queue and cache |
| S3-compatible object store | events, multimodal inputs, exports |
| Web and worker containers | the actual application |
This architecture is powerful, but it wants to be operated cleanly. ClickHouse is the main cost driver and needs well-thought-out storage and backup design. An important note on version planning: there is already a version 4, but as of May 2026 it is only available as a preview in Langfuse Cloud and not yet self-hostable. Anyone who wants to self-host uses the stable 3.x line. This distinction matters so that no false expectations arise in operation.
This is exactly where it is decided whether self-hosting becomes relief or a permanent construction site. Four data services, backups, updates, and monitoring are not a side project for half an admin. We operate stacks like this as part of our managed AI services and on LLM hosting infrastructure in German data centers.
Langfuse vs. LangSmith, Helicone and Phoenix
LLM observability is a young but contested market. The most important distinction for companies with a sovereignty requirement is not the feature set but the license and self-hosting capability.
| Platform | License | Self-Hosting | Assessment |
|---|---|---|---|
| Langfuse | MIT (core) | full, no limits | self-hosting leader, broad integrations |
| LangSmith | proprietary | only in Enterprise plan | deepest LangChain integration, but closed source |
| Helicone | Apache 2.0 | yes | lean proxy approach, easy entry |
| Arize Phoenix | Elastic License v2 | restricted | not OSI-certified, managed-service clause |
Two points are decisive here. LangSmith by LangChain is proprietary; self-hosting is only available in the Enterprise plan. The LangChain framework itself is open source, but that is not to be confused with the observability platform LangSmith. And Arize Phoenix is under the Elastic License v2, which is source-available but not an OSI-recognized open-source license and restricts offering it as a hosted service. Anyone who wants real license freedom and full data control ends up with Langfuse (MIT) or Helicone (Apache 2.0).
Observability as a Lever for Compliance and Cost
The EU AI Act aspect is only one side. In practice, observability pays off twice.
Cost control. As soon as several teams work against several models, the monthly token bill quickly becomes opaque. Combined with an LLM gateway like LiteLLM, costs can be attributed precisely per team, per feature, and per model. That is the basis for budgets, rate limiting, and well-founded model decisions, for example when a smaller local model can replace an expensive cloud model.
Quality and evidence. Evals make response quality measurable instead of leaving it to guesswork. And the traces themselves are the data basis that makes traceability under Article 12 practically implementable in the first place, without retention becoming a mere box-ticking exercise. Anyone pursuing GDPR-compliant AI consulting builds the technical foundation for it with tracing and logging.
The approach becomes particularly strong in combination: a local model via Ollama or vLLM, a LiteLLM gateway in front of it, and Langfuse alongside for observability. Add a RAG knowledge base and you get a fully sovereign AI stack on your own infrastructure where no data leaves the house.
Do Not Forget Security and Updates
Self-hosting also means responsibility. Langfuse is no exception: in spring 2026 the vulnerability CVE-2026-41487 (GHSA-2524-j966-gfgh) became known, in which a user with a restricted member role could, under certain conditions, read out a stored LLM provider API key. The flaw was rated low severity and is fixed in self-hostable versions from 3.167.0 onward.
That is not an argument against Langfuse but for disciplined update management. This is exactly where the difference between "quickly set up" and "cleanly operated" lies. Anyone running an LLM platform in production needs ongoing CVE monitoring and a clear patch process. The same pattern shows up across the self-hosted AI stack, as the Ollama vulnerability Bleeding Llama demonstrated.
Our Approach at WZ-IT
We treat observability not as an add-on but as a fixed part of responsible AI operations.
-
Architecture instead of gut feeling. We first clarify whether a use case even falls under the high-risk logic of the EU AI Act and what traceability is needed both functionally and legally. From that we derive how much observability makes sense.
-
Sovereign stack. Langfuse for observability, a LiteLLM gateway for cost control and model routing, local models as the backend. All self-hosted on infrastructure in German data centers, with no data flowing to third parties.
-
Clean operations. Backups for PostgreSQL and ClickHouse, monitoring, update and patch management including CVE tracking. Exactly the points where self-hosted platforms otherwise become a permanent construction site.
-
Documented for compliance. Logging and retention set up so that they technically support traceability under Article 12 and retention under Article 19, coordinated with your data protection and legal counsel.
Whether as pure LLM hosting infrastructure or as fully managed AI operations: the stack runs on European infrastructure, and operations stay in European hands.
Further Reading
- EU AI Act from August 2026: What companies with high-risk AI must do - the regulatory framework in detail
- CVE monitoring for self-hosted software - why patch management is mandatory
- Bleeding Llama: securing the Ollama vulnerability CVE-2026-7482 - the same pattern in the AI stack
- Ollama vs. vLLM: comparison for self-hosted LLMs - the model layer behind the gateway
- Langfuse expertise at WZ-IT - consulting, setup, and operations
- LLM hosting at WZ-IT - sovereign AI infrastructure
AI in production but nobody is watching? We build your sovereign LLM stack with observability, gateway, and local models - self-hosted, GDPR-compliant, and cleanly operated. Schedule an intro call
As of May 2026. The EU AI Act is an evolving regulation, and the deadlines mentioned in this article may change through ongoing legislative procedures. This article is not legal advice. For specific compliance questions, consult data protection and legal counsel.
Sources
- EU AI Act, Article 12 - record-keeping (logs)
- EU AI Act, Article 13 - transparency and provision of information
- EU AI Act, Article 19 - retention of automatically generated logs
- EU AI Act, Article 99 - penalties
- Council of the EU: agreement to simplify the AI rules (07.05.2026)
- European Commission: Digital Omnibus on AI
- Langfuse: open source and licensing
- Langfuse: self-hosting architecture
- Langfuse: Joining ClickHouse (16.01.2026)
- GitHub Security Advisory GHSA-2524-j966-gfgh (CVE-2026-41487)
Frequently Asked Questions
Answers to important questions about this topic
LLM observability makes visible what happens inside an AI application: which prompt went to which model, what response came back, how long it took, what the call cost, and where in a multi-step chain an error occurred. Platforms like Langfuse capture this data as traces, enable evaluations (evals), and thereby provide the basis for quality assurance, cost control, and the traceability of AI systems.
For high-risk AI systems under Annex III, yes. Article 12 requires the automatic recording of events (logs) over the lifecycle, and Article 19 requires retention of at least six months unless other law provides otherwise. Important: the obligation applies to the provider's high-risk system, not automatically to every single LLM call. LLM observability is a practical means of implementing this traceability, not a legally prescribed format per call.
The Langfuse core is MIT-licensed and self-hostable without usage limits: tracing, evals, prompt management, playground, and datasets are free to use. Basic SSO via SAML or OIDC is included too. Commercially licensed (Enterprise) features are mainly SCIM user provisioning, project-level RBAC roles, audit logs, data retention policies, and server-side data masking.
The self-hostable version 3 needs four data services: PostgreSQL for transactional data, ClickHouse for analytical trace data, Redis or Valkey as queue and cache, and an S3-compatible object store. On top of that come two application containers (web and worker). This runs entirely on your own infrastructure, for example on a Hetzner server in Germany.
LangSmith by LangChain is a proprietary SaaS solution; self-hosting is only available in the Enterprise plan. Langfuse has an MIT core, is open source, and is self-hostable without license costs. For companies with sovereignty and data protection requirements, the decisive difference is that Langfuse can run entirely in your own data center, without trace data flowing to an external provider.
ClickHouse acquired Langfuse on 16 January 2026. According to the official statement, Langfuse stays open source and self-hostable, with no planned changes to licensing, pricing, or the self-hosting option. ClickHouse is also the database that Langfuse's trace analysis is built on.
Yes. Anyone running Langfuse on their own infrastructure in the EU retains full control over all trace data, including the prompts and responses that may contain personal data. There is no third-country transfer and no data processing agreement with a US provider. This significantly simplifies the data protection assessment compared to a cloud solution. This is not legal advice - consult data protection counsel for specific questions.

Written by
Timo Wevelsiep
Co-Founder & CEO
Co-Founder of WZ-IT. Specialized in cloud infrastructure, open-source platforms and managed services for SMEs and enterprise clients worldwide.
LinkedInLet's Talk About Your Idea
Whether a specific IT challenge or just an idea - we look forward to the exchange. In a brief conversation, we'll evaluate together if and how your project fits with WZ-IT.


Timo Wevelsiep & Robin Zins
Managing Directors of WZ-IT





