20260702 KI unter Kontrolle

AI under control

Many of our articles have criticized the unpredictability of AI systems and accused those responsible of a lack of transparency. Of course, we want to go even further and call for democratic oversight of such companies and a regulatory body that acts in the public interest to determine which AI applications — not least because of these systems’ energy consumption — should be used at all. But let’s start small. It would already be a step in the right direction if we could verify whether an AI system (at least) adheres to its own specifications (terms of service, task descriptions, etc.).

That is exactly what the open-source software Praxen is designed to do: "it checks whether an AI agent does what it claims to do."

Praxen analyzes the policy specified for the AI system and then examines how the system behaves during operation. As a result, Praxen lists where and what differences exist between the stated requirements and reality. In doing so, Praxen draws on standard procedures used in project management.

Who has which tasks in the project/system?

Who has which roles?

What rights is an actor or subcomponent permitted to exercise?

Where were there instances of exceeding authority or a lack of authority?

Which actions were carried out on the edge of the regulations or in violation of them?

Praxen examines all of these questions using the Live AI System and compiles its findings in detailed log files. In doing so, Praxen is not operating in a vacuum but is building upon the existing — albeit sometimes flawed — guidelines for AI systems. For Praxen, these currently include the OWASP Top 10 for LLM Applications 2025, the OWASP Top 10 for Agentic AI Applications 2026, the OWASP Secure MCP Server Development Guide 2026, and the RAISE Framework.

While we certainly cannot expect miracles from Praxen, we at least have clues as to where expectations and reality diverge, and can then continue our search in that direction as human beings. Analyzing the results still requires a fair amount of tact and judgment. In this way, Praxen evens out errors when, in repeated similar situations, the results swing one way or the other. The developers aim to improve this: "The goal is to make them visible, measurable, and recoverable so users can trust the results they receive."
Praxen is a start, a first step, but by no means a substitute for democratic oversight of Big Tech.