PandaProbe is an open-source agent engineering platform that gives you deep observability into AI agent applications. Use it to trace, evaluate, monitor and debug your AI agents in development and production.
Building AI agents is getting easier, but understanding and trusting them in production is still hard.
Once agents start calling LLMs, tools, APIs, MCPs, and sub-agents, logs aren’t enough anymore. You need to see what happened, why it failed, whether quality regressed, and how reliable the system is across full sessions.
PandaProbe is my attempt to solve this: an open-source agent engineering platform for tracing, evaluation, monitoring, and debugging AI agent applications.
The goal is simple: help developers move from “it works on my laptop” to “I understand production behavior, can measure quality, and continuously improve it.”
What PandaProbe provides
🔎 Trace — capture full agent executions as sessions, traces, and spans across LLMs, tools, agents, and custom logic. 📊 Evaluate — score traces and sessions using mission-critical, agent-specific metrics. ⏱️ Monitor — schedule recurring evaluations to automatically validate new traces and sessions in production. 📈 Analytics — track performance, cost, latency, errors, and quality trends over time. 🛠️ Open source + cloud — use the open-source core on GitHub or run PandaProbe in the cloud.
Who it’s for
🧑💻 AI engineers — debug agent behavior across LLMs, tools, and workflows. 🏗️ Platform teams — monitor quality, regressions, and reliability in production. 🔬 Builders experimenting with agents — understand failures and iterate faster. 🚀 Startups — add observability and evaluation before things become unmanageable.reason about.
I’ll be here all day answering questions and collecting feedback.
If you’re building agents today, what’s the hardest part to debug or evaluate?
Thanks for checking it out 🙏 — Sina
About PandaProbe on Product Hunt
“open source agent engineering platform”
PandaProbe launched on Product Hunt on May 3rd, 2026 and earned 378 upvotes and 25 comments, earning #2 Product of the Day. PandaProbe is an open-source agent engineering platform that gives you deep observability into AI agent applications. Use it to trace, evaluate, monitor and debug your AI agents in development and production.
On the analytics side, PandaProbe competes within Open Source, Developer Tools, Artificial Intelligence and GitHub — topics that collectively have 1.1M followers on Product Hunt. The dashboard above tracks how PandaProbe performed against the three products that launched closest to it on the same day.
Who hunted PandaProbe?
PandaProbe was hunted by Sina Tayebati. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.
For a complete overview of PandaProbe including community comment highlights and product details, visit the product overview.
👋 Hey Product Hunt!
I’m Sina, founder of PandaProbe.
Building AI agents is getting easier, but understanding and trusting them in production is still hard.
Once agents start calling LLMs, tools, APIs, MCPs, and sub-agents, logs aren’t enough anymore. You need to see what happened, why it failed, whether quality regressed, and how reliable the system is across full sessions.
PandaProbe is my attempt to solve this: an open-source agent engineering platform for tracing, evaluation, monitoring, and debugging AI agent applications.
The goal is simple: help developers move from “it works on my laptop” to “I understand production behavior, can measure quality, and continuously improve it.”
What PandaProbe provides
🔎 Trace — capture full agent executions as sessions, traces, and spans across LLMs, tools, agents, and custom logic.
📊 Evaluate — score traces and sessions using mission-critical, agent-specific metrics.
⏱️ Monitor — schedule recurring evaluations to automatically validate new traces and sessions in production.
📈 Analytics — track performance, cost, latency, errors, and quality trends over time.
🛠️ Open source + cloud — use the open-source core on GitHub or run PandaProbe in the cloud.
Who it’s for
🧑💻 AI engineers — debug agent behavior across LLMs, tools, and workflows.
🏗️ Platform teams — monitor quality, regressions, and reliability in production.
🔬 Builders experimenting with agents — understand failures and iterate faster.
🚀 Startups — add observability and evaluation before things become unmanageable.reason about.
Quick links
GitHub: https://github.com/chirpz-ai/pandaprobe
Docs: https://docs.pandaprobe.com
Cloud: https://www.pandaprobe.com/
I’ll be here all day answering questions and collecting feedback.
If you’re building agents today, what’s the hardest part to debug or evaluate?
Thanks for checking it out 🙏
— Sina