Product Thumbnail

PandaProbe

open source agent engineering platform

Open Source
Developer Tools
Artificial Intelligence
GitHub
Visit WebsiteSee on Product HuntTwitterGithub

Hunted bySina TayebatiSina Tayebati

PandaProbe is an open-source agent engineering platform that gives you deep observability into AI agent applications. Use it to trace, evaluate, monitor and debug your AI agents in development and production.

Top comment

👋 Hey Product Hunt!

I’m Sina, founder of PandaProbe.

Building AI agents is getting easier, but understanding and trusting them in production is still hard.

Once agents start calling LLMs, tools, APIs, MCPs, and sub-agents, logs aren’t enough anymore. You need to see what happened, why it failed, whether quality regressed, and how reliable the system is across full sessions.

PandaProbe is my attempt to solve this: an open-source agent engineering platform for tracing, evaluation, monitoring, and debugging AI agent applications.

The goal is simple: help developers move from “it works on my laptop” to “I understand production behavior, can measure quality, and continuously improve it.”

What PandaProbe provides

🔎 Trace — capture full agent executions as sessions, traces, and spans across LLMs, tools, agents, and custom logic.
📊 Evaluate — score traces and sessions using mission-critical, agent-specific metrics.
⏱️ Monitor — schedule recurring evaluations to automatically validate new traces and sessions in production.
📈 Analytics — track performance, cost, latency, errors, and quality trends over time.
🛠️ Open source + cloud — use the open-source core on GitHub or run PandaProbe in the cloud.

Who it’s for

🧑‍💻 AI engineers — debug agent behavior across LLMs, tools, and workflows.
🏗️ Platform teams — monitor quality, regressions, and reliability in production.
🔬 Builders experimenting with agents — understand failures and iterate faster.
🚀 Startups — add observability and evaluation before things become unmanageable.reason about.

Quick links

GitHub: https://github.com/chirpz-ai/pandaprobe

Docs: https://docs.pandaprobe.com

Cloud: https://www.pandaprobe.com/

I’ll be here all day answering questions and collecting feedback.

If you’re building agents today, what’s the hardest part to debug or evaluate?

Thanks for checking it out 🙏
— Sina

Comment highlights

Honestly the open source + self hostable combo is what makes this worth a proper look. most observability tools want you locked into their cloud and charging per seat by the time you actually need it. been burned by that before with Datadog at a startup. one instrument() call to trace the whole run is a nice dx too, gonna try this on a side project this week

We use LangGraph for these purposes. How is PandaProbe better and why should we switch to it?

We've been running Langfuse for our agent stack for about six months and the trace UI is decent, but session-level evals across multi-agent runs are still where things get messy. Curious how PandaProbe handles that. If a sub-agent fails three turns deep, do you surface root cause at the session level, or do I still have to walk the span tree manually? Also, what's the storage model look like for self-hosted? Postgres only, or something columnar for the trace volume? One more thing: any plans for OpenTelemetry-native ingestion so I don't have to swap out my existing tracing SDK across services?

We've been running Langfuse for our agent stack for about six months and the trace UI is decent, but session-level evals across multi-agent runs are still where things get messy. Curious how PandaProbe handles that. If a sub-agent fails three turns deep, do you surface root cause at the session level or do I still have to walk the span tree manually? Also, what's the storage model look like for self-hosted? Postgres only, or something columnar for the trace volume? One more thing: any plans for OpenTelemetry-native ingestion so I don't have to swap out my existing tracing SDK across services?

Congrats on launching! How does PandaProbe handle sub-agent calls? Like if agent A spins up agent B, do both get traced under the same session tree

Congratulations on the launch @sina_tayebati
BTW, how well does PandaProbe handle tracking regressions across different agent versions over time?

Quick q, how does PandaProbe’s tracing handle multi-step agent loops where the failure is caused by an earlier decision that only becomes obvious later?

Where does PandaProbe sit relative to LangSmith, Langfuse, and Helicone? They all claim "agent observability" but mean very different things underneath — some are basically prompt loggers, others actually trace tool-call DAGs. Curious which problem you decided was the real one.

Evaluation is the hardest part of this whole space and most platforms hand-wave it. The failure mode that actually bites in production isn't crashes or schema errors. It's slow drift in subjective quality (voice, classification accuracy, output style) that only shows up when a human reads 50 outputs in a row. How does PandaProbe handle that in practice? LLM-as-judge with custom rubrics, human-in-loop on a held-out set, embedding-distance from a golden corpus, or something else? And how do you stop eval cost from outpacing inference cost when you're re-judging every trace?

Handling state and debugging for long-running autonomous agents is usually a nightmare, so having an open-source platform to standardize that workflow is huge. I can definitely see myself using PandaProbe to self-host my agent evaluation pipeline to keep sensitive client data entirely local. I am really curious to hear if you currently support custom tracing for raw API calls instead of just the standard frameworks.

Really nice work. The gap between "it ran" and "I understand what happened" is enormous for agents and nobody's solved it cleanly yet. Rooting for you!

Congrats on another great product going live! does it support MCP tool tracing natively or do you have to instrument those calls manually?

About PandaProbe on Product Hunt

open source agent engineering platform

PandaProbe launched on Product Hunt on May 3rd, 2026 and earned 378 upvotes and 25 comments, earning #2 Product of the Day. PandaProbe is an open-source agent engineering platform that gives you deep observability into AI agent applications. Use it to trace, evaluate, monitor and debug your AI agents in development and production.

PandaProbe was featured in Open Source (68.4k followers), Developer Tools (512.4k followers), Artificial Intelligence (468.5k followers) and GitHub (41.2k followers) on Product Hunt. Together, these topics include over 194.2k products, making this a competitive space to launch in.

Who hunted PandaProbe?

PandaProbe was hunted by Sina Tayebati. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.

Want to see how PandaProbe stacked up against nearby launches in real time? Check out the live launch dashboard for upvote speed charts, proximity comparisons, and more analytics.