APIEval-20
An open benchmark for AI agents that test APIs
API
Developer Tools
Artificial Intelligence

Upvotes117

▲ 117View on ProductHunt ⧉

Comments10

10 commentsSee comments on PH ⧉

Featured onMay 8th, 2026

Hunted by

Abhishek Saikia

Unicorn Platform

Create a Website for Your Project Fast • Sponsored

Create your website ⧉

Product upvotes vs the next 3

Waiting for data. Loading

Product comments vs the next 3

Waiting for data. Loading

Product upvote speed vs the next 3

Waiting for data. Loading

Product upvotes and comments

Waiting for data. Loading

Product vs the next 3

APIEval-20

An open benchmark for AI agents that test APIs

APIEval-20 is a black-box benchmark for API testing agents. Each agent gets only a JSON schema and one sample payload, then generates a test suite. We run those tests against live reference APIs with planted bugs and score bug detection, API coverage, and efficiency. Unlike LLM-as-judge evals, scoring is fully objective: a bug is either caught or it isn’t. Tasks span auth, errors, pagination, schemas, and multi-step flows. Open on Hugging Face.

Top comment

Upvotes117

▲ 117View on ProductHunt ⧉

Comments10

10 commentsSee comments on PH ⧉

Product of the Day12nd

Hey Product Hunt, I’m Abhishek, CEO of KushoAI. We built APIEval-20 because API testing is now a common claim across AI agents, but there was no reliable way to verify it. The evaluations we found usually had one of three gaps. They assumed source code access, depended on detailed documentation, or checked whether the output looked valid instead of measuring actual bugs found. That felt far from how most teams test APIs in practice. So we built a black-box benchmark. Schema and payload in. Nothing else. The agent generates a test suite. We run those tests against live reference APIs with planted bugs. The score comes from what the agent actually catches: bug detection, API coverage, and efficiency. No LLM judges. No subjective calls. A bug is either caught or missed. The part I’m most proud of is the complexity taxonomy. Sending nulls to every field is easy. The real test is whether an agent can reason about field relationships, auth behavior, pagination, error handling, schema constraints, and multi-step flows. That is where stronger agents start to separate from weaker ones. APIEval-20 is open on Hugging Face. We are also putting together a leaderboard comparing major AI agents in a separate research report. If you run your agent on the benchmark before then, we would love to include your results. Two questions for the community: 1. What domains or API patterns should we add next? 2. If you are building a testing tool or agent, would you want your results included in the leaderboard? I’ll be here all day. Drop a comment or reach us at [email protected]

About APIEval-20 on Product Hunt

“An open benchmark for AI agents that test APIs”

APIEval-20 launched on Product Hunt on May 8th, 2026 and earned 117 upvotes and 10 comments, placing #12 on the daily leaderboard. APIEval-20 is a black-box benchmark for API testing agents. Each agent gets only a JSON schema and one sample payload, then generates a test suite. We run those tests against live reference APIs with planted bugs and score bug detection, API coverage, and efficiency. Unlike LLM-as-judge evals, scoring is fully objective: a bug is either caught or it isn’t. Tasks span auth, errors, pagination, schemas, and multi-step flows. Open on Hugging Face.

On the analytics side, APIEval-20 competes within API, Developer Tools and Artificial Intelligence — topics that collectively have 1.1M followers on Product Hunt. The dashboard above tracks how APIEval-20 performed against the three products that launched closest to it on the same day.

Who hunted APIEval-20?

APIEval-20 was hunted by Abhishek Saikia. A “hunter” on Product Hunt is the community member who submits a product to the platform — uploading the images, the link, and tagging the makers behind it. Hunters typically write the first comment explaining why a product is worth attention, and their followers are notified the moment they post. Around 79% of featured launches on Product Hunt are self-hunted by their makers, but a well-known hunter still acts as a signal of quality to the rest of the community. See the full all-time top hunters leaderboard to discover who is shaping the Product Hunt ecosystem.

Reviews

APIEval-20 has received 1 review on Product Hunt with an average rating of 5.00/5. Read all reviews on Product Hunt.

For a complete overview of APIEval-20 including community comment highlights and product details, visit the product overview.

APIEval-20An open benchmark for AI agents that test APIsAPIDeveloper ToolsArtificial Intelligence

Product upvotes and comments

Product vs the next 3

Top comment

About APIEval-20 on Product Hunt

Who hunted APIEval-20?

Reviews

APIEval-20
An open benchmark for AI agents that test APIs
API
Developer Tools
Artificial Intelligence