, ,

One Hundred POCs a Day

·

·

Listen to this article
0:00
0:00

12 min read

Picture this.

You wake up at 6:14 on a Tuesday. Not to an alarm. To a notification.

Your agent orchestration layer, the thing that coordinates everything while you are not looking, left you a message at 3 AM.

Three words at the top: Review these three.

Below that, three proof of concepts. Fully functioning. Deployed to your test environment. Each one with a summary, a synthetic user report, a confidence score, a link to the running instance, and a complete go-to-market plan.

You did not ask for them.

You did not write a brief. You did not file a ticket. You did not schedule a brainstorm with product and engineering to “ideate” on what to build next quarter.

The agents did it while you slept.

What Happened Overnight

Between midnight and 6 AM, the agents started with context. Not a prompt you wrote that night. The accumulated context of your entire system. Customer data. Usage patterns. Support tickets. Sales conversations. Competitor movements. Industry signals. Your strategy documents. The roadmap. The backlog. The stuff you explicitly deprioritized and why. External market data. Analyst reports. Social chatter. All of it. Not summarized into a deck. Ingested as operating context.

From that context, the agents identified the top one hundred features worth building. Not a brainstorm list. Not sticky notes on a virtual whiteboard. One hundred discrete, scoped, buildable features, each one tied to a strategic signal, a customer need, or a competitive gap.

Then they built them. In your codebase.

All one hundred. Each one branched from your actual repository. Written against your existing architecture. Using your component library, your design system, your API patterns, your data models. Not greenfield experiments bolted onto the side. Features that fit into the product your customers already use, following every convention your engineering team established.

The agents read your codebase the way a senior engineer would on their first week. They understood the folder structure. They followed the naming conventions. They used the shared components instead of reinventing them. They matched your UX patterns. The same navigation flows, the same form behaviors, the same error handling, the same responsive breakpoints, the same accessibility standards your team already ships. A customer using one of these POCs would not know it was built overnight. It looks and feels like the rest of your product because it was built from the same source, in the same style, by agents that understood the system they were extending.

All one hundred. Working code. Deployed to isolated test environments from feature branches off your main branch. Functioning software, integrated into your product, consistent with your UX, ready to click through and use.

Ninety minutes.

Synthetic Customers

Once the hundred POCs were running, the agents tested them with people.

Synthetic people.

The system spun up personas built from your real customer archetypes. Drawn from actual usage data, support history, purchasing patterns, and behavioral signals. A mid-market ops director who hates onboarding friction. A developer at an enterprise client who needs the API to work a specific way. A CFO who will never click more than twice to find a number.

These synthetic customers used the features. They navigated them. They tried to break them. They gave structured feedback, the kind you would get from a well-run usability study, except it happened at 2 AM and it took eleven minutes.

Then the agents ran simulations. Load patterns. Edge cases. Failure modes. What happens when the synthetic CFO exports to Excel and the date format is wrong. What happens when the synthetic developer hits the rate limit on the third call. What happens when the synthetic ops director tries to onboard twelve users at once and three of them have the same email domain.

Real scenarios. Grounded in real data. Executed against real running software.

These are not random personas a language model hallucinated. They are data-grounded simulations modeled on your actual customers. They miss the irrational stuff. The customer who calls support and yells for twenty minutes about something that is not broken. The enterprise buyer who makes decisions based on a golf conversation. But they catch the structural stuff. The onboarding friction. The confusing navigation. The API that returns a 200 when it should return a 201. The export that truncates at row ten thousand. The permission model that does not account for the contractor role.

Eighty percent of what a two-week usability study would catch. In eleven minutes. At 2 AM. Without scheduling a single meeting.

You still need real customers. You still need a human product leader with judgment and taste and the ability to read body language in a room. But you do not need real customers to decide whether a concept is worth pursuing. You need them to decide whether a validated concept is ready to ship.

Every POC Got a Go-to-Market Plan

The agents did not stop at software. Every single POC, all one hundred, came with a complete go-to-market plan. Not a bullet point. A plan you could hand to your head of marketing on Monday morning and start executing.

Positioning. Specific to your market, your competitors, and the customer segment the POC targets. Written against your brand voice. Referencing competitors by name. Identifying the gap this feature fills and why the timing is now.

Pricing. Built from your current pricing structure, your competitors’ published pricing, the willingness-to-pay signals buried in your sales call transcripts, and the usage patterns that indicate which customers would upgrade for this feature. Three price point scenarios. Attach rates by segment. Revenue impact in quarter one versus quarter four.

Launch sequence. Week one: internal enablement, what your sales team needs to know, what your CS team needs to say. Week two: beta cohort, which ten customers to approach first, why those ten, what you are measuring, exit criteria. Week three: controlled release with instrumentation. Week four: general availability with a campaign brief, social copy, email sequences, and a landing page wireframe.

Competitive response. If you ship this, what does your closest competitor do? Three scenarios: they ignore it, they copy it within a quarter, or they leapfrog it. A playbook for each. Grounded in their release cadence, their public roadmap, their engineering team size, and the signals from their recent hires.

Risk. Regulatory exposure, data privacy implications, infrastructure cost at scale, customer confusion if this changes an existing workflow, cannibalization against your own features. Not a generic risk matrix. A specific analysis tied to your business, your market, and your compliance obligations.

All one hundred POCs. All one hundred plans. Built between midnight and 4 AM.

By the time the filtering started, the agents were evaluating businesses. Not features.

The Funnel

One hundred POCs entered.

The first cut came from a panel of product manager agents. One focused on market fit, one on technical feasibility at scale, one on strategic alignment, one on customer impact. They reviewed the synthetic feedback, the usage data, the go-to-market plans. They debated each other. An actual adversarial review where one agent argued for a feature and another argued against it based on conflicting priorities.

Fifty survived.

Those fifty went through another round. More simulation. Tighter personas. Harder edge cases. The agents iterated on the code. Tightened the UX. Fixed the failure modes the first round uncovered. Re-ran the synthetic customers against the improved versions. The go-to-market plans got sharper too. Pricing models adjusted when usage patterns diverged from the initial assumption. Launch sequences rewritten when the beta cohort analysis pointed to different customers.

Twenty-five survived.

Then twelve.

Then six.

Then three.

Those three were sitting in your test environment when you opened your laptop. Working software on a feature branch, validated user feedback, and a business plan ready for Monday.

The other ninety-seven were catalogued. Scored. Sitting in a queue with full context. Why they were deprioritized, what would need to change for them to move up, how much work each one needs to reach production.

The POC Was Never the Problem. Your Process Was.

I wrote about this in The Customer Product Operating Model. The way most organizations “prove” a concept has nothing to do with proof and everything to do with engineering constraints that no longer exist.

Your product team gets an insight. A customer said something. A competitor shipped something. A pattern emerged in the data. In the old world, here is what happened next.

A PM wrote a spec. Twelve pages. Two weeks to draft, another week to review. Then a wireframe. Then a Figma prototype, high fidelity, interactive, pixel-perfect, that took a designer a week and a half. Then a stakeholder review where fourteen people clicked through the Figma and gave contradictory feedback. Then a revised Figma. Then a PRD update. Then a sprint planning discussion about when engineering could fit it in.

Eight weeks later, if you were lucky, an engineering team built something that approximated what the PM described on paper and the designer mocked in Figma. The customer saw it and said “that is not what I meant.”

Nobody even thought about go-to-market until the feature was half built. Pricing was a conversation that happened after launch. Competitive positioning was a slide someone threw together the week before the press release. The launch sequence was whatever marketing could scramble in the time engineering left them.

That was your POC process. Not a proof of concept. A proof of coordination. A two-month relay race from insight to artifact.

The constraint was engineering capacity. When building is expensive, you cannot afford to build the wrong thing. So you invested enormous effort in defining the right thing on paper. Wireframes. Figma mockups. Specs. PRDs. Acceptance criteria. Stakeholder alignment decks. All of it hoping that documents could substitute for working software.

Building is not expensive anymore.

A POC that used to take three engineers eight weeks takes agents ninety minutes. Not a Figma clickthrough. Not a wireframe with annotations. Not a twelve-page spec that nobody reads twice. Working software with a go-to-market plan attached. Real code, real data, real interactions, real business case, deployed to a test environment you can use.

When the cost drops that far, the math changes completely. You do not build two POCs a quarter and pray you picked the right ones. You build a hundred overnight and let the system tell you which three are worth your time.

Agents in Roles

Most people think about one agent doing one thing. A coding agent that writes code. A testing agent that runs tests. A summarization agent that writes reports.

That is not what is happening here.

This system has agents performing roles. Product manager agents evaluate market fit. Engineering agents build and iterate. QA agents stress-test with synthetic scenarios. Strategy agents score alignment against the company roadmap. Design agents evaluate usability against established heuristics. Marketing agents build positioning and campaign briefs. Pricing agents model revenue scenarios. Competitive intelligence agents track competitor behavior and predict responses.

They hold context. They form positions. They argue with each other. The product manager agent writes a brief explaining why this feature addresses a gap that three enterprise customers flagged in Q4, and why the current workaround costs them eleven hours a week. The marketing agent builds a launch sequence that accounts for your sales cycle length, your customer’s buying committee structure, and the trade show six weeks from now. The engineering agent makes architectural decisions based on the existing codebase and documents why it built what it built so the human reviewing the POC can challenge the reasoning, not just the code.

Multi-agent orchestration with role-based specialization. Every piece of this exists today. You can build this in March 2026. I know because I am building it.

Do This Safely

You are running autonomous agents against your real data, building software that touches your real systems, testing with synthetic customers modeled on your real users. That is powerful. That can also go wrong in ways that matter.

Isolate the test environments. No agent-built POC touches production data, production infrastructure, or production customers. Ever. Not until a human reviews it, approves it, and promotes it through your normal release process. The agents build in sandboxes. The sandboxes are disposable. Nothing leaks.

Ground your synthetic personas in real data, but anonymize them. The personas should reflect real behavioral patterns. They should never contain real PII. Build archetypes from aggregated data. Never from individual customer records.

Adversarial review is not optional. The filtering pipeline must include agents whose explicit job is to kill a POC. Not cheerleaders. Critics. If every POC survives, your filter is broken.

Human review is the gate to production. Agents propose. Humans dispose. The decision to invest real engineering time, real production infrastructure, and real customer exposure belongs to a human with judgment, context, and accountability.

Treat the go-to-market plans as drafts. The pricing model is a starting point. The competitive response analysis is informed speculation, not prophecy. The launch sequence is a first pass your team refines. The agents give you an eighty-percent head start. Your people close the last twenty.

Log everything. Every agent decision, every synthetic test result, every filtering rationale, every go-to-market assumption. When something goes wrong, and something will, you need to trace why the system made the choices it made.

Start with one night. Not a hundred POCs. Start with ten. Review them in the morning. Evaluate the software and the business plans. Tune the personas. Adjust the filters. Build trust before you scale it. This is an operating discipline, not a parlor trick.


The Morning After

It is 6:47 AM. You have reviewed three proof of concepts. One of them is good. Genuinely good. It addresses a gap you had been thinking about for weeks, but you had not prioritized it because you assumed it would take a full sprint to validate and another month to build the business case.

Agents did it in three hours. The software works. The synthetic feedback is convincing. The go-to-market plan is specific enough that your head of marketing could start executing this week. The pricing model covers three segments. The competitive response analysis anticipated the objection your VP of Sales would have raised. The launch sequence names ten beta customers and explains why those ten.

You bring this to standup. Not as a pitch. Not as a mandate. As a candidate, with working software and a business plan attached. The kind of package that used to take three months to assemble.

Your engineers open the branch and see code that follows their conventions, uses their components, fits into the architecture they built. The diff looks like something a teammate wrote. The agents treated the codebase with the same discipline a good engineer would. They extended the system instead of bolting something onto the side.

Your marketing team refines the positioning. Your sales team validates the pricing against what they hear in the field. That is their job. That is the judgment that matters.

But everyone starts from a feature branch with working software and a business case. Not a blank page. Not a spec. Not a Figma prototype and a prayer.


This Is Not 2030

Everything I just described. The overnight builds. The synthetic customers. The adversarial filtering. The go-to-market plans. The multi-agent orchestration. The codebase-native feature branches. None of it requires technology that does not exist.

The models are here. The orchestration frameworks are here. The infrastructure patterns are well-understood. The synthetic persona techniques work. The multi-agent coordination is documented, tested, and running in production systems today.

You could have done this last month.

Not in theory. Not as a research project. Not as a slide in a strategy deck. You could have wired together the orchestration layer, pointed it at your data, spun up the test environments, and woken up to three validated POCs with go-to-market plans attached. In February 2026. With off-the-shelf models and open-source tooling.

While you were writing specs. While your designer was refining a Figma prototype. While your PM was in the third hour of a backlog grooming session. While your product team debated whether to build Feature A or Feature B this quarter. The answer was both, plus ninety-eight others, built in your codebase, tested by synthetic customers, filtered by adversarial review, and sitting in your test environment before breakfast.

A hundred POCs a day. Not two a quarter. Not one a sprint. A hundred a day. Each one with working software, synthetic validation, and a go-to-market plan your team can execute.

That is the bar now.

Why are you waiting?


Real experiences from operators building with AI in the SDLC, including what worked, what failed, and what changed. Please feel free to use this, just attribute it to agentdrivendevelopment.com.