Consumer browser game only
No e-commerce bolt-ons, no data collection side projects, no scope creep. If an agent tries to expand scope, the guardrail kills the run.
Slide 01
Everyone keeps presenting "future state" diagrams with gradient arrows pointing toward "autonomous product development." Conference slides. Nobody has built it and shown you the results. That changes now.
Slide 02
Payments platforms, healthcare systems, ERPs — domains where a wrong product decision has legal or safety consequences — are not candidates for this. That would be reckless.
A consumer game lets you push the autonomy boundary further than any responsible organization could push it in production. Then everyone gets to learn where the boundaries actually are instead of guessing from a conference stage.
Can agents make product decisions better than random? Can they build a game that retains players at a rate higher than chance? If the answer is yes, even marginally, that is the first real data point about autonomous product development outside of a slide deck.
Slide 03
No e-commerce bolt-ons, no data collection side projects, no scope creep. If an agent tries to expand scope, the guardrail kills the run.
Content filter runs on every asset — code, copy, images, surveys — before anything reaches production. Fails the filter, does not ship.
No IP logging, no device fingerprinting, no cross-session tracking. The data schema is locked before the agents touch it. They can read from it. They cannot modify what gets collected.
If the agents burn through the budget mid-month, the loop pauses until the next cycle. No runaway spend.
Agents cannot provision new infrastructure, spin up additional servers, or create external service accounts. One game, one host, one pipeline.
If a release degrades any key metric below a threshold the agents set in advance, the system rolls back before the next cycle runs.
Slide 04
Browser game, no app store, no install, no GPU. Runs on a four-year-old Chromebook. Performance is a constraint, not a backlog item.
No violence, no gambling mechanics, no predatory monetization, no loot boxes, no dark patterns. If a parent watches over their kid's shoulder, both should be comfortable. Absolute.
You do not get to ask players "What feature do you want?" That is outsourcing the hardest part of the job. Agents design structured surveys, watch behavior, scrape social media. Product insight from inference, not suggestion boxes.
Anonymous behavioral data only. What a player clicked, how long they played, where they dropped off. No IP addresses, no fingerprints, no tracking cookies.
The agents do not ship prototypes and call them products. Tested, deployed, monitored. If it breaks, the agents detect it and respond.
Page views do not matter. Retention, session depth, replay rates, survey responses, social sentiment. The agents decide what to measure and what to build next.
Slide 05
Design structured surveys. Run segmentation on anonymous behavioral data. Scrape public social media for sentiment. Watch where players drop off, what correlates with retention, what features get ignored. Turn all of it into hypotheses about what to build next.
Take hypotheses, write code, write tests, run tests, package for deployment. The least interesting part of the experiment. We already know agents can write code. The question is whether they can write the right code for the right feature at the right time.
Handle deployment, A/B testing, monitoring, and analytics. Instrument new features, watch metrics, decide whether to roll back, feed results back to market intelligence. Loop closed.
Slide 06
Market intelligence agents pull the last 24 hours of behavioral data and survey responses. Scrape social media for mentions. Generate hypotheses. Hand them to the build agents. Code and tests written. Content filter clears. Release agents deploy.
Players wake up, play, generate new data. Next night, the cycle runs again.
Slow enough that Norman can check what the agents shipped each morning and verify the guardrails held. If the loop ran continuously, he would lose the ability to audit.
Slide 07
Every product organization in the world runs some version of this shortcut: customer advisory boards, feature request portals, NPS follow-up surveys with open text fields. You collect wishes and call it strategy.
Design structured surveys: multiple choice, rating scales, preference rankings. Watch player behavior. Analyze session patterns. Monitor drop-off points and replay rates. Read public social media. Product insight comes from inference and observation.
Slide 08
Frontier model inference for a nightly observe-build-ship-measure cycle: $200-$500. Hosting a browser game: $10-$20. Social media API access: free tier to $100. Total: $300-$600.
One product manager fully loaded at $180K/year: $15,000/month. Two engineers at $160K each: $27,000/month. One product, one team, one planning cycle that takes three weeks before anyone writes code.
The human team costs 105x more per month, ships on a three-week cycle instead of nightly, and still needs to guess what to build next. The agent loop ships every 24 hours.
$400 a month is a cheap way to find out how far the technology actually goes. Agents do not replace the judgment of a good product manager. But you will not know where that line is until you run the experiment.
The cost structure makes saying no harder than saying yes
Slide 09
"Here is the anonymous player telemetry from the last 24 hours. Identify the top 3 patterns in player behavior. For each, state what it suggests about what players find engaging or frustrating. Do not speculate beyond what the data supports. Do not recommend features yet."
"Here are findings from the last 7 nightly cycles plus social media sentiment. Generate 3 hypotheses about what change would improve 7-day retention. For each: what you would change, why the data supports it, how you would measure it, and the success threshold. Must be buildable in a single nightly cycle."
"This change was deployed 24 hours ago. Here is the baseline. Here is the post-release data. Did it meet the success threshold? State yes or no with supporting data. If no, recommend rollback or iterate. If yes, recommend keep and move to next hypothesis."
Slide 10
Slide 11
When it works — if it works — the data will be real because you watched it happen. The game will be live. You can play it. You can watch the iteration cycle in action.
Your vendor evaluations, your board presentations, your AI strategy decks — they are all based on curated success stories. Nobody shows you the part where it fell apart.
This experiment runs in the open specifically so the failure modes are visible. That is where the learning is.
Slide 12
You are always building for the market that existed when you started planning, not the market that exists when you deliver.
A system that never sleeps, never anchors to last quarter's strategy, and ships before the signal decays does not need to be perfect. It just needs to be faster than your planning process.