How Karpathy Killed His Own App On Stage — And Why We Rebuilt MenuGen Anyway
How Karpathy Killed His Own App On Stage — And Why We Rebuilt MenuGen Anyway
A year ago Andrej Karpathy showed up to a vibe-coding hackathon with basically zero web-dev experience and decided to build the app he'd been wanting: photograph a restaurant menu, see what every dish actually looks like. He called it MenuGen. Cursor and Claude wrote 100% of the code. He pushed it live on Vercel with Clerk for auth, Stripe for payments, Replicate for the image generation. Caption it "vibe coding" — you tell the model what you want, the model writes the code, you barely glance at the diff.
The app worked. He even charged for it. But the blog post he wrote wasn't exactly a victory lap.
The hardest part, he said, was never the code. It was wrangling all the IKEA furniture of modern web development: Clerk OAuth dashboards, Stripe webhook secrets, Vercel environment variables, Google Cloud OAuth consent screens, half a dozen services that each want their own dashboard logins and secret tokens you have to copy-paste. He spent more time hunting through settings UIs than tweaking prompts. His exact words: "the plethora of services you have to assemble like IKEA furniture to make it real."
Six months later, at Sequoia AI Ascent 2026, Karpathy gave a talk on what he calls Software 3.0. And he killed his own app on stage. He showed that you can hand the same menu photo to Gemini, tell it to use Nano Banana to overlay food images directly into the pixels of the original menu, and skip the entire app. No OCR pipeline. No Replicate. No Stripe. No MenuGen. The model just does it.
So why are we publishing a new MenuGen today?
Because the demo Karpathy showed is amazing — and also nothing like what real people actually need. A single prompt to a single multimodal model is a magic trick. It's not a product. It doesn't speak Persian. It doesn't handle a four-page bound menu where the dishes you want are buried on page three. It doesn't know that finding a real photo of پیتزا پپرونی is going to beat a generated one nine times out of ten. And it definitely doesn't install onto your phone for next Friday when you're staring at a Turkish menu in Istanbul.
Our MenuGen is the version after the magic trick.
You upload one photo, or twelve. We read every page with Gemini 2.5 Flash, deduplicate dishes across them, and translate each dish name into whichever of twelve languages you picked. English, Persian, Spanish, French, German, Arabic, Chinese, Japanese, Italian, Portuguese, Turkish, Korean. The whole UI flips to RTL when you switch to Arabic or Persian.
For each dish we do something smarter than just generating. We ask a grounded Gemini call to search the web for an actual food photograph of that dish — from a food blog, Wikipedia, a real restaurant — and we only fall back to generating with Flux Pro if no real photo turns up. Real photos look like real food. Generated photos look like abstract food photography. We prefer the real ones when they exist.
And the IKEA assembly Karpathy complained about? That's just gone. There's no Clerk dashboard to register. No Stripe webhook to wire. No Google Cloud project to spin up. The AI Pass SDK handles auth via a single button you drop into the page, payments and credits via the same widget, model access via one OAuth-backed client ID the server substitutes at publish time. The entire MenuGen frontend is one self-contained HTML file that you can install to your home screen as a PWA.
Try it: https://aipass.one/spaces/aipass/menugen
Tips for the best results: take the photo head-on, not at an angle, with the dish names clearly in focus. If a menu is multi-page, scan all the pages at once — we deduplicate, so a dish printed twice won't show up twice. And if a dish name is in a language Gemini doesn't recognize, the original script is preserved underneath the translated name so you can still match it to the menu.
The Karpathy lesson is real. Single multimodal models are eating thin app wrappers. The right response isn't to mourn the wrapper — it's to build the version of the app that the one-shot prompt can't replace. Multi-page, multi-language, web-grounded, installable, free to try. That's the version you can ship on a platform that already solved the IKEA furniture problem.