Andrej Karpathy released a 630-line Python script in March 2026 and changed the way people think about AI-driven optimization. The project is called autoresearch, and the idea is simple: let an AI agent modify code, run a quick experiment, measure the result, keep what works, revert what doesn't, and repeat — all night, without human input.
In two days, the agent ran roughly 700 experiments and found about 20 genuine improvements. It reduced training time by 11 percent. It even found a bug that Karpathy himself had missed for months.
Shopify CEO Tobi Lutke ran the same pattern overnight. After 37 experiments, his smaller model outperformed a model twice its size by 19 percent.
Fortune magazine named it The Karpathy Loop. The GitHub repo passed 40,000 stars in its first two weeks.
Here is the part that matters for Excel users: Karpathy said the pattern works for "any metric you care about that is reasonably efficient to evaluate." That includes spreadsheet formulas, VBA macros, and prompt templates.
The Three Ingredients
The autoresearch pattern needs exactly three things:
-
An editable artifact — the thing you are optimizing. In Karpathy's case, it was a training script. In Excel, it could be a formula, a macro, or a prompt.
-
A measurable metric — a single number that tells you whether the change helped. Karpathy used validation loss (bits per byte). For Excel, this could be execution time, formula length, accuracy against test data, or file size.
-
A time-boxed cycle with version control — each experiment has a fixed time budget. You try a change, measure the result, and either keep it or revert. Then you move on.
That is the entire framework. Everything else is implementation detail.
What This Looks Like in Excel
The pattern maps to at least four real spreadsheet scenarios.
Formula Optimization
You have a formula that works but feels overbuilt. Maybe it is a five-level nested IF, or an INDEX/MATCH chain that nobody on the team can read without a whiteboard.
The autoresearch version:
- Artifact: the formula itself.
- Metric: formula length (number of characters), accuracy against 20 known test cases, or calculation time.
- Loop: paste the formula into ChatGPT or Claude, ask for three alternative approaches, test each variant, keep the winner, and repeat.
Here is a real example. Start with this:
=IF(A2<10,"XS",IF(A2<20,"S",IF(A2<30,"M",IF(A2<40,"L","XL"))))
After one round of the loop, AI suggests:
=IFS(A2<10,"XS",A2<20,"S",A2<30,"M",A2<40,"L",TRUE,"XL")
After a second round, targeting readability:
=SWITCH(TRUE,A2<10,"XS",A2<20,"S",A2<30,"M",A2<40,"L","XL")
After a third round, using a lookup table approach:
=INDEX({"XS";"S";"M";"L";"XL"},MATCH(A2,{0,10,20,30,40},1))
Each version is shorter, easier to modify, and equally accurate. That is three experiments. The autoresearch pattern just runs more of them, faster.
VBA Macro Optimization
VBA macros accumulate slowness over time. A data cleanup macro that took five seconds in 2022 now takes 45 seconds because the dataset grew and nobody optimized the code.
- Artifact: the VBA subroutine.
- Metric: execution time (using the
Timerfunction), lines of code. - Loop: paste the macro into an AI tool, describe the metric you care about, test each variant against the same dataset, log the results.
Start with a typical slow macro:
Sub CleanData()
Dim ws As Worksheet
Set ws = ThisWorkbook.Sheets("Data")
Dim lastRow As Long
lastRow = ws.Cells(ws.Rows.Count, 1).End(xlUp).Row
Dim i As Long
For i = 2 To lastRow
ws.Cells(i, 2).Value = Trim(ws.Cells(i, 2).Value)
ws.Cells(i, 3).Value = UCase(ws.Cells(i, 3).Value)
If ws.Cells(i, 4).Value < 0 Then
ws.Cells(i, 4).Value = 0
End If
Next i
End Sub
After running the loop, the AI suggests reading into an array, processing in memory, and writing back in one operation:
Sub CleanDataFast()
Dim ws As Worksheet
Set ws = ThisWorkbook.Sheets("Data")
Dim lastRow As Long
lastRow = ws.Cells(ws.Rows.Count, 1).End(xlUp).Row
Dim data As Variant
data = ws.Range("B2:D" & lastRow).Value
Dim i As Long
For i = 1 To UBound(data, 1)
data(i, 1) = Trim(data(i, 1))
data(i, 2) = UCase(data(i, 2))
If data(i, 3) < 0 Then data(i, 3) = 0
Next i
ws.Range("B2:D" & lastRow).Value = data
End Sub
On a 50,000-row dataset, the first version might take 12 seconds. The second takes under one second. Same logic, same results, better metric.
Dashboard and Report Optimization
- Artifact: a workbook template with multiple sheets, charts, and linked formulas.
- Metric: file size, recalculation time (check the status bar), number of volatile functions (
NOW(),INDIRECT(),OFFSET()). - Loop: describe the workbook structure to AI, ask for optimization suggestions, implement one at a time, measure before and after.
This is less about a single formula and more about the system. Replace OFFSET-based dynamic ranges with structured Tables. Swap INDIRECT references for direct sheet links. Remove unused conditional formatting rules. Each change is one experiment.
Prompt Optimization
This is the meta-application. If you already use ChatGPT or Claude to help with Excel work, you have a prompt — and that prompt can be optimized with the same loop.
- Artifact: your prompt template for generating Excel formulas.
- Metric: correctness rate across 10 test problems.
- Loop: modify the prompt wording, run it against your test set, score the results, keep the version that scores higher.
For example, you might find that adding "return only the formula with no explanation" improves accuracy for simple tasks, while adding "explain your reasoning step by step before writing the formula" improves accuracy for multi-condition lookups. The loop tells you which prompt works for which problem type.
If you want a deeper look at how different AI tools handle Excel prompts, start with Best AI for Excel Formulas: Copilot vs ChatGPT vs Claude.
Try It Yourself: A Step-by-Step Autoresearch Loop
Here is a practical walkthrough you can follow right now.
Step 1: Pick Your Artifact
Choose one formula or macro that bothers you. Something that works, but is slow, long, fragile, or hard to read. Start small — one formula is enough.
Step 2: Define Your Metric
Decide what "better" means. Pick one:
- Accuracy — does it return the correct result for all test cases?
- Length — how many characters is the formula?
- Speed — how long does recalculation take?
- Readability — can a colleague understand it in under 30 seconds?
For your first loop, accuracy plus length is a good combination.
Step 3: Create a Test Harness
Build a small worksheet with known inputs and expected outputs. This is your validation set — the equivalent of Karpathy's prepare.py. Ten to twenty test cases is usually enough.
| Input (A) | Expected Output (B) | Formula Result (C) | Match? (D) |
|---|---|---|---|
| 5 | XS | =YOUR_FORMULA(A2) | =B2=C2 |
| 15 | S | =YOUR_FORMULA(A3) | =B3=C3 |
| 25 | M | =YOUR_FORMULA(A4) | =B4=C4 |
| 35 | L | =YOUR_FORMULA(A5) | =B5=C5 |
| 45 | XL | =YOUR_FORMULA(A6) | =B6=C6 |
Add a summary cell: =COUNTIF(D2:D21,TRUE) — this is your scalar metric.
Step 4: Set Up a Results Log
Create a second sheet or table to track each experiment. This is your version of autoresearch's results.tsv:
| Experiment | Formula Variant | Accuracy | Length | Status |
|---|---|---|---|---|
| 1 | Nested IF (baseline) | 20/20 | 62 | keep |
| 2 | IFS function | 20/20 | 55 | keep |
| 3 | SWITCH with TRUE | 20/20 | 58 | discard |
| 4 | INDEX/MATCH with lookup array | 20/20 | 51 | keep |
| 5 | XLOOKUP with sorted range | 18/20 | 44 | discard |
Step 5: Run the Loop
Open ChatGPT or Claude. Paste your current best formula, your metric, and your test cases. Use a prompt like:
Here is an Excel formula:
=IFS(A2<10,"XS",A2<20,"S",A2<30,"M",A2<40,"L",TRUE,"XL")
My goal is to minimize formula length while keeping 100% accuracy
on these test cases: [5→XS, 15→S, 25→M, 35→L, 45→XL, 0→XS, 39→L].
Propose 3 alternative formulas that achieve the same results
with fewer characters. For each, explain why it might be shorter.
Test each suggestion against your harness. Update the log. Feed the winner back as the new baseline. Repeat five to ten rounds.
Step 6: Review What You Learned
After a few rounds, you will notice patterns. AI tends to suggest the same families of solutions — lookup-based, array-based, helper-column-based. The interesting discoveries come when you push past the obvious alternatives.
This is exactly what happened with autoresearch. The first 50 experiments produced incremental gains. The surprising discoveries — like finding a bug in the codebase — came from sheer volume.
What This Cannot Do Yet
The honest version of autoresearch for Excel has real limits.
You are the GPU. In Karpathy's setup, the AI agent runs experiments automatically while you sleep. In Excel, you are still the one pasting formulas, running tests, and logging results. There is no automated experiment loop for spreadsheets yet.
Metrics are harder. Machine learning has clean loss functions. Spreadsheet quality is a mix of accuracy, speed, readability, and maintainability. You have to choose what to optimize, and that choice matters.
AI does not see your data. If the formula depends on real business data, you have privacy decisions to make before sharing it with an AI tool. See How to Upload an Excel File to ChatGPT and Analyze It Safely for a walkthrough of that.
Correct is not the same as good. A formula can pass every test case and still be wrong in production because the test cases did not cover edge cases. The harness is only as good as your test data.
Where This Is Heading
Copilot already suggests formula alternatives inside Excel. The next step is obvious: let Copilot run iterations autonomously, measure results against a user-defined metric, and surface the best version. That is autoresearch built into the spreadsheet.
Office Scripts could close the gap sooner. A script that calls an AI API, swaps formulas, runs validation, and logs results is technically possible today — it just has not been packaged into a clean workflow yet.
The pattern has already been applied to marketing automation, voice agents, and web performance. Spreadsheets are a natural next step.
For a broader view of how AI tools work with Excel today, start with AI in Excel: Practical Guide to Copilot, ChatGPT, Claude, and Gemini.