You may be talking out of context

Two Fridays ago, I had an idea: why not try to use Codex to write a PDF renderer, i.e., a tool to convert a PDF into an image? This certainly wouldn’t have been possible before coding agents, as PDF is a notoriously complex format, and a renderer can easily exceed 100K+ lines of code. But I had some unused quota, so why not?

The beginning was quite promising. I already had a JBIG2 decoder from pdfium. I quickly implemented a JPEG2000 decoder using openjp2 and cgo. And Codex created a libfreetype wrapper in no time. I asked it to use xpdf (the origin of Poppler, the other popular PDF library). In just two days, it could already render a PDF, albeit with some small issues. I thought maybe in two more days it would be done.

Then it got stuck for almost a week. I occasionally checked in, and every day it was jumping between two bugs, back and forth. All the while, the lines of code kept increasing, and it kept telling me, “I already fixed it” (very much like the Claude boy vibe). Then I looked at its response and saw mentions of “shortcuts” and “heuristics.” The PDF it was using for testing was the URLA form, which contains form inputs. For whatever reason, it insisted this required XFA support, and that was the rabbit hole it went down. When I saw “shortcuts,” my internal alarm rang. I asked, “Did you follow the reference implementation?” It said that for reasons A, B, C, and D, it had decided to use another approach - i.e., the “shortcuts.” So I corrected Codex, instructed it to follow the reference materials exactly, and rework the solution. It calmly agreed and started working again.

But after two more days, the situation didn’t seem to be improving. So last weekend, I brought in Gemini CLI, which gave me a very promising plan - at least it looked promising. Codex had been “yelled at” a few times, but when I brought the Gemini plan over, Codex rejected it by pointing out issues. It felt like watching two experienced engineers arguing over a tech design in front of their product manager. Eventually, I trusted Codex and emphasized it must stick to xpdf’s approach. In its responses, it did keep telling me it had followed my instructions and how its approach was well-aligned with xpdf. Except it was still stuck on the same bug. The lines of code for that xfa_support.go file had grown to 3000+.

As the codebase grew, every turn with Codex burned more tokens and got slower. Sometimes it took almost an hour for a single turn and used half the context. Finally, I decided to take a closer look. There was a bunch of vestigial code in that file. And no wonder the magic word “heuristics” occasionally reappeared in its responses. This is a typical issue with LLMs: they can’t truly learn from mistakes in a conversation; instead, the mistakes get baked into the context and continue to impact subsequent responses.

In a pre-LLM era, this would be a very difficult situation: your engineers proposed a reasonable design, worked on it for a long time, and you just discovered it was not what your have thought and not going anywhere. But there was no other way. I had to start over, after almost 10 days. I asked Codex to rip out the XFA support completely - which consisted of probably 1/4 of the total code it had produced - and start over. This time I gave it more specific instructions on how to organize the code. It progressed well.

But when I started testing with urla.pdf, the result seemed off again. I asked what was wrong with XFA support, and it told me there was no XFA usage in the PDF file, basically like “I worked my ass off this feature but I have no idea why you want it”. I was sure I had seen some XFA stuff in an earlier version of the URLA PDF when manually examining it, so I thought it was making a mistake again. (It’s worth noting that when I asked it to clean up, Codex also deleted the test files, so this became a mystery.) It was so sure of itself that I checked the file manually, and to my surprise, it indeed didn’t use XFA. I grabbed another version; it was the same. I was totally lost and for a moment doubted if I was just having deja vu. I had been battling with something nonexistent this whole time. Like, what the heck. So, it turned out to be another bug entirely, unrelated to XFA. And Codex figured it out by itself after I stopped mentioning XFA (which I used to limit the scope of the context!).

I threw a few more test PDFs at it, it fixed a few more bugs, and now, after two weeks, it’s clear we’ve reached a solid foundation. The end result is about 1/5 of the original estimated size, with the caveat of missing some less-used features. It is certainly not production ready so this is more like an experiment about Codex’s capability for larger software projects. The whole process wasn’t expected, but in retrospect, not surprising either. It has been both frustrating and enlightening.

As I’ve argued before, in the era of AI, communication and context are the critical points, not the work itself. Ultimately, Codex went off the rails when it started to work on “shortcuts,” then got stuck with that large trunk of incorrect context/memory. In the end, we figured out it had probably been chasing the wrong problem from the beginning. In hindsight, this probably couldn’t have been avoided, as I didn’t spend much time on it, especially during weekdays - just casually checking in like a manager, without letting it occupy too much of my own brain, which is probably exactly the way you want to avoid when using AI. The takeaway? Codex remains impressively capable, but be cautious of the context (pun intended) when you talk to an LLM about anything complex. You may be running out of the context window, and you may also be talking out of context without even realizing it.