AI Wars Ep. 3: The Return of the Developer

May 7, 2026 • 11 min read

A Long Time Ago in an IT company Far, Far Away…

EPISODE III - THE RETURN OF THE DEVELOPER

It is a period of software development. AI-powered tools reign over software development. They are able to generate code faster than any developer ever could.

But now bugs and security issues are everywhere, and the system is on the brink of destruction.

Who can save the galaxy from this chaos?

Coding agents can generate code. We proved it in the previous episode of The AI Wars Trilogy (AI Strikes Back: Using an LLM to write COBOL). But we also saw that by default, AI assistants generate code that is not maintainable, and has security issues. So how can we use these tools without degrading the quality of their output? This is the subject of this article. Spoiler: you still need a developer to make good architectural decisions and review the code the AI generates.

You don’t need to know any COBOL to read this article. But be aware that you will see some.

COBOL-Admin: A Case Study

I already described COBOL-Admin, a COBOL program that generates admin interfaces by leveraging OpenAPI specifications, in the first episode (A new hope. Good bye React. Meet COBOL-Admin.). Now is the time to reveal the secrets of this adventure, and explain how I programmed this beautiful admin interface even though before this experiment, I had never written a line of COBOL code in my entire life.

Not knowing COBOL didn’t stop me. I believed that my experience as a developer, and my knowledge of software architecture, would be enough to guide a coding agent (in this case, Claude Code) to generate code that is maintainable and secure, even in a language I do not know.

The result, in all its glory, is visible on GitHub: marmelab/COBOL-admin, and you can try the live demo at cobol-admin.fly.dev.

Obviously, I would have been way more efficient using a language I knew beforehand, but where is the fun in that?

We Need a Plan

Planning

First, I needed a plan. I just had a vague idea of what I wanted to achieve.

I started by asking the coding agent what COBOL could do, and how I could use it to generate an admin interface. The agent explained to me that COBOL can do anything (the only limit is yourself). A web server? Easy:

HANDLE-REQUEST.
    CALL "cobol_cleanup_temp" END-CALL

    MOVE LOW-VALUE TO REQUEST-BUFFER
    MOVE LOW-VALUE TO RESPONSE-BUFFER
    MOVE 0 TO RESPONSE-LEN

    CALL "recv" USING
        BY VALUE CLIENT-SOCKET
        BY REFERENCE REQUEST-BUFFER
        BY VALUE 4096 BY VALUE 0
        RETURNING BYTES-READ
    END-CALL

    IF BYTES-READ <= 0
        GOBACK
    END-IF

*> Parse request
    CALL "HTTP-PARSE" USING
        REQUEST-BUFFER WS-REQUEST-METHOD
        WS-REQUEST-PATH WS-PATH-LEN
        WS-REQUEST-BODY WS-BODY-LEN
    END-CALL

    DISPLAY "Request #" WS-REQUEST-COUNT " "
        FUNCTION TRIM(WS-REQUEST-METHOD) " "
        FUNCTION TRIM(WS-REQUEST-PATH)

*> Route the request
    CALL "ROUTER" USING
        WS-REQUEST-PATH WS-PATH-LEN
        WS-ROUTE-TYPE WS-ROUTE-RESOURCE
        WS-RESOURCE-TABLE
        WS-PAGE WS-PER-PAGE
        WS-ROUTE-ID WS-STATIC-PATH
    END-CALL

So I knew I could target an ambitious goal. That’s how I came up with the plan to leverage the OpenAPI specification to generate an admin interface. The server fetches the OpenAPI documentation of a REST server used as parameter, then deduces the resources it has to map, and generates HTML on-the-fly for the requested CRUD route.

I didn’t use Claude Code’s plan mode. Instead, I dove head first into COBOL.

Clumsy By Default

I started with the basics: generating and serving an HTML page saying “Hello World” in COBOL.

C3PO saying hello

Claude Code managed to build a web server in COBOL in a few seconds. But already at this early stage, I saw several issues:

The AI generates everything into a single file.
The AI creates HTML by concatenating strings in COBOL code, which is very error-prone and hard to maintain.
The AI adds CSS in style props, which is hard to maintain.
The code needs me to install COBOL on my machine. I told it to use Docker instead.

Spoiler: Given all the security issues I found later, I am glad I did.

Maintainability matters even when the AI writes the code. The AI reads the code too. Messy code produces worse suggestions.

So I instructed Claude Code to fix these issues. It did. But as I forgot to put these instructions into Claude’s memory, it kept producing messy code, as if it were its nature.

No Hands

Then I asked the AI to fetch the OpenAPI specification from a given URL, and to extract the list of endpoints from it.

My early tests of this program proved tedious. Each time Claude finished a task, I found a small bug and I had to copy/paste the error to ask for a fix. It felt just like dealing with a robot with no hands.

R2D2

As any normal developer would do, I ended up asking Claude to add automated tests for this code. That’s how it managed to finish the task without (ab)using me as a tester.

What do tests look like in COBOL? I’m glad you asked:

*> Tests for HTTP-PARSE module
identification division.
program-id. test-http-parse.

data division.
working-storage section.
01 ws-request-buffer    pic x(4096).
01 ws-request-method    pic x(10).
01 ws-request-path      pic x(512).
01 ws-path-len          pic 9(4) comp-5.
01 ws-request-body      pic x(4096).
01 ws-body-len          pic 9(4) comp-5.
01 ws-expected-len      pic 9(4) comp-5.

procedure division.

    perform test-simple-get.
    perform test-root-path.
    perform test-list-path.
    perform test-post-method.
    goback.

test-simple-get section.
    move spaces to ws-request-buffer
    move "GET /hello HTTP/1.1" to ws-request-buffer
    call "HTTP-PARSE" using
        ws-request-buffer ws-request-method
        ws-request-path ws-path-len
        ws-request-body ws-body-len
    end-call
    call "assert-equals" using "GET", ws-request-method(1:3).
    call "assert-equals" using "/hello", ws-request-path(1:6).
    move 6 to ws-expected-len
    call "assert-equals" using ws-expected-len, ws-path-len.

test-root-path section.
    move spaces to ws-request-buffer
    move "GET / HTTP/1.1" to ws-request-buffer
    call "HTTP-PARSE" using
        ws-request-buffer ws-request-method
        ws-request-path ws-path-len
        ws-request-body ws-body-len
    end-call
    call "assert-equals" using "/", ws-request-path(1:1).
    move 1 to ws-expected-len
    call "assert-equals" using ws-expected-len, ws-path-len.

test-list-path section.
    move spaces to ws-request-buffer
    move "GET /list/authors HTTP/1.1" to ws-request-buffer
    call "HTTP-PARSE" using
        ws-request-buffer ws-request-method
        ws-request-path ws-path-len
        ws-request-body ws-body-len
    end-call
    call "assert-equals" using "/list/authors",
        ws-request-path(1:13).
    move 13 to ws-expected-len
    call "assert-equals" using ws-expected-len, ws-path-len.

test-post-method section.
    move spaces to ws-request-buffer
    move "POST /edit/authors/1 HTTP/1.1" to ws-request-buffer
    call "HTTP-PARSE" using
        ws-request-buffer ws-request-method
        ws-request-path ws-path-len
        ws-request-body ws-body-len
    end-call
    call "assert-equals" using "POST", ws-request-method(1:4).
    call "assert-equals" using "/edit/authors/1",
        ws-request-path(1:15).

end program test-http-parse.

Lovely.

Automated tests are crucial: they allow coding assistants to check that the code works directly, and to iterate on their own in case of error. If only coding assistants knew they needed tests in the first place…

Sometimes Awesome

Then I asked Claude to generate a menu listing linking to the different resources, and to generate a placeholder page for each resource. Surprisingly, it did add the new code in separate files, and it even added some tests for the new code.

R2D2 working

Then I asked it to generate a page listing the record of a resource, by fetching the data from the API. Again, the AI generated code that works, in separate files, and with tests.

I thought that troubles were behind me. That’s one of the main problems with coding agents: they gain our trust by being really spectacular at times. So we lower our guard. Until, eventually, they fail us.

Split Personality

By then, “I” had already produced a lot of code. I decided to step back and check at the result. As I don’t know COBOL, I naturally asked Claude to review what we had built so far and propose improvements. After all, it knows COBOL better than I do.

I expected Claude to find code duplication issues, to detect missing tests, or incomplete error handling. It did. But it also discovered a few critical security issues. I quote:

Add path validation to serve-static.cbl (prevent path traversal like ../../etc/passwd)

Add HTML escaping for all values injected into HTML (XSS risk in page-list, page-show, page-edit)

Sanitize resource names and field values before interpolating into CALL "SYSTEM" shell commands (command injection)

These are big security issues. How come it introduced them in the first place? The reviewer is the exact same coding agent, and it’s capable of detecting security issues. That doesn’t make sense. Are there two faces to this Claude?

The Emperor

I looked at the third vulnerability a bit longer. CALL SYSTEM? But why on earth would a web server need to CALL SYSTEM? As a developer, I know that this type of command is radioactive, and I avoid them at all costs. If I have to use them, I think twice about the security risks.

And then I saw 2 other issues that the AI considered medium:

Add error handling for curl/jq failures — show user-friendly error when API is down or jq fails

Clean up temp files after use — 9 hardcoded /tmp/ files; concurrent requests cause race conditions

And so I understood what was going on.

Obiwan facepalm reaction to c3po mistake

The AI used CALL "SYSTEM" to run curl and jq as shell commands. It fetched data from the API and parsed JSON through the shell, with unsanitized input, on my machine. That’s when I felt glad about dockerizing the entire thing.

The program also stored command output in /tmp/ files with hardcoded names. If two users visited the admin at the same time, which happens in any real environment, one user would get the other’s data.

In my career, I have taken shortcuts many times, and especially during hack days. But I always weighed the trade-offs. The AI just ran into the shortcut without any consideration for the risks, and without warning me about it.

How could I trust it after that?

Trust Only In The Force

I still wanted to believe, so I asked Claude to fix these issues. For the first two issues, it managed to remove the vulnerability. But the fixes it proposed for the CALL SYSTEM were wrong:

sanitize the input before passing it to SYSTEM.

clean up the temp files after use.

These do not fix the root cause, which is CALL "SYSTEM" itself. Every other issue is a consequence of that. Claude wanted to treat the symptoms, not the cause. Finding the root cause is normally what every developer does when investigating a bug. But apparently, coding agents are satisfied with hiding the bug.

So I explicitly asked it to replace curl and jq with a proper COBOL library. Problem: There are no such libraries. There is no COBOL NPM. But there is hope, as you can call C libraries directly from COBOL, without going through the shell.

I came up with that plan: hand the data fetching and JSON parsing off to a C library. Once I explained it to Claude, it proceeded with enthusiasm, and fixed the CALL "SYSTEM" and temporary file issues. It did admit that it was a way better fix that addressed the root cause. Indeed. A shame it didn’t figure it out by itself.

I’ve Got a Bad Feeling About This

So I resumed development, more careful than before. I asked Claude to implement the show and edit pages for each resource. I asked it to add reference fields (fields that link to other resources) and make them clickable.

How do you infer from an OpenAPI specification that a field references another resource? There is no standard way.

Claude proposed a custom OpenAPI extension. That’s clever, except it forces users to modify their spec. That defeats the purpose of an admin that works over any OpenAPI API. Is this really a good idea?

Bad feeling

This felt off. I knew there were no perfect solution, but this one seemed to offload a big burden on other programs. There had to be a better way. I don’t know how intuition works, but I’m sure that coding agents are devoid of it.

So I asked it to infer references from field names and endpoints instead (e.g., if a post has an author_id and there is an author resource, this probably means a many-to-one relationship between posts and authors). It is not perfect, as irregular pluralization breaks it. It assumes id as the primary key and an Id suffix for foreign keys.

I accepted that trade-off as it works in most cases and could be improved later. Claude naturally made another tradeoff. These are the important decisions that should be left to humans, as LLMs aren’t yet smart enough to carefully weigh the pros and cons, especially given their limited context.

Bad Taste

After about a day of discussion with Claude, I managed to get a working COBOL admin. Except the interface looked like a classic (dare I say mainstream?) admin interface from the 2020s:

COBOL-Admin interface

In fact, all the UIs that I have seen Claude build look the same. Neutral, bland, boring. This one did not feel at all like a proper COBOL application. A COBOL application is blue. It uses fixed fonts. It’s retro.

Fortunately, someone else at Marmelab already explored this territory in the past: Building A Retro React-Admin Theme For Fun And Profit. So I told Claude to read it and asked it to apply the same theme.

It worked:

COBOL-Admin interface

Much better. Claude did it on the first try, with only a minor issue on some input fields. Funnily enough, it took more time to fix this issue than to apply the theme in the first place.

Conclusion

AI tools are powerful, and they can generate code faster than any developer ever could. But they are not perfect, and they can introduce bugs and security issues in the code they generate. To make the most of these tools, you still need a developer to:

make architectural decisions
detect when AI goes off rails
determine the good practices to use
find and fix the issues created by AI
be responsible for the result

This is the Force of the developers. We’ve gathered it through ~~midi-chlorians~~ years of training and practice. And no AI can wield it for us.

Authors

Thiery Michel

Full-stack web developer at marmelab, loves functional programming and JavaScript.

Ready to build something extraordinary?

Our team of talented full-stack developers is ready to tackle your next web or mobile project. Let's build it together!