← Back to Blog

Why Generic AI Tools Fail on Mainframe – and What Actually Works

AI vendors want you to believe code is just text. On a mainframe, code is a physical execution graph spanning three decades of IBM middleware.

Every AI vendor sells the same modernisation playbook for mainframe. Chunk your legacy code. Vectorise it. Drop it into a standard RAG pipeline. Ask questions. Get answers.

This works for a Node microservice built in 2022. It is a spectacular way to misunderstand a z/OS system that has been running since 1988.

The problem is a fundamental assumption that does not hold on mainframe: that application logic lives inside the source files.

Code is not just text on mainframe

On a modern web application, the source code is largely self-contained. You can read a Python function and understand what it does, what it calls, and what it produces. The execution context is relatively shallow.

On a mainframe, a COBOL program is a fragment. Its meaning comes from everything around it:

  • The JCL that invokes it – which datasets it receives as input, which it produces as output, what condition codes it is expected to return
  • The CICS transaction definition that routes to it – how it is triggered, what commarea it receives, where errors are sent
  • The VSAM file that was sorted by a job that ran two hours earlier and passed its output forward as input to this program
  • The DB2 stored procedure it calls, the table it reads, the logging table that captures its errors
  • The job scheduler entry that controls when it runs, what it depends on, and what runs after it

A vector database reads a COBOL program and sees a dataset input. It has absolutely no idea where that dataset came from. Semantic search cannot resolve batch dependencies. An LLM cannot know that a JCL step executed at 2:00 AM sorted a VSAM file and passed it forward.

When you ask a generic AI tool what happens when an account fails validation, it searches for COBOL error logic. It completely misses the CICS transaction definition that intercepts the abend and routes the error to a DB2 logging table.

"Vectorising source code without execution context is analysing an incomplete picture. The answers you get are confidently wrong in ways that are very hard to detect."

The execution graph is the real unit of analysis

The correct unit of analysis on mainframe is not a COBOL program. It is the execution graph – the full picture of what runs, in what order, with what inputs and outputs, under what conditions.

That graph spans:

  • JCL job streams and their dataset flows
  • CICS transaction routing tables and commarea definitions
  • Job scheduler dependencies (third-party workload schedulers)
  • VSAM and DB2 dataset lineage
  • Started tasks and their interactions with application programs
  • Error handling paths – WTO messages, abend exits, restart logic

This graph is not written down anywhere in a single place. It is distributed across dozens of different system components, some of which have not been touched in fifteen years and some of which change every release cycle.

Building this graph from static source analysis alone is not possible. The source files contain the logic. They do not contain the execution context that gives that logic meaning.

Runtime evidence is the right starting point

What actually works is starting from runtime evidence rather than source code.

The mainframe generates an extraordinary amount of runtime data. SMF records capture every program execution, every dataset access, every transaction, every job step – with timing, resource consumption, and outcome. CICS journals record transaction flows. Job scheduler logs record dependencies and execution sequences. Abend records capture exactly what was running when something went wrong.

This runtime evidence tells you what actually happened, not what the code says should happen. It tells you which programs execute in production and how often. It tells you which code paths are live and which have not run in three years. It tells you which CICS transactions are business-critical and which are internal utilities.

Once you have that picture, AI analysis has something real to work with. You can ask meaningful questions: which programs are involved in this transaction flow, what typically precedes this abend, which datasets does this job depend on. The answers are grounded in actual execution history rather than static code inference.

The COBOL source is the last thing you look at, not the first.

What this means in practice

If your AI tool cannot parse a CICS routing table, it has no business touching your core banking system.

More specifically: before evaluating any AI tool for mainframe diagnostics, modernisation, or analysis, ask these questions:

  • Does it ingest SMF data or only source code?
  • Does it understand JCL dataset flows, not just COBOL logic?
  • Can it parse CICS transaction definitions and commarea structures?
  • Does it model job dependencies from your scheduler, or only program-level dependencies?
  • Does it distinguish between code that executes in production and code that has not run in years?

A tool that cannot answer yes to most of these is applying a generic AI pattern to a domain-specific problem. It will produce answers that sound credible and are wrong in ways you will only discover in production.

The approach that works

The approach that works starts from the execution graph, not the source.

Build the runtime picture first – from SMF records, CICS journals, scheduler logs, and dataset lineage. Identify which programs are live, which are critical, and how they connect. Then bring AI analysis to bear on that grounded picture.

This is more work than chunking source files and running a RAG pipeline. It requires mainframe-specific data ingestion that most AI vendors have not built. It requires understanding z/OS architecture at a level that most AI engineers do not have.

But it produces results that are actually trustworthy on systems where the cost of a confident wrong answer is a production outage at a bank.

Generic AI tools fail on mainframe because they are built for a world where code is text. On mainframe, code is a physical execution graph spanning thirty years of IBM middleware. The tools need to match the domain.

Also in this series: Why Mainframe is Different – The Execution Graph Problem · Runtime Evidence as the Right Starting Point · The Hidden Risk in Every COBOL Migration Project

Want AI diagnostics built for mainframe? IMUAI starts from runtime evidence – SMF data, CICS flows, job dependencies – not just source code.
Learn more
Working on Linux and mainframe? IM3270 is a modern 3270 terminal emulator for Linux – free 60-day trial, no credit card required.
Download Free