Agentic AI

AI and SOX compliance: the hidden risk of shadow preparation

Written by

The Maxima Team

Right now, someone on the accounting team may be pasting a trial balance into ChatGPT to explain a variance. Someone else may be uploading a bank statement to Claude to help reconcile cash. A staff accountant may be asking Gemini to draft the memo behind a journal entry.

The output may be useful. It may even be correct. But when it is copied back into the accounting workflow, the system of record retains the conclusion without the source population, transformations, exceptions, reviewer changes, and approval behind it.

This is shadow AI, and in accounting it has a sharper name: shadow preparation. The output returns to the books, but the preparation does not return with it. Part of a financial reporting process is already operating in a chat window the control owner cannot reliably govern or test. Controllers are right to ask how segregation of duties, review evidence, and approval survive when software prepares the work.

Accounting has never depended on the ability to produce a number. It depends on the ability to explain where that number came from, who reviewed it, and why it was approved. We explored this idea more broadly in our earlier piece, auditability in the age of agentic AI, which examined what changes and what doesn’t when AI begins preparing accounting work. This article focuses on a related challenge: what happens when that preparation occurs outside the accounting workflow altogether.

The control objective stays the same

SOX Section 404 makes management responsible for establishing, maintaining, and assessing internal control over financial reporting. PCAOB AS 2201 distinguishes between whether a control is designed effectively and whether it operated effectively. Those questions do not change when an agent prepares the work.

Consider a vendor accrual. In a traditional workflow, an accountant gathers source data, applies accounting policy, prepares the calculation, attaches support, and routes the entry for review. The objective is straightforward: ensure the accrual is complete, accurate, supported, reviewed, and approved before posting.

Imagine an agent identifies a $1.24 million accrual population from open purchase orders at month-end. The reviewer is not focused solely on the final number. They want to understand what population was included, which items were excluded, what exceptions were identified, and why the approved accrual ultimately posted at the amount it did.

An accrual prepared by an agent still needs to satisfy the same standard. What changes is how the work is captured. Source data, preparation logic, exceptions, reviewer comments, approvals, and final outputs can all remain connected to the workflow itself rather than being reconstructed later from spreadsheets, inboxes, chat threads, and shared drives.

Some parts of accounting benefit from judgment. Others should produce the same answer every time. Population tie-outs, duplicate detection, posting permissions, and approval requirements belong in the second category. The closer those controls are tied to the workflow itself, the easier they become to operate and inspect.

Preparation becomes easier to repeat. Review moves closer to judgment. The evidence stays attached to the work instead of being assembled afterward.

Claude cannot satisfy your auditors. Neither can ChatGPT, or any generic AI model

This deserves to be stated plainly because the enterprise versions of these tools have improved enough that the distinction can feel blurry. They now offer managed access, retention policies, and administrative logs. But enterprise security is not the same as accounting controls.

A saved chat can show the prompt and response. It cannot show whether the source population was complete, whether approved accounting policy was applied, whether exceptions reached the right reviewer, or whether the reviewed version was the one that ultimately posted.

Work performed in a personal, or even an enterprise, AI workspace exists outside the accounting workflow. The intelligence may be available. The preparation is not connected to the control.

The challenge becomes more significant over time. When employees leave, the chat history, assumptions, and preparation logic embedded in those conversations may be difficult or impossible to recover. The accounting conclusion remains in the books. The reasoning behind it often does not.

Training employees not to use personal AI accounts is necessary, but policy alone is not a control over AI-prepared accounting work. A policy defines expected behavior. Access controls, governed workflows, approvals, logging, and monitoring provide evidence of what actually occurred.

The issue is less about which model is used and more about where the work happens. A general-purpose model can operate inside a governed workflow, and purpose-built software can still be implemented poorly. What matters is whether source data, preparation logic, exceptions, review, approval, and posting remain connected to one another.

Building that connection requires deliberate workflow design. It does not happen automatically when a company upgrades from personal to enterprise AI subscriptions. An audit trail can show that actions occurred. It cannot recreate preparation that was never captured in the first place.

This is also why COSO’s 2026 guidance on generative AI is important. It places AI governance within the existing internal control framework. Ownership, risk assessment, control activities, information flows, and monitoring continue to apply regardless of what technology is performing the work.

Zendesk shows what stronger controls could look like

This is not only a theoretical control model. In a webinar with Maxima, Zendesk described a reconciliation process that had historically operated as a manual SOX control. Accountants downloaded data from one system, uploaded it into another, performed the reconciliation, and documented the results for review.

By introducing direct system integrations and automated matching, the team reduced manual preparation while preserving the control itself. Human review remained in place before results synchronized to the ERP. Segregation of duties was maintained through role-based permissions. Teams retained visibility into how data was transformed, what exceptions were identified, and how those exceptions were resolved. Reconciliations that failed validation could be investigated before any downstream action occurred.

The control objective never changed. The reconciliation still needed to be complete, accurate, reviewed, and approved. What changed was the quality of the evidence supporting it and the amount of manual effort required to produce it.

What stays human

Agentic AI does not automatically strengthen controls. Strong controls come from workflows where preparation, review, approval, and evidence remain connected.

The boundaries are reasonably clear. Agents can gather data, apply approved rules, prepare entries, match transactions, identify anomalies, assemble support, and route exceptions. Policy decisions, materiality judgments, unusual transactions, control design, configuration changes, remediation decisions, and final sign-off remain human responsibilities.

The review changes as well. A controller who spends the close checking that formulas copied correctly is re-performing work that the system already validated. That time is better spent evaluating whether the conclusion is reasonable, whether exceptions were resolved appropriately, and whether the control operated at the precision the risk requires.

Agent-prepared is not agent-approved. Preparation becomes more systematic so judgment can become more deliberate.

How to assess your exposure now

Most accounting teams are already somewhere on this spectrum, often without having mapped it.

Identify where AI is already in the workflow. Ask the team directly. Some use of general-purpose AI tools in the close is nearly universal. Understanding where preparation is happening outside governed systems is a prerequisite to control design.
Map your SOX controls to their assumed preparer. For each key control, identify whether the design assumes a human gathered the source population and performed the calculation. If AI is substituting for that step, even informally, the control design may no longer match operating reality.
Separate what can be deterministic from what requires judgment. Population completeness, duplicate detection, tie-outs, and posting permissions should operate through rules that produce consistent results. Exception classification and accounting policy application require judgment that humans should own explicitly.
Assess your change management controls. In an agent-prepared workflow, changes to accrual logic, matching thresholds, or exception routing are control-relevant changes.
Test the audit trail against what your auditors will actually ask. An auditor reviewing an AI-prepared journal entry will want to know that the source population was complete and tied to the system of record, that approved policy was applied, that exceptions were resolved by the right person, and that the reviewed version was the version posted.

If your current workflow cannot demonstrate each of those steps, that is where the work begins.

The choice controllers face

The controller’s choice is no longer whether agentic AI enters the close. It already has, often through tools the accounting control was never designed to reach. The choice is whether that preparation stays beyond the control’s reach or runs inside a workflow the control can see.

This is the operating model Maxima is built to support. Agents prepare accounting work inside a governed system, deterministic checks validate what should not be left to probability, exceptions route to the right people, and human approval remains attached before the result is posted or relied on.

The strongest control is not a log written after the work is done. It is a workflow where the evidence is created alongside the work itself.

Title