Agentic AI
Auditability in the age of agentic AI: what changes and what doesn't
Written by

The Maxima Team
The auditor asks a simple question: "This entry was prepared by your AI system. What did it use as input, what logic did it follow, and who reviewed it before it posted?" The controller knows the answer. The problem is proving it.
The source report is in the ERP. The calculation is in a spreadsheet on a shared drive. The approval is in an email chain. The review note is in Slack. The entry is already posted. Assembling the response will take most of an afternoon. The entry itself took five minutes.
This scene is not new. It has played out in every close cycle for decades. What makes it urgent now is that AI is beginning to prepare accounting work, and the gap between performing the work and proving how it was performed is getting harder to defend. That is not an AI problem by itself. It is a documentation architecture problem that AI inherits. If AI-prepared work runs on the same fragmented infrastructure that created the evidence gap in the first place, the result is the same auditability problem, only faster.
As Yogi Goel writes in CPA Practice Advisor, AI is becoming embedded in core accounting workflows during the end-of-month process, from transaction classification to journal entry generation to variance explanations. But the core questions for auditors remain questions of completeness, accuracy, and trust. Can the output be traced back to its source? Was it reviewed and approved by a human? Is the documentation sufficient to withstand scrutiny?
The trust standard does not change when AI prepares the work. Controllers still need evidence, tie-outs, review history, approvals, and audit-ready support. What changes is how that evidence gets created and maintained.
The model that holds up under audit scrutiny has a name: agent-prepared, human-reviewed. The agent does the preparation. The human owns the judgment and the approval. The rest of this piece is about what that model has to produce to be auditable, and why most current systems do not produce it.
The separation problem
Most accounting teams separate doing the work from documenting the work. Often the same accountant does all of it. They post a journal entry, then assemble the support. They tie a reconciliation, then reconstruct the evidence for the reviewer. They write a variance explanation, then hunt for the invoice, vendor detail, or transaction listing that proves it.
This is not carelessness. It is how many close processes were built. The result is a separation problem: the accounting conclusion and the evidence supporting it do not live in the same place at the same time.
That separation creates three failure modes:
The first is context decay. The moment the work is completed, the person who performed it holds the most context. Every hour that passes between performing the work and documenting it erodes that understanding. A recurring accrual may keep posting because it has always posted. A reconciling item may keep rolling forward because someone remembers why it was valid. A variance explanation may say "timing" because that is what the team said last quarter. By the time the auditor asks, the preparer may not remember the rationale clearly. The reviewer may not remember what was challenged. The support may exist, but the story behind the support has faded.
The second is evidence fragmentation. The journal entry lives in the ERP. The source data lives in a payroll provider, bank feed, billing system, or subledger. The calculation lives in Excel. The approval lives in a close checklist. The reviewer's question lives in Slack or email. No single system holds the complete chain from source to output to approval. When the PBC list arrives, the accounting team does not retrieve evidence. It reassembles evidence.
The third is reverse-engineered support. The team starts with the number that posted, then works backward to find documentation that justifies it. The support may be directionally right, but it was assembled to explain a conclusion after the fact, not captured while the conclusion was being prepared. A reconciliation that ties is not automatically audit-ready. The reviewer still needs to understand why it ties without needing the preparer in the room. That is where auditability starts to break down.
PCAOB AS 1215 defines audit documentation as the written record supporting the auditor's conclusions. Recent amendments to AS 1215 shorten the audit documentation assembly window from 45 days to 14 days after the report release date, with effective dates depending on firm category and audit period. That is an auditor documentation requirement, not a direct corporate accounting-team checklist. But it is a clear signal that documentation discipline is tightening.
The accounting team's practical challenge is upstream: if the evidence was not captured when the work was performed, everyone downstream has to work harder to prove what happened.
Tracking the work is not the same as proving it
This is the distinction many close processes miss, and it is worth pausing on because it changes how teams evaluate their own infrastructure. A system that logs when a task was completed, who checked the box, and what document was attached is producing a task audit trail. That matters. But it is not the same as proving the accounting work.
A task audit trail records that work happened. An accounting audit trail shows how the number was prepared.
These sound similar. They are fundamentally different. A task audit trail answers the question: was this done? An accounting audit trail answers the question: can this be re-performed? The first tells you that someone completed the cash reconciliation on June 3rd at 4:47 PM. The second tells you what data sourced the reconciliation, what matching logic was applied, which items were flagged as exceptions, how those exceptions were resolved, who reviewed the result, and whether the review happened before the balance was finalized.
Controllers live in the gap between these two standards every audit cycle. The close management system says every task is complete. The auditor asks how a specific number was derived. And the team spends the next three hours reassembling an answer from spreadsheets, email threads, and memory. When an auditor pulls a journal entry for testing, the question is not only whether the entry was marked complete. The auditor wants to know what data sourced it, what logic produced the amount, who reviewed the result, whether exceptions were resolved, and whether the review happened before posting.
A checklist timestamp does not answer those questions. A file attachment does not automatically establish lineage. The attachment might be the right report. It might be the wrong version. It might have been uploaded after the fact. The task log cannot always tell the difference because the preparation layer was never captured.
This gap widens when AI enters the workflow. If an agent drafts a journal entry and the only record is that a human clicked "approve," the trail is incomplete. It shows that approval happened. It does not show what the agent used as input, what rule or template it applied, what exceptions it surfaced, or what the reviewer evaluated before approving.
The evidence may look sufficient until someone tries to re-perform the work. Then the chain breaks. That is the real standard for audit trail accounting in the age of AI: not whether a task moved through a workflow, but whether the accounting conclusion can be traced from source to approval. The audit trail should capture the preparation itself, not just the administration around it.
What evidence should actually look like
PCAOB AS 1105 says auditors must obtain sufficient appropriate audit evidence to provide a reasonable basis for their opinion. It also explains that audit evidence includes information that supports management's assertions and information that contradicts them.
In an AI-prepared close workflow, that standard translates into a simple evidence chain: Source → transformation → output → exception handling → approval → retention
The source is the upstream data used to prepare the work: a bank feed, payroll report, billing export, contract, invoice, subledger detail, or ERP transaction. The transformation is what happened to that data before it became an accounting output: matching logic, account mapping, threshold application, allocation, aggregation, accrual calculation, or variance driver analysis. The output is the accounting result: journal entry, reconciliation conclusion, variance explanation, close task, or reporting support.
Exception handling shows what did not follow the normal path: unmatched items, unusual movements, overrides, rejected suggestions, reviewer adjustments, or escalations. Approval shows that a qualified human reviewed the work, challenged it where needed, and approved it before the result was posted or relied on. Retention preserves the full chain so the team can produce it later without reconstructing it from memory, screenshots, and inbox searches.
This evidence chain matters because AI output alone is not audit evidence.

A journal entry, a reconciliation, or a variance explanation produced by an agent is a draft until the chain behind it is intact: the source it drew on, the logic it applied, the exceptions it surfaced, the human who reviewed it, and the approval that let it post.
Auditors do not need the model's hidden reasoning. They need accounting evidence: source records, applied rules, calculations, assumptions, thresholds, exceptions, overrides, review notes, approvals, and final postings. The system should explain what it did in accounting terms. It should not ask the auditor to accept a black-box answer.
What this looks like in practice
Consider a month-end payroll accrual: A company's pay period ends on June 28, but the accounting period ends on June 30. Assume the company's policy accrues salaried payroll using a calendar-day run rate. The payroll report shows $2.94 million of biweekly salary expense over a 14-day period. The calculation is simple. $2.94 million divided by 14 days equals a daily run rate of $210,000. Two days produces an initial accrual of $420,000.
But the audit question is not only whether $420,000 was calculated correctly. The audit question is whether the number is supported. What population was included? Were terminated employees excluded? Were bonuses, commissions, benefits, and payroll taxes treated consistently with policy? Was the amount allocated to the right departments? Did the reviewer inspect the source data? Did the final entry tie to what was posted?
Suppose the reviewer identifies terminated employees who should not be part of the accrual population, excludes them, adjusts the accrual down by $6,500, and approves a final figure of $413,500. The journal entry posts to the ERP. Now there are two numbers in play. The calculation produced $420,000. The entry posted at $413,500. That difference is not an error and not a mystery, as long as the record shows it. The exclusion, the reason for it, and the reviewer who made the call all have to be captured at the moment the review happens. If they are not, then the entry simply reads $413,500 and nothing on file explains why it is not $420,000.
The final number is not the point. The trail is the point.
A defensible audit trail would show the source payroll report, the date range, the employee population, the daily run-rate calculation, the reviewer adjustment, the final entry, the approval, and the ERP posting reference. If the auditor selects the entry six months later, the team should not have to reconstruct the story. The story should already be attached to the work.
This is what Maxima's approach to journal entry preparation is designed to produce. The agent prepares the entry from the source data and the configured rules. The controller reviews it, makes the adjustment, and approves. The difference is that the reviewer's change and the reason behind it are recorded as the review happens, so the trail is complete the moment the entry posts, not assembled weeks later when an auditor asks.
The same logic extends to every close workflow, though what the chain has to carry shifts with the work.
A cash reconciliation connects the GL balance to the bank balance, and the trail should hold the bank statement or feed, the GL detail, the matching logic, the deposits in transit, the outstanding payments, the fees, the stale items, and the reviewer's questions. A reconciliation that ties still has to show the reviewer what produced the match.
Revenue is harder, because recognition involves judgment. The chain connects reported revenue to billing, contract, usage, or schedule data, and carries the recognition logic, the subledger-to-GL tie-out, deferred revenue movement, and manual adjustments. An agent can prepare the tie-out and surface the unusual items. It should not be the one owning or approving a non-standard revenue judgment.
Intercompany adds a second party, so the chain has to show both sides of the transaction: counterparty mapping, entity-level balances, FX or timing differences, elimination support, and any unresolved breaks.
Accruals turn on assumptions, so the chain has to carry the methodology, the estimate basis, the completeness review, the reversal plan, and the subsequent clearing. A recurring accrual may be largely rule-based. A judgment-heavy one needs the reasoning written down. The system should make that difference visible, not flatten it.
A variance explanation has to tie to the trial balance, the threshold, the account movement, the transaction drivers, and the reviewer's challenge. "Increase due to timing" is not audit evidence. A quantified explanation with source-linked drivers is.
Auditors do not need more folders. They need a cleaner path from the final number back to the evidence that supports it
The audit trail worth building
The question is not whether agentic AI can produce accounting work. It is whether the infrastructure around that work produces evidence that holds up when someone asks how a number was derived, who reviewed it, and whether the conclusion is defensible.
Agent-prepared does not mean agent-approved. AI can gather source data, apply rules, match transactions, draft entries, prepare reconciliations, surface exceptions, and assemble evidence. But judgment, accountability, and governance remain human-owned: policy judgment, materiality assessment, review of non-routine exceptions, approval of journal entries, evaluation of estimates, escalation of unusual items, final sign-off, and responsibility for the financial statements. The most defensible AI workflows make that boundary visible. The agent prepares. The human reviews, challenges, approves, or rejects.
This is the standard Maxima is built toward: not a system that tracks whether a task was completed or stores its support after the fact, but one that prepares the work so the evidence chain is already attached when the controller reviews it. Every number has lineage. Every exception has an owner. Every approval has context. Every reviewer can understand the work without reconstructing it.
This is not a prediction about where accounting might go. It is where the regulatory signals, the audit profession, and the economics of close already point. AI will prepare more of the work each year. The teams that come through audits well will be the ones who built the evidence into the work itself. The teams still treating documentation as a separate job will keep paying for that separation.
The goal is not to convince auditors that AI is trustworthy. It is to make trust unnecessary by making the work inspectable. The trust standard does not change. The evidence model does.
Move closer to an audit-ready, real-time close

Request demo

Request demo

Request demo
Insights, news and content
The latest
See all



