Can Claude Code do a real bank reconciliation end-to-end?

It can do the matching and exception-list generation reliably. The judgment-call portion (deciding what reconciling items mean and how to clear them) still needs a human. The 80% of work that's mechanical matching is exactly what an agent is good at.

What if my bank export and GL don't have a common transaction ID?

Most don't. The standard approach is fuzzy matching on amount + date (with a tolerance of 1-2 days for clearing differences) plus a similarity score on the description field. Claude Code can build this in 30 minutes; the harder part is tuning the thresholds for your specific accounts.

Automating Reconciliations with Claude Code: A Worked Example

If you do bank reconciliation manually, you've probably thought "this should automate itself" a hundred times. Here's a complete worked example of building that automation in an afternoon with Claude Code, including the prompts that actually worked and the gotchas worth knowing.

The setup

Two CSV files in a folder:

bank_export.csv — date, description, amount, balance
gl_export.csv — date, account, description, debit, credit, ref_number

Goal: produce matched.csv, bank_only.csv (items on the bank but not the GL), and gl_only.csv (items on the GL but not the bank).

Prompt 1 — The first pass

"I have two files in this folder, bank_export.csv and gl_export.csv. Write a Python script that matches transactions between them. A match is when amounts equal (treat GL debit as positive bank, GL credit as negative bank) and dates are within 2 calendar days. Output three files: matched.csv, bank_only.csv, gl_only.csv. Print a summary of the row counts."

Claude Code writes a script using pandas, runs it, and reports something like: 142 matched, 8 bank-only, 5 GL-only.

Verification — the part most automation skips

Open the three output files. Spot-check ten rows in matched.csv: are the matches actually correct? Look at bank_only.csv: do those items have a real reason to be unmatched (timing differences, bank fees not yet recorded), or did the matcher miss them?

This is where the worked example always reveals issues. In my test data:

Two real matches were missed because the GL had the amount split into two entries (e.g. a $1,500 deposit posted as $1,000 + $500 in the GL but $1,500 on the bank).
Three "matches" were wrong because two different transactions for the same amount happened on adjacent days.

Prompt 2 — Iterating on the gotchas

"Update the script to also try summing consecutive GL entries within a 2-day window when looking for a match — sometimes a single bank deposit is split into multiple GL entries. Also, when there are multiple potential matches for the same bank line, prefer the one with the closest date and flag the situation in the matched output as 'ambiguous_match'."

Claude Code rewrites and runs. New numbers: 144 matched (with 3 flagged ambiguous), 6 bank-only, 5 GL-only.

Prompt 3 — Adding fuzzy description matching

"For the bank-only and GL-only lists, look for likely matches across the lists where the amount and date don't exactly fit but the description has high similarity (use rapidfuzz, threshold 80). Output a fourth file likely_matches_review.csv with these candidates, the similarity score, and what the differences are."

Claude Code installs rapidfuzz and runs. Result: 4 likely matches surfaced for human review.

Prompt 4 — Making it repeatable

"Wrap this in a single script I can run with python reconcile.py path/to/bank.csv path/to/gl.csv. Add error handling for malformed input files. Add a tolerance config at the top so I can adjust the date window and amount tolerance without editing the matching code."

I now have a reusable script. Next month, drop in two new files, run one command, get four output files. What used to be a half-day reconciliation is 15 minutes of review.

The workpaper

An automation that doesn't produce documentation isn't audit-friendly. So:

"Add a markdown summary file reconciliation_summary.md with: input file names and row counts, matching parameters used, output file row counts, list of all ambiguous and likely matches with their detail, timestamp, and a short methodology paragraph. This is the workpaper."

Now there's an audit trail.

What this isn't

It isn't end-to-end automated. A human still reviews the exception lists, decides what each unmatched item means, and posts any reconciling entries.
It isn't a replacement for a SOX control. If this is part of a key control test, your auditor needs to validate the matching logic and the agent's output.
It isn't risk-free. The script could match wrong if the data is unusual. Spot-check forever, even after it's "working."

The compounding lesson

You just built a small Python application without writing Python. The same loop — describe, watch the agent build, verify, iterate — applies to dozens of accounting workflows. Pick the one that's annoying you most this week and try the same sequence. The first build is the steepest; everything after is faster.