High-volume decisioning: what escalated retail claims taught me about scaling judgment
At high volume, consistent fast decisions come from frameworks, escalation thresholds, and feedback loops, not gut feel. Lessons from escalated retail claims.
The fantasy version of good judgment is a wise person looking at a hard case, thinking deeply, and arriving at the right answer. It is a lovely image. It also falls apart the moment you have to make that call two hundred times before lunch, while the next person in the queue is already waiting and the one after that is angry.
I spent a stretch of my working life as a senior resolution generalist at IKEA, with financial authority over escalated customer claims. "Escalated" is the operative word. The easy ones got resolved before they ever reached me. What landed on my desk was the residue: the disputed delivery that may or may not have arrived, the damaged item with a story that did not quite add up, the refund sitting just past the edge of policy, the customer already told no twice and not in the mood for a third. And I had the authority to spend the company's money to make it right, or not.
Here is the thing nobody warns you about. When you have that authority and that volume, your gut is not an asset. It is a liability, and it fails two ways. The first is decision fatigue: every call burns a little fuel, and run on pure deliberation you are not deciding by mid-afternoon, you are flinching. You approve things to make the discomfort go away, or deny things to feel in control, and neither has anything to do with the merits. The second is inconsistency: the version of me at 9am and the version at 4pm would not give the same customer the same answer. Each decision might be defensible; the pattern is indefensible. A refund policy that depends on who you got and what mood they were in is not a policy, it is a lottery, and people can smell a lottery.
So judgment at scale is not a personal quality. It is a system, one that produces consistent, defensible, fast decisions even when the human inside it is tired, busy, and on their fourth coffee. Once you see it that way, you can build it. Here is what is actually inside.
The mental model: most decisions are not decisions
The single most useful idea I carried out of that role is this. The overwhelming majority of cases that feel like judgment calls are not. They are pattern matches wearing a judgment costume.
Picture every incoming case dropping through a sieve with three layers.
- The clear-yes layer. The case fits a known pattern that resolves in the customer's favour. Damaged on arrival, photo attached, within the window, value under the threshold. Nothing to deliberate. You apply the rule and move on. Speed is the whole job here, and deliberating makes it worse, because you are spending scarce judgment fuel on a case that did not need any.
- The clear-no layer. The case fits a known pattern that does not qualify. Outside policy, no evidence, a pattern that looks like abuse. Again, no agonising. You decline, clearly and kindly, and move on.
- The genuinely-ambiguous layer. What is left after the first two catch everything they can. The novel situation, the sympathetic edge case, the one where the rule gives an answer that is technically correct and obviously wrong. This is where real judgment belongs, and now you have the fuel for it, because you did not waste it upstream.
The mistake almost everyone makes, especially conscientious people, is treating all three layers as the ambiguous layer. They bring full deliberation to the clear-yes cases out of diligence, run out of fuel, then bring sloppy half-attention to the genuinely hard ones that deserved the full beam. Exactly backwards. Be fast and almost mechanical on the obvious cases precisely so you can be slow and careful on the hard ones.
Getting good at this role was mostly getting good at sorting: recognising, in the first ten seconds, which layer a case belonged to. The sorting is the skill. The deciding is easy once the sorting is right.
Escalation thresholds: knowing where your authority ends
A decision framework is only half the system. The other half is knowing, precisely and in advance, where your own authority stops.
I had financial authority up to a point. Above it, a case had to go up, and the worst possible time to figure out where that line sits is mid-call with a customer staring at you. So the line has to be a number you already know cold, not a feeling you arrive at under pressure. A clean threshold has three explicit parts:
- The trigger. What pushes a case past you. Usually a value above your authority, but not only that. Anything legally loaded, anything touching safety, anything that would set a precedent bigger than the single case. A 40 euro goodwill credit and a claim that hints at injury are not the same kind of decision even if the euro figure is identical.
- The path. Who it goes to, with what attached. Escalation that means "I gave up, here is a mess" is useless and trains the next person up the chain to dread your name. Done right it is a clean handoff: here is the case, here is what I have already verified, here is my read, here is the specific thing I cannot sign off and why.
- The default while it is in flight. What the customer experiences during the wait, which is the part people forget. They do not care about your internal authority structure. They care that they are stuck. So the threshold comes with a holding move that keeps the person informed and not abandoned while the real answer is worked out.
Escalation is not failure. Early on it feels like admitting you could not handle it. It is the opposite. Someone who escalates the right cases and only the right cases shows better judgment than someone who white-knuckles every decision to prove they can. The skill is calibration: escalate too much and you are a bottleneck pushing your job onto someone else; too little and you are signing off on things that were never yours to sign. The threshold removes the agonising. The case is either over your line or it is not, and you already know where the line is.
Feedback loops: the part that makes it compound
A framework without a feedback loop slowly rots. This is the piece most operations get wrong, because the first two parts are visible and this one is not.
If you make a hundred decisions a day and never find out which were right, you are not building judgment. You are repeating your current judgment, blind spots and all, faster. Volume without feedback does not make you better. It makes you more confidently the same.
The loop that matters is the reversal. When something I approved turned out to be abuse, or something I declined got overturned on appeal because the customer was right and I had missed it, that was the gold. Uncomfortable, but every reversal was information about exactly where my pattern matching was miscalibrated: maybe I was systematically too harsh on one category of claim, maybe a pattern I had been treating as abuse was a legitimate situation I had never seen framed that way. You only learn that from cases coming back, and only if someone is closing the loop instead of letting reversals vanish silently into the system.
The same logic runs through onboarding, which I also spent real time on. A new person's calibration is wrong on day one, not because they are not smart, but because they have not seen enough cases to know which layer a thing belongs in. The fastest way to build their judgment is not a longer rulebook. It is a tight loop: they decide, someone reviews, they find out within a day or two, they adjust. Same-day correction on twenty real cases beats a hundred-page manual followed by a month of guessing alone.
This is what turns a static framework into something that compounds. The framework sorts the cases, the thresholds catch what is over your line, and the feedback loop quietly retunes the framework itself, so next month's sorting is sharper than this month's. Without the loop you have a system that is exactly as good as the day you wrote it and never any better.
Where it transfers, and where it bites back
I do not work in retail anymore. I build software now, systems that make decisions on their own. But this exact shape, framework, thresholds, feedback, is the spine of how I think about any operation that has to produce consistent judgment at scale, whether the decider is a tired human or a confident model.
The connection is almost too direct. When I design an AI agent that proposes accounting entries, I am building the same three layers I used to run in my head at a claims desk. Auto-handle the obvious cases fast. Block and escalate the loaded ones to a human. Route the genuinely ambiguous middle to where the real judgment lives. The model is just doing the first-pass sorting I used to do by eye, the escalation threshold is still a number decided in advance, and the feedback loop is still what keeps the whole arrangement honest. I did not learn this from building software. I learned it at a desk with the authority to spend real money on strangers, and the software is the same lesson in a different language.
The tradeoffs are real, though, and worth naming so you do not romanticise the framework.
- Frameworks calcify. A sorting rule that was right last year quietly becomes wrong as the world shifts, and because it is fast and automatic, nobody notices it is misfiring. The feedback loop is your only defence, which is exactly why it is the part that gets cut first when things get busy.
- Thresholds get gamed. The moment people learn that claims under a certain value sail through, some will conveniently land just under it. Any fixed line invites pattern-matching against the line itself, so it cannot be the only check.
- Sorting fast can become not-looking. The dark version of "most decisions are not decisions" is treating a genuinely ambiguous case as a clear-yes because you are tired and it is faster. This is the one I watch hardest, in myself and in the systems I build: getting mechanical on the easy cases is supposed to leave more care for the hard ones, and the failure mode is that habit of speed bleeding into the wrong layer.
Scale runs on systems, not goodwill. I believed that before I could have told you why. Standing at that desk, the goodwill version of me ran out around two in the afternoon, every day. The system version did not. It made the same fair call at 4pm that it made at 9am, spent its real attention on the cases that earned it, and got a little sharper every time a decision came back to teach it something. That is not a smaller kind of judgment than the wise-person fantasy. At any real volume, it is the only kind that holds.
Want to discuss this? Write directly.
jami@impactnode.fi