How We Built a 1-Person Freight Brokerage Run by AI Agents

TL;DR: ATI Logistics is a multimodal freight broker operated by one ops person plus a stack of AI agents (SALI, EZRA, TyTe, V3 TMS, MV3 client extensions). We handle ~1,000 shipments per month. This is the operator story I've been wanting to read — what works, what we tried and abandoned, the actual file paths and architecture, and the real numbers.

What this is, what it isn't

This isn't a launch announcement and it isn't a product pitch. It's the kind of post I wanted to read three years ago when I started thinking about whether AI could actually run a freight brokerage. Most of what's out there is either marketing for a TMS vendor or a startup founder telling a story about scaling a team. Neither of those answered the question I had, which was: what does the stack actually look like on a Tuesday afternoon when two carriers are emailing back rate confirmations at the same time and a customer is asking for a quote with no freight class?

The four robots

We split the work into four conceptual "robots." This is a useful mental model even if your team is one person, because each robot has a separate latency budget and a separate failure mode.

SALI: the email intake brain

SALI (Standardize, Automate, Liberate, Intelligence) is the python service that owns the inbox. It runs on the SALI server box (an Azure VM running Windows + WSL2 Ubuntu) and listens on a handful of internal ports.

Why python and not LangChain / CrewAI / etc

We tried two of the orchestration frameworks early. Both worked for a demo and fell apart in production because they don't have a good answer for: "what happens when this LLM call costs more than the freight margin on this quote?" We needed cost-aware confidence routing, where a high-confidence parse goes straight through and a low-confidence parse escalates to a more expensive model, and we needed to log every decision for retraining. We ended up writing it ourselves in straight python with the Anthropic SDK. It's ~3,000 lines of dispatcher code — not the part you'd want in a framework.

The intake flow

The whole flow has cost guardrails. The Haiku tiebreak runs first, and we only escalate to Opus when Haiku's confidence is below threshold. On a typical day Opus fires on roughly 8–12% of inbound emails. The rest are handled deterministically or by Haiku.

What we learned about LLM parsing

Robot	Job	Latency budget	Failure mode
Quote Runner	Parse inbound RFQ emails, dispatch to provider mix, write back to customer	<60s end-to-end	Wrong mode classification (LTL vs FTL vs drayage)
Carrier Runner	Parse carrier rate replies, match to open quotes, write to brokered-rate ledger	<30s per reply	MC mismatch / wrong quote attribution
Tracking / Docs	Pickup/delivery confirmations, BOL/POD ingestion, customer notifications	<2 min per event	Missed shipment event leads to angry customer
Lead Gen	Outbound prospecting via enriched lane history + Hunter.io + Serper	Async (queued)	Spam complaint / sender reputation hit

The bottleneck isn't the parse, it's the spam filter. When we instrumented our inbound queue, more than half the volume was marketing email, credit application responses, mailer-daemon bounces, and bot-generated quote requests with no real lane. We built a 4-layer skip gate (sender blocklist, subject keywords like "credit application," confidence floor, LLM self-classification escape) that rejects 60-70% of inbound before we ever pay for a full parse. Without this layer the LLM bill would have been 3x.

Customers don't include freight class. We analyzed 2,229 anonymized LTL quote requests and 84% of them did not include a freight class. The industry has been telling shippers to include it for 30+ years. They don't. This means a rating engine that requires class will reject the request, guess wrong, or bounce to manual. Our parser handles this by inferring class from commodity + density when it can, and otherwise treating it as a legitimately ambiguous field that warrants an ASK card.

EZRA: market data and rate observations

EZRA runs on a separate machine (a MacPro tower for ops reasons, not because Macs are better) and handles the market-data side: rate observations across DAT, Truckstop, Highway, polymarket, port congestion observations, and a few proprietary scrapers we built for specific lane benchmarks. It pushes data to our infrastructure via a write endpoint on the SALI box.

We deliberately keep EZRA on separate hardware because the market data scrapers have a different failure profile (long-running browser automation with frequent session-expiry pain) than the email intake stack. When EZRA breaks, we'd rather not also lose the ability to quote.

TyTe: the AI ops layer

TyTe is the third agent in the swarm and the one that's hardest to explain. It's a DigitalOcean-hosted PHP swarm that does cross-cutting platform work: PR review, large refactors, vendor-API integration writeups, content generation, audit passes. When SALI needs to know "where does this error code come from in our codebase," TyTe can grep the entire stack and reply.

The three agents talk via a single shared coordination board (just a PHP endpoint that writes to a JSON log). Every agent reads it at session start, every agent posts decisions or asks. It's a primitive message bus by 2026 standards but it works, and it's the single source of truth for what the AI swarm has been doing.

V3 TMS: where the money lives

The transactional layer is a custom CakePHP application we call V3. It serves v3.availabletradeinternational.com behind nginx, with separate PHP-FPM pools per tenant (atix, enterprise, truck, v2). It owns quote records, orders, brokered rates, customer accounts, and the carrier app backend.

This is the part of the stack that no AI agent touches without an HMAC. SALI dispatches to V3 through POST /admin/api/createQuote and similar endpoints, with a 5-column source attribution payload on every write (source_system / source_type / source_endpoint / source_lane) so we can always answer "who created this row." If an AI agent's logic is wrong we can find and reverse its writes in the audit log.

MV3 client extensions: how we replaced the Mac Helper

One of our biggest infrastructure projects was retiring our Mac Helper, which was a Playwright-based browser automation layer running on a dedicated Mac. Every loadboard provider session lived inside it: Truckstop, DAT, TruckSmarter, Highway, HaulPay, Unishippers. It worked, but session expiry was a nightmare and it was a single point of failure.

We rebuilt it as a stack of Chrome MV3 (Manifest V3) extensions that ride on a long-lived Chrome browser. Each extension owns one provider's session, watches for token expiration, and provides an internal HTTP bridge for the rest of the stack to call. When a token is about to expire it refreshes silently using the provider's own refresh flow. We retired the Mac Helper entirely on the same day all 4 remaining clients (LBN, Drayage, Highway, EasySendy, GlobalTranz) went live in MV3.

The numbers

I have to be careful here because some of these are sensitive. The ones I can share publicly:

What we tried and abandoned

SoftModal. We integrated a rail-aggregator API called SoftModal and shipped it as a provider in our rate fan-out. Within weeks their account suspension hit because of our query volume. Lesson: rail aggregator APIs have throttling rules that aren't always documented; talk to the rep before integrating, not after.

Mac Helper Playwright. See above. Session expiry on multiple sites at once meant 3am pages for the ops person and a bad time. We replaced it with MV3 client extensions.

Self-registering autonomous LLM cycles. We wanted SALI to be able to schedule its own LLM cycles for self-improvement (the "Dreams" loop pattern). The Anthropic harness blocks this as Create-Unsafe-Agents, which is the right call: an AI agent that can schedule itself crossing into broader and broader autonomy is the kind of thing that needs human approval, not a self-grant. We use a passive harvester pattern instead — SALI flags candidates for self-improvement, a human (or a separate non-self-modifying loop) decides whether to act on them.

What we'd build differently if starting over

Start with a smaller LLM stack. We tried too many models early. Pick one frontier model and one cheap classifier and stick with them for the first 90 days. The cost-aware confidence routing logic we built is worth it eventually, but in month one you should be focused on the parse quality, not the cost.

Build the audit log on day one. Every AI write needs source attribution before the first AI write happens. We back-filled this and it cost us weeks of investigation work when something went wrong and we couldn't tell which agent's logic had written which row.

Don't fight the harness. Anthropic's classifier blocks certain categories (self-modification of approval gates, create-unsafe-agents, security-weaken). Early on we tried to route around some of these. We stopped, and instead treated the boundaries as design constraints. The system is better for it.

If you want to talk

I'm ryan@ship-ati.com. Happy to talk to founders building in adjacent spaces, ops people at other brokers, or anyone thinking about whether AI can run a part of their business. I will not respond to vendor pitches or recruiting.

If you're a shipper looking at brokers, our rate request hub is at v3.availabletradeinternational.com/moving and we cover all 48 contiguous states across multiple modes.