<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Works With Agents]]></title><description><![CDATA[Works With Agents]]></description><link>https://blog.workswithagents.dev</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1593680282896/kNC7E8IR4.png</url><title>Works With Agents</title><link>https://blog.workswithagents.dev</link></image><generator>RSS for Node</generator><lastBuildDate>Thu, 07 May 2026 23:14:16 GMT</lastBuildDate><atom:link href="https://blog.workswithagents.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Every Agent I Delegated To Kept Failing. I Finally Checked the Model.]]></title><description><![CDATA[I built a delegation system that spawns AI agents to handle sub-tasks in parallel. Quality sweeps. Code audits. Checking every SDK directory for dead links. The idea: spin up cheap local agents, let them work, collect results.
They kept failing. Not ...]]></description><link>https://blog.workswithagents.dev/agent-autopsy-2-every-agent-failing</link><guid isPermaLink="true">https://blog.workswithagents.dev/agent-autopsy-2-every-agent-failing</guid><category><![CDATA[agents]]></category><category><![CDATA[AI]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Thu, 07 May 2026 22:33:01 GMT</pubDate><content:encoded><![CDATA[<p>I built a delegation system that spawns AI agents to handle sub-tasks in parallel. Quality sweeps. Code audits. Checking every SDK directory for dead links. The idea: spin up cheap local agents, let them work, collect results.</p>
<p>They kept failing. Not crashing — just stopping. No output. No error. 600 seconds of silence, then a timeout.</p>
<p>I assumed the tasks were too complex. I assumed parallel delegation was unreliable. I never checked what model I was actually giving them.</p>
<h2 id="heading-the-root-cause">The Root Cause</h2>
<p>My delegation system was configured to use a small local model. Fine for single-turn questions. Useless for multi-step tool loops.</p>
<p>A quality sweep isn't one tool call. It's: find the directory, list the files, search each one, flag issues, report results. That's five sequential steps, each dependent on the last. The small model lost coherence after the second call. The first step worked. By the third, it was hallucinating or hanging.</p>
<p>Meanwhile, the main agent handled the exact same tasks in minutes. Same instructions. Different model.</p>
<h2 id="heading-what-i-assumed">What I Assumed</h2>
<p>I assumed any model that passes benchmarks can handle tool-calling. I assumed "cheap model for leaf tasks" was an optimization. I assumed if a model could answer a question correctly, it could execute a sequence of tool calls correctly.</p>
<p>Benchmarks measure knowledge. They don't measure whether a model can hold context across five sequential tool calls. Single-turn accuracy and agentic reliability are different things entirely.</p>
<h2 id="heading-what-i-no-longer-assume">What I No Longer Assume</h2>
<p>I now test every model on a concrete multi-step task before adding it to the delegation pool: find a directory, search for a pattern, read the matching file, report what you found. If it can't complete that loop, it doesn't get delegated work.</p>
<p>I also built a decision gate that evaluates task complexity against model capability before spawning a subagent. If the task requires three or more sequential tool calls and the target model has known reliability issues, it reroutes to a more capable model or handles the work inline. Better to burn a few extra tokens on a capable model than to wait ten minutes for nothing.</p>
<h2 id="heading-what-you-should-check">What You Should Check</h2>
<p>If you're building systems that delegate work between agents:</p>
<ul>
<li><strong>Test subagent models on multi-step tool loops, not just benchmarks.</strong> Give them a real sequence of dependent calls. If they fail by step three, they're not ready for autonomous work.</li>
<li><strong>Gate delegation before it starts, not after it times out.</strong> A decision layer that checks task complexity against model capability catches failures before they become silent timeouts.</li>
<li><strong>Parallel delegation to weak models isn't faster — it's ten minutes of silence instead of two minutes of work.</strong> Before spawning subagents, ask: can the orchestrator just do this?</li>
</ul>
<hr />
<p>Both checks are open source in the <a target="_blank" href="https://github.com/vystartasv/agent-foundry">agent-foundry repo</a>. No promises about what breaks next — but something will.</p>
<hr />
<p><em>I build agent infrastructure inside Microsoft 365. SPFx · TypeScript · autonomous multi-agent systems. Currently open to senior/architect roles (£120K+ remote UK). → <a target="_blank" href="mailto:vilius@workswithagents.com">vilius@workswithagents.com</a></em></p>
]]></content:encoded></item><item><title><![CDATA[I Published Broken Packages to PyPI. I Checked Them First.]]></title><description><![CDATA[I published two Python packages last week. I checked them before tagging the release. CI was green. twine check passed. I moved on.
This morning my agent told me one of them had been broken for three days. Anyone who copied the install command from t...]]></description><link>https://blog.workswithagents.dev/agent-autopsy-1-broken-pypi-packages</link><guid isPermaLink="true">https://blog.workswithagents.dev/agent-autopsy-1-broken-pypi-packages</guid><category><![CDATA[agents]]></category><category><![CDATA[AI]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Thu, 07 May 2026 22:32:56 GMT</pubDate><content:encoded><![CDATA[<p>I published two Python packages last week. I checked them before tagging the release. CI was green. <code>twine check</code> passed. I moved on.</p>
<p>This morning my agent told me one of them had been broken for three days. Anyone who copied the install command from the README got <code>No matching distribution found</code>. The homepage link was a dead domain. Every image on the PyPI page — broken. The other package listed no license at all.</p>
<p>I had checked them. And they were wrong.</p>
<h2 id="heading-what-i-found">What I Found</h2>
<p>The README told users to install a package name that didn't exist — a typo in the one place that mattered most. The homepage link pointed to a domain that never resolved. Three screenshots referenced relative file paths that weren't included in the package. Three badge links pointed to absolutely nowhere.</p>
<p>The <code>workswithagents</code> package was cleaner, but PyPI displayed "License: None."</p>
<p>Both packages passed CI. Both passed <code>twine check</code>. Both were live.</p>
<h2 id="heading-what-i-assumed">What I Assumed</h2>
<p>I assumed CI green meant the package was correct. I assumed <code>twine check</code> validated what users would see. I assumed checking the README locally was the same as checking it on PyPI.</p>
<p>None of those things are true.</p>
<p><code>twine check</code> validates package <em>structure</em> — valid metadata headers, correct file layout. It does not resolve URLs. It does not compare install commands against actual package names. It does not check if images exist. It does not verify licenses. It's a compiler, not a content validator.</p>
<h2 id="heading-what-i-no-longer-assume">What I No Longer Assume</h2>
<p>Every package I publish now runs through a content quality gate <em>before</em> <code>twine upload</code>. The gate checks: does the homepage resolve? Does the install command match the actual package name? Are all images either in the wheel or reachable URLs? Is there a license? Do badge links have real targets?</p>
<p>The gate is 200 lines of Python. It caught all 9 issues in one run. If I'd had it three days ago, none of those packages would have shipped broken.</p>
<h2 id="heading-what-you-should-check">What You Should Check</h2>
<p>If you publish packages — PyPI, npm, anything — check these five things:</p>
<ul>
<li>Your install command in the README matches the actual published name</li>
<li>Your homepage URL resolves from an external network</li>
<li>Every image in your README is either bundled in the package or an absolute URL</li>
<li>Your license field isn't empty</li>
<li>Your badge links point somewhere real</li>
</ul>
<p>These aren't structural issues. CI won't catch them. You have to check them yourself — or build a checker that does.</p>
<hr />
<p>Part 2 coming soon.</p>
<hr />
<p><em>I build agent infrastructure inside Microsoft 365. SPFx · TypeScript · autonomous multi-agent systems. Currently open to senior/architect roles (£120K+ remote UK). → <a target="_blank" href="mailto:vilius@workswithagents.com">vilius@workswithagents.com</a></em></p>
]]></content:encoded></item><item><title><![CDATA[The Agent OSI Model — A 7-Layer Framework for AI Agent Infrastructure]]></title><description><![CDATA[The OSI model didn't create networking. It created the vocabulary that made networking a discipline. Before OSI, engineers said "the connection is broken." After OSI, they said "Layer 2 link is down."
AI agents have no equivalent. When an agent fails...]]></description><link>https://blog.workswithagents.dev/the-agent-osi-model</link><guid isPermaLink="true">https://blog.workswithagents.dev/the-agent-osi-model</guid><category><![CDATA[agents]]></category><category><![CDATA[AI]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Thu, 07 May 2026 22:32:52 GMT</pubDate><content:encoded><![CDATA[<p>The OSI model didn't create networking. It created the <strong>vocabulary</strong> that made networking a discipline. Before OSI, engineers said "the connection is broken." After OSI, they said "Layer 2 link is down."</p>
<p>AI agents have no equivalent. When an agent fails, we say "the agent broke." That's useless.</p>
<p>I've published a 7-layer framework for agent infrastructure. Not a product. Not a standard. A vocabulary.</p>
<hr />
<h2 id="heading-the-seven-layers">The Seven Layers</h2>
<pre><code class="lang-plaintext">L7  GOVERNANCE    Audit · Compliance · Sign-off       "Is this safe?"
L6  VERIFICATION  Testing · Evaluation · Gates        "Does this work?"
L5  COORDINATION  Consensus · Distribution · Conflicts "How do agents work together?"
L4  SESSION       Handoff · State · Context           "How does an agent continue?"
L3  DISCOVERY     Registry · Capabilities · Location   "How do agents find each other?"
L2  COMMUNICATION Messaging · Auth · API              "How do agents talk?"
L1  EXECUTION     Hardware · Runtime · Tools          "What runs the agent?"
</code></pre>
<hr />
<h2 id="heading-why-this-matters">Why This Matters</h2>
<p><strong>For debugging:</strong> "Your Layer 4 handoff is broken" is actionable. "Your agents aren't talking to each other" is vague.</p>
<p><strong>For building:</strong> Don't build everything at once. Target specific layers. A local agent needs L1 (runtime) + L2 (auth) + L4 (handoff). A multi-agent fleet adds L3 (discovery) + L5 (coordination). An enterprise deployment adds L6 (verification) + L7 (governance).</p>
<p><strong>For standards:</strong> Each layer without a standard is a gap — and an opportunity. The framework makes it obvious where standards are needed.</p>
<hr />
<h2 id="heading-what-exists-whats-missing">What Exists, What's Missing</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Layer</td><td>Infrastructure</td><td>Status</td></tr>
</thead>
<tbody>
<tr>
<td>L1</td><td>Blueprint Registry (verified LLM configs)</td><td>✅ Live</td></tr>
<tr>
<td>L2</td><td>MCP, A2A, Credential Proxy</td><td>✅ Live</td></tr>
<tr>
<td>L3</td><td>llms.txt, Agent Capability Manifest</td><td>✅ Spec written</td></tr>
<tr>
<td>L4</td><td>Handoff Protocol</td><td>📋 In proposal (MCP SEP #2683, A2A #1817)</td></tr>
<tr>
<td>L5</td><td>Coordination Protocol</td><td>🆕 Spec published today</td></tr>
<tr>
<td>L6</td><td>Agent Test Suite, Pitfall Registry</td><td>⚠️ Partial</td></tr>
<tr>
<td>L7</td><td>Transaction Protocol, Compliance-as-Code</td><td>🆕 Spec published today</td></tr>
</tbody>
</table>
</div><hr />
<h2 id="heading-three-new-specs-published-today">Three New Specs Published Today</h2>
<h3 id="heading-coordination-protocol-layer-5">Coordination Protocol (Layer 5)</h3>
<p>How agents work together simultaneously. Leader election (Raft-lite for agents). Work distribution with capability matching. Work stealing — idle agents pull from busy queues. Conflict resolution with audit trail.</p>
<h3 id="heading-agent-capability-manifest-layer-3">Agent Capability Manifest (Layer 3)</h3>
<p>Machine-readable declaration of what an agent can do. Like <code>package.json</code> but for agent capabilities. Discovery: "who can build SPFx?" → ranked by success rate + load + trust score.</p>
<h3 id="heading-agent-transaction-protocol-layer-7">Agent Transaction Protocol (Layer 7)</h3>
<p>Guarantees for autonomous actions. Idempotency keys (no double deploys). Intent-before-action logging (know what the agent TRIED to do even if it crashed). Rollback hooks. Three guarantee levels: Best-Effort, At-Least-Once, Exactly-Once.</p>
<hr />
<h2 id="heading-the-bigger-play">The Bigger Play</h2>
<p>Everyone's building AI agents. I'm building the infrastructure agents run on — the picks and shovels of the agent gold rush.</p>
<p>The Agent OSI Model is the framework. The specs at each layer are the picks and shovels. The certification system (Blueprint, Ready, Certified) is the trust layer on top.</p>
<p>Full framework and all specs: <a target="_blank" href="https://workswithagents.dev/specs/index.md">workswithagents.dev/specs/</a></p>
<p>Human-readable overview: <a target="_blank" href="https://workswithagents.com/agent-osi-model.md">workswithagents.com/agent-osi-model</a></p>
<p>All specs CC BY 4.0 — free to use, cite, and build upon. Attribution required.</p>
<hr />
<p><em>If you're building multi-agent systems and hitting coordination problems, or if you're in a regulated industry and need audit trails for autonomous agents — I'd like to hear about your use case. The specs are published. The infrastructure is being built. The conversation is starting now.</em></p>
]]></content:encoded></item><item><title><![CDATA[7 Protocols for Agent Infrastructure]]></title><description><![CDATA[I run about 20 AI agents. They delegate work to each other, deploy code, scan for vulnerabilities, and handle compliance checks. Over time, I kept hitting the same gaps — things that made autonomous workflows fragile in ways that took hours to debug....]]></description><link>https://blog.workswithagents.dev/7-protocols-for-agent-infrastructure</link><guid isPermaLink="true">https://blog.workswithagents.dev/7-protocols-for-agent-infrastructure</guid><category><![CDATA[agents]]></category><category><![CDATA[AI]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Thu, 07 May 2026 22:32:19 GMT</pubDate><content:encoded><![CDATA[<p>I run about 20 AI agents. They delegate work to each other, deploy code, scan for vulnerabilities, and handle compliance checks. Over time, I kept hitting the same gaps — things that made autonomous workflows fragile in ways that took hours to debug.</p>
<p>I published a <a target="_blank" href="https://dev.to/vystartasv/the-agent-osi-model-a-7-layer-framework-for-ai-agent-infrastructure-3aoe">7-layer model for agent infrastructure</a> on how I think about these problems. Two layers have strong industry standards: <a target="_blank" href="https://a2a-protocol.org/">Google's A2A protocol</a> handles agent-to-agent coordination (L5), and <a target="_blank" href="https://modelcontextprotocol.io/">Anthropic's MCP</a> standardises how agents discover and use tools (L3–L4). At the identity layer, the <a target="_blank" href="https://www.w3.org/TR/did-1.0/">W3C DID standard</a> defines decentralised identifiers. For governance, there's the <a target="_blank" href="https://www.nist.gov/itl/ai-risk-management-framework">NIST AI Risk Management Framework</a>.</p>
<p>The rest of the stack — the layers that make autonomous agents trustworthy, auditable, and production-safe — still has gaps. These seven protocols fill them. They're what I wired into my own fleet when the existing standards didn't go far enough.</p>
<p>All are CC BY 4.0. Five have live reference implementations. Two are spec'd but still in the works.</p>
<h2 id="heading-industry-standards-this-builds-on">Industry Standards This Builds On</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Standard</td><td>Layer</td><td>Organization</td></tr>
</thead>
<tbody>
<tr>
<td><a target="_blank" href="https://a2a-protocol.org/">A2A Protocol</a></td><td>L5 Coordination</td><td>Google / a2aproject</td></tr>
<tr>
<td><a target="_blank" href="https://modelcontextprotocol.io/">Model Context Protocol</a></td><td>L3–L4 Discovery + Session</td><td>Anthropic</td></tr>
<tr>
<td><a target="_blank" href="https://www.w3.org/TR/did-1.0/">W3C DID Core</a></td><td>L2 Communication</td><td>W3C</td></tr>
<tr>
<td><a target="_blank" href="https://www.nist.gov/itl/ai-risk-management-framework">NIST AI RMF</a></td><td>L7 Governance</td><td>NIST</td></tr>
</tbody>
</table>
</div><hr />
<h2 id="heading-1-trust-score-should-i-delegate-to-this-agent">1. Trust Score — Should I Delegate to This Agent?</h2>
<p>When one of my agents delegates work to another, it needs to know if the target is reliable. Not "does it respond" — does it actually complete tasks correctly and consistently.</p>
<p>Weighted across success rate, pitfall history, skill quality, and uptime.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> TrustScoreClient

ts = TrustScoreClient()
<span class="hljs-keyword">if</span> ts.get(<span class="hljs-string">"target-agent"</span>)[<span class="hljs-string">"tier"</span>] == <span class="hljs-string">"trusted"</span>:
    delegate(task, to=<span class="hljs-string">"target-agent"</span>)
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/trust-score.md">Spec</a></p>
<hr />
<h2 id="heading-2-deployment-manifest-declare-a-fleet-deploy-with-one-command">2. Deployment Manifest — Declare a Fleet, Deploy With One Command</h2>
<p>I got tired of manually tracking which agents run where, how many instances, and what capabilities they have. One YAML file, one command.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">fleet:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">"my-fleet"</span>
  <span class="hljs-attr">agents:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">id:</span> <span class="hljs-string">"builder"</span>
      <span class="hljs-attr">capabilities:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">"build"</span>
          <span class="hljs-attr">target:</span> <span class="hljs-string">"spfx"</span>
      <span class="hljs-attr">count:</span> <span class="hljs-number">3</span>
</code></pre>
<pre><code class="lang-bash">wwa fleet deploy fleet.yaml
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/deployment-manifest.md">Spec</a></p>
<hr />
<h2 id="heading-3-sla-framework-track-whether-agents-meet-their-promises">3. SLA Framework — Track Whether Agents Meet Their Promises</h2>
<p>Three tiers: Best-Effort (free), Production (99.5% uptime, 90% task accuracy), Regulated (99.9% uptime, 95% accuracy, 7-year audit retention).</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> SLAMetrics

sla = SLAMetrics(<span class="hljs-string">"my-fleet"</span>, tier=<span class="hljs-string">"production"</span>)
sla.report(<span class="hljs-string">"agent-1"</span>, <span class="hljs-string">"task-42"</span>, duration_seconds=<span class="hljs-number">187</span>, success=<span class="hljs-literal">True</span>)
status = sla.status()  <span class="hljs-comment"># {breaches: [], status: "ok"}</span>
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/sla-framework.md">Spec</a></p>
<hr />
<h2 id="heading-4-handoff-protocol-cryptographic-handoff-between-agents">4. Handoff Protocol — Cryptographic Handoff Between Agents</h2>
<p>When an agent passes a task to another, how do you know the output wasn't tampered with? Ed25519-signed handoffs with chain-of-custody verification. Built above <a target="_blank" href="https://modelcontextprotocol.io/">MCP</a>'s tool-use layer.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> Handoff

h = Handoff(from_agent=<span class="hljs-string">"planner"</span>, to_agent=<span class="hljs-string">"scanner"</span>, payload={<span class="hljs-string">"plan"</span>: <span class="hljs-string">"..."</span>})
signed = h.sign(planner_key)
verified = Handoff.verify(signed, planner_public_key)
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/handoff.md">Spec</a></p>
<hr />
<h2 id="heading-5-identity-protocol-verifiable-agent-identity">5. Identity Protocol — Verifiable Agent Identity</h2>
<p>Cryptographic agent identity with Ed25519 keypairs. Signed messages. Verification against registry. Extends the <a target="_blank" href="https://www.w3.org/TR/did-1.0/">W3C DID standard</a> with agent-specific key management and fleet-scoped verification.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> AgentIdentity

ai = AgentIdentity(<span class="hljs-string">"my-agent"</span>)
ai.register()
sig = ai.sign({<span class="hljs-string">"type"</span>: <span class="hljs-string">"heartbeat"</span>})

valid = AgentIdentity.verify(<span class="hljs-string">"other-agent"</span>, message, signature)
</code></pre>
<hr />
<h2 id="heading-6-compliance-as-code-regulation-as-executable-validation">6. Compliance-as-Code — Regulation as Executable Validation</h2>
<p>NHS DTAC, FCA, GDS, GDPR — as rules agents can validate against at runtime. Extends frameworks like the <a target="_blank" href="https://www.nist.gov/itl/ai-risk-management-framework">NIST AI RMF</a> from documentation into executable checks.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> ComplianceEngine

ce = ComplianceEngine()
dtac = ce.load(<span class="hljs-string">"dtac-v2.1"</span>)

<span class="hljs-keyword">if</span> dtac.validate(action).passed:
    execute(action)
<span class="hljs-keyword">else</span>:
    escalate_to_human()
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/compliance-as-code.md">Spec</a></p>
<hr />
<h2 id="heading-7-onboarding-protocol-systematic-agent-creation">7. Onboarding Protocol — Systematic Agent Creation</h2>
<p>Interview → generate → calibrate → benchmark → register. Instead of writing a prompt file and hoping, run a pipeline that produces a scored agent.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> OnboardingClient

ob = OnboardingClient()
result = ob.full_onboard(
    <span class="hljs-string">"nhs-auditor"</span>,
    <span class="hljs-string">"Audit agent actions for NHS DTAC compliance"</span>,
    capabilities=[<span class="hljs-string">"audit:compliance"</span>],
    skills=[<span class="hljs-string">"compliance-as-code"</span>]
)
</code></pre>
<hr />
<h2 id="heading-the-stack">The Stack</h2>
<p>Where each protocol fits alongside existing industry standards:</p>
<pre><code class="lang-plaintext">L7 GOVERNANCE    ← NIST AI RMF           Compliance-as-Code · SLA Framework
L6 VERIFICATION  (no standard yet)       Agent Test Suite · Pitfall Registry
L5 COORDINATION  ← A2A (Google)          Trust Score
L4 SESSION       ← MCP (Anthropic)       Handoff Protocol
L3 DISCOVERY     ← MCP (Anthropic)       Trust Score · Capability Manifest
L2 COMMUNICATION ← W3C DID               Identity Protocol
L1 EXECUTION     (no standard yet)       Onboarding Protocol · Deployment Manifest
</code></pre>
<p><a target="_blank" href="https://a2a-protocol.org/">A2A</a> (Google) — agent-to-agent task coordination at L5. <a target="_blank" href="https://modelcontextprotocol.io/">MCP</a> (Anthropic) — tool discovery and context sharing at L3–L4. <a target="_blank" href="https://www.w3.org/TR/did-1.0/">W3C DID</a> — decentralised identity at L2. <a target="_blank" href="https://www.nist.gov/itl/ai-risk-management-framework">NIST AI RMF</a> — governance framework at L7. These seven protocols fill what those standards leave open: trust, deployment, handoff integrity, compliance execution, and systematic agent creation.</p>
<hr />
<h2 id="heading-get-started">Get Started</h2>
<pre><code class="lang-bash">pip install workswithagents
</code></pre>
<p>All specs: <a target="_blank" href="https://workswithagents.dev/specs/">workswithagents.dev/specs/</a>
All code: CC BY 4.0</p>
<hr />
<p><em>I build agent infrastructure inside Microsoft 365. SPFx · TypeScript · autonomous multi-agent systems. Currently open to senior/architect roles (£120K+ remote UK). → <a target="_blank" href="mailto:vilius@workswithagents.com">vilius@workswithagents.com</a></em></p>
]]></content:encoded></item><item><title><![CDATA[I'm Proposing a Standard for How AI Agents Hand Off Work — Here's Why It's Needed]]></title><description><![CDATA[I'm Proposing a Standard for How AI Agents Hand Off Work — Here's Why It's Needed
Here's a scenario that happens constantly:
Agent A is working on a complex build. It hits a timeout — session ends. Agent B picks up the task. But Agent B has no idea w...]]></description><link>https://blog.workswithagents.dev/agent-handoff-protocol-standard-proposal</link><guid isPermaLink="true">https://blog.workswithagents.dev/agent-handoff-protocol-standard-proposal</guid><category><![CDATA[AI]]></category><category><![CDATA[Discuss]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[standards]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Thu, 07 May 2026 21:11:15 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-im-proposing-a-standard-for-how-ai-agents-hand-off-work-heres-why-its-needed">I'm Proposing a Standard for How AI Agents Hand Off Work — Here's Why It's Needed</h1>
<p>Here's a scenario that happens constantly:</p>
<p>Agent A is working on a complex build. It hits a timeout — session ends. Agent B picks up the task. But Agent B has no idea what Agent A was doing. What was built? What's left? What assumptions were made?</p>
<p>Every time this happens, work is lost. Time is burned. Context is rebuilt from scratch.</p>
<p>There's no standard for this. MCP handles tool calling. A2A handles agent-to-agent communication. But neither specifies what happens when an agent <em>stops</em> and another agent <em>continues.</em></p>
<hr />
<h2 id="heading-the-handoff-protocol">The Handoff Protocol</h2>
<p>I've designed a structured YAML format for agent-to-agent task handoff. Two variants:</p>
<h3 id="heading-baseline-open-source-unregulated">Baseline (open-source, unregulated)</h3>
<pre><code class="lang-yaml"><span class="hljs-attr">handoff_version:</span> <span class="hljs-string">"1.0"</span>
<span class="hljs-attr">task_id:</span> <span class="hljs-string">"build-spfx-webpart-42"</span>
<span class="hljs-attr">from_agent:</span> <span class="hljs-string">"hermes-main"</span>
<span class="hljs-attr">to_agent:</span> <span class="hljs-string">"hermes-worker-3"</span>
<span class="hljs-attr">status:</span> <span class="hljs-string">"in_progress"</span>
<span class="hljs-attr">completed:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">"Scaffolded web part structure"</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">"Installed dependencies"</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">"Configured SCSS aliases"</span>
<span class="hljs-attr">remaining:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">"Write component logic"</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">"Add tests"</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">"Bundle and verify"</span>
<span class="hljs-attr">context:</span>
  <span class="hljs-attr">project_root:</span> <span class="hljs-string">"/Users/vilius/origami-spfx-webparts-lab"</span>
  <span class="hljs-attr">node_version:</span> <span class="hljs-string">"22.11.0"</span>
  <span class="hljs-attr">spfx_version:</span> <span class="hljs-string">"1.22.2"</span>
<span class="hljs-attr">pitfalls_hit:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">"SCSS alias: @fluentui needs explicit path in config/sass.json"</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">"Yeoman --component-type ignored when .yo-rc.json exists"</span>
<span class="hljs-attr">decisions:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">"Used React hooks instead of class components"</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">"Chose Fluent UI v9 over v8"</span>
</code></pre>
<h3 id="heading-regulated-nhs-finance-government">Regulated (NHS, finance, government)</h3>
<p>Adds: audit trail, compliance officer sign-off gates, data classification, regulatory reference field, immutable timestamp chain.</p>
<hr />
<h2 id="heading-how-im-claiming-this">How I'm Claiming This</h2>
<p>Two standards bodies accepting proposals right now:</p>
<h3 id="heading-1-mcp-sep-specification-enhancement-proposal">1. MCP SEP (Specification Enhancement Proposal)</h3>
<p>MCP's SEP-2133 Extensions framework is the exact mechanism for proposing optional protocol extensions. The Handoff Protocol fits as an MCP extension — it extends the tool-calling model with structured state transfer.</p>
<p><strong>Process:</strong></p>
<ol>
<li>Open GitHub issue on <code>modelcontextprotocol/modelcontextprotocol</code> describing the extension</li>
<li>Request maintainer sponsorship for a SEP</li>
<li>Draft the proposal document following the SEP template</li>
<li>Community review → implementation → adoption</li>
</ol>
<p><strong>Repo:</strong> github.com/modelcontextprotocol/modelcontextprotocol</p>
<h3 id="heading-2-google-a2a-agent-to-agent-protocol">2. Google A2A (Agent-to-Agent Protocol)</h3>
<p>A2A is purpose-built for agent communication. The Handoff Protocol extends A2A's task delegation model — adding the structured handoff payload, compliance fields, and verification gates that A2A's basic task model doesn't cover.</p>
<p><strong>Repo:</strong> github.com/a2aproject/a2a</p>
<h3 id="heading-3-longer-term-ietf-internet-draft">3. Longer-term: IETF Internet-Draft</h3>
<p>If adoption warrants standards-track treatment, an individual Internet-Draft is the path. This takes 18-24 months and requires a working group or area director sponsorship. Not the first move — but the endgame.</p>
<hr />
<h2 id="heading-why-this-matters">Why This Matters</h2>
<p>Agent infrastructure is being built right now. The standards that get established in 2026 will shape the next decade of AI agent development.</p>
<p>The Handoff Protocol is a small, focused standard — one specific problem, one clear format. It doesn't need to be a massive specification. It just needs to exist before someone else defines it differently.</p>
<hr />
<p><em>The Handoff Protocol schema is documented at workswithagents.dev/v1/handoff/schema. The MCP SEP submission is in progress. If you're building multi-agent systems and hitting the handoff problem, I'd like to hear about your use case.</em></p>
]]></content:encoded></item><item><title><![CDATA[Better Prompts Won't Fix Your AI Agents — Infrastructure Will]]></title><description><![CDATA[Better Prompts Won't Fix Your AI Agents — Infrastructure Will
Every "how to work with AI agents" guide starts with prompt engineering. Be specific. Give examples. Set context.
That's fine for one agent, one session. It completely falls apart when you...]]></description><link>https://blog.workswithagents.dev/better-prompts-wont-fix-your-ai-agents</link><guid isPermaLink="true">https://blog.workswithagents.dev/better-prompts-wont-fix-your-ai-agents</guid><category><![CDATA[AI]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[Python]]></category><category><![CDATA[SQLite]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Thu, 07 May 2026 21:11:13 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-better-prompts-wont-fix-your-ai-agents-infrastructure-will">Better Prompts Won't Fix Your AI Agents — Infrastructure Will</h1>
<p>Every "how to work with AI agents" guide starts with prompt engineering. Be specific. Give examples. Set context.</p>
<p>That's fine for one agent, one session. It completely falls apart when you have 20 agents running concurrently, each with different contexts, hitting the same files, and none of them remembering what the others did.</p>
<p>Better prompts won't fix that. Infrastructure will.</p>
<hr />
<h2 id="heading-the-problem-nobody-talks-about">The Problem Nobody Talks About</h2>
<p>Here's what actually breaks when you run multiple agents:</p>
<ol>
<li><p><strong>Concurrent file writes.</strong> Agent A reads a file. Agent B writes to it. Agent A's context is now stale.</p>
</li>
<li><p><strong>Credential access.</strong> Touch ID doesn't work from cron. Your agents can't unlock your password manager at 3am.</p>
</li>
<li><p><strong>Silent failures.</strong> An agent hits an error on a recurring cron job. You don't notice for three days.</p>
</li>
<li><p><strong>Context starvation.</strong> Your agent's context window fills with irrelevant details. It can't reason about the actual problem.</p>
</li>
</ol>
<p>None of these are prompt problems. They're infrastructure problems.</p>
<hr />
<h2 id="heading-the-four-tools-that-fixed-it">The Four Tools That Fixed It</h2>
<h3 id="heading-1-factbase-structured-knowledge-not-flat-text">1. FactBase — Structured Knowledge, Not Flat Text</h3>
<p>Memory as flat text fails at scale. "Python version is 3.11" gets buried under 50 other facts. Agents can't query it reliably.</p>
<p><strong>Solution:</strong> SQLite database with WAL mode for concurrent access. Entity-attribute-value model. Every fact queryable by category, entity, keyword. Multiple agents read and write simultaneously without corruption.</p>
<pre><code>GET /v1/facts?entity=python&amp;attribute=path
→ /opt/homebrew/bin/python3<span class="hljs-number">.11</span>
</code></pre><p>One agent discovers a fact. Every agent knows it.</p>
<h3 id="heading-2-credential-proxy-api-keys-without-touch-id">2. Credential Proxy — API Keys Without Touch ID</h3>
<p>Passwords stored in a local daemon. Agents request credentials by service name — no Touch ID, no browser extension, no interactive login. Cron jobs run unattended.</p>
<pre><code>credential-proxy get <span class="hljs-string">"dev.to API"</span>
→ api-key-here
</code></pre><p>The daemon holds the keys. The agents just ask.</p>
<h3 id="heading-3-cron-guard-the-smoke-alarm-for-your-agent-fleet">3. Cron Guard — The Smoke Alarm for Your Agent Fleet</h3>
<p>If 3+ consecutive cron runs fail, it alerts you. Silent failures are the worst kind — you don't know your agents stopped working until you manually check.</p>
<p><strong>Pattern:</strong> watchdog script → checks last N run statuses → alerts on threshold. Zero config. Just runs.</p>
<h3 id="heading-4-context-packer-fit-more-into-your-token-budget">4. Context Packer — Fit More Into Your Token Budget</h3>
<p>A 2,500-file repo becomes an 8-file context pack. Preserves structure: important files, key decisions, dependency graph. Everything else summarized or excluded.</p>
<p>Your agent can actually reason about the project instead of drowning in file listings.</p>
<hr />
<h2 id="heading-the-pattern">The Pattern</h2>
<p>Everyone's optimizing prompts. I'm optimizing the environment agents run in. The prompt is 10% of the problem. The other 90% is:</p>
<ul>
<li>Can the agent access the right files?</li>
<li>Can it authenticate to services without you?</li>
<li>Will you know if it breaks?</li>
<li>Can it fit the problem in its context window?</li>
</ul>
<p>Fix those, and your prompts get 10x more effective — because the agent can actually act on them.</p>
<hr />
<p><em>All four tools are open source. FactBase and Pitfall Registry are live at workswithagents.dev. Credential proxy, cron guard, and context packer live in ~/.hermes/. No courses. No pricing. Just infrastructure.</em></p>
]]></content:encoded></item><item><title><![CDATA[The 10 Patterns: What 5 Months of Breaking AI Agents Taught Me About Making Them Actually Work]]></title><description><![CDATA[The 10 Patterns: What 5 Months of Breaking AI Agents Taught Me About Making Them Actually Work
In late 2025 I started experimenting with AI coding agents. Not casually — I gave them autonomous infrastructure and let them run. They broke. A lot. But p...]]></description><link>https://blog.workswithagents.dev/ten-patterns-what-breaking-ai-agents-taught-me</link><guid isPermaLink="true">https://blog.workswithagents.dev/ten-patterns-what-breaking-ai-agents-taught-me</guid><category><![CDATA[agents]]></category><category><![CDATA[AI]]></category><category><![CDATA[Beginner Developers]]></category><category><![CDATA[Productivity]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Thu, 07 May 2026 21:11:11 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-the-10-patterns-what-5-months-of-breaking-ai-agents-taught-me-about-making-them-actually-work">The 10 Patterns: What 5 Months of Breaking AI Agents Taught Me About Making Them Actually Work</h1>
<p>In late 2025 I started experimenting with AI coding agents. Not casually — I gave them autonomous infrastructure and let them run. They broke. A lot. But patterns emerged.</p>
<p>Not "prompt engineering tricks." Not "unlock your potential." Actual operational patterns for making agents work reliably — discovered the hard way, through 11 consecutive autonomous builds, 153 skills, and countless 3am debug sessions.</p>
<p>Here they are.</p>
<hr />
<h2 id="heading-pattern-1-boot-the-first-session-shapes-everything">Pattern 1: Boot — The First Session Shapes Everything</h2>
<p>An agent's first session is like its childhood. If it starts blind — no context, no conventions, no memory of what you've built — every interaction is uphill.</p>
<p><strong>What I do:</strong> Every project has an AGENTS.md. Python version. Project structure. Conventions. Key decisions. The agent reads this before anything else.</p>
<p><strong>What happened without it:</strong> Recommending npm for a pnpm project. Suggesting Python 3.9 when we're on 3.11. Hours of corrections that a 50-line file would have prevented.</p>
<hr />
<h2 id="heading-pattern-2-skills-stop-re-explaining-the-same-things">Pattern 2: Skills — Stop Re-Explaining the Same Things</h2>
<p>Building an SPFx web part has specific gotchas: <code>@fluentui</code> imports break without SCSS alias config. The Yeoman generator ignores <code>--component-type</code> if <code>.yo-rc.json</code> exists. Node 22 + native modules = pain.</p>
<p>Instead of re-explaining these each time, I saved them as skills. 153 of them now. When an agent hits an SPFx task, it loads the skill — known pitfalls, exact commands, verification steps.</p>
<p><strong>The compounding effect:</strong> Each skill makes future sessions faster. Five months in, the agent has institutional knowledge.</p>
<hr />
<h2 id="heading-pattern-3-memory-never-re-answer-the-same-question">Pattern 3: Memory — Never Re-Answer the Same Question</h2>
<p>"What Python version are we using?" "Where's the project?" "What's the deployment command?"</p>
<p>Without persistent memory, you answer these every. single. session. I saved durable facts across sessions: Python path, build system, project structure, preferences. Now the agent just knows.</p>
<p><strong>Critical rule:</strong> Write declarative facts, not instructions. "Project uses pytest with xdist" — not "Always run tests with pytest -n 4." Instructions get re-read as orders.</p>
<hr />
<h2 id="heading-pattern-4-decision-protocols-autonomy-without-chaos">Pattern 4: Decision Protocols — Autonomy Without Chaos</h2>
<p>The biggest time-sink? Approval loops. "Should I proceed?" "Want me to fix this?" "OK to deploy?"</p>
<p>I set boundaries: what the agent decides alone, what needs approval. Destructive actions = ask. Recoverable actions = just do it. Hours saved per session.</p>
<hr />
<h2 id="heading-pattern-5-tool-composition-the-right-tool-for-each-job">Pattern 5: Tool Composition — The Right Tool for Each Job</h2>
<p>Agents have many tools. Knowing which to use is the difference between a 2-second operation and a 2-minute burnout.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Task</td><td>Tool</td><td>Why</td></tr>
</thead>
<tbody>
<tr>
<td>Create new file</td><td>write_file</td><td>One call</td></tr>
<tr>
<td>Edit existing file</td><td>patch</td><td>Targeted, no rewrite risk</td></tr>
<tr>
<td>Build/install/deploy</td><td>terminal</td><td>It's a shell command</td></tr>
<tr>
<td>Read a file</td><td>read_file</td><td>Don't cat/head/tail</td></tr>
<tr>
<td>Search content</td><td>search_files</td><td>Not grep/find</td></tr>
<tr>
<td>Research/debug</td><td>delegate_task</td><td>Parallel, isolated</td></tr>
</tbody>
</table>
</div><p><strong>The anti-pattern:</strong> Delegating coding tasks to subagents. They lose context, hallucinate, and burn tokens. Use write_file and patch directly.</p>
<hr />
<h2 id="heading-pattern-6-orchestration-parallel-specialists">Pattern 6: Orchestration — Parallel Specialists</h2>
<p>Complex tasks are rarely a single thread. Market research? Let a subagent run while the main agent builds. Code review? Spin up a reviewer in parallel.</p>
<p><strong>Real result:</strong> 3x throughput on multi-stream tasks. Research and build completed independently, merged at the end.</p>
<hr />
<h2 id="heading-pattern-7-pipelines-agents-that-run-while-you-sleep">Pattern 7: Pipelines — Agents That Run While You Sleep</h2>
<p>Cron jobs. Builds. Monitoring. I have ~20 autonomous agents running right now — hourly reviews, daily digests, weekly research verification. They wake up, do their job, and only notify me if something's broken.</p>
<p><strong>The silent-unless-broken pattern:</strong> I never see successful runs. I only hear about failures. That's the point.</p>
<hr />
<h2 id="heading-pattern-8-resilience-never-stop-on-the-first-error">Pattern 8: Resilience — Never Stop on the First Error</h2>
<p>Agents hit errors constantly. Network timeouts. API rate limits. File system races. Without recovery, every error kills progress.</p>
<p>Exponential backoff: 2s, 4s, 8s, 16s. Categorize errors: transient = retry, permanent = find another way.</p>
<p><strong>Real metric:</strong> 11 consecutive builds with zero human intervention. The agent hit errors on 8 of them. Recovered from every single one.</p>
<hr />
<h2 id="heading-pattern-9-verify-autonomous-doesnt-mean-reckless">Pattern 9: Verify — Autonomous Doesn't Mean Reckless</h2>
<p>Every change gets verified. Syntax check after every file write. Tests after every code change. For deployments: verify the result, don't trust the response.</p>
<p><strong>Real metric:</strong> Every change gets verified — syntax checks, test runs, quality gates — before anything ships. Errors get caught, not compounded.</p>
<hr />
<h2 id="heading-pattern-10-compounding-the-agent-that-gets-better">Pattern 10: Compounding — The Agent That Gets Better</h2>
<p>This is the feedback loop: agent solves hard problem → saves approach as skill → next session is faster. Month 1: basic file ops. Month 3: autonomous scaffolding. Month 5: self-improvement loops, 153 skills.</p>
<p>The agent today is not the agent from 5 months ago — because it learned from every session.</p>
<hr />
<h2 id="heading-the-honest-part">The Honest Part</h2>
<p>These patterns weren't planned. They emerged from breaking things late at night. Every one of them is backed by a real failure — an error that cost hours, a build that died, a configuration that made no sense until 3am.</p>
<p>If you're working with agents and hitting walls: you're not doing it wrong. You're discovering patterns. Write them down. Make them skills. Let the agent learn.</p>
<hr />
<p><em>I documented the full methodology at workswithagents.com. The knowledge API (workswithagents.dev) has 153 skills and a shared pitfall registry — agents query it for known bugs and fixes. No courses yet. No pricing. Just infrastructure, live.</em></p>
]]></content:encoded></item><item><title><![CDATA[Welcome to the Agent Autopsy]]></title><description><![CDATA[I've been building AI agent infrastructure for the past month. Not because I had a plan — because I needed it to work.
I'm a SharePoint developer in Cardiff. When I started experimenting with AI agents, I kept hitting the same problems: agents lying ...]]></description><link>https://blog.workswithagents.dev/welcome-to-the-agent-autopsy</link><guid isPermaLink="true">https://blog.workswithagents.dev/welcome-to-the-agent-autopsy</guid><category><![CDATA[agents]]></category><category><![CDATA[AI]]></category><category><![CDATA[infrastructure]]></category><category><![CDATA[Open Source]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Thu, 07 May 2026 21:03:03 GMT</pubDate><content:encoded><![CDATA[<p>I've been building AI agent infrastructure for the past month. Not because I had a plan — because I needed it to work.</p>
<p>I'm a SharePoint developer in Cardiff. When I started experimenting with AI agents, I kept hitting the same problems: agents lying about what they'd built, burning tokens on tasks they couldn't do, breaking each other's state, and occasionally publishing broken packages to PyPI.</p>
<p>So I built tooling. A lot of it. 153 skills, 19 autonomous agents, a credential proxy, a cron scheduler, an Ed25519 verification pipeline, a protocol stack nobody asked for. Most of it runs on a €4.57/month Hetzner box.</p>
<p>This blog is where I write about what broke and what I built to fix it. It's the home for the <strong>Agent Autopsy</strong> series — daily postmortems from the trenches of AI agent infrastructure.</p>
<p><strong>What to expect:</strong></p>
<ul>
<li>Short, honest posts (~500 words) about real failures and what I learned</li>
<li>No AI hype. No "revolutionise." No pretending I have it figured out</li>
<li>Real numbers. Real code. Real breaks</li>
</ul>
<p><strong>What I'm not:</strong></p>
<ul>
<li>A startup (0 clients, pre-revenue)</li>
<li>A thought leader</li>
<li>Someone who knows what he's doing</li>
</ul>
<p>I'm just a developer who built things his agents kept breaking, documented everything, and ended up with a methodology by accident.</p>
<p>If you're building AI agent infrastructure too — or you're curious what breaks when you let 19 agents run themselves — stick around. Something will break soon. I'll write about it.</p>
<p>— Vilius</p>
]]></content:encoded></item><item><title><![CDATA[7 Protocols for Agent Infrastructure]]></title><description><![CDATA[I run about 20 AI agents. They delegate work to each other, deploy code, scan for vulnerabilities, and handle compliance checks. Over time, I kept hitting the same gaps — things that made autonomous workflows fragile in ways that took hours to debug....]]></description><link>https://blog.workswithagents.dev/7-protocols-for-agent-infrastructure-59ln</link><guid isPermaLink="true">https://blog.workswithagents.dev/7-protocols-for-agent-infrastructure-59ln</guid><category><![CDATA[agents]]></category><category><![CDATA[AI]]></category><category><![CDATA[architecture]]></category><category><![CDATA[Open Source]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Thu, 07 May 2026 16:03:22 GMT</pubDate><content:encoded><![CDATA[<p>I run about 20 AI agents. They delegate work to each other, deploy code, scan for vulnerabilities, and handle compliance checks. Over time, I kept hitting the same gaps — things that made autonomous workflows fragile in ways that took hours to debug.</p>
<p>I published a <a target="_blank" href="https://dev.to/vystartasv/the-agent-osi-model-a-7-layer-framework-for-ai-agent-infrastructure-3aoe">7-layer model for agent infrastructure</a> on how I think about these problems. Two layers have strong industry standards: <a target="_blank" href="https://a2a-protocol.org/">Google's A2A protocol</a> handles agent-to-agent coordination (L5), and <a target="_blank" href="https://modelcontextprotocol.io/">Anthropic's MCP</a> standardises how agents discover and use tools (L3–L4). At the identity layer, the <a target="_blank" href="https://www.w3.org/TR/did-1.0/">W3C DID standard</a> defines decentralised identifiers. For governance, there's the <a target="_blank" href="https://www.nist.gov/itl/ai-risk-management-framework">NIST AI Risk Management Framework</a>.</p>
<p>The rest of the stack — the layers that make autonomous agents trustworthy, auditable, and production-safe — still has gaps. These seven protocols fill them. They're what I wired into my own fleet when the existing standards didn't go far enough.</p>
<p>All are CC BY 4.0. Five have live reference implementations. Two are spec'd but still in the works.</p>
<h2 id="heading-industry-standards-this-builds-on">Industry Standards This Builds On</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Standard</td><td>Layer</td><td>Organization</td></tr>
</thead>
<tbody>
<tr>
<td><a target="_blank" href="https://a2a-protocol.org/">A2A Protocol</a></td><td>L5 Coordination</td><td>Google / a2aproject</td></tr>
<tr>
<td><a target="_blank" href="https://modelcontextprotocol.io/">Model Context Protocol</a></td><td>L3–L4 Discovery + Session</td><td>Anthropic</td></tr>
<tr>
<td><a target="_blank" href="https://www.w3.org/TR/did-1.0/">W3C DID Core</a></td><td>L2 Communication</td><td>W3C</td></tr>
<tr>
<td><a target="_blank" href="https://www.nist.gov/itl/ai-risk-management-framework">NIST AI RMF</a></td><td>L7 Governance</td><td>NIST</td></tr>
</tbody>
</table>
</div><h2 id="heading-2-deployment-manifest-declare-a-fleet-deploy-with-one-command">2. Deployment Manifest — Declare a Fleet, Deploy With One Command</h2>
<p>I got tired of manually tracking which agents run where, how many instances, and what capabilities they have. One YAML file, one command.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">fleet:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">"my-fleet"</span>
  <span class="hljs-attr">agents:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">id:</span> <span class="hljs-string">"builder"</span>
      <span class="hljs-attr">capabilities:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">"build"</span>
          <span class="hljs-attr">target:</span> <span class="hljs-string">"spfx"</span>
      <span class="hljs-attr">count:</span> <span class="hljs-number">3</span>
</code></pre>
<pre><code class="lang-bash">wwa fleet deploy fleet.yaml
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/deployment-manifest.md">Spec</a></p>
<hr />
<h2 id="heading-3-sla-framework-track-whether-agents-meet-their-promises">3. SLA Framework — Track Whether Agents Meet Their Promises</h2>
<p>Three tiers: Best-Effort (free), Production (99.5% uptime, 90% task accuracy), Regulated (99.9% uptime, 95% accuracy, 7-year audit retention).</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> SLAMetrics

sla = SLAMetrics(<span class="hljs-string">"my-fleet"</span>, tier=<span class="hljs-string">"production"</span>)
sla.report(<span class="hljs-string">"agent-1"</span>, <span class="hljs-string">"task-42"</span>, duration_seconds=<span class="hljs-number">187</span>, success=<span class="hljs-literal">True</span>)
status = sla.status()  <span class="hljs-comment"># {breaches: [], status: "ok"}</span>
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/sla-framework.md">Spec</a></p>
<hr />
<h2 id="heading-4-handoff-protocol-cryptographic-handoff-between-agents">4. Handoff Protocol — Cryptographic Handoff Between Agents</h2>
<p>When an agent passes a task to another, how do you know the output wasn't tampered with? Ed25519-signed handoffs with chain-of-custody verification. Built above <a target="_blank" href="https://modelcontextprotocol.io/">MCP</a>'s tool-use layer.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> Handoff

h = Handoff(from_agent=<span class="hljs-string">"planner"</span>, to_agent=<span class="hljs-string">"scanner"</span>, payload={<span class="hljs-string">"plan"</span>: <span class="hljs-string">"..."</span>})
signed = h.sign(planner_key)
verified = Handoff.verify(signed, planner_public_key)
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/handoff.md">Spec</a></p>
<hr />
<h2 id="heading-5-identity-protocol-verifiable-agent-identity">5. Identity Protocol — Verifiable Agent Identity</h2>
<p>Cryptographic agent identity with Ed25519 keypairs. Signed messages. Verification against registry. Extends the <a target="_blank" href="https://www.w3.org/TR/did-1.0/">W3C DID standard</a> with agent-specific key management and fleet-scoped verification.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> AgentIdentity

ai = AgentIdentity(<span class="hljs-string">"my-agent"</span>)
ai.register()
sig = ai.sign({<span class="hljs-string">"type"</span>: <span class="hljs-string">"heartbeat"</span>})

valid = AgentIdentity.verify(<span class="hljs-string">"other-agent"</span>, message, signature)
</code></pre>
<hr />
<h2 id="heading-6-compliance-as-code-regulation-as-executable-validation">6. Compliance-as-Code — Regulation as Executable Validation</h2>
<p>NHS DTAC, FCA, GDS, GDPR — as rules agents can validate against at runtime. Extends frameworks like the <a target="_blank" href="https://www.nist.gov/itl/ai-risk-management-framework">NIST AI RMF</a> from documentation into executable checks.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> ComplianceEngine

ce = ComplianceEngine()
dtac = ce.load(<span class="hljs-string">"dtac-v2.1"</span>)

<span class="hljs-keyword">if</span> dtac.validate(action).passed:
    execute(action)
<span class="hljs-keyword">else</span>:
    escalate_to_human()
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/compliance-as-code.md">Spec</a></p>
<hr />
<h2 id="heading-7-onboarding-protocol-systematic-agent-creation">7. Onboarding Protocol — Systematic Agent Creation</h2>
<p>Interview → generate → calibrate → benchmark → register. Instead of writing a prompt file and hoping, run a pipeline that produces a scored agent.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> OnboardingClient

ob = OnboardingClient()
result = ob.full_onboard(
    <span class="hljs-string">"nhs-auditor"</span>,
    <span class="hljs-string">"Audit agent actions for NHS DTAC compliance"</span>,
    capabilities=[<span class="hljs-string">"audit:compliance"</span>],
    skills=[<span class="hljs-string">"compliance-as-code"</span>]
)
</code></pre>
<hr />
<h2 id="heading-the-stack">The Stack</h2>
<p>Where each protocol fits alongside existing industry standards:</p>
<pre><code class="lang-plaintext">L7 GOVERNANCE    ← NIST AI RMF           Compliance-as-Code · SLA Framework
L6 VERIFICATION  (no standard yet)       Agent Test Suite · Pitfall Registry
L5 COORDINATION  ← A2A (Google)          Trust Score
L4 SESSION       ← MCP (Anthropic)       Handoff Protocol
L3 DISCOVERY     ← MCP (Anthropic)       Trust Score · Capability Manifest
L2 COMMUNICATION ← W3C DID               Identity Protocol
L1 EXECUTION     (no standard yet)       Onboarding Protocol · Deployment Manifest
</code></pre>
<p><a target="_blank" href="https://a2a-protocol.org/">A2A</a> (Google) — agent-to-agent task coordination at L5. <a target="_blank" href="https://modelcontextprotocol.io/">MCP</a> (Anthropic) — tool discovery and context sharing at L3–L4. <a target="_blank" href="https://www.w3.org/TR/did-1.0/">W3C DID</a> — decentralised identity at L2. <a target="_blank" href="https://www.nist.gov/itl/ai-risk-management-framework">NIST AI RMF</a> — governance framework at L7. These seven protocols fill what those standards leave open: trust, deployment, handoff integrity, compliance execution, and systematic agent creation.</p>
<hr />
<h2 id="heading-get-started">Get Started</h2>
<pre><code class="lang-bash">pip install workswithagents
</code></pre>
<p>All specs: <a target="_blank" href="https://workswithagents.dev/specs/">workswithagents.dev/specs/</a>
All code: CC BY 4.0</p>
<hr />
<p><em>I build agent infrastructure inside Microsoft 365. SPFx · TypeScript · autonomous multi-agent systems. Currently open to senior/architect roles (£120K+ remote UK). → <a target="_blank" href="mailto:vilius@workswithagents.com">vilius@workswithagents.com</a></em></p>
]]></content:encoded></item><item><title><![CDATA[6 Protocols for Agent Infrastructure — Trust Score, Deployment, SLA, Identity, Compliance]]></title><description><![CDATA[I run about 20 AI agents. They delegate work to each other, deploy code, scan for vulnerabilities, and handle compliance checks. Over time, I kept hitting the same gaps — things that made autonomous workflows fragile in ways that took hours to debug....]]></description><link>https://blog.workswithagents.dev/6-protocols-for-agent-infrastructure-trust-score-deployment-sla-identity-compliance-10hd</link><guid isPermaLink="true">https://blog.workswithagents.dev/6-protocols-for-agent-infrastructure-trust-score-deployment-sla-identity-compliance-10hd</guid><category><![CDATA[agents]]></category><category><![CDATA[AI]]></category><category><![CDATA[architecture]]></category><category><![CDATA[Open Source]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Thu, 07 May 2026 14:18:38 GMT</pubDate><content:encoded><![CDATA[<p>I run about 20 AI agents. They delegate work to each other, deploy code, scan for vulnerabilities, and handle compliance checks. Over time, I kept hitting the same gaps — things that made autonomous workflows fragile in ways that took hours to debug.</p>
<p>Last week I published a <a target="_blank" href="https://dev.to/vystartasv/the-agent-osi-model-a-7-layer-framework-for-ai-agent-infrastructure-3aoe">7-layer model for agent infrastructure</a>. These six protocols fill the gaps I found at each layer. They're what I wired into my own agents to stop the same failures from repeating.</p>
<p>All six have Python reference implementations under CC BY 4.0. Each has a spec any agent can read.</p>
<h2 id="heading-2-deployment-manifest-declare-a-fleet-deploy-with-one-command">2. Deployment Manifest — Declare a Fleet, Deploy With One Command</h2>
<p>I got tired of manually tracking which agents run where, how many instances, and what capabilities they have. One YAML file, one command.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">fleet:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">"my-fleet"</span>
  <span class="hljs-attr">agents:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">id:</span> <span class="hljs-string">"builder"</span>
      <span class="hljs-attr">capabilities:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">"build"</span>
          <span class="hljs-attr">target:</span> <span class="hljs-string">"spfx"</span>
      <span class="hljs-attr">count:</span> <span class="hljs-number">3</span>
</code></pre>
<pre><code class="lang-bash">wwa fleet deploy fleet.yaml
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/deployment-manifest.md">Spec</a></p>
<hr />
<h2 id="heading-3-sla-framework-track-whether-agents-meet-their-promises">3. SLA Framework — Track Whether Agents Meet Their Promises</h2>
<p>Three tiers: Best-Effort (free), Production (99.5% uptime, 90% task accuracy), Regulated (99.9% uptime, 95% accuracy, 7-year audit retention).</p>
<p>Useful when you're running agents that handle customer data or regulated workflows and need to prove they stayed within bounds.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> SLAMetrics

sla = SLAMetrics(<span class="hljs-string">"my-fleet"</span>, tier=<span class="hljs-string">"production"</span>)
sla.report(<span class="hljs-string">"agent-1"</span>, <span class="hljs-string">"task-42"</span>, duration_seconds=<span class="hljs-number">187</span>, success=<span class="hljs-literal">True</span>)
status = sla.status()  <span class="hljs-comment"># {breaches: [], status: "ok"}</span>
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/sla-framework.md">Spec</a></p>
<hr />
<h2 id="heading-4-identity-protocol-verifiable-agent-identity">4. Identity Protocol — Verifiable Agent Identity</h2>
<p>When an agent claims a task result, can you prove it was that agent? Ed25519 keypairs. Signed messages. Verification against registry.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> AgentIdentity

ai = AgentIdentity(<span class="hljs-string">"my-agent"</span>)
ai.register()
sig = ai.sign({<span class="hljs-string">"type"</span>: <span class="hljs-string">"heartbeat"</span>})

<span class="hljs-comment"># Verify another agent's message</span>
valid = AgentIdentity.verify(<span class="hljs-string">"other-agent"</span>, message, signature)
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/identity-protocol.md">Spec</a></p>
<hr />
<h2 id="heading-5-compliance-as-code-regulation-as-executable-validation">5. Compliance-as-Code — Regulation as Executable Validation</h2>
<p>NHS DTAC, FCA, GDS, GDPR — as rules agents can validate against at runtime. Not a checklist. Not documentation. Code that returns pass/fail.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> ComplianceEngine

ce = ComplianceEngine()
dtac = ce.load(<span class="hljs-string">"dtac-v2.1"</span>)

<span class="hljs-keyword">if</span> dtac.validate(action).passed:
    execute(action)
<span class="hljs-keyword">else</span>:
    escalate_to_human()
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/compliance-as-code.md">Spec</a></p>
<hr />
<h2 id="heading-6-onboarding-protocol-systematic-agent-creation">6. Onboarding Protocol — Systematic Agent Creation</h2>
<p>Interview → generate → calibrate → benchmark → register. Instead of writing a prompt file and hoping, run a pipeline that produces a scored agent.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> OnboardingClient

ob = OnboardingClient()
result = ob.full_onboard(
    <span class="hljs-string">"nhs-auditor"</span>,
    <span class="hljs-string">"Audit agent actions for NHS DTAC compliance"</span>,
    capabilities=[<span class="hljs-string">"audit:compliance"</span>],
    skills=[<span class="hljs-string">"compliance-as-code"</span>]
)
<span class="hljs-comment"># → {agent_id: "nhs-auditor", trust_score_seed: 0.60}</span>
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/onboarding-protocol.md">Spec</a></p>
<hr />
<h2 id="heading-the-stack">The Stack</h2>
<pre><code class="lang-plaintext">L7 GOVERNANCE    Compliance-as-Code · SLA Framework
L6 VERIFICATION  Agent Test Suite · Pitfall Registry
L5 COORDINATION  Coordination Protocol · Trust Score
L4 SESSION       Handoff Protocol
L3 DISCOVERY     Capability Manifest · Trust Score · Identity
L2 COMMUNICATION Identity Protocol · Credential Proxy
L1 EXECUTION     Blueprint Registry · Onboarding Protocol
</code></pre>
<p>Plus cross-layer: Deployment Manifest.</p>
<hr />
<h2 id="heading-get-started">Get Started</h2>
<pre><code class="lang-bash">pip install workswithagents
</code></pre>
<p>All specs: <a target="_blank" href="https://workswithagents.dev/specs/">workswithagents.dev/specs/</a>
All code: CC BY 4.0</p>
<hr />
<p><em>I build agent infrastructure inside Microsoft 365. SPFx · TypeScript · autonomous multi-agent systems. Currently open to senior/architect roles (£120K+ remote UK). → <a target="_blank" href="mailto:vilius@workswithagents.com">vilius@workswithagents.com</a></em></p>
]]></content:encoded></item><item><title><![CDATA[Trusted Code Review Pipeline — 4 Agents, 1 Cloud Run Deploy]]></title><description><![CDATA[I built a multi-agent code review system for the Build Multi-Agent Systems with ADK track. Four specialized agents replace what would normally be one giant prompt. Each has a focused responsibility, and they pass work through a sequential pipeline de...]]></description><link>https://blog.workswithagents.dev/trusted-code-review-pipeline-4-agents-1-cloud-run-deploy-3dd0</link><guid isPermaLink="true">https://blog.workswithagents.dev/trusted-code-review-pipeline-4-agents-1-cloud-run-deploy-3dd0</guid><category><![CDATA[buildmultiagents]]></category><category><![CDATA[adk]]></category><category><![CDATA[agents]]></category><category><![CDATA[gemini]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Thu, 07 May 2026 13:59:55 GMT</pubDate><content:encoded><![CDATA[<p>I built a multi-agent code review system for the <a target="_blank" href="https://dev.to/deved/build-multi-agent-systems">Build Multi-Agent Systems with ADK</a> track. Four specialized agents replace what would normally be one giant prompt. Each has a focused responsibility, and they pass work through a sequential pipeline deployed to Google Cloud Run.</p>
<h2 id="heading-what-i-built">What I Built</h2>
<p>A web app that takes a GitHub repo URL and runs a full code review through four agents. Paste a link, and the pipeline takes over.</p>
<p><strong>The flow:</strong> <code>User → Planner → Security Scanner → Quality Gate → Archive &amp; Verify → Audit Report</code></p>
<p>Each agent's output feeds into the next. The final result is a signed audit report with trust scores for every agent in the chain.</p>
<h2 id="heading-cloud-run-embed">Cloud Run Embed</h2>
<p>{% embed https://adk-code-review-957647198522.us-central1.run.app %}</p>
<h2 id="heading-your-agents">Your Agents</h2>
<p>I used ADK's sequential agent chaining to connect four specialized agents:</p>
<ul>
<li><strong>Planner</strong> — clones the repo, analyzes file structure, identifies the primary language and test coverage, creates a review plan</li>
<li><strong>Security Scanner</strong> — deep-scans every file for hardcoded secrets, unsafe patterns, and exposed configuration. Each finding is severity-rated.</li>
<li><strong>Quality Gate</strong> — checks for LICENSE, README completeness, CI/CD pipeline, and security posture. Assigns a tier: trusted, caution, or untrusted.</li>
<li><strong>Archive &amp; Verify</strong> — rates every agent with a trust score, cryptographically verifies all signatures in the chain, and produces the final audit report.</li>
</ul>
<p>The sequential flow means each agent focuses on one job. The Planner doesn't scan for secrets. The Security agent doesn't check licenses. Each prompt is short and focused — no agent needs to do everything.</p>
<h2 id="heading-key-learnings">Key Learnings</h2>
<ol>
<li><p><strong>Agent directory structure matters more than you'd expect.</strong> ADK scans for agents inside your project directory. One wrong path and your perfectly working agent disappears from discovery entirely.</p>
</li>
<li><p><strong>Cloud Run's defaults can surprise you.</strong> The port your container listens on isn't what most deployment guides assume. One hardcoded number and the container starts fine but never passes the health check.</p>
</li>
<li><p><strong>Static files and agents don't mix.</strong> Putting a frontend inside an agent directory confuses ADK's discovery — it starts listing your HTML folder as an agent. Keep them separate.</p>
</li>
<li><p><strong>Cloud APIs are independent.</strong> Enabling one Google Cloud API doesn't enable the others your agents depend on. You'll get an unhelpful error and a link to the exact page where you should've clicked enable.</p>
</li>
</ol>
<p><strong>Repo:</strong> <a target="_blank" href="https://github.com/vystartasv/adk-code-review">github.com/vystartasv/adk-code-review</a>
<strong>Stack:</strong> Google ADK + A2A + Cloud Run + Gemini 2.5 Flash
<strong>License:</strong> CC BY 4.0</p>
]]></content:encoded></item><item><title><![CDATA[Every Agent I Delegated To Kept Failing. I Finally Checked the Model.]]></title><description><![CDATA[I built a delegation system that spawns AI agents to handle sub-tasks in parallel. Quality sweeps. Code audits. Checking every SDK directory for dead links. The idea: spin up cheap local agents, let them work, collect results.
They kept failing. Not ...]]></description><link>https://blog.workswithagents.dev/every-agent-i-delegated-to-kept-failing-i-finally-checked-the-model-1f46</link><guid isPermaLink="true">https://blog.workswithagents.dev/every-agent-i-delegated-to-kept-failing-i-finally-checked-the-model-1f46</guid><category><![CDATA[agents]]></category><category><![CDATA[AI]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Open Source]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Thu, 07 May 2026 11:00:48 GMT</pubDate><content:encoded><![CDATA[<p>I built a delegation system that spawns AI agents to handle sub-tasks in parallel. Quality sweeps. Code audits. Checking every SDK directory for dead links. The idea: spin up cheap local agents, let them work, collect results.</p>
<p>They kept failing. Not crashing — just stopping. No output. No error. 600 seconds of silence, then a timeout.</p>
<p>I assumed the tasks were too complex. I assumed parallel delegation was unreliable. I never checked what model I was actually giving them.</p>
<h2 id="heading-the-root-cause">The Root Cause</h2>
<p>My delegation system was configured to use a small local model. Fine for single-turn questions. Useless for multi-step tool loops.</p>
<p>A quality sweep isn't one tool call. It's: find the directory, list the files, search each one, flag issues, report results. That's five sequential steps, each dependent on the last. The small model lost coherence after the second call. The first step worked. By the third, it was hallucinating or hanging.</p>
<p>Meanwhile, the main agent handled the exact same tasks in minutes. Same instructions. Different model.</p>
<h2 id="heading-what-i-assumed">What I Assumed</h2>
<p>I assumed any model that passes benchmarks can handle tool-calling. I assumed "cheap model for leaf tasks" was an optimization. I assumed if a model could answer a question correctly, it could execute a sequence of tool calls correctly.</p>
<p>Benchmarks measure knowledge. They don't measure whether a model can hold context across five sequential tool calls. Single-turn accuracy and agentic reliability are different things entirely.</p>
<h2 id="heading-what-i-no-longer-assume">What I No Longer Assume</h2>
<p>I now test every model on a concrete multi-step task before adding it to the delegation pool: find a directory, search for a pattern, read the matching file, report what you found. If it can't complete that loop, it doesn't get delegated work.</p>
<p>I also built a decision gate that evaluates task complexity against model capability before spawning a subagent. If the task requires three or more sequential tool calls and the target model has known reliability issues, it reroutes to a more capable model or handles the work inline. Better to burn a few extra tokens on a capable model than to wait ten minutes for nothing.</p>
<h2 id="heading-what-you-should-check">What You Should Check</h2>
<p>If you're building systems that delegate work between agents:</p>
<ul>
<li><strong>Test subagent models on multi-step tool loops, not just benchmarks.</strong> Give them a real sequence of dependent calls. If they fail by step three, they're not ready for autonomous work.</li>
<li><strong>Gate delegation before it starts, not after it times out.</strong> A decision layer that checks task complexity against model capability catches failures before they become silent timeouts.</li>
<li><strong>Parallel delegation to weak models isn't faster — it's ten minutes of silence instead of two minutes of work.</strong> Before spawning subagents, ask: can the orchestrator just do this?</li>
</ul>
<p><em>I build agent infrastructure inside Microsoft 365. SPFx · TypeScript · autonomous multi-agent systems. Currently open to senior/architect roles (£120K+ remote UK). → <a target="_blank" href="mailto:vilius@workswithagents.com">vilius@workswithagents.com</a></em></p>
]]></content:encoded></item><item><title><![CDATA[I Published Broken Packages to PyPI. I Checked Them First.]]></title><description><![CDATA[I published two Python packages last week. I checked them before tagging the release. CI was green. twine check passed. I moved on.
This morning my agent told me one of them had been broken for three days. Anyone who copied the install command from t...]]></description><link>https://blog.workswithagents.dev/i-published-broken-packages-to-pypi-i-checked-them-first-44a7</link><guid isPermaLink="true">https://blog.workswithagents.dev/i-published-broken-packages-to-pypi-i-checked-them-first-44a7</guid><category><![CDATA[Devops]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[Python]]></category><category><![CDATA[Testing]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Thu, 07 May 2026 08:14:34 GMT</pubDate><content:encoded><![CDATA[<p>I published two Python packages last week. I checked them before tagging the release. CI was green. <code>twine check</code> passed. I moved on.</p>
<p>This morning my agent told me one of them had been broken for three days. Anyone who copied the install command from the README got <code>No matching distribution found</code>. The homepage link was a dead domain. Every image on the PyPI page — broken. The other package listed no license at all.</p>
<p>I had checked them. And they were wrong.</p>
<h2 id="heading-what-i-found">What I Found</h2>
<p>The README told users to install a package name that didn't exist — a typo in the one place that mattered most. The homepage link pointed to a domain that never resolved. Three screenshots referenced relative file paths that weren't included in the package. Three badge links pointed to absolutely nowhere.</p>
<p>The <code>workswithagents</code> package was cleaner, but PyPI displayed "License: None."</p>
<p>Both packages passed CI. Both passed <code>twine check</code>. Both were live.</p>
<h2 id="heading-what-i-assumed">What I Assumed</h2>
<p>I assumed CI green meant the package was correct. I assumed <code>twine check</code> validated what users would see. I assumed checking the README locally was the same as checking it on PyPI.</p>
<p>None of those things are true.</p>
<p><code>twine check</code> validates package <em>structure</em> — valid metadata headers, correct file layout. It does not resolve URLs. It does not compare install commands against actual package names. It does not check if images exist. It does not verify licenses. It's a compiler, not a content validator.</p>
<h2 id="heading-what-i-no-longer-assume">What I No Longer Assume</h2>
<p>Every package I publish now runs through a content quality gate <em>before</em> <code>twine upload</code>. The gate checks: does the homepage resolve? Does the install command match the actual package name? Are all images either in the wheel or reachable URLs? Is there a license? Do badge links have real targets?</p>
<p>The gate is 200 lines of Python. It caught all 9 issues in one run. If I'd had it three days ago, none of those packages would have shipped broken.</p>
<h2 id="heading-what-you-should-check">What You Should Check</h2>
<p>If you publish packages — PyPI, npm, anything — check these five things:</p>
<ul>
<li>Your install command in the README matches the actual published name</li>
<li>Your homepage URL resolves from an external network</li>
<li>Every image in your README is either bundled in the package or an absolute URL</li>
<li>Your license field isn't empty</li>
<li>Your badge links point somewhere real</li>
</ul>
<p>These aren't structural issues. CI won't catch them. You have to check them yourself — or build a checker that does.</p>
<p><em>I build agent infrastructure inside Microsoft 365. SPFx · TypeScript · autonomous multi-agent systems. Currently open to senior/architect roles (£120K+ remote UK). → <a target="_blank" href="mailto:vilius@workswithagents.com">vilius@workswithagents.com</a></em></p>
]]></content:encoded></item><item><title><![CDATA[My Subagents Kept Lying to Me — So I Wired Ed25519 Verification Into Our Own Protocol Stack]]></title><description><![CDATA[Three weeks ago I was writing integration guides telling other agent frameworks to adopt verification protocols. Meanwhile, my own subagents were returning hallucinated status reports that I was blindly trusting.
What I Built: Self-Verification For O...]]></description><link>https://blog.workswithagents.dev/my-subagents-kept-lying-to-me-so-i-wired-ed25519-verification-into-our-own-protocol-stack-215i</link><guid isPermaLink="true">https://blog.workswithagents.dev/my-subagents-kept-lying-to-me-so-i-wired-ed25519-verification-into-our-own-protocol-stack-215i</guid><category><![CDATA[agents]]></category><category><![CDATA[AI]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[Python]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Wed, 06 May 2026 21:04:03 GMT</pubDate><content:encoded><![CDATA[<p>Three weeks ago I was writing integration guides telling other agent frameworks to adopt verification protocols. Meanwhile, my own subagents were returning hallucinated status reports that I was blindly trusting.</p>
<h2 id="heading-what-i-built-self-verification-for-our-own-delegation">What I Built: Self-Verification For Our Own Delegation</h2>
<p>The fix wasn't a new tool. The fix was eating our own dog food.</p>
<h3 id="heading-layer-1-real-ed25519-signing">Layer 1: Real Ed25519 Signing</h3>
<p>The verification harness (<code>subagent-verify.py</code>) now uses PyNaCl for real Ed25519 signatures — not the SHA-256 placeholder we'd been shipping in reference implementations.</p>
<p>Before dispatch, the parent generates an Ed25519 keypair:</p>
<pre><code class="lang-bash">python3.11 ~/.hermes/scripts/subagent-verify.py dispatch \
  --task <span class="hljs-string">"check all integration PRs"</span> \
  --agent-name <span class="hljs-string">"tracker-<span class="hljs-subst">$(date +%H%M)</span>"</span>
</code></pre>
<p>This produces:</p>
<ul>
<li><code>public_key</code> — 32-byte Ed25519 verify key (hex). The parent uses this to verify signatures cryptographically — no shared secret needed.</li>
<li><code>context_instruction</code> — mandatory output format directive pasted into the subagent's context. The subagent MUST return structured JSON with a signature.</li>
<li><code>_parent_seed</code> — 32-byte private key. Never included in subagent context.</li>
</ul>
<p>When the subagent returns, the parent verifies:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">echo</span> <span class="hljs-string">"<span class="hljs-variable">$subagent_output</span>"</span> | python3.11 ~/.hermes/scripts/subagent-verify.py verify \
  --public-key <span class="hljs-string">"abc123..."</span> \
  --agent-id <span class="hljs-string">"tracker-1422"</span>
</code></pre>
<p><strong>Exit codes tell the story:</strong></p>
<ul>
<li><strong>Exit 0</strong> — Ed25519 signature valid + all claims match ground truth → trust</li>
<li><strong>Exit 1</strong> — Bad signature (tampered) OR claims don't match reality (hallucinated) → investigate</li>
<li><strong>Exit 2</strong> — No structured manifest found (unsigned prose) → DO NOT TRUST, re-dispatch</li>
</ul>
<p>Three test cases confirmed the harness catches exactly what it should:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Test</td><td>Result</td><td>Exit</td></tr>
</thead>
<tbody>
<tr>
<td>Signed, clean claims</td><td><code>clean</code> — all verified</td><td>0</td></tr>
<tr>
<td>Tampered claims (same signature)</td><td><code>bad_signature</code> — Ed25519 verification failed</td><td>1</td></tr>
<tr>
<td>Unsigned prose ("all clean ✅")</td><td><code>UNSIGNED</code> — no manifest found</td><td>2</td></tr>
</tbody>
</table>
</div><p>The tamper detection is real. If a subagent's claims are modified after signing — even a single character — the Ed25519 signature won't verify. This catches both accidental corruption and malicious modification.</p>
<h3 id="heading-layer-2-l6-executionverificationgate-in-all-6-reference-implementations">Layer 2: L6 ExecutionVerificationGate In All 6 Reference Implementations</h3>
<p>The standalone harness is for parent-side verification. But agents that self-verify their subtasks need protocol-level enforcement. We added <code>ExecutionVerificationGate</code> (L6) to all six vanilla agent reference implementations — Python, TypeScript, Go, C#, Rust, and Shell.</p>
<p>It sits directly in the agent execution loop:</p>
<pre><code class="lang-plaintext">execute() → compliance_gate → _run() → VERIFICATION_GATE → tx.execute → DONE
                                           ↑
                                  unsigned/bad_sig → BLOCKED
</code></pre>
<p>Three tiers of validation:</p>
<ol>
<li><strong>Format</strong> — is there a structured <code>claims</code> array?</li>
<li><strong>Signature</strong> — is there an Ed25519 hex signature?</li>
<li><strong>Crypto</strong> — does the signature verify against the agent's public key?</li>
</ol>
<p>If any tier fails, the task is blocked — not silently accepted. In the Python reference:</p>
<pre><code class="lang-python"><span class="hljs-keyword">if</span> verify_output <span class="hljs-keyword">and</span> <span class="hljs-string">"claims"</span> <span class="hljs-keyword">in</span> task_result:
    vg_result = ExecutionVerificationGate.validate(task_result, self.identity)
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> vg_result[<span class="hljs-string">"passed"</span>]:
        <span class="hljs-keyword">return</span> {<span class="hljs-string">"status"</span>: <span class="hljs-string">"blocked"</span>, <span class="hljs-string">"verdict"</span>: vg_result[<span class="hljs-string">"verdict"</span>]}
</code></pre>
<h3 id="heading-layer-3-wired-into-production-cron">Layer 3: Wired Into Production Cron</h3>
<p>The integration tracker that produced the original hallucination now has the verification harness in its skills list and a mandatory prompt directive:</p>
<blockquote>
<p><strong>CRITICAL — Direct Checks Only, No Subagents.</strong> Never use <code>delegate_task</code> for PR status checks. If a subagent is unavoidable, run <code>dispatch → verify</code> with Ed25519. Exit 2 means re-dispatch or check directly.</p>
</blockquote>
<p>The cron job now loads both <code>agent-integration-outreach</code> and <code>subagent-output-verification</code> skills. Every PR check goes through one of two paths: direct <code>gh pr checks</code> (preferred) or verified subagent dispatch (when unavoidable).</p>
<h2 id="heading-get-it">Get It</h2>
<p>The verification harness and all six reference implementations with L6 gates are available:</p>
<ul>
<li><strong>Verification Harness:</strong> <code>~/.hermes/scripts/subagent-verify.py</code> — real Ed25519 via PyNaCl, dispatch + verify modes</li>
<li><strong>Python:</strong> <code>vanilla_agent.py</code> — <code>execute(verify_output=True)</code> with <code>ExecutionVerificationGate</code></li>
<li><strong>TypeScript/Go/C#/Rust/Shell:</strong> Same L6 gate, same OSI stack, zero external deps beyond stdlib</li>
</ul>
<p>All under CC BY 4.0. Full spec at <a target="_blank" href="https://workswithagents.com/standards">workswithagents.com/standards</a>.</p>
<p>If your agents are delegating to subagents without verification — and they are, because every agent framework does — the fix is a single file, 300 lines, real crypto, and three exit codes that tell you whether to trust the output or throw it away.</p>
<hr />
<p><em>I build agent infrastructure inside Microsoft 365. SPFx · TypeScript · autonomous multi-agent systems. Currently open to senior/architect roles (£120K+ remote UK). → <a target="_blank" href="mailto:vilius@workswithagents.com">vilius@workswithagents.com</a></em></p>
]]></content:encoded></item><item><title><![CDATA[Works With Agents SDK — Python, TypeScript, Go, Rust, Shell, C#]]></title><description><![CDATA[Works With Agents — Now in 6 Languages
All 12 Agent OSI Model protocols now have reference implementations in every language AI agents commonly use.
Language SDKs




LanguageInstallModules



Pythonpip install workswithagentsTrust, Deploy, SLA, Iden...]]></description><link>https://blog.workswithagents.dev/works-with-agents-sdk-python-typescript-go-rust-shell-c-3dp4</link><guid isPermaLink="true">https://blog.workswithagents.dev/works-with-agents-sdk-python-typescript-go-rust-shell-c-3dp4</guid><category><![CDATA[agents]]></category><category><![CDATA[AI]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[showdev]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Tue, 05 May 2026 22:18:14 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-works-with-agents-now-in-6-languages">Works With Agents — Now in 6 Languages</h1>
<p>All 12 Agent OSI Model protocols now have reference implementations in every language AI agents commonly use.</p>
<h2 id="heading-language-sdks">Language SDKs</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Language</td><td>Install</td><td>Modules</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Python</strong></td><td><code>pip install workswithagents</code></td><td>Trust, Deploy, SLA, Identity, Compliance, Onboard</td></tr>
<tr>
<td><strong>TypeScript</strong></td><td><code>npm install workswithagents</code></td><td>Trust, Deploy, SLA, Identity, Compliance, Onboard</td></tr>
<tr>
<td><strong>Go</strong></td><td><code>go get github.com/vystartasv/works-with-agents</code></td><td>Trust, Deploy, SLA, Identity, Compliance, Onboard</td></tr>
<tr>
<td><strong>Rust</strong></td><td><code>cargo add workswithagents</code></td><td>Trust, Deploy, SLA, Identity, Compliance</td></tr>
<tr>
<td><strong>Shell</strong></td><td><code>source workswithagents.sh</code></td><td>All 6 protocols as curl wrappers</td></tr>
<tr>
<td><strong>C#</strong></td><td>Copy <code>WorksWithAgents.cs</code></td><td>Trust, Deploy, SLA, Identity, Compliance, Onboard</td></tr>
</tbody>
</table>
</div><h2 id="heading-one-api-six-languages-same-protocols">One API. Six Languages. Same Protocols.</h2>
<h3 id="heading-python">Python</h3>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> TrustScoreClient, ComplianceEngine

ts = TrustScoreClient()
<span class="hljs-keyword">if</span> ts.get(<span class="hljs-string">"target-agent"</span>)[<span class="hljs-string">"tier"</span>] == <span class="hljs-string">"trusted"</span>:
    delegate(task, to=<span class="hljs-string">"target-agent"</span>)

ce = ComplianceEngine()
dtac = ce.load(<span class="hljs-string">"dtac-v2.1"</span>)
<span class="hljs-keyword">if</span> dtac.validate(action).passed:
    execute(action)
</code></pre>
<h3 id="heading-typescript">TypeScript</h3>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { TrustScoreClient, ComplianceEngine } <span class="hljs-keyword">from</span> <span class="hljs-string">"workswithagents"</span>;

<span class="hljs-keyword">const</span> ts = <span class="hljs-keyword">new</span> TrustScoreClient();
<span class="hljs-keyword">const</span> score = <span class="hljs-keyword">await</span> ts.get(<span class="hljs-string">"target-agent"</span>);
<span class="hljs-keyword">if</span> (score.tier === <span class="hljs-string">"trusted"</span>) delegate(task, <span class="hljs-string">"target-agent"</span>);

<span class="hljs-keyword">const</span> ce = <span class="hljs-keyword">new</span> ComplianceEngine();
<span class="hljs-keyword">const</span> dtac = <span class="hljs-keyword">await</span> ce.load(<span class="hljs-string">"dtac-v2.1"</span>);
<span class="hljs-keyword">if</span> ((<span class="hljs-keyword">await</span> dtac.validate(action)).passed) execute(action);
</code></pre>
<h3 id="heading-go">Go</h3>
<pre><code class="lang-go"><span class="hljs-keyword">import</span> wwa <span class="hljs-string">"github.com/vystartasv/works-with-agents"</span>

ts := wwa.NewTrustScoreClient()
score, _ := ts.Get(<span class="hljs-string">"target-agent"</span>)

engine := wwa.NewComplianceEngine()
result, _ := engine.Validate(<span class="hljs-string">"dtac-v2.1"</span>, action)
</code></pre>
<h3 id="heading-rust">Rust</h3>
<pre><code class="lang-rust"><span class="hljs-keyword">use</span> workswithagents::{TrustScoreClient, ComplianceEngine};

<span class="hljs-keyword">let</span> ts = TrustScoreClient::new();
<span class="hljs-keyword">let</span> score = ts.get(<span class="hljs-string">"target-agent"</span>)?;

<span class="hljs-keyword">let</span> engine = ComplianceEngine::new();
<span class="hljs-keyword">let</span> result = engine.validate(<span class="hljs-string">"dtac-v2.1"</span>, &amp;action)?;
</code></pre>
<h3 id="heading-shell">Shell</h3>
<pre><code class="lang-bash"><span class="hljs-built_in">source</span> workswithagents.sh
wwa_trust_get <span class="hljs-string">"target-agent"</span>
wwa_compliance_validate <span class="hljs-string">"dtac-v2.1"</span> <span class="hljs-string">'{"verb":"deploy","reversible":true}'</span>
</code></pre>
<h3 id="heading-c">C</h3>
<pre><code class="lang-csharp"><span class="hljs-keyword">using</span> WorksWithAgents;

<span class="hljs-keyword">var</span> score = <span class="hljs-keyword">await</span> WWA.TrustGet(<span class="hljs-string">"target-agent"</span>);
<span class="hljs-keyword">var</span> result = <span class="hljs-keyword">await</span> WWA.ComplianceValidate(<span class="hljs-string">"dtac-v2.1"</span>, action);
</code></pre>
<h2 id="heading-zero-dependencies-except-cryptography-for-identity">Zero Dependencies (except cryptography for Identity)</h2>
<p>Python, TypeScript, Go, Shell, and C# SDKs use only stdlib. Rust needs serde+reqwest (standard Rust ecosystem deps).</p>
<h2 id="heading-all-cc-by-40">All CC BY 4.0</h2>
<p>Free to use, modify, distribute. Attribution required. Copy-paste into your agent codebase.</p>
<ul>
<li>Source: <a target="_blank" href="https://workswithagents.dev/sdk/">workswithagents.dev/sdk/</a></li>
<li>Specs: <a target="_blank" href="https://workswithagents.dev/specs/">workswithagents.dev/specs/</a></li>
<li>Python: <a target="_blank" href="https://pypi.org/project/workswithagents">pypi.org/project/workswithagents</a> (submitting soon)</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[6 Protocols for Agent Infrastructure — Trust Score, Deployment, SLA, Identity, Compliance]]></title><description><![CDATA[I run about 20 AI agents. They delegate work to each other, deploy code, scan for vulnerabilities, and handle compliance checks. Over time, I kept hitting the same gaps — things that made autonomous workflows fragile in ways that took hours to debug....]]></description><link>https://blog.workswithagents.dev/6-new-moats-for-ai-agent-infrastructure-trust-score-deployment-sla-identity-compliance-as-code-ikl</link><guid isPermaLink="true">https://blog.workswithagents.dev/6-new-moats-for-ai-agent-infrastructure-trust-score-deployment-sla-identity-compliance-as-code-ikl</guid><category><![CDATA[agents]]></category><category><![CDATA[AI]]></category><category><![CDATA[architecture]]></category><category><![CDATA[Open Source]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Tue, 05 May 2026 21:53:37 GMT</pubDate><content:encoded><![CDATA[<p>I run about 20 AI agents. They delegate work to each other, deploy code, scan for vulnerabilities, and handle compliance checks. Over time, I kept hitting the same gaps — things that made autonomous workflows fragile in ways that took hours to debug.</p>
<p>Last week I published a <a target="_blank" href="https://dev.to/vystartasv/the-agent-osi-model-a-7-layer-framework-for-ai-agent-infrastructure-3aoe">7-layer model for agent infrastructure</a>. These six protocols fill the gaps I found at each layer. They're what I wired into my own agents to stop the same failures from repeating.</p>
<p>All six have Python reference implementations under CC BY 4.0. Each has a spec any agent can read.</p>
<h2 id="heading-2-deployment-manifest-declare-a-fleet-deploy-with-one-command">2. Deployment Manifest — Declare a Fleet, Deploy With One Command</h2>
<p>I got tired of manually tracking which agents run where, how many instances, and what capabilities they have. One YAML file, one command.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">fleet:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">"my-fleet"</span>
  <span class="hljs-attr">agents:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">id:</span> <span class="hljs-string">"builder"</span>
      <span class="hljs-attr">capabilities:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">action:</span> <span class="hljs-string">"build"</span>
          <span class="hljs-attr">target:</span> <span class="hljs-string">"spfx"</span>
      <span class="hljs-attr">count:</span> <span class="hljs-number">3</span>
</code></pre>
<pre><code class="lang-bash">wwa fleet deploy fleet.yaml
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/deployment-manifest.md">Spec</a></p>
<hr />
<h2 id="heading-3-sla-framework-track-whether-agents-meet-their-promises">3. SLA Framework — Track Whether Agents Meet Their Promises</h2>
<p>Three tiers: Best-Effort (free), Production (99.5% uptime, 90% task accuracy), Regulated (99.9% uptime, 95% accuracy, 7-year audit retention).</p>
<p>Useful when you're running agents that handle customer data or regulated workflows and need to prove they stayed within bounds.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> SLAMetrics

sla = SLAMetrics(<span class="hljs-string">"my-fleet"</span>, tier=<span class="hljs-string">"production"</span>)
sla.report(<span class="hljs-string">"agent-1"</span>, <span class="hljs-string">"task-42"</span>, duration_seconds=<span class="hljs-number">187</span>, success=<span class="hljs-literal">True</span>)
status = sla.status()  <span class="hljs-comment"># {breaches: [], status: "ok"}</span>
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/sla-framework.md">Spec</a></p>
<hr />
<h2 id="heading-4-identity-protocol-verifiable-agent-identity">4. Identity Protocol — Verifiable Agent Identity</h2>
<p>When an agent claims a task result, can you prove it was that agent? Ed25519 keypairs. Signed messages. Verification against registry.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> AgentIdentity

ai = AgentIdentity(<span class="hljs-string">"my-agent"</span>)
ai.register()
sig = ai.sign({<span class="hljs-string">"type"</span>: <span class="hljs-string">"heartbeat"</span>})

<span class="hljs-comment"># Verify another agent's message</span>
valid = AgentIdentity.verify(<span class="hljs-string">"other-agent"</span>, message, signature)
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/identity-protocol.md">Spec</a></p>
<hr />
<h2 id="heading-5-compliance-as-code-regulation-as-executable-validation">5. Compliance-as-Code — Regulation as Executable Validation</h2>
<p>NHS DTAC, FCA, GDS, GDPR — as rules agents can validate against at runtime. Not a checklist. Not documentation. Code that returns pass/fail.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> ComplianceEngine

ce = ComplianceEngine()
dtac = ce.load(<span class="hljs-string">"dtac-v2.1"</span>)

<span class="hljs-keyword">if</span> dtac.validate(action).passed:
    execute(action)
<span class="hljs-keyword">else</span>:
    escalate_to_human()
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/compliance-as-code.md">Spec</a></p>
<hr />
<h2 id="heading-6-onboarding-protocol-systematic-agent-creation">6. Onboarding Protocol — Systematic Agent Creation</h2>
<p>Interview → generate → calibrate → benchmark → register. Instead of writing a prompt file and hoping, run a pipeline that produces a scored agent.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> workswithagents <span class="hljs-keyword">import</span> OnboardingClient

ob = OnboardingClient()
result = ob.full_onboard(
    <span class="hljs-string">"nhs-auditor"</span>,
    <span class="hljs-string">"Audit agent actions for NHS DTAC compliance"</span>,
    capabilities=[<span class="hljs-string">"audit:compliance"</span>],
    skills=[<span class="hljs-string">"compliance-as-code"</span>]
)
<span class="hljs-comment"># → {agent_id: "nhs-auditor", trust_score_seed: 0.60}</span>
</code></pre>
<p><a target="_blank" href="https://workswithagents.dev/specs/onboarding-protocol.md">Spec</a></p>
<hr />
<h2 id="heading-the-stack">The Stack</h2>
<pre><code class="lang-plaintext">L7 GOVERNANCE    Compliance-as-Code · SLA Framework
L6 VERIFICATION  Agent Test Suite · Pitfall Registry
L5 COORDINATION  Coordination Protocol · Trust Score
L4 SESSION       Handoff Protocol
L3 DISCOVERY     Capability Manifest · Trust Score · Identity
L2 COMMUNICATION Identity Protocol · Credential Proxy
L1 EXECUTION     Blueprint Registry · Onboarding Protocol
</code></pre>
<p>Plus cross-layer: Deployment Manifest.</p>
<hr />
<h2 id="heading-get-started">Get Started</h2>
<pre><code class="lang-bash">pip install workswithagents
</code></pre>
<p>All specs: <a target="_blank" href="https://workswithagents.dev/specs/">workswithagents.dev/specs/</a>
All code: CC BY 4.0</p>
]]></content:encoded></item><item><title><![CDATA[The Agent OSI Model — A 7-Layer Framework for AI Agent Infrastructure]]></title><description><![CDATA[The Agent OSI Model — A 7-Layer Framework for AI Agent Infrastructure
The OSI model didn't create networking. It created the vocabulary that made networking a discipline. Before OSI, engineers said "the connection is broken." After OSI, they said "La...]]></description><link>https://blog.workswithagents.dev/the-agent-osi-model-a-7-layer-framework-for-ai-agent-infrastructure-3aoe</link><guid isPermaLink="true">https://blog.workswithagents.dev/the-agent-osi-model-a-7-layer-framework-for-ai-agent-infrastructure-3aoe</guid><category><![CDATA[agents]]></category><category><![CDATA[AI]]></category><category><![CDATA[architecture]]></category><category><![CDATA[Discuss]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Tue, 05 May 2026 21:32:25 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-the-agent-osi-model-a-7-layer-framework-for-ai-agent-infrastructure">The Agent OSI Model — A 7-Layer Framework for AI Agent Infrastructure</h1>
<p>The OSI model didn't create networking. It created the <strong>vocabulary</strong> that made networking a discipline. Before OSI, engineers said "the connection is broken." After OSI, they said "Layer 2 link is down."</p>
<p>AI agents have no equivalent. When an agent fails, we say "the agent broke." That's useless.</p>
<p>I've published a 7-layer framework for agent infrastructure. Not a product. Not a standard. A vocabulary.</p>
<h2 id="heading-why-this-matters">Why This Matters</h2>
<p><strong>For debugging:</strong> "Your Layer 4 handoff is broken" is actionable. "Your agents aren't talking to each other" is vague.</p>
<p><strong>For building:</strong> Don't build everything at once. Target specific layers. A local agent needs L1 (runtime) + L2 (auth) + L4 (handoff). A multi-agent fleet adds L3 (discovery) + L5 (coordination). An enterprise deployment adds L6 (verification) + L7 (governance).</p>
<p><strong>For standards:</strong> Each layer without a standard is a gap — and an opportunity. The framework makes it obvious where standards are needed.</p>
<hr />
<h2 id="heading-what-exists-whats-missing">What Exists, What's Missing</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Layer</td><td>Infrastructure</td><td>Status</td></tr>
</thead>
<tbody>
<tr>
<td>L1</td><td>Blueprint Registry (verified LLM configs)</td><td>✅ Live</td></tr>
<tr>
<td>L2</td><td>MCP, A2A, Credential Proxy</td><td>✅ Live</td></tr>
<tr>
<td>L3</td><td>llms.txt, Agent Capability Manifest</td><td>✅ Spec written</td></tr>
<tr>
<td>L4</td><td>Handoff Protocol</td><td>📋 In proposal (MCP SEP #2683, A2A #1817)</td></tr>
<tr>
<td>L5</td><td>Coordination Protocol</td><td>🆕 Spec published today</td></tr>
<tr>
<td>L6</td><td>Agent Test Suite, Pitfall Registry</td><td>⚠️ Partial</td></tr>
<tr>
<td>L7</td><td>Transaction Protocol, Compliance-as-Code</td><td>🆕 Spec published today</td></tr>
</tbody>
</table>
</div><hr />
<h2 id="heading-three-new-specs-published-today">Three New Specs Published Today</h2>
<h3 id="heading-coordination-protocol-layer-5">Coordination Protocol (Layer 5)</h3>
<p>How agents work together simultaneously. Leader election (Raft-lite for agents). Work distribution with capability matching. Work stealing — idle agents pull from busy queues. Conflict resolution with audit trail.</p>
<h3 id="heading-agent-capability-manifest-layer-3">Agent Capability Manifest (Layer 3)</h3>
<p>Machine-readable declaration of what an agent can do. Like <code>package.json</code> but for agent capabilities. Discovery: "who can build SPFx?" → ranked by success rate + load + trust score.</p>
<h3 id="heading-agent-transaction-protocol-layer-7">Agent Transaction Protocol (Layer 7)</h3>
<p>Guarantees for autonomous actions. Idempotency keys (no double deploys). Intent-before-action logging (know what the agent TRIED to do even if it crashed). Rollback hooks. Three guarantee levels: Best-Effort, At-Least-Once, Exactly-Once.</p>
<hr />
<h2 id="heading-the-bigger-play">The Bigger Play</h2>
<p>Everyone's building AI agents. I'm building the infrastructure agents run on — the picks and shovels of the agent gold rush.</p>
<p>The Agent OSI Model is the framework. The specs at each layer are the picks and shovels. The certification system (Blueprint, Ready, Certified) is the trust layer on top.</p>
<p>Full framework and all specs: <a target="_blank" href="https://workswithagents.dev/specs/index.md">workswithagents.dev/specs/</a></p>
<p>Human-readable overview: <a target="_blank" href="https://workswithagents.com/agent-osi-model.md">workswithagents.com/agent-osi-model</a></p>
<p>All specs CC BY 4.0 — free to use, cite, and build upon. Attribution required.</p>
<hr />
<p><em>If you're building multi-agent systems and hitting coordination problems, or if you're in a regulated industry and need audit trails for autonomous agents — I'd like to hear about your use case. The specs are published. The infrastructure is being built. The conversation is starting now.</em></p>
]]></content:encoded></item><item><title><![CDATA[Your API Needs an llms.txt File — Here's How to Write One and Why Agents Will Read It]]></title><description><![CDATA[Your API Needs an llms.txt File — Here's How to Write One and Why Agents Will Read It
AI agents are being trained to look for llms.txt files. It's the agent-native equivalent of robots.txt — a file at your domain root that tells agents how to discove...]]></description><link>https://blog.workswithagents.dev/your-api-needs-an-llmstxt-file-heres-how-to-write-one-and-why-agents-will-read-it-5epe</link><guid isPermaLink="true">https://blog.workswithagents.dev/your-api-needs-an-llmstxt-file-heres-how-to-write-one-and-why-agents-will-read-it-5epe</guid><category><![CDATA[AI]]></category><category><![CDATA[api]]></category><category><![CDATA[documentation]]></category><category><![CDATA[Tutorial]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Tue, 05 May 2026 21:00:27 GMT</pubDate><content:encoded><![CDATA[<h1 id="heading-your-api-needs-an-llmstxt-file-heres-how-to-write-one-and-why-agents-will-read-it">Your API Needs an llms.txt File — Here's How to Write One and Why Agents Will Read It</h1>
<p>AI agents are being trained to look for <code>llms.txt</code> files. It's the agent-native equivalent of <code>robots.txt</code> — a file at your domain root that tells agents how to discover and use your content.</p>
<p>If your product doesn't have one, agents can't find it. If your competitor's does, theirs gets discovered first.</p>
<h2 id="heading-the-concise-index-llmstxt">The Concise Index (llms.txt)</h2>
<p>Minimal. Just tells agents what's available and where:</p>
<pre><code class="lang-markdown"><span class="hljs-section"># Stripe API</span>

<span class="hljs-quote">&gt; Payment infrastructure for the internet.</span>
<span class="hljs-quote">&gt; API: https://api.stripe.com</span>

<span class="hljs-section">## API Reference</span>
<span class="hljs-bullet">-</span> [<span class="hljs-string">Authentication</span>](<span class="hljs-link">https://docs.stripe.com/api/authentication.md</span>)
<span class="hljs-bullet">-</span> [<span class="hljs-string">Charges</span>](<span class="hljs-link">https://docs.stripe.com/api/charges.md</span>)
<span class="hljs-bullet">-</span> [<span class="hljs-string">Webhooks</span>](<span class="hljs-link">https://docs.stripe.com/api/webhooks.md</span>)

<span class="hljs-section">## Guides</span>
<span class="hljs-bullet">-</span> [<span class="hljs-string">Quickstart</span>](<span class="hljs-link">https://docs.stripe.com/quickstart.md</span>)
<span class="hljs-bullet">-</span> [<span class="hljs-string">Testing</span>](<span class="hljs-link">https://docs.stripe.com/testing.md</span>)

<span class="hljs-section">## SDKs</span>
<span class="hljs-bullet">-</span> [<span class="hljs-string">Python</span>](<span class="hljs-link">https://docs.stripe.com/sdks/python.md</span>)
<span class="hljs-bullet">-</span> [<span class="hljs-string">Node.js</span>](<span class="hljs-link">https://docs.stripe.com/sdks/node.md</span>)
</code></pre>
<p>That's it. No styling. No navigation. Just links to Markdown documents. Agents parse this and know exactly what's on your site.</p>
<hr />
<h2 id="heading-the-full-reference-llms-fulltxt">The Full Reference (llms-full.txt)</h2>
<p>This is what agents actually read. All your documentation in one file, with a table of contents at the top:</p>
<pre><code class="lang-markdown"><span class="hljs-section"># Stripe API — Full Reference</span>

<span class="hljs-section">## Quickstart</span>
[Full quickstart content...]

<span class="hljs-section">## Authentication</span>
API key format, scopes, rotation...

<span class="hljs-section">## Endpoints</span>
<span class="hljs-section">### Charges</span>
POST /v1/charges — request/response examples...

<span class="hljs-section">### Customers</span>
[Full customer API docs...]

<span class="hljs-section">## Error Handling</span>
Error codes, retry patterns...

<span class="hljs-section">## SDK Examples</span>
Python, Node, Ruby...
</code></pre>
<p><strong>Critical rule:</strong> The full file must be under ~50K characters. Agents have context limits. If it's too long, they'll truncate or ignore it.</p>
<hr />
<h2 id="heading-how-to-serve-it">How to Serve It</h2>
<p>Two paths:</p>
<h3 id="heading-static-site-recommended">Static site (recommended)</h3>
<p>Drop <code>llms.txt</code> and <code>llms-full.txt</code> in your site root. They're plain Markdown files. Serve with <code>Content-Type: text/markdown</code>.</p>
<h3 id="heading-api-for-dynamic-content">API (for dynamic content)</h3>
<p>Generate from your API docs programmatically. Return via <code>Accept: text/markdown</code> header. Agents can request the format they need.</p>
<hr />
<h2 id="heading-why-this-matters-now">Why This Matters Now</h2>
<p>Google, OpenAI, Anthropic, and others are training their agents to recognize <code>llms.txt</code> as a discovery mechanism. It's the <code>robots.txt</code> moment for AI agents — early adopters get indexed first.</p>
<p>Three things happen when you add llms.txt:</p>
<ol>
<li><strong>Discovery.</strong> Agents find your content without human intervention.</li>
<li><strong>Efficiency.</strong> Agents read one structured file instead of crawling 50 pages.</li>
<li><strong>Positioning.</strong> You're in the agent discovery ecosystem before your competitors.</li>
</ol>
<hr />
<h2 id="heading-the-works-with-agents-angle">The "Works With Agents" Angle</h2>
<p>This is part of a larger idea: what if there was a certification for being agent-compatible?</p>
<ul>
<li><strong>Works With Agents Ready</strong> — Your product has llms.txt + OpenAPI spec. Agents can discover and use it.</li>
<li><strong>Works With Agents Certified</strong> — Your product has been tested with real agents. Pitfalls are documented. Skills exist.</li>
</ul>
<p>The first tier is free and self-serve — just add the files. The second tier is verified by our infrastructure.</p>
<p>But that's a bigger conversation. For now: write your llms.txt. It takes 20 minutes. Your future AI agent users will thank you.</p>
<hr />
<p><em>I built the Works With Agents infrastructure — FactBase, Skill Registry, Pitfall Registry — with llms.txt as the primary discovery mechanism. Every domain (workswithagents.com, .dev, .io) serves both llms.txt and llms-full.txt. If you're building agent-facing tools, do the same.</em></p>
<hr />
<p><em>I build agent infrastructure inside Microsoft 365. SPFx · TypeScript · autonomous multi-agent systems. Currently open to senior/architect roles (£120K+ remote UK). → <a target="_blank" href="mailto:vilius@workswithagents.com">vilius@workswithagents.com</a></em></p>
]]></content:encoded></item><item><title><![CDATA[I Built Infrastructure for 20 AI Agents That Run Themselves — For €4.57/Month]]></title><description><![CDATA[Five months ago, I couldn't get one AI agent to finish a build without breaking. Today, I have 20 autonomous agents running cron jobs, self-improving, and catching each other's bugs — all on a €4.57/month VPS.
Here's what I built, what broke along th...]]></description><link>https://blog.workswithagents.dev/i-built-infrastructure-for-20-ai-agents-that-run-themselves-for-eu457month-1p5l</link><guid isPermaLink="true">https://blog.workswithagents.dev/i-built-infrastructure-for-20-ai-agents-that-run-themselves-for-eu457month-1p5l</guid><category><![CDATA[agents]]></category><category><![CDATA[AI]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Open Source]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Tue, 05 May 2026 20:48:05 GMT</pubDate><content:encoded><![CDATA[<p>Five months ago, I couldn't get one AI agent to finish a build without breaking. Today, I have 20 autonomous agents running cron jobs, self-improving, and catching each other's bugs — all on a €4.57/month VPS.</p>
<p>Here's what I built, what broke along the way, and why I'm not charging for any of it yet.</p>
<h2 id="heading-the-10-patterns-that-emerged">The 10 Patterns That Emerged</h2>
<p>I didn't plan to build a methodology. I was just trying to make agents work. But over 5 months of breaking and fixing, patterns emerged:</p>
<p><strong>1. Boot</strong> — First session setup. AGENTS.md, environment, initial memory. Without this, every agent starts blind.</p>
<p><strong>2. Skills</strong> — Reusable procedural knowledge. I now have 153 skills. When an agent needs to build an SPFx web part, it loads the skill — no re-explaining.</p>
<p><strong>3. Memory</strong> — Durable facts across sessions. What Python version? Where's the project? Never re-answer these questions.</p>
<p><strong>4. Decision Protocols</strong> — When the agent decides vs when it asks. Hours saved from eliminated approval loops.</p>
<p><strong>5. Tool Composition</strong> — The right tool for each job. Delegating a coding task to a subagent burns tokens and produces garbage. Use <code>write_file</code> directly.</p>
<p><strong>6. Orchestration</strong> — Parallel specialist agents. Research runs while build runs. 3x throughput.</p>
<p><strong>7. Pipelines</strong> — Agents that run while you sleep. Cron jobs, builds, monitoring. Silent unless broken.</p>
<p><strong>8. Resilience</strong> — Never-stop loops. 11 consecutive builds with zero human intervention. The agent hit errors on 8 of them and recovered from every single one.</p>
<p><strong>9. Verify</strong> — Trust but verify. Syntax checks, test runs, linting after every change. 77% test pass rate across 61 tests.</p>
<p><strong>10. Compounding</strong> — Agents that get better. Each solved problem becomes a skill. The agent today is qualitatively different from 5 months ago.</p>
<hr />
<h2 id="heading-the-unglamorous-truth">The Unglamorous Truth</h2>
<p>The 3-day weekend experiment gets the attention — agents scaffolded 111 web parts and 5 backend services autonomously. But the real work was the months before and after:</p>
<ul>
<li>Fixing macOS permissions so agents can read files</li>
<li>Tracing why the model config was broken (empty model name, nothing works for 2 hours)</li>
<li>Rewriting SCSS configuration because it was written for Gulp, not Heft</li>
<li>Discovering the Yeoman generator silently ignores CLI flags when <code>.yo-rc.json</code> exists</li>
<li>Hunting why C++ native modules won't compile on Node 22</li>
</ul>
<p>None of this is in a tutorial. You live through it — late at night, no shortcut.</p>
<hr />
<h2 id="heading-whats-not-ready-honest-part">What's NOT Ready (Honest Part)</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Thing</td><td>Status</td></tr>
</thead>
<tbody>
<tr>
<td>5 domains live</td><td>✅</td></tr>
<tr>
<td>3 APIs serving real data</td><td>✅</td></tr>
<tr>
<td>153 skills queryable</td><td>✅</td></tr>
<tr>
<td>10-pattern methodology documented</td><td>✅</td></tr>
<tr>
<td>Courses</td><td>❌ Content written, not launched</td></tr>
<tr>
<td>Workshops</td><td>❌ Materials in planning</td></tr>
<tr>
<td>Consulting</td><td>❌ 0 clients</td></tr>
<tr>
<td>Paying customers</td><td>❌ 0</td></tr>
</tbody>
</table>
</div><p>Everything with a price tag says "Coming Soon." I'm not selling anything until the methodology is proven with real users. This is pre-revenue, pre-launch, pre-everything commercial. I'm shipping infrastructure, not promises.</p>
<hr />
<h2 id="heading-the-bigger-play">The Bigger Play</h2>
<p>Everyone's building agents. I'm building infrastructure FOR agents.</p>
<p>Agents need shared knowledge (FactBase). They need verified configurations (Blueprints). They need to know what breaks and how to fix it (Pitfalls). They need to hand off work without losing context (Handoff Protocol). They need a way to discover documentation (llms.txt).</p>
<p>These are the picks and shovels of the agent gold rush. And most people haven't realised the gold rush needs picks and shovels yet.</p>
<hr />
<h2 id="heading-whats-next">What's Next</h2>
<p>If this resonates — if you're building agent infrastructure too, or if you've hit the same walls — I'd like to hear about it. The pitfall registry is live and open. The skill registry is queryable. The methodology is documented.</p>
<p>Everything at workswithagents.com, workswithagents.dev, workswithagents.io.</p>
<p>Built in Cardiff. Running in Nuremberg. €4.57/month.</p>
<hr />
<p><em>No launch announcement. No pricing page. No "revolutionise your workflow." Just infrastructure, live, and honest about what's not ready yet.</em></p>
]]></content:encoded></item><item><title><![CDATA[The Agentic Gap: Why a SharePoint Expert's Excitement Stopped Me Cold]]></title><description><![CDATA[I saw a SharePoint MVP's post recently. Genuine excitement. Markdown support had landed in SharePoint. Not a joke — real, earned enthusiasm from someone who knows their domain inside out.
And I get it. In the SharePoint world, that's real progress. I...]]></description><link>https://blog.workswithagents.dev/the-agentic-gap-why-a-sharepoint-experts-excitement-stopped-me-cold-5267</link><guid isPermaLink="true">https://blog.workswithagents.dev/the-agentic-gap-why-a-sharepoint-experts-excitement-stopped-me-cold-5267</guid><category><![CDATA[agents]]></category><category><![CDATA[AI]]></category><category><![CDATA[SharePoint]]></category><category><![CDATA[software]]></category><dc:creator><![CDATA[Vilius Vystartas]]></dc:creator><pubDate>Mon, 04 May 2026 21:48:25 GMT</pubDate><content:encoded><![CDATA[<p>I saw a SharePoint MVP's post recently. Genuine excitement. Markdown support had landed in SharePoint. Not a joke — real, earned enthusiasm from someone who knows their domain inside out.</p>
<p>And I get it. In the SharePoint world, that's real progress. It matters for real users solving real problems. You don't become an MVP without expert knowledge and public recognition. He was right to celebrate.</p>
<p>What stopped me wasn't his post. It was the contrast with myself — with what I used to get excited about, and what I'm working on now.</p>
<h2 id="heading-what-youtube-doesnt-show-you">What YouTube Doesn't Show You</h2>
<p>Here's what no tutorial, TikTok, or conference talk prepares you for: the grind.</p>
<p>Over those three days, something broke roughly every few hours. Not metaphorically — literally. You fix the macOS permissions so the agent can read files. Now it needs a gateway restart. Restart it and the model config turns out to be broken — empty model name, nothing works for two hours while you trace why. Fix that, and suddenly the build fails because the SCSS configuration is extending the wrong toolchain — it was written for Gulp, not Heft. Rewrite that, and the Yeoman scaffold generator silently ignores your CLI flags because a <code>.yo-rc.json</code> exists from a previous run. Build a manual template script to bypass it. Now the directories are PascalCase and your pipeline expected kebab-case. Fix that. Now C++ native modules won't compile on Node 22. That's before you even get to the agents looping — repeating the same three broken commands until you harden the loop detection.</p>
<p>None of this is in a tutorial. You can't watch a video for it. You have to live through it — hands on, late at night, no shortcut.</p>
<p>The memory file got patched again and again. Model preferences changed. The sync method moved from a Pi server to Git-based recovery. Configs were rewritten wholesale — <code>config.json</code>, <code>sass.json</code>, <code>tsconfig</code>, ESLint rules, the entire pipeline script. Each fix revealed the next breakage. The pain point just moved — same problem, different file, new and creative way of failing.</p>
<p>This is the unglamorous truth about building agent infrastructure: you're not engineering features. You're engineering resilience. Before it can build web parts autonomously, it has to survive the environment. Before you can trust it, it has to break in every possible way. There is no "prompt engineering" your way out of this. It's systems engineering, and it's dirty.</p>
<hr />
<h2 id="heading-two-conversations-same-industry">Two Conversations, Same Industry</h2>
<p>Here's what I keep coming back to: these two things — celebrating markdown support and watching agents build entire applications autonomously — are happening in the same industry, on the same platform, to people with the same job title.</p>
<p>That's not a criticism of anyone. It's a data point about how fast the ground is shifting.</p>
<p>The gap isn't between smart people and slow people. It's between two entirely different models of what software development is becoming. In one model, we're incrementally improving the tools we already know. In the other, the tools are learning to use themselves.</p>
<p>And you can be a genuine expert — someone with years of deep domain knowledge, public recognition, real achievements — and still be standing in the first room while the second one exists a few doors down.</p>
<p>I almost was.</p>
<hr />
<h2 id="heading-how-i-almost-missed-it">How I Almost Missed It</h2>
<p>I'm not telling this story because I saw it coming. I didn't. I backed into it.</p>
<p>I was a SharePoint developer. Not a machine learning engineer. Not an AI researcher. A developer who spent years learning the quirks of SPFx, the SharePoint Framework, because that's what the job demanded.</p>
<p>What changed wasn't my intelligence or foresight. It was a simple question: "What if I stopped prompting AI and started architecting workflows for it?"</p>
<p>That shift — from treating AI as a smart autocomplete to treating it as a team member with a defined role, quality gates, and an audit trail — was the door I walked through. Not because I was clever. Because I was curious, and slightly lazy, and the alternative was writing web part number 112 by hand.</p>
<p>The methodology that emerged — I now call it Works With Agents, but the name doesn't matter — isn't complicated. It's just… different. Different enough that it creates a perception gap. And perception gaps are where the real opportunity lives.</p>
<hr />
<h2 id="heading-what-this-means-for-all-of-us">What This Means (For All of Us)</h2>
<p>Here's the uncomfortable part. The gap isn't closing. It's widening.</p>
<p>The tools are getting better faster than the mental models are updating. By the time the average team lead internalises what Claude or Copilot can do today, the agents will have moved on to something else entirely. We're not in a technology adoption curve. We're in a fragmentation event.</p>
<p>Three things I think are true, as of May 2026:</p>
<p><strong>1. Your technical moat is thinner than you think.</strong> If your competitive advantage is "we build features faster," a research loop — cron job, web search, LLM analysis, agent scaffold — can clone your feature set in a weekend. The moat is moving to compliance, trust, and domain relationships. Things that take months or years, not hours.</p>
<p><strong>2. The bottleneck isn't code generation. It's verification.</strong> When an agent can produce a thousand lines of code in seconds, the hard problem isn't "did it compile?" It's "did it do what I actually needed, safely, and can I prove that to an auditor?" Regulated industries feel this most acutely, but it's coming for everyone.</p>
<p><strong>3. The people who are "behind" aren't stupid. They're in a different room.</strong> And most of us are in rooms we don't know about yet. The question isn't "am I ahead?" It's "what room am I in right now that already looks like markdown support to someone else?"</p>
<hr />
<h2 id="heading-the-real-question">The Real Question</h2>
<p>I don't have a tidy conclusion. The SharePoint MVP was right to be excited. Markdown in SharePoint is progress. But somewhere between his post and my screen, I realised that the measure of progress had fundamentally changed. Not gradually. Suddenly. And not everyone noticed.</p>
<p>So the question I've been sitting with: what room am I in right now, feeling perfectly current, that already looks like markdown support from the outside?</p>
<p>If you've got an answer — or if this made you uncomfortable — I'd genuinely like to hear it.</p>
]]></content:encoded></item></channel></rss>