DazzaGreenwood's Weblog

Existing on the New Web

Dazza Greenwood — Tue, 25 Nov 2025 05:18:28 GMT

Stephen Burns runs a motorcycle repair shop out of his garage in Redwood City. He’s meticulous about local SEO and has been for years. But recently, customers started showing up who hadn’t found him through Google. They’d asked ChatGPT where to get their motorcycle fixed, and it sent them to his garage.

That story captures something important happening across the web right now. Discovery is being restructured. The customer journey increasingly runs through AI systems, and those systems have their own requirements for who they can see and recommend.

Burns got lucky: his content made it into the training data, and the model knew he existed. But many businesses aren’t so fortunate. And the unlucky ones often don’t even know they’re invisible.

The Shift (Almost) Nobody Prepared For

For twenty years, the web security playbook has been straightforward: humans good, bots bad. Build walls. Check CAPTCHAs. Rate-limit aggressively. Block anything that doesn’t look like a person clicking around.

That made sense when “bot” meant scrapers, spammers, and credential stuffers. But the category has fractured. Today, automated traffic includes:

Training crawlers harvesting content for AI model development (Common Crawl, GPTBot, ClaudeBot). These are extractive and periodic with no user behind them, just dataset assembly.

Retrieval bots fetching real-time information to augment AI responses (Perplexity, ChatGPT with browsing). These surface your content in AI-synthesized answers.

Transaction agents acting on direct behalf of users to accomplish specific goals: book a flight, compare insurance quotes, place an order, schedule an appointment.

That third category is the one that should keep business leaders up at night, not because it’s dangerous, but because it’s valuable, and we’re systematically blocking it.

When a user tells their AI assistant “find me a hotel in Lisbon under €200 with good reviews and book it,” that agent is a customer. It has intent, a task, and (via the user) a credit card. If your site can’t accommodate it - or worse, actively blocks it - you’ve lost a sale to a competitor whose infrastructure was ready.

Consider Children’s Hospital of Los Angeles, one of the top pediatric cancer centers in the United States. It’s effectively invisible to AI assistants. When parents ask Gemini or ChatGPT where to take a child with leukemia in LA, CHLA doesn’t appear, not because the hospital opted out, but because their CDN’s default settings block AI crawlers. Families may be unable to find potentially life-saving care because of a configuration choice the hospital may not even know was made.

That’s the current state: valuable, legitimate discovery and transaction pathways being severed by infrastructure designed for a different threat model.

Three Properties Your Web Presence Now Needs

I’ve been working on identity and authorization infrastructure for AI agents with colleagues across the industry, including co-authoring a recent whitepaper on the topic. We keep returning to the same framework. For your web presence to function in an agent-mediated world, it needs three properties:

Accessible: The agent can actually reach your content and services. Not blocked by CDN defaults, overzealous bot detection, or blanket crawler bans.

Legible: The agent can understand what it finds. Structured data, semantic markup, machine-readable formats. Not just pretty HTML that requires a human eye to interpret.

Actionable: The agent can do something. Complete a transaction, submit an inquiry, access a service. Not just read, but also act.

If any layer is missing, whether accessibility, legibility, or actionability, your web presence is invisible or inert to the fastest-growing discovery and transaction channel emerging today. Even if your site is live but not properly indexed for agent retrieval or omitted from the training corpus, you may still be invisible.

Most organizations have focused their AI strategy on the first category, namely training data accessibility, being “in the model.” That matters. But it’s table stakes. The real opportunity (and the real risk of missing out) is in the third category: enabling legitimate agents to transact on behalf of real users.

The Verification Problem (And Why It’s Being Solved)

The obvious objection: “How do I tell a legitimate agent from a malicious bot? They look the same at the firewall.”

Fair point. Today, they often do look the same. User-agent strings are trivially spoofable. Traffic patterns can be mimicked. This is a real problem.

But it’s being actively solved. The IETF is developing Web Bot Auth, a protocol that allows agents to cryptographically prove their identity within HTTP requests, essentially a passport for responsible agents. Major players like Cloudflare and Vercel are involved in the effort. AWS Bedrock AgentCore already supports Web Bot Auth to reduce CAPTCHAs when its agents browse protected sites. This isn’t speculative; it’s shipping.

On the authorization side, OAuth 2.1 extensions are being developed to support explicit delegated authority, a formal “on-behalf-of” flow where the agent’s access token contains two distinct identifiers: the user who granted permission and the agent performing the action. This is critically different from impersonation. It creates a clear, auditable link: you can see both who authorized the action and what performed it.

The infrastructure is coming. The question is whether you’ll be ready for it, or scrambling to catch up while your competitors capture the agent-mediated market.

What “Agent Optimization” Actually Means

We’ve spent two decades optimizing for search engines. Keywords, backlinks, page speed, mobile responsiveness, the whole SEO apparatus. Now a new optimization target is emerging: AI agents.

Agent Optimization means:

Structured data that agents can parse: Schema.org markup, JSON-LD, clear semantic HTML. If an agent can’t extract your pricing, availability, or booking endpoint programmatically, you don’t exist to it.
APIs and action endpoints: Not just content to read, but services to invoke. Can an agent place an order? Submit an inquiry? Check inventory? If the only path is clicking through a JavaScript-heavy checkout flow, you’re invisible to agent-mediated commerce.
Authentication infrastructure that distinguishes agent types: Allow legitimate agents through while maintaining security. This requires moving beyond binary “human or bot” detection to nuanced policies based on verified identity and delegated scope.
Consent and governance frameworks: When an agent accesses your systems on behalf of a user, what are the terms? What data can it retrieve? What actions can it perform? Clear policies, machine-readable where possible.

The organizations that build this infrastructure now will have a significant advantage as agent-mediated interaction becomes mainstream. Those that don’t will find themselves optimized out of an increasingly important channel.

The Stakes Are Higher Than You Think

Scenario 1: E-commerce. A user asks their AI assistant to “order more of that coffee I liked from last month.” The agent needs to access the user’s order history (with permission), find the product, check availability, and complete a purchase. If your site can’t support this flow, the agent will find a competitor who sells similar coffee and can support it. You didn’t lose a customer to a better product. You lost them to better infrastructure.

Scenario 2: Professional services. A business user tells their agent to “schedule a consultation with a commercial real estate attorney in Denver for next week.” The agent needs to find appropriate providers, check availability, and book an appointment. If your law firm’s website is a brochure with a “Contact Us” form and no structured data, the agent can’t engage. You don’t get the lead.

Scenario 3: B2B procurement. A procurement agent is tasked with “find three suppliers for industrial adhesives that meet our specs and request quotes.” The agent needs to query product databases, compare specifications, and initiate RFQ processes. If your supplier portal requires human navigation through nested menus, you’re not in the consideration set.

In each case, the failure isn’t about the quality of your product or service. It’s about the accessibility, legibility, and actionability of your web presence to AI agents acting as legitimate proxies for potential customers.

What You Should Do Now

1. Audit your current accessibility. Are AI crawlers being blocked by your CDN? Check your Cloudflare settings, your robots.txt, your rate-limiting rules. Tools like CanAISeeIt can analyze which known AI bots can access your site and how you’re showing up in AI-generated citations.

2. Assess your legibility. Can a machine parse your key information? Do you have structured data for products, services, pricing, availability, locations? Run your pages through schema validators. If an agent can’t extract the basics, you have work to do.

3. Evaluate your actionability. What can an agent actually do on your site? If the answer is “read content,” you’re only halfway there. Consider APIs, booking integrations, programmatic inquiry endpoints. What would it take for an agent to complete a transaction?

4. Develop agent access policies. Not all automated access is equal. Define what types of agents you want to support, under what conditions, with what verification. This is a policy decision, not just a technical one.

5. Watch the standards landscape. Web Bot Auth, OAuth for AI agents, MCP (Model Context Protocol), A2A (Agent-to-Agent protocol, and the related Agent Payment Protocol), these are developing rapidly. You don’t need to implement everything today, but you should understand what’s coming. To get started, check out this webinar I hosted last week discussing the emerging AI Agents standards race, with senior representatives from Visa, Stripe, Skyfire, and Consumer Reports.

6. Reframe the conversation internally. If your security team’s mandate is “block bots,” you have a framing problem. The mandate should be “enable legitimate access while blocking malicious actors.” Those are different objectives with different implementations.

7. Think in two layers: live retrieval and foundational memory. Your site must both be live-index-ready and training-corpus-visible.

For purposes of being open for business by AI Agents, your current site needs to be discoverable and indexable now by whatever live web feeds support retrieval-augmented generation (RAG) and AI-agent search. That means ensuring your content is live, indexed, updated, structured, and accessible.

But there’s a second, equally strategic layer: ensuring your content is included in the training data of large language models. Being in the training corpus doesn’t guarantee retrieval, but being absent from it dramatically lowers your odds of ever being surfaced.

Treat properly identified AI crawlers (like Common Crawl’s CCBot) as strategic stakeholders, not threats. Allow appropriate access. Mark your content as machine-readable. Opt in rather than blocking by default.

The formula: live indexing + training corpus inclusion = dual-path visibility in the era of agent-mediated discovery.

Practical Standards: What’s Working Now

The strategic framework matters, but so does implementation. Here’s what’s emerging as practical infrastructure for agent-readiness.

For Accessibility

robots.txt is getting AI-specific extensions. The Robots Exclusion Protocol (now RFC 9309) remains the baseline, but an IETF draft proposes syntax to distinguish AI training from inference, letting you permit RAG-style answers while blocking training ingestion, or vice versa. AI crawlers like GPTBot, ClaudeBot, and Google-Extended already check robots.txt.

Cloudflare now blocks AI crawlers by default for new customers. If you’re on Cloudflare, check your settings. Their AI Crawl Control features let you make nuanced decisions. Be intentional about your access policy rather than accepting defaults that may be making you invisible.

For Legibility

llms.txt is the clearest practical step you can take today. It’s a simple Markdown file at /llms.txt that provides a curated map of your most important content for AI systems: key docs, FAQs, policies, pricing, with links to clean Markdown versions where possible.

Here’s what a basic llms.txt file looks like:

# YourCompany.com

> Brief description of what your company does and what this site offers.

## Key Pages
- [Product Overview](/docs/product-overview.md): What we offer and how it works
- [Pricing](/pricing.md): Current plans and pricing
- [API Documentation](/docs/api.md): Full API reference for developers

## Support & Policies
- [FAQ](/faq.md): Common questions answered
- [Terms of Service](/legal/terms.md)
- [Contact](/contact.md): How to reach us

Adoption is growing. Directories like llmstxt.site and directory.llmstxt.cloud track hundreds of implementations. GitBook has published tutorials. CMS platforms are building auto-generation features.

I’ve implemented llms.txt on several of my own sites, and I plan to expand this significantly, adding Markdown versions of key content and keeping the files current. It’s one of the most concrete things you can do right now.

Structured data (JSON-LD / Schema.org) remains non-negotiable. Products, organizations, FAQs, events, locations, schema markup gives agents a machine-readable knowledge graph of your key entities.

For Actionability

Expose your services as tools, not just pages. If you have APIs, document them with OpenAPI/Swagger specs. Agents can ingest these and treat your API as a callable tool, placing orders, checking inventory, submitting inquiries, rather than screen-scraping checkout flows.

Consider MCP (Model Context Protocol). If you want agents to act on your services, exposing an MCP-compatible endpoint is increasingly the path. Your booking system, inventory lookup, or quote generator can become a tool that agents call directly, with proper authentication and scoping.

The /ask endpoint pattern is emerging. A Microsoft-Cloudflare collaboration is pushing a model where sites expose conversational interfaces: /ask for human Q&A, /mcp for agent tool calls, both backed by the same retrieval infrastructure. Forward-looking, but being built now.

For Diagnostics

Check where you stand. CanAISeeIt scores sites on AI visibility, crawler accessibility, and protocol compliance. Your server logs show which AI user-agents are visiting. If you’re not seeing CCBot, GPTBot, or ClaudeBot, find out why.

The Web Is Being Rebuilt. Quietly.

What I’m describing isn’t a distant future. It’s happening now, mostly invisibly. Every major AI lab is building agent capabilities. Every major identity vendor is developing agent-specific IAM. Standards bodies are actively drafting protocols for agent authentication, authorization, and payment.

The shift from search engine optimization to AI optimization is directionally right as a framing, but it undersells the magnitude. SEO was about being found. Agent optimization is about being found and being usable by non-human actors who represent real human intent.

The web was built for human browsers, then retrofitted for search engine crawlers. Now it’s being rebuilt again, this time for AI agents that act as legitimate proxies for human users.

The organizations that recognize this shift and prepare for it will capture a new channel of demand. Those that don’t will watch that demand flow to competitors who were paying attention.

Your next customer might arrive via an AI agent. The question is whether you’ll recognize them as a customer, or lock them out as a bot.

AI Agent ID

Dazza Greenwood — Wed, 05 Nov 2025 02:15:47 GMT

Why Identity Management for AI Agents Can’t Wait: Introducing Our New OpenID Foundation Whitepaper

If you’re investing in, building, or deploying AI agents, there’s a foundational problem you need to understand: identity, authentication, and authorization for autonomous agents is fundamentally different from traditional software, and many current implementations are getting it wrong.

Today, I’m excited to share a comprehensive whitepaper I co-authored for the OpenID Foundation: “Identity Management for Agentic AI: The New Frontier of Authorization, Authentication, and Security for an AI Agent World.”

Why This Matters Now

As AI agents rapidly move from proof-of-concept to pilot and now to production, they’re creating urgent security and accountability challenges:

User impersonation is masking accountability. Most agents today act indistinguishably from their users, creating dangerous gaps in audit trails and accountability when things go wrong.
Consent fatigue is inevitable. As agents proliferate, users will face thousands of authorization requests, leading to reflexive approval and security risks.
Recursive delegation is uncharted territory. When agents spawn sub-agents or delegate tasks across organizational boundaries, we lack clear mechanisms for scope attenuation and attributable transitive trust.
Cross-domain operations break current models. OAuth 2.1 works well within anchored trust domains, but agents operating more fluidly across organizational boundaries need something more robust.

What’s Already Working (and What Isn’t)

The good news: we’re not starting from scratch. Current OAuth 2.1 frameworks, when properly implemented with protocols like MCP (Model Context Protocol), provide a starting point for enterprise agents accessing internal tools within a single trust domain.

The challenge: this only solves the simplest use cases. The moment agents need greater autonomy, asynchronous execution, or cross-domain delegation, existing patterns reveal significant gaps. We identify several issues, options, and future opportunities in the whitepaper that I hope will provide a sound approach supporting everyone seeking to span that gap!

A Huge Thanks to the Team

I want to especially thank Tobin South for his incredible, energetic leadership as the primary author and editor who wrangled this entire effort together. His vision and persistence made this comprehensive work possible. I’m also thrilled that the Stanford & Consumer Reports Loyal Agents Initiative (where both Tobin and I are active) was able to collaborate on this project. This cross-institutional collaboration reflects the urgency and importance of getting agent identity right, especially for ensuring AI agents are safe and effective for consumers to use and rely upon, particularly when conducting e-commerce transactions and making binding commitments on behalf of users.

What’s in the Paper

The whitepaper provides both immediate, practical guidance and a strategic roadmap:

Section 2 outlines current best practices using existing standards (OAuth 2.1, SCIM, SSO, CIBA) for today’s agent implementations
Section 3 tackles future challenges: delegated authority models, recursive delegation, scope attenuation, scalable consent mechanisms, and the economic layer (payments and financial transactions)
Real-world use cases demonstrating where traditional IAM fails and what’s needed for high-velocity, asynchronous, and cross-domain agent operations

What’s Coming Next

This whitepaper is just the beginning of a deeper exploration I’ll be sharing:

Agent Protocols: I’ve already started with my recent post on Agent Payments Protocol (AP2) last month, with more protocol deep-dives to follow.

Legal Dimensions: Building on my previous work on AI agents conducting transactions, UETA and LLM agents, and recent agent legal frameworks, I’ll be diving deeper into the legal infrastructure needed for increasingly autonomous agent transactions.

Evals for AI Agents: Following up on my initial exploration beyond AI benchmarks, I’ll be sharing frameworks for properly evaluating agent capabilities, safety, and reliability.

High-Value Use Cases: Identifying and unpacking the specific scenarios where proper identity capabilities unlock significant new value and reduces risk.

Agents Accelerating Research and Science: Exploring how properly governed agents can transform scientific discovery and research methodologies to spur innovation.

Looking Forward with Clear Eyes

I’m genuinely optimistic about the transformative potential of AI agents to augment human capabilities, empower consumers, and create new forms of value. The technical foundations exist, brilliant people across industry and academia are collaborating, and momentum is building toward interoperable standards.

But let’s be clear: many hard challenges remain. We need to move from impersonation to true delegation, build scalable governance mechanisms that respect user autonomy, create robust cross-domain trust fabrics, and ensure agents serve their users’ interests loyally. The work of building safe, trustworthy, and effective agent systems is just beginning.

For those investing in AI agents: ignoring these identity and authorization challenges doesn’t make them go away, it just means you’ll hit them unexpectedly in production. This whitepaper aims to be your starting point for understanding what’s required and building responsibly from the ground up.

Read the full paper: Identity Management for Agentic AI

Let’s build the future of autonomous agents together, securely, responsibly, accountably, and successfully!

Agent Payments Protocol (AP2)

Dazza Greenwood — Wed, 17 Sep 2025 14:59:20 GMT

Overview: AP2 as a Foundational Protocol for Trusted AI Commerce

Yesterday, Google announced the Agent Payments Protocol (AP2), a new, open standard designed to solve the fundamental question of trust in AI-driven payments in commerce. Today’s payment systems assume a human is clicking "buy." AP2 creates the framework for a world where autonomous AI agents can securely and verifiably transact on behalf of users and businesses.

It achieves this by introducing a system of Verifiable Credentials called "Mandates," which serve as cryptographically signed, auditable proof of authority and intent for every transaction. AP2 is not a new payment network; it is a data protocol that layers on top of the Agent2Agent (A2A) protocol, ensuring it can work with any payment method—from credit cards to real-time bank transfers. I previously wrote about A2A here in the" Agents Talking to Agents (A2A): Reshaping the Marketplace and Your Power" section.

Deep Dive: The Intent Mandate - The "Digital Power of Attorney"

The Intent Mandate is the most critical innovation for business and legal purposes. It is the core instrument of delegation for any transaction where the user is not present to give final approval (a "Human-Not-Present" scenario).

What it is: A legally and technically significant "delegation contract" that a user signs to grant an AI agent specific, constrained purchasing authority. It formally translates a user's goal (e.g., "Buy this item if the price drops below $100") into a set of enforceable rules.
Legal Significance: It serves as non-repudiable proof that the user authorized the agent's action, providing a powerful evidentiary anchor for assigning liability and resolving disputes. It answers the question: "Who told the agent to do that?"
Business Significance: It unlocks automated and conditional commerce. Businesses can empower agents to execute procurement strategies, manage subscriptions, or react to market opportunities autonomously, all while operating within pre-approved boundaries.

Deep Dive: The Other Mandates - The "Evidentiary Chain"

Two other mandates complete the transaction's auditable trail:

The Cart Mandate: This is the "notarized purchase order" for Human-Present transactions. The merchant generates it to lock in the final terms (items, price, shipping), and the user signs it on a trusted device surface. It provides definitive proof of what was agreed upon at the moment of purchase.
The Payment Mandate: This is the "transaction manifest" sent to the payment network (e.g., Visa, Mastercard). Its primary purpose is to signal that an AI agent was involved and whether a human was present. This allows issuers and networks to apply appropriate risk models and provides critical data for the financial ecosystem.

Examples and Use Cases for Consumers and Businesses

AP2 creates powerful new capabilities for both B2C and B2B commerce by providing a secure framework for delegation.

Consumer Use Cases: Convenience and Automation with Guardrails

Deal Hunting
A user wants to buy a specific gaming console but only if it drops below $400 before the holidays.
The user signs an Intent Mandate with the SKU, a price ceiling of $400, and an expiry date. The agent monitors prices and executes the purchase automatically when the condition is met.

Time-Sensitive Purchases
A user wants to buy tickets for a popular concert the moment they go on sale.
The user signs an Intent Mandate specifying the event, a seating preference (e.g., "front section"), and a maximum budget. The agent is pre-authorized to act instantly.

Complex Travel Planning
A user asks their agent: "Book me a round-trip flight and a 4-star hotel in London for the first week of December, total budget $1500."
The agent holds a signed Intent Mandate. It interacts with airline and hotel agents simultaneously. Once it finds a combination that fits the budget and criteria, it can execute both bookings.

Subscription Management
"Renew my streaming subscription, but only if the price doesn't increase by more than 10%."
An Intent Mandate governs the renewal. The agent verifies the price each cycle and either proceeds or pauses for user instruction if the price hike exceeds the limit.

On-the-Go Purchases
While driving, a user tells their voice assistant to order and pay for coffee from a nearby shop.
This is a Human-Present flow. The coffee shop's agent returns a Cart Mandate. The user provides a quick biometric approval on their phone or car's infotainment screen, signing the Cart Mandate to complete the payment.

Business Use Cases: Auditable Automation and Control

AP2 is transformative for B2B transactions, providing the auditable trail necessary for corporate governance and financial controls.

Automated Procurement
A procurement manager authorizes an agent to re-order lab supplies from approved vendors whenever inventory drops below a threshold, provided the price per unit has not increased by more than 5% since the last order.
The manager signs an Intent Mandate that is cryptographically linked to their corporate identity. The mandate specifies the SKUs, the approved vendor list, and the 5% price variance rule. Every purchase is auditable and tied back to this specific, standing authorization.

Contractor & Field Operations
A construction firm authorizes a site foreman's agent to purchase up to $5,000 in materials from Home Depot or Lowe's for a specific project.
The project manager issues a time-bound Intent Mandate linked to the foreman's identity and the project's budget code. The mandate limits the merchant category and total spend. The trail proves the expense was authorized for that project, streamlining reconciliation.

Dynamic Cloud Resource Scaling
An IT department authorizes an agent to scale cloud computing resources based on real-time demand, with a hard budget cap of $10,000/month.
The CIO signs an Intent Mandate allowing the agent to interact with the cloud provider's agent. The mandate contains the budget cap and service-level rules. This prevents runaway costs while enabling automation.

Travel & Expense Management
An employee uses their corporate travel agent to book a trip. The company's policy (e.g., "economy class only, hotel under $300/night") is encoded into the agent's instructions.
The employee's request generates an Intent Mandate that also reflects corporate policy constraints. The auditable trail shows the booking was compliant, simplifying expense reporting. The employee's identity is tied to the authorization.

Structuring the Corresponding Legal Framework: The Letter of Authorization

It stands to reason that the technical IntentMandate must be backed by a formal legal agreement, a Letter of Authorization (LoA) of some kind, between the User (or User Organization) and the AI Agent Provider, unless the user is operating the AI Agent infrastructure themself. This agreement defines the legal rights and responsibilities of each party. Below are three potential models for structuring this relationship.

I am focused primarily on option 1 below as a conceptual approach to such authorization, and also actively developing other options given this early stage of implementation.

OPTION 1: The Principal-Agent Model (User as Authorizer, Provider as Enforcer)

This model establishes a classic principal-agent relationship where the user provides explicit instructions and the provider must execute them faithfully.

User Responsibilities: The user is the source of authority and is responsible for clearly articulating their intent. Their primary responsibilities include:
- Delegating Authority: The user initiates the entire process by appointing the provider to operate the agent on their behalf, often memorialized through an agreement like a DocuSign.
- Defining Authorization (The "What"): The user must specify exactly what the agent is allowed to do. This includes defining the scope (check_balance), the target resource (account GH-1234), and any constraints (data_minimization, purpose_binding).
- Defining Autonomy (The "How"): The user sets the rules for how the agent carries out its tasks, such as when it can act silently ("auto-ok") versus when it must get explicit, real-time confirmation ("always-ask").
- Assuming Consequences: The user is ultimately responsible for the consequences of the agent's properly authorized actions.
AI Agent Provider Responsibilities 🤖: The AI Agent Provider is responsible for the technical and operational infrastructure that brings the user's instructions to life safely and reliably. Their key responsibilities are:
- Operating Secure Infrastructure: The provider must maintain the underlying service, network, and security controls to run the agent reliably.
- Enforcing User Grants: The provider's core duty is to honor and strictly enforce the authorization and autonomy rules defined by the user. The agent must not exceed its granted authority.
- Managing Authentication & Credentials: The provider is responsible for presenting the correct credentials (e.g., short-lived, purpose-bound tokens) to third parties like the bank.
- Enforcing Revocation: When a user revokes permission, the provider must ensure that access is terminated promptly, meeting the stated Service-Level Objective (SLO) of ≤60 seconds.
- Providing Evidence: The provider must generate and deliver auditable proof of the agent's actions, such as signed receipts, to create a clear evidence trail for all parties.
- Upholding a Duty of Care: A central point of the exercise is to determine the nature of the provider's duty—whether they are simply a neutral "tool provider" or hold a higher, fiduciary-like "duty of loyalty" to act in the user's best interest and avoid conflicts.

OPTION 2: The Managed Platform Model (Template-Based Delegation)

This model positions the AI Agent Provider as a platform offering pre-defined, vetted "skills" or "playbooks." The user's role is to configure and authorize these templates rather than defining instructions from scratch. This is analogous to using a marketplace of trusted apps with pre-set permissions.

User Responsibilities:
- Selecting and Configuring Templates: The user browses a library of pre-built "Mandate Templates" (e.g., "Auto-Book Travel," "Monitor and Buy Stock") and configures key parameters (e.g., budget, dates, vendors).
- Authorizing the Configured Template: The user signs the finalized template, which becomes the active Intent Mandate.
- Monitoring and Revoking: The user is responsible for monitoring the agent's actions against the template's goals and revoking authorization if needed.
AI Agent Provider Responsibilities 🤖:
- Curating a Safe and Secure Template Library: The provider is responsible for the safety, security, and clarity of the templates it offers. This includes vetting them for common exploits or ambiguous language.
- Strict Parameter Enforcement: The provider must ensure the agent operates strictly within the user-configured parameters of the chosen template.
- Transparency and Disclosure: The provider must clearly disclose the capabilities and limitations of each template.
- Liability for Template Flaws: The provider may assume a greater share of liability if a loss occurs due to a flaw or vulnerability in the template itself, rather than user error in configuration.

OPTION 3: The Certified Fiduciary Model (Role-Based Trust & Duty of Care)

This model envisions an ecosystem where AI agents can be independently certified for specific, high-stakes roles (e.g., "Certified Corporate Procurement Agent," "Certified Financial Advisor Agent"). The legal framework is tied to the agent's certified capabilities and implies a higher standard of care.

User/User Organization Responsibilities:
- Due Diligence in Agent Selection: The user is responsible for selecting an agent with the appropriate certification for the task at hand. Using a non-certified agent for a high-stakes financial task would place more liability on the user.
- Providing Clear Objectives: The user must still provide the high-level goals and constraints for the Intent Mandate.
- Cooperation in Audits: The user must cooperate in providing information if a certified agent's actions are audited.
AI Agent Provider Responsibilities 🤖:
- Achieving and Maintaining Certification: The provider must meet the rigorous technical, security, and ethical standards required by a third-party certifying body.
- Upholding a Fiduciary Duty: For certified financial roles, the agent must legally and technically operate under a fiduciary duty, meaning it must act in the user's absolute best financial interest, avoiding conflicts of interest (e.g., it cannot favor a merchant who pays a higher commission).
- Proactive Risk Mitigation: A certified agent is expected to go beyond simple instruction-following and proactively identify and flag potential risks to the user (e.g., "Warning: This purchase is non-refundable and the merchant has a poor rating. Do you still wish to proceed?").
- Submitting to Audits: The provider must agree to be audited by the certifying body to ensure continued compliance.

I’m working on some other potential options as well, but nothing quite ready to share yet. And as always, if you have other ideas about how this could play out, I’m all ears!

Remaining Work and Strategic Next Steps

AP2 provides the technical foundation, but significant work remains to build the business and legal ecosystems around it.

For Businesses and Consumers (as Users):

Develop Internal Governance and Delegation Policies: Businesses must create clear policies defining who can authorize agents, for what purposes, and under what financial limits. This includes establishing evaluations for adherence to adopted practices and policies.
Integrate with Procurement and ERP Systems: The true power of B2B automation will be realized when agents can read from and write to existing systems of record, like SAP or Oracle, governed by AP2 mandates.
User Education and Training: Both consumers and employees will need to be educated on how to safely and effectively delegate authority to AI agents, including how to craft clear, unambiguous intents.

For AI Agent Providers:

Build User-Friendly Mandate Creation Tools: The process of creating and signing an Intent Mandate must be simple, transparent, and secure. This is a critical UX/UI challenge.
Develop Legal Frameworks and LoAs: Providers must work with their legal teams to develop the "Letter of Authorization" agreements based on one of the models above, clearly defining responsibilities and liabilities.
Engage with the Ecosystem on Certification: For the Fiduciary Model to work, providers should begin conversations with industry bodies and regulators to define what "certification" means for different agent roles. Evals and benchmarks developed by users could be a strategic basis for some such certifications or trust marks.

For the AP2 Standard and the Intent Mandate:

Evolve the Intent Mandate Schema: The current v0.1 schema is designed for common commerce. Future versions will need to support more complex business logic, such as:
- Conditional Logic: "Buy item A only if item B is also available."
- Multi-Party Approvals: Requiring signatures from multiple individuals (e.g., a manager and finance) for high-value corporate purchases.
- Richer Constraint Language: Moving beyond simple price ceilings to more complex rules (e.g., "quality benchmarks," "ratings and rankings," "total cost of ownership," "vendor performance scores," etc.).
Formalize the Cryptographic Profile: As discussed, a formal specification for the signature and verification process is the top technical priority for moving from alpha to a production-ready standard.

AP2 addresses a fundamental challenge that will only grow more pressing as AI agents become routine participants in commerce: establishing verifiable authority and accountability for autonomous transactions. While still in early stages, the protocol provides a practical framework for businesses and developers to begin experimenting with trusted agent delegation. The business, legal and technical foundations outlined here represent necessary infrastructure for scaling AI commerce effectively and responsibly. In future posts, I'll be sharing working examples and implementation patterns for those interested in testing these concepts in practice. For organizations considering how agent-mediated transactions might fit their operations, now is an appropriate time to begin exploring the possibilities.

Reach out to me directly here if you’d like to discuss opportunities to work together on these and related opportunities.

Beyond AI Benchmarks

Dazza Greenwood — Fri, 05 Sep 2025 07:47:39 GMT

Every board meeting about AI eventually seems to arrive at the same uncomfortable moment. After the presentations about efficiency gains and innovation potential, after the breathless vendor demos and the carefully rehearsed use cases, someone asks the question that stops everything cold: “But how do we know it actually works for us? For our specific needs, our standards, our risks?”

The silence that follows is expensive. Benchmarks prove competence in the abstract; your risks live in the specifics. Edge cases, specialized terminology, and unique constraints that define your work rarely appear in anyone else’s test suite. The gap between benchmark scores and your reality isn’t a few percentage points, it’s damaged client trust, regulatory scrutiny, and sleepless nights for the executives who signed off on the deployment.

This gap between promise and performance isn’t a technical glitch. It’s a governance challenge. And it reveals something profound about how we’ve been thinking about AI leadership entirely wrong.

The Blindspot in Every AI Playbook

Pick up any executive guide to AI transformation, recent executive AI guides from IBM, McKinsey, and the Big Four consultancies, and you’ll find sophisticated frameworks for governance, detailed roadmaps for implementation, and compelling visions of AI-powered futures. These books and reports get 90% of the story right. They correctly identify that leaders must move from being passive consumers of AI to active creators of AI value. They emphasize governance, skills development, and strategic alignment.

But they systematically omit the single most important mechanism for achieving these goals: how leaders translate their deep domain expertise, their understanding of what quality means in their specific context, into measurable, enforceable standards for AI systems.

This isn’t a minor oversight. It’s the difference between governance theater and actual control. Between hoping your AI behaves and knowing it will perform.

The authors of these guides aren’t ignorant. These guides tend to focus on high-level strategy and often treat evaluation as a technical implementation detail. But this reveals a fundamental misunderstanding of what evaluation actually is. It’s not quality assurance. It’s not testing. It’s the very act of encoding what your organization values into a form that can be measured, managed, and improved.

When a law firm defines what constitutes a properly researched legal memo, when an insurance company articulates what empathetic claim handling looks like, when a bank specifies acceptable risk thresholds, these aren’t technical specifications. They’re strategic decisions that define competitive advantage. And in the AI era, these decisions must be translated into what I call “evaluation-as-policy.”

The Non-Delegable Duty of Defining “Good”

Here’s what the playbooks miss: in an AI-transformed enterprise, defining what constitutes acceptable performance isn’t something leaders can delegate to their technical teams. It’s not something they can outsource to vendors. It’s a fundamental leadership responsibility as non-negotiable as setting strategy or managing risk.

Think about how you currently ensure quality in human work. You don’t just hire smart people and hope for the best. You provide clear expectations. You review work products. You give specific feedback. You know what good looks like because you’ve spent years developing that expertise.

The same expertise that allows you to recognize a well-crafted legal argument, a compelling marketing campaign, or a thorough risk assessment is exactly what’s needed to create meaningful AI evaluations. The only difference is that instead of reviewing work after the fact, you’re encoding your standards upfront in a form that can be systematically applied.

This is where the concept of “golden data” becomes critical. Golden data isn’t just training data or test data. It’s the carefully curated collection of examples that embody your organization’s definition of excellence. Each example is a concrete instantiation of your standards, your values, your risk tolerance.

Creating golden data isn’t a technical task, it’s a leadership function. When your general counsel reviews AI-generated legal summaries and annotates what’s acceptable and what’s not, she’s not doing QA. She’s encoding the firm’s legal standards into a strategic asset. When your head of customer service identifies model responses that perfectly capture your brand voice, he’s not just providing feedback. He’s building competitive advantage.

From Abstract Principles to Executable Standards

The challenge, of course, is that most leaders don’t know how to bridge the gap between their expertise and the technical requirements of AI evaluation. They can articulate what they want—“accurate legal citations,” “empathetic customer responses,” “comprehensive risk assessments”—but they don’t know how to make these concepts measurable and enforceable.

This is the murky void that exists in most organizations today. Everyone agrees that evaluation is important. Few understand how to actually do it. Even fewer realize that the solution doesn’t require technical expertise, it requires clear thinking about what matters to your business.

Let me make this concrete. Evaluation, at its core, follows a simple three-column pattern: input (what goes into the AI), output (what the AI produces), and expected output (what you wanted it to produce). This isn’t complicated. It’s exactly how you’d evaluate human work, just structured more systematically.

The power comes from how you assess the relationship between your system's actual output and the expected output. Sometimes you need exact matches—a legal citation must be precisely correct. Sometimes you need fuzzy matching—a customer service response should cover the right points even if the wording varies. And sometimes you need nuanced judgment—does this financial advice demonstrate appropriate fiduciary duty?

This is where the concept of LLM-as-a-Judge becomes transformative. Instead of trying to codify every possible variant of acceptable output, you can articulate your standards in natural language—the same way you’d instruct a human employee—and use a language model to assess whether outputs meet those standards.

If you can write a memo explaining what makes a good quarterly report, you can create evaluation criteria for AI-generated reports. If you can train a junior attorney on proper legal research, you can define standards for AI legal research. The skill you need isn’t programming. It’s the ability to articulate what you already know.

The Strategic Asset Nobody’s Talking About

Here’s what should keep executives up at night: while you’re treating evaluation as a technical afterthought, your competitors might be building it as a strategic asset. Because your evaluation criteria and golden datasets aren’t just test files. They’re the usable codification of your organizational knowledge, competitive insights, and strategic priorities.

Consider what goes into a sophisticated evaluation suite for a law firm’s AI systems. It contains examples of how to spot obscure jurisdictional issues that only experienced partners would catch. It embodies the firm’s approach to risk assessment that differentiates it from competitors. It captures the nuanced judgment calls that define the firm’s reputation.

This isn’t a generic capability that any firm could replicate. It’s proprietary intellectual property as valuable as any other strategic asset. Some evaluations—basic accuracy, general fairness—can and should be shared across industries. But your core evaluations, the ones that capture what makes your organization unique, are trade secrets.

The organizations that recognize this are doing something radical: they’re treating evaluation development as a C-suite responsibility. They’re running cross-functional workshops where legal, risk, product, and customer service leaders collaborate to define golden datasets. They’re version-controlling these assets like critical code. They’re measuring and reporting on evaluation coverage like any other strategic metric.

Making It Real: From Theory to Practice

At this point, you might be thinking, “This sounds important but impossibly complex.” Let me show you how wrong that assumption is. You can start meaningfully evaluating your AI systems this week with just a spreadsheet and clear thinking.

To see this principle in action, you can try it yourself in under two minutes using our open-source platform, Lake Merritt. Follow the first exercise in the Quick Start guide, a “60-Second Sanity Check.” You’ll simply create a spreadsheet with three columns: the input (the question you ask the AI), the output (the AI's actual response), and the expected_output (your definition of a perfect answer). When you run the evaluation, you’ll see how an “LLM-as-a-Judge” programmatically assesses the quality of the actual output against your ideal expected_output. Fiddle with it, change the content in the expected_output column and see how it impacts the evaluation scores. This simple, hands-on exercise will give you the concrete intuition needed to apply this process to your own business context.

Begin with what I call a “10-row quick start.” Take ten representative cases from a real use case in your business. For each input, develop your own idea of what outputs you expect and why, and then have domain experts define their ideal outputs. Settle on an initial set of expected outputs. This is your initial golden dataset. Now run your AI system against these inputs and compare its outputs to your golden standard.

The results will be immediately illuminating. You’ll see patterns in where the AI struggles. You’ll identify edge cases you hadn’t considered. Most importantly, you’ll begin developing intuition for what kinds of standards are easy to meet and which require more sophistication.

As you develop confidence, you can scale this approach. The ten rows become a hundred, then a thousand. The simple comparisons evolve into sophisticated rubrics. The ad-hoc checks become systematic “evaluation packs”, version-controlled, repeatable test suites that can be run automatically before any AI system updates are deployed.

There’s an even more powerful approach that allows your leadership to encode their expertise more rapidly: learning from reality. This method allows your executives to shift from being authors to being editors, which is often a more efficient use of their time. Instead of trying to define perfect outputs upfront, have your key leaders and their most trusted senior experts (the same people who define your strategy) annotate actual AI outputs. They can mark what’s good, what’s problematic, and what’s unacceptable. These leadership-validated annotations then become core foundations for your evaluation system, ensuring it recognizes quality the same way you would.

To make this concrete: for a legal summary AI system, instead of asking your general counsel to write ten perfect legal summaries from scratch, you can present her with ten AI-generated summaries and have her annotate them, correcting a citation here, flagging a risk there. Those annotations, born from senior-level judgment, become the executable standards for your evaluation system. This creates a virtuous cycle where your top experts continually refine the AI's alignment with your organization's most critical standards.

This creates a virtuous cycle. Your AI systems generate outputs. Your experts review and annotate them. These annotations become evaluation criteria. The evaluations drive improvements. The improved systems generate better outputs. And the cycle continues, with each iteration encoding more of your organization’s expertise into measurable, manageable form.

The Agent Revolution Changes Everything

So far, I’ve focused on evaluating AI outputs, the text, analysis, or recommendations that AI systems produce. But the next generation of AI isn’t just generating content. It’s taking action. AI agents are making decisions, using tools, following processes, and interacting with other systems in complex workflows.

This fundamentally changes what evaluation means. It’s no longer sufficient to check if the final answer is correct. You need to evaluate the entire process. Did the agent use the right tools? Did it follow required procedures? Did it respect security boundaries? Did it escalate appropriately when uncertain?

Consider a legal research agent. The quality of its final memo matters, but so does its process. Did it search the right databases? Did it prioritize binding precedent appropriately? Did it verify that cited cases haven’t been overturned? These behavioral evaluations require a different approach, one that captures and analyzes the full trajectory of the agent’s actions.

This is where technical concepts like OpenTelemetry traces become essential. But don’t let the jargon intimidate you. A trace is simply a record of everything the agent did, every tool it called, every decision it made, every piece of data it accessed. Evaluating these traces means you can ensure not just that the agent reached the right conclusion, but that it got there the right way.

The implications are profound. In traditional software, you could separate business logic from implementation details. In agentic AI, the process IS the product. The way an agent conducts legal research, handles customer complaints, or analyzes risk isn’t just a means to an end—it’s a direct expression of your organizational values and standards.

Proof That This Works

These aren’t theoretical frameworks or academic exercises. Organizations are using these approaches today to solve real problems and prevent real failures.

Consider a challenge at the heart of AI governance: ensuring systems behave fairly and align with your company’s values. This isn't just a legal or regulatory checkbox; it's fundamental to brand safety, customer trust, and strategic alignment. A powerful example is the BBQ (Bias Benchmark for QA), a rigorous academic framework for detecting demographic bias. Using a tool like Lake Merritt, this top-tier public benchmark can be implemented as a reusable "evaluation pack" to systematically test your systems. To underscore its industry significance, BBQ was the sole fairness and bias benchmark OpenAI chose to use in its safety testing for GPT-5. This shows how you can move beyond theory to not just flag problems, but quantify them, track them over time, and ensure that fixes actually work.

This same approach of codifying standards applies to any area where deep, nuanced domain expertise is your competitive advantage. Rather than rely on generic public benchmarks like BBQ, however, the task is to develop your own measures that support and reflect your organization's priorities and imperatives. For instance, a financial services firm can move beyond generic compliance to evaluate its unique interpretation of "fiduciary duty." Such an evaluation might progress from basic, deterministic checks—like verifying required disclosures are present—to sophisticated, judgment-based assessments of whether advice truly serves a client’s best interests in a nuanced scenario.

Crucially, these evaluations work because they are built by the domain experts who own the outcome, not by technicians. In the financial services scenario, this means the legal team defines disclosure, compliance specifies risk scenarios, and customer advocates articulate what "client’s best interests" means in practice. But the principle is universal: for a marketing AI, the brand team would define what is "on-brand"; for a medical AI, clinicians would define a "safe diagnostic summary." The technical team's role is to simply implement these expert-defined standards into a systematic, repeatable process.

The Ecosystem of Evaluation

To demonstrate that these concepts aren’t just theory, I’ve built Lake Merritt, an open-source evaluation workbench that embodies these principles. I use Lake Merritt every day to evaluate my own AI apps and services, and have also utilized it effectively as part of Civics.com's professional consulting services, ensuring that my clients' AI products operate as expected. But let me be clear: Lake Merritt isn’t the point. The methodology is the point. Lake Merritt simply proves that the methodology works.

The platform does several things that matter. It provides a web interface simple enough that a lawyer or product manager can use it without training. It supports what I call the “Hold My Beer” workflow—where you can go from a vague idea about quality to a working evaluation in minutes. It treats evaluations as code, making them versionable, shareable, and systematic. It can evaluate not just outputs but entire agent workflows through OpenTelemetry trace analysis.

While I launched Lake Merritt this week because I think it’s valuable to have an easy to use evals tool that non-technical people can get started with, this software is just one option in a rich ecosystem of evaluation tools. Arize Phoenix provides powerful observability and monitoring capabilities. Galileo offers sophisticated analytics and agent debugging tools. Open-source projects like DeepEvals and OpenAI Evals provide flexible frameworks for custom evaluations. LangWatch excels at specific use cases. Each serves different needs at different scales.

In the legal domain specifically, pioneers are emerging. Vals has published groundbreaking reports on legal AI evaluation. ScoreCard is working to standardize agent evaluations for legal use cases. Individuals like Ryan McDonough who is a true global thought leader on AI and evals in law at KPMG, and newer voices like Anna Guo and her collaborators in Singapore, are openly sharing their learnings and pushing the field forward. There are many, many others making starting to make strides.

This diversity is healthy and necessary. No single tool or approach will serve every need. What matters is that organizations develop the capability—through whatever tools make sense for them—to systematically evaluate their AI systems against their specific standards.

We’re in the advanced planning stage now of bringing this community together at an evaluation summit jointly hosted by Stanford and MIT. The goal isn’t to crown winning tools or approaches. It’s to share learnings, establish best practices, and accelerate the entire field’s development. To stay informed about that event or if you have constructive and relevant work in the custom evaluations arena, please reach out here.

Your Path Forward

If you’ve read this far, you’re probably convinced that custom evaluation matters. The question is what to do about it. Let me give you a practical path forward that you can start this week.

First, identify your highest-risk AI use case. This is where evaluation matters most and where you’ll get immediate value from better oversight. Don’t try to boil the ocean. Pick one critical application and focus there.

Second, convene your domain experts. Bring together the people who truly understand what quality means for this use case. This isn’t a technical meeting, it’s a business meeting. The question on the table is simple: “What does good look like?”

Third, create your first golden dataset. Start small, even ten examples are enough to begin. For each example, capture the input and the ideal output. Have your experts explain why each output is ideal. These explanations become the seeds of your evaluation criteria.

Fourth, test your current AI system against this golden dataset. Don’t expect perfection. Expect illumination. You’ll immediately see patterns in where your system struggles and where it excels.

Fifth, iterate and expand. Add more examples. Refine your criteria. Develop more sophisticated evaluations. Move from manual checks to automated gates. Build evaluation into your deployment pipeline so that no AI update goes live without passing your standards.

This isn’t a technical project. It’s a governance initiative. It’s how you exercise real control over AI systems that are increasingly critical to your operations. It’s how you ensure that AI serves your strategic objectives rather than undermining them.

The Executive Imperative

We’re at an inflection point in how organizations create value with AI. The experimental phase is ending. The operational phase is beginning. And in this operational phase, the organizations that thrive won’t be those with the most sophisticated models or the largest datasets. They’ll be those that can most effectively translate their human expertise into AI capabilities.

This translation happens through evaluation. Not generic benchmarks or vendor-supplied metrics, but custom evaluations that embody your specific standards, values, and priorities. These evaluations aren’t a tax on innovation, they’re an accelerator for it. They allow you to move fast because you can move with confidence. They allow you to delegate to AI because you can verify performance. They allow you to differentiate because you can systematically improve what matters most to your business.

The choice facing every executive is stark. You can continue treating AI evaluation as a technical detail, hoping that your vendors and technical teams somehow divine what quality means for your organization. Or you can recognize that in the AI era, evaluation is the executive function, the mechanism through which leadership expertise shapes organizational outcomes.

Your AI strategy without custom evaluation isn’t a strategy. It’s expensive hope. And in a world where AI increasingly mediates critical business functions, hope is not a plan.

The boards that are asking “How do we know it works for us?” aren’t being paranoid. They’re being prescient. They understand that AI governance without custom evaluation is like financial governance without custom accounting standards, theoretically possible but practically meaningless.

The good news is that building evaluation capability doesn’t require massive investment or technical transformation. It requires clarity about what matters to your business and the discipline to measure it systematically. If you can articulate expectations to humans, you can create evaluations for AI. If you can recognize quality when you see it, you can encode that recognition into systematic assessment. Literally, that recognition just needs to be articulated in language in order to be usable as criteria in programmatic evals.

In the AI era, this isn’t optional. It’s existential. The organizations that master evaluation will shape AI to serve their purposes. Those that don’t will find themselves shaped by AI systems they don’t sufficiently control.

The question isn’t whether you’ll develop custom evaluation capabilities. It’s whether you’ll develop them before or after they become urgently necessary. Before or after your first AI crisis. Before or after your competitors use superior evaluation to deliver superior AI-powered services.

The time to start is now. Not because the technology demands it, but because leadership demands it. Because in a world where AI increasingly mediates how organizations create value, the ability to define and measure what “good” looks like isn’t just a technical capability.

It’s the executive function itself.

From Fine Print to Machine Code: How AI Agents are Rewriting the Rules of Engagement: Part 1 of 3

January 14, 2025

Part 1 of 3

by Diana Stern and Dazza Greenwood, Codex Affiliate

Picture this: you’ve just developed a sleek new AI shopping assistant. It’s ready to scour the internet for the best deals, compare prices faster than you can say “discount,” and make purchases quicker than you can reach for your wallet. But wait, there’s a catch. How do you ensure this digital dealmaker doesn’t make mistakes that could bind you or your customer to a bad deal, create liability under privacy laws, or violate terms of service that it (and, let’s face it, probably you) never actually read?

This three-part series will identify U.S. legal issues raised by this type of AI agent and how to address them. In this post, we’ll start by level setting on AI agent terminology. Next, we’ll dispel the misnomer that liability can be pushed to the AI agents themselves and explain why the company offering services like this AI shopping assistant to customers could be left holding the bag o’ risks. Finally, we’ll touch on how software companies can helpfully leverage principal agent law to manage this risk.

What is a Transactional Agent?

AI agents are an umbrella category of AI systems that execute tasks on behalf of users. In addition to your AI shopping bot that purchases goods online, think of virtual assistants that book flights or event tickets and meeting schedulers that reserve tables at restaurants. There are a variety of AI agents with diverse capabilities.

This series focuses on what we’ll call “Transactional Agents”: AI agent systems that conduct transactions involving monetary or contractual commitments. These systems leverage large language models (LLMs) to move beyond basic query-response interactions. What makes them special is their ability to perform dynamic, multi-step reasoning and take action without human review or approval. Imagine your shopping bot doesn’t just find products but compares prices across retailers, checks reviews, confirms availability and makes purchases – all while sticking to your customer’s specified budget and preferences. Transactional Agents achieve this through key capabilities like:

Tool use: Accessing external services like payment processors or APIs
Memory management: Retaining context and user preferences across interactions
Iterative refinement: Learning from past decisions to improve future outcomes

Their ability to make binding commitments, including payments, differentiates Transactional Agents from simple chatbots and other types of AI agents. These systems can spend real money or enter into contracts on one’s behalf. Let’s say your company provides an AI shopping bot consumer app powered by a third-party LLM. On the surface, this seems like it could be a straightforward SaaS offering, but it has hidden challenges and risks related to security, authorization, and trust. How do you ensure the app follows your customers’ requests? How do you prevent errors? Misuse? These are some of the challenges we’ll explore in this series.

Your Transactional Agent Is Not A Legal Agent, But You Might Be

Your Transactional Agent cannot be held liable nor enter agreements itself because it’s not a legal entity – it’s software! So, how are they able to buy the perfect pair of Jimmy Choo’s for your customer right when they go on sale? Under the Uniform Electronic Transactions Act, which we will discuss further in a future post, it is well-settled that Transactional Agents can form contracts on behalf of their users, but principal-agent law may also be operating in the background.

If you’ve bought a house, a real estate agent may have acted on your behalf to buy the property, negotiate prices, and handle paperwork. Not all principal-agent relationships are made through an express agreement like in real estate. They can also be implied, like a whiskey bar manager who is in charge of curating the menu and decides to enter into agreements on the bar’s behalf to buy mocktail supplies in January. In addition, a principal-agent relationship can be based on “apparent authority”, when a third party reasonably believes an agent has the authority to act on the principal’s behalf. For example, when the bar manager tells a non-alcoholic spirit distributor, she is authorized to enter into agreements for new products on the bar’s behalf.

Under state common law (law primarily developed through court cases), a common law agent has a fiduciary duty to the principal (legal nerds can see Restatement (Third) of Agency § 8.01). This is a big deal! A fiduciary duty is one of the highest standards of care imposed by law. It is a legal obligation to act in the best interests of the other party within the scope of the business relationship. The agent owes other duties as well, including avoiding conflicts of interest and acting in line with the agency agreement.

When a company offering a Transactional Agent to customers (“Transactional Agent Provider”) operates the Transactional Agent, a principal-agent relationship *may* exist. If the customer went to court, they could argue there was a principal-agent relationship between them and the Transactional Agent Provider in order to get the Transactional Agent Provider on the hook. The court would likely look at the customer’s actions in deploying and configuring the Transactional Agent as well as the terms they agreed to, among other factors.

Apparent authority may be a particularly relevant consideration for the court, since third parties interacting with the AI may not know the actual instructions given to the Transactional Agent by the user, but rather, are relying on what they see from the Transactional Agent. The court would consider how the Transactional Agent Provider’s authority was communicated to third parties, including representations, disclaimers, and industry standards.

Even if a Transactional Agent Provider exceeded its authority, a court might analyze whether the customer ratified the action, meaning the customer essentially gave the Transactional Agent Provider authority to do that action after the fact.

In short, when it comes to Transactional Agents, the customer could be the principal delegating authority to the Transactional Agent Provider as their agent. Et voila, the Transactional Agent Provider would become legally liable under principal-agent laws.

Making Agency (or Alternatives) Work For You

Agency law is a familiar legal framework for courts and can potentially clarify liability issues, so, in some cases, it might be advantageous to state there is an agency relationship in Transactional Agent Provider terms of service. We have seen this already in our review of existing Transactional Agent Provider terms of service. At the same time, since the standard of care for an agent is so high, Transactional Agent Providers may wish to structure these relationships as independent contractor relationships if they can ensure that the terms and the way the customer interacts with the Transactional Agent align with this characterization. Likewise, there may be a competitive advantage in embracing some fiduciary duties as a Transactional Agent Provider to create and retain customer trust.

In addition, there’s a potential business opportunity here. Transactional Agent Providers may look to third parties to take on the responsibility of being the customer’s legal agent. This already happens in the payments industry where some companies act as the “merchant of record” and take on some liability for the actual provider or manufacturer of products and services sold.

In conclusion, as more Transactional Agents with increasingly advanced capabilities come online every day, customers should choose their Transactional Agent Providers wisely, and Transactional Agent Providers should be proactive in determining the principal-agent legal strategy appropriate for their business.

Diana Stern is Deputy General Counsel at Protocol Labs, Inc. and advises clients in her role as Special Counsel at DLx Law. Dazza Greenwood runs Civics.Com consultancy services, and he founded and leads law.MIT.edu and heads the Agentic GenAI Transaction Systems research project at Stanford’s CodeX.
Thanks to Sarah Conley Odenkirk, art attorney and founder of ArtConverge, and Jessy Kate Schingler, Law Clerk, Mill Law Center and Earth Law Center, for their valuable feedback on this post.

URL for the following original post: https://law.stanford.edu/2025/01/21/from-fine-print-to-machine-code-how-ai-agents-are-rewriting-the-rules-of-engagement-2/

From Fine Print to Machine Code: How AI Agents are Rewriting the Rules of Engagement: Part 2 of 3

January 21, 2025

Part 2 of 3

by Diana Stern and Dazza Greenwood, Codex Affiliate

Your AI shopping assistant is humming along, finding deals and making purchases for your customers. Then one day, it happens: the bot buys 100 self-heating mugs instead of 1, maxes out a customer’s credit card on duplicate Xbox orders, or shares your customer’s shipping address with an unauthorized third party. As the company behind this digital dealmaker (the “Transactional Agent Provider”), what happens when your AI assistant makes mistakes?

As a refresher, in our prior post, we defined Transactional Agents and uncovered why Transactional Agent Providers should be thoughtful about whether they serve as a legal agent for their customers (fiduciary duties abound!). We also identified a new business opportunity for third parties to take on this role.

Mistakes and Errors – at AI Scale

At a practical level, given the myriad possible contract permutations, the Transactional Agent could easily overstep its intended authority by filling in the gaps where its specific direction is not programmed, resulting in unintended obligations for the user (like ponying up enough cash to keep 100 self-heating mugfuls of matcha tea going at once). Will these agreements be binding if the Transactional Agent makes a mistake or exceeds its intended scope of authorization?

The Uniform Electronic Transactions Act (UETA) is broadly adopted commercial law in the United States that has provisions specifically addressing errors made during automated transactions conducted by Transactional Agents. For example, a relevant provision of UETA addressing errors permits the user to reverse transactions if the Transactional Agent did not provide a means to prevent or correct the error. This provision should be carefully understood by Transactional Agent Providers to ensure their process flow and ultimate user interaction support and reflect adequate means to prevent or correct these types of errors.

Likewise, under another provision of UETA, if the parties had an agreed security procedure in place and one party failed to abide by that procedure but would have caught the issue if they had, then the other party may be able to reverse the transaction. Even with this uniform law, such changes and errors’ legal and practical implications are complex and largely untested. Would these provisions mean that no transaction conducted by a Transactional Agent should be considered finalized until or unless its user has had an opportunity to review and determine no error requires correction? How long a period of time would be reasonable?

If a Transactional Agent Makes a Mistake, Who is on the Hook?

If a Transactional Agent doesn’t stick to customer instructions and makes a purchasing mistake, several different issues could come up in court. While tort law claims could fill their own textbook (we’ll leave those for our litigator friends), let’s zoom in on the contract law side of things.

In terms (heh) of contract formation, the mistake doctrine could apply. Under the Restatement (Second) of Contracts § 153, a mistake by one party could allow her to get out of the contract if:

The mistake was about a basic assumption on which she made the contract;
The mistake had a material effect on how the contract was carried out that negatively impacted her;
She does not bear the risk of the mistake; and
The other party knew or had reason to know of the mistake or the effect of the mistake would make the contract unconscionable (extremely one-sided or unjust) to enforce.

Whew, that was a mouthful.

Let’s bring this to life. Say you as the Transactional Agent Provider are acting as your customer’s legal agent, as explained in our last post. The actions your Transactional Agent takes within its scope of authority bind the customer. Let’s say your Transactional Agent books your customer on a trip to Paris, France instead of time-sensitive tickets to a conference in Paris, Texas. Your customer assumed the bot would book destinations accurately, and she would be adversely affected by having plans in France instead of Texas. Even assuming refundable bookings, she might miss her conference in Texas or have to pay higher room rates.

Does the risk of the Transactional Agent booking a trip to the wrong city fall on the customer (does she bear the risk)? What if the Transactional Agent Provider had disclaimers that the customer would bear the risk? Is that enough? Is the risk of Transactional Agents not following instructions so well known that customers bear the risk just by using them? Is that a desirable policy outcome?

And when is the Transactional Agent’s mistake so obvious, the other party should have known? What if the Transactional Agent left a reservation note to the French hotel that the customer was coming for the annual cryptocurrency conference in Paris, Texas? These answers will emerge as industry norms and expectations evolve.

Fortunately, there are ways for Transactional Agent Providers to mitigate some of these risks. As we discussed earlier, the Uniform Electronic Transactions Act (UETA) Section 10(2) offers a powerful tool in this regard. This provision allows customers to reverse transactions if the Transactional Agent did not provide a means to prevent or correct the error. By implementing a user interface and process flow that enables customers to review and correct transactions before they are finalized, providers not only comply with UETA but also establish a strong argument for ratification. If a customer has the opportunity to correct an error but chooses not to, they have arguably adopted the transaction as final. Moreover, this provision of UETA cannot be varied by contract, which means this rule allowing customers to reverse transactions will apply even if providers insert disclaimers or other contract terms insisting the customer holds all responsibility and liability for mistakes and errors committed by the Transactional Agent.

Given this is the law of the land in the U.S., with UETA enacted in 49 states, it is prudent to take these rules seriously. This design pattern – proactively building in error prevention and correction mechanisms – is therefore not just about legal compliance; it’s a fundamental aspect of responsible Transactional Agent development that helps define the point of finality and clarify the allocation of risk. But it’s also just good practice and a fair rule. By implementing these mechanisms, providers can significantly reduce their risk of liability. By embracing error avoidance and corrections protocols in the design and deployment of Transactional Agents, perhaps the most valuable benefit will not be avoiding liability for reversed transactions but legitimately earning Transactional Agent customers’ trust and reliance upon this new technology and way of doing business.

Enter the Regulators

Depending on the frequency and severity to which Transactional Agents’ mistakes harm customers, regulators like state attorneys general might investigate whether such conduct constitutes unfair or deceptive practices under consumer protection statutes.

Privacy issues add another layer of complexity. When Transactional Agents follow their open-loop model to complete tasks, they may use information in unexpected ways. Your friendly neighborhood shopping assistant might leverage information from your customer’s health-related queries to recommend products for purchase. This raises thorny questions about context integrity, consent, and compliance with privacy frameworks like GDPR, especially when these systems can make complex inferences about customers from seemingly innocuous data.

Designing Transactional Agents for compliance with existing laws is further complicated by certain regulators’ shift toward new, AI-specific laws. For example, last year, Regulation (EU) 2024/1689 (the “EU AI Act”) became the first AI-specific legal framework across the EU. While the EU AI Act makes a nod to existing EU privacy regulations, stating that they will not be modified by the Act, it may prove challenging for companies to comply with both if inconsistencies between the two bodies of law arise as more varied Transactional Agents are deployed. In the U.S., California’s Assembly Bill 2013 Generative Artificial Intelligence: Training Data Transparency will require builders to publish summaries of their training datasets, including whether aspects of the datasets meet certain privacy law definitions, increasing compliance overhead.

And this is just the tip of the agentic iceberg. The legal challenges posed by Transactional Agents bear some resemblance to those faced when open-source software first emerged. Just as the legal and developer communities grappled with novel issues surrounding open source licensing – such as who is liable for a bug in the code – we’re now confronting unprecedented questions about Transactional Agents and liability.

What About Missteps between the Transactional Agent Provider and LLM Provider?

Another persnickety contract-related risk lies in the terms of service between the Transactional Agent Provider and the LLM it uses. In our research, we observed that many LLM providers place a great deal of liability on the Transactional Agent Provider, leaving them with one-way indemnities and uncapped liability for certain claims. Others take a more even-handed approach. One commonality is that they leverage broad principles the Transactional Agent Provider must follow. LLM providers need to account for the innumerable edge cases that emerge when Transactional Agents are released in the wild. These principles range from restrictions against building competing services and circumventing safeguards to compliance with law. While useful for LLM-side lawyers drafting around a large set of risks posed by a rapidly developing technology, these principles become quite complicated when Transactional Agent Providers consider how to make them programmable. You would need to deal with thousands of areas of law in multiple jurisdictions around the world in the context of an open-loop interaction where you cannot predict outputs. Some of this uncertainty can be solved through thoughtful technical architecture that appropriately uses deterministic outputs to mitigate risk, but it’s not the only way.

Stay tuned for our third and final post, where we’ll share more solutions for managing Transactional Agent legal risks. We’ll explore everything from clear delegation frameworks to zero-knowledge proofs.

————————————————————————————————————–
Diana Stern is Deputy General Counsel at Protocol Labs, Inc. and Special Counsel at DLx Law. Dazza Greenwood runs Civics.Com consultancy services, and he founded and leads law.MIT.edu and heads the Agentic GenAI Transaction Systems research project at Stanford’s CodeX.
Thanks to Sarah Conley Odenkirk, art attorney and founder of ArtConverge, and Jessy Kate Schingler, Law Clerk, Mill Law Center and Earth Law Center, for their valuable feedback on this post.

URL for the following original post: https://law.stanford.edu/2025/03/26/from-fine-print-to-machine-code-how-ai-agents-are-rewriting-the-rules-of-engagement-part-3-of-3/

From Fine Print to Machine Code: How AI Agents are Rewriting the Rules of Engagement: Part 3 of 3

March 26, 2025

by Dazza Greenwood, Codex Affiliate (1) and Diana Stern

In the first two parts of this series, we explored the emergence of AI agents in everyday transactions and the legal risks they pose, particularly concerning agency and liability. We then examined the potential for AI agent errors and the crucial role of user trust. Now, in this final installment, we turn our attention to proactive solutions and “legal hacks” – innovative strategies to embed legal safeguards directly into AI agent systems, minimizing risk and maximizing their transformative potential. (Here are parts one and two of this series.)

Starting Off on the Right Foot

A robust approach to managing AI agents begins with a clear delegation and consent framework, mirroring established protocols in banking where explicit authorization is required for specific transactions. Just as a bank requires explicit authorization for financial actions, users should grant AI agent providers clearly defined authority from the outset. This is not merely a matter of convenience; it’s a fundamental principle of agency law.

An emerging consideration for managing AI agent risks is the potential role of insurance products. Just as professional errors and omissions policies protect human professionals, specialized insurance could provide a valuable safety net for autonomous AI transactions. These products could offer protection for consumers and platforms when AI agents encounter unexpected scenarios or make unintended decisions.

A well-defined scope of authority is crucial because, under agency law, the principal (the user) is bound by the agent’s actions within that scope. This minimizes the risk of unintended legal consequences and establishes a clear audit trail if issues arise. We encourage companies to consider the tradeoffs of taking an agency or independent contractor approach, which we touched on in our first post. In addition, companies might try to take the position that users themselves are taking all of the actions, and the AI agent is only providing access and infrastructure.

The optimal time to address legal considerations is during the transaction itself – when the AI agent interacts with a seller or counterparty. This is when agreements are formed, terms are established, and responsibilities are defined. While future AI agents might autonomously negotiate aspects of these agreements, a more immediate and powerful solution is the development of standardized transactional terms, analogous to Creative Commons licenses. Imagine a shared library of legal terms, pre-approved and readily understandable by both humans and AI agents. These standardized terms could provide a common framework for AI-driven transactions, ensuring a shared understanding of rights and obligations between the agent, the user, and the counterparty, streamlining legal interactions at scale.

The Human in the Loop: A Well-Intentioned Speed Bump

Traditionally, the answer to risky AI behavior has been to keep a human “in the loop”. While this provides a critical safety net, it also introduces friction and delays. Moreover, many users barely skim, let alone fully comprehend, lengthy terms of service before clicking “I Agree.”

While human oversight remains a necessary precaution in the current stage of AI agent development, particularly for high-value or complex transactions, the ultimate goal is to create agents that can operate autonomously and reliably, with minimal human intervention. Consider a practical scenario: an AI travel booking agent that could autonomously negotiate flexible cancellation policies with service providers based on predefined user preferences. For instance, the agent might secure more lenient terms for a trip to Paris, adapting the booking conditions to match the user’s specific risk tolerance and travel plans. Users could set preferences once and have each new AI agent they use incorporate them.

The traditional approach of “human in the loop,” while providing a safety net, significantly reduces the efficiency and scalability that make AI agents so compelling. Furthermore, the effectiveness of human oversight is questionable, especially when users often accept complex terms of service without careful review. To move beyond these limitations and fully realize the potential of AI agents, we need to explore proactive strategies – “legal hacks” – to embed legal safeguards directly into their design and operation.

Legal Hacks for AI Agents: Addressing What Could Go Wrong

To move beyond the limitations of human oversight and address the inherent legal risks of AI agents, we now explore “legal hacks” – proactive strategies to embed legal safeguards directly into the design and operation of these systems. These “legal hacks” are not about circumventing the law, but rather about leveraging technology to make legal compliance more efficient, reliable, and scalable. Our aim is to create more predictable legal outcomes, reduce reliance on cumbersome human intervention, and potentially offer first-mover advantages to companies that adopt these innovative approaches.

Teaching AI to Read the Fine Print

One powerful “legal hack” is to integrate relevant contractual terms directly into the AI agent’s decision-making process. Instead of treating legal agreements as external constraints, we can make them an integral part of the agent’s operational logic. This could involve platforms providing terms of service in structured, machine-readable formats, potentially via APIs or standardized data formats. AI agents could then be designed to parse this structured legal data, proactively assess potential compliance issues before executing transactions, and ensure alignment with applicable terms. An innovative approach to managing evolving legal terms could involve a broadcast mechanism. When platform terms of service are updated, AI agents could receive immediate notifications, eliminating the need for constant manual checking. This would allow agents to stay continuously aligned with the latest legal requirements without computational overhead.

Designing for Compliance: Checkpoints and Balances

This compliance-centric approach requires embedding checkpoints within the AI agent’s workflow. Before executing a transaction, the agent would cross-reference its planned actions against applicable legal terms, flagging potential non-compliance and, if necessary, prompting human review or adjusting its course of action. This creates a system of internal controls, ensuring that the agent operates within defined legal boundaries.

The Devil in the Details: Challenges and Considerations

Implementing this approach is not without challenges. Terms of service are often lengthy, complex, and ambiguous. Teaching an AI to interpret and apply these terms requires sophisticated natural language processing and a deep understanding of legal principles. Furthermore, we must be mindful of the unauthorized practice of law (UPL). If an AI agent were to directly advise users about complex legal terms or offer legal interpretations, it could potentially be construed as UPL. One way to mitigate this risk is to design these compliance tools primarily for the benefit of the AI agent provider. By focusing on internal compliance checks and business rule enforcement, the tool helps the provider ensure the AI operates within legal boundaries, while the AI agent itself communicates only business restrictions or options to the user, rather than direct legal advice.

The Future: AI-Friendly Terms of Service

Looking ahead, we envision a future where terms of service are designed specifically for AI comprehension. Platforms could create computational versions of their terms, optimized for machine readability while maintaining legal validity. This could involve a standardized format, perhaps analogous to the ‘robots.txt’ file that web crawlers use to understand website rules. In fact, today, AI agent developers are already updating business websites to ensure they are easily readable by LLMs and AI agents by providing a plain text version of the information. The ‘LLMS.txt’ specification is the main way people are doing this. A website’s terms of service could be put into LLMs.txt format today, making this legal hack immediately and easily achievable. In the future, an LLMS.txt file could provide additional legal and compliance requirements for AI agents operating on a given platform, making legal expectations clear and accessible. Furthermore, extending attribution fields, similar to those in some AI APIs like Google Gemini that are used to cite sources, to include metadata identifying the responsible party for an AI agent’s actions would enhance transparency and accountability in AI-driven transactions. Taking it even further, in the future, these machine-readable terms of service could roll up into immediately understandable summaries for end users who might want to filter by, for example, AI agents that act as a legal agent (as opposed to those that take the alternative independent contractor or infrastructure approaches referenced above).

On the Horizon: Leveraging Zero Knowledge Proofs

Another groundbreaking “legal hack,” particularly relevant to addressing privacy concerns highlighted in our second post, lies in the realm of cryptography: zero-knowledge proofs. A zero-knowledge proof is a cryptographic method that allows one party (the prover) to convince another party (the verifier) that a statement is true, without revealing any information beyond the validity of the statement itself. Imagine you have a magic door that only opens if you know a secret password. You want to prove to someone that you know the password without actually telling them what it is. A zero-knowledge proof would allow you to do just that. You could interact with the door in a way that demonstrates you can open it, convincing the other person you know the secret without ever revealing the password itself.

In the context of AI agents, zero-knowledge proofs could enable agents to process sensitive data – such as personal information required for a purchase – without actually revealing that data to the agent itself, the platform, or other parties. This significantly enhances user privacy and reduces the risk of data breaches, key considerations highlighted by privacy regulations. For AI agent providers, incorporating zero-knowledge proofs could minimize the amount of sensitive data they collect, simplifying compliance with privacy regulations.

Conclusion: Code as Law 2.0 – Architecting the Digital Future

Companies that pioneer these “legal hacks” – from AI-readable terms of service and standardized transactional terms to compliance checkpoints and zero-knowledge proofs – are not simply adapting to a changing legal landscape; they are actively shaping it. These innovations represent a fusion of law and code, creating a “Code as Law 2.0” paradigm that has the potential to revolutionize digital interactions. By embedding legal safeguards directly into AI agents, we can reduce compliance costs, mitigate legal risks, enhance user trust, and unlock new global markets. As AI agents become increasingly sophisticated and autonomous, embracing these proactive legal strategies will be essential for responsible innovation and building a more trustworthy, efficient, and equitable digital future. The question is not if the industry will adopt AI agents for transactions, but how quickly will you adapt to this emerging future and gain advantage over those who lag behind?

(1) Dazza Greenwood runs Civics.Com consultancy services, and he founded and leads law.MIT.edu and heads the Agentic GenAI Transaction Systems research project at Stanford’s CodeX. Diana Stern is Deputy General Counsel at Protocol Labs, Inc. and Special Counsel at DLx Law.
Thanks to Sarah Conley Odenkirk, art attorney and founder of ArtConverge, and Jessy Kate Schingler, Law Clerk, Mill Law Center and Earth Law Center, for their valuable feedback on this post.

URL for the following original post: https://innovation.consumerreports.org/defining-loyalty-for-ai-agents-insights-from-the-stanford-ai-agents-x-law-workshop/

May 5, 2025

Defining ‘Loyalty’ for AI Agents: Insights from the Stanford AI Agents x Law Workshop

By Dazza Greenwood

AI agents are rapidly moving from science fiction to daily reality. These sophisticated software systems promise to manage tasks, conduct transactions, and augment our capabilities in unprecedented ways. But as they become more integrated into our lives, critical questions arise: Whose interests will they serve? How can we ensure they act reliably and responsibly on our behalf?

These questions were at the heart of the AI Agents x Law Workshop, held on April 8th, 2025, at Stanford Law School. Part of an ongoing research initiative affiliated with Stanford CodeX and law.MIT.edu, the event brought together legal experts, technologists, founders, and consumer advocates in collaboration with the Stanford HAI Digital Economy Lab and the Consumer Reports (CR) Innovation Lab to map the complex legal and ethical terrain of emerging AI agent technologies. This event marked the beginning of a focused effort by these organizations to collaboratively define actionable standards and practices for consumer-centric AI.

The overarching goal echoed throughout the day, was to foster an ecosystem where AI agents are built to be trustworthy, safe, and aligned with the best interests of the individuals they serve – agents that work for people, not on them. This first post in a series will provide a brief overview of the workshop and then dive deeper into one of the central themes discussed: What does it mean for an AI agent to be “loyal” to its user?

Setting the Stage: The Quest for Consumer-Centric Agents

The workshop kicked off with framing remarks emphasizing the high stakes. Professor Sandy Pentland (MIT/Stanford HAI) highlighted the intense industry interest driven not just by opportunity, but by liability concerns. Companies recognize the need for evidence-based best practices and standards to ensure agent systems don’t go off the rails, potentially leading to significant harm and legal challenges. The vision? To move towards agents that could potentially act as legal fiduciaries for their users.

Ben Moskowitz, VP of Innovation at CR, explained CR’s commitment to this vision. He spoke of “consumer-authorized agents” designed to empower users in the marketplace – tools that research, buy, and troubleshoot effectively, advocating tirelessly for consumer interests. He stressed that achieving this requires tackling normative questions, technical challenges, and defining clear expectations for agent behavior, underscoring CR’s dual role in both consumer protection advocacy and proactive product R&D to help build the desired future. Ben specifically called for consumer platforms like CR to help develop standardized testing methodologies to validate agent claims—echoing CR’s historical role in product reliability assessments.

What Does a “Loyal” AI Agent Mean for Consumers?

This fundamental question of loyalty was a recurring theme, explored in depth by myself (Dazza Greenwood, Stanford CodeX/law.MIT.edu) and Diana Stern (Deputy GC at Protocol Labs & Special Counsel at DLx Law), a leading Silicon Valley lawyer and collaborator on a pre-workshop blog series on this topic.

Imagine Bob, circa 2026, needing a new dishwasher. Instead of wading through endless online reviews and potentially misleading sponsored content, he asks his AI agent: “Find me the best dishwasher for my needs and budget.”

A “loyal” agent, operating under a duty of loyalty, would prioritize Bob’s stated interests. It would analyze objective information, compare features based on Bob’s criteria (price, efficiency, reliability ratings, specific features), and recommend the option that genuinely best serves Bob. Its internal logic and external actions would be aligned with maximizing Bob’s benefit.
An agent not bound by loyalty, however, might operate differently. Its recommendations could be skewed by hidden incentives. Perhaps it prioritizes dishwashers from manufacturers who pay the agent provider the highest commission or kickback. Maybe it highlights models from advertising partners, even if they aren’t the best fit for Bob. Bob might still get a dishwasher, but likely not the best one for him, potentially paying more or getting a less suitable product.

This “duty of loyalty” concept, central to traditional agency law (as seen in the “Iron Triangle” diagram), suggests a model where the agent provider is legally and ethically bound to put the user’s interests first within the scope of their relationship.

Beyond Promises: The Link Between Legal Frameworks & Technical Reality

The workshop discussion highlighted that merely claiming loyalty in a terms of service document isn’t sufficient. True loyalty must be reflected in the agent’s underlying architecture and behavior. As Ben Moskowitz prompted, what happens if an agent claims loyalty but acts otherwise, perhaps due to flawed design, negligence, or even intentional bias in its programming?

This necessitates observability and verifiability of agent decisions. We need ways to assess whether an agent is actually acting loyally. Can we technically test if its information processing and decision-making are free from undue influence from third-party interests or the provider’s own conflicting business models? Can we evaluate if it consistently prioritizes the user’s goals as instructed? This technical dimension is inseparable from the legal promise. Workshop attendees identified promising technical approaches, such as independent “agent audits” and sandboxed simulations—methods CR could lead or facilitate—to objectively measure an agent’s adherence to consumer-first standards.

Diana Stern’s work, which we discussed, further illuminates this by outlining different potential relationship models between agent providers and users:

Fiduciary: The highest standard, embedding a duty of loyalty (as discussed above)
Technology Provider: The opposite extreme, where the provider essentially says, “We just provide the tool; you bear all the risk,” disclaiming liability (as seen in some current terms)
Contractor: An intermediate model where duties and responsibilities are defined by a specific contract or scope of work, potentially mixing elements of service provision with limited obligations.

Choosing a model has profound implications on user trust and provider liability. While the “technology provider” stance might seem safest legally for the provider, the “fiduciary” approach, despite its higher bar, could become a significant competitive differentiator, attracting users seeking agents they can genuinely trust.

Looking Ahead

Establishing loyalty is foundational, but it’s just one piece of the puzzle. The AI Agents x Law workshop also explored critical mechanisms for handling agent errors (leveraging UETA Section 10b), the challenges of authorizing agents securely (authenticated delegation), the impact of agents on legal practice and labor, and the need for robust evaluation methods (“evals”) to ensure agent performance and alignment. Future posts will explore other crucial topics surfaced during the workshop, such as error handling and the implications of new protocols like Agent-to-Agent (A2A) communication. Stay tuned for more.

The transition to an agent-driven world requires careful thought, collaboration, and proactive design. By bringing together diverse perspectives, initiatives like this aim to develop the frameworks, standards, and technical solutions needed to ensure AI agents enhance, rather than undermine, consumer welfare and market fairness. To this end, CR is exploring prototype tests and interactive demos, aiming to make loyalty measurable and visible to everyday users.

Interested in how AI agents can better serve people? Want to help define that future? We’d love to hear from you. Reach out to us anytime at innovationlab@cr.consumer.org.

URL for the following original post: https://innovation.consumerreports.org/my-agent-messed-up-understanding-errors-and-recourse-in-ai-transactions/

May 19, 2025

My Agent Messed Up! Understanding Errors and Recourse in AI Transactions

By Dazza Greenwood

In my previous post, I shared highlights from Stanford CodeX’s AI Agents x Law Workshop exploring how we might foster an ecosystem where AI agents are built to be trustworthy, safe, and aligned with the best interests of the individuals they serve. In this post, I’ll dive into Section 10(b) of the Uniform Electronic Transactions Act (UETA)—a previously obscure provision—that has suddenly become critically relevant as AI-driven agents increasingly mediate commercial transactions.

Setting the Scene

Imagine asking your new AI shopping assistant to order a specific book, only to find 10 copies arriving at your door. Or perhaps it books a flight to Paris, France, instead of Paris, Texas, for that crucial conference. As AI agents move beyond providing information to actively conducting transactions on our behalf – buying goods, booking services, managing finances – the potential for costly errors increases. What happens then? Who is responsible, and what recourse do you have?

While the technology feels cutting-edge, part of the answer lies in a surprisingly relevant piece of legislation from the dawn of the internet age: the UETA. Enacted in 49 states and territories around 1999 to give legal validity to electronic signatures and records, UETA showed remarkable foresight by including provisions specifically addressing “electronic agents.” These rules, particularly Section 10(b) concerning errors are once again pertinent with the rise of powerful LLM-driven agents.

UETA Section 10(b): The Right to Undo Agent Errors

UETA Section 10(b) provides a critical safeguard for individuals when an electronic agent introduces an error into a transaction. In plain terms:

If an electronic agent makes a mistake during a transaction (one you didn’t intend), and…
You, the user, were not provided with a reasonable “means to prevent or correct the error” by the agent’s provider…
Then, you generally have the legal right to “avoid the effect” of the erroneous transaction – essentially, to reverse or undo it.

This isn’t about agents giving bad advice – that might fall under different legal principles like negligence or deceptive practices. UETA Section 10(b) specifically targets situations where the agent itself, operating autonomously, messes up the action of the transaction.

Crucially, this right to reverse the transaction cannot simply be waived by fine print in the terms of service.

The Provider’s Role: Building the Escape Hatch

The key phrase here is the “means to prevent or correct the error.” This puts the onus squarely on the company providing the AI agent service. If they want to ensure the transactions conducted by their agents are considered final and legally binding, they must build mechanisms that give the user a fair chance to catch and fix mistakes before they become irreversible problems.

What does this look like in practice? At the Stanford CodeX’s AI Agents x Law Workshop, Andor Kesselman presented a compelling open-source demo showcasing exactly this. Implementations might include:

Clear Confirmation Prompts: “You are about to purchase 10 widgets for $100. Confirm or Cancel?”
Review Steps: Allowing users to review order details before final submission
Spending Limits or Threshold Alerts: Flagging unusually large or atypical transactions for human verification
Accessible Error Reporting: Clear paths for users to report issues promptly

As Diana Stern and I noted in a recent Stanford CodeX article:

“By implementing a user interface and process flow that enables customers to review and correct transactions before they are finalized, providers not only comply with UETA but also establish a strong argument for ratification… This design pattern – proactively building in error prevention and correction mechanisms – is therefore not just about legal compliance; it’s a fundamental aspect of responsible Transactional Agent development that helps define the point of finality and clarify the allocation of risk. But it’s also just good practice and a fair rule.”

Why This Matters Now More Than Ever

While UETA is over two decades old, its provisions on automated transactions and error handling are stepping into the spotlight. The “electronic agents” envisioned then were largely deterministic; today’s LLM-powered agents are far more complex and unpredictable, making robust error handling even more vital.

Because of UETA Section 10(b), consumers have a powerful legal remedy if an agent transaction goes wrong and the consumer wasn’t given a chance to fix it. For businesses deploying AI agents, UETA Section 10(b) is a clear mandate: building effective, transparent error prevention and correction isn’t just good customer service – it’s a legal necessity for ensuring transaction finality, mitigating liability, and ultimately, earning user trust in this new era of automated commerce.

Looking Ahead

While we’ve explored the importance of loyalty in AI agents and the legal frameworks for handling their errors, it’s also crucial to recognize that agents are no longer acting alone—they’re starting to talk to each other. My final post in this series will dive into the emerging world of Agent-to-Agent (A2A) communication and what it means for consumers.

Interested in how AI agents can better serve people? Want to help define that future? We’d love to hear from you. Reach out to us anytime at innovationlab@cr.consumer.org.

URL for the following original post: https://innovation.consumerreports.org/agents-talking-to-agents-a2a-reshaping-the-marketplace-and-your-power/

May 30, 2025

Agents Talking to Agents (A2A): Reshaping the Marketplace and Your Power

By Dazza Greenwood

In previous posts, we explored the importance of loyalty in AI agents and the legal framework like the Uniform Electronic Transactions Act (UETA) for handling their errors. But the next evolution is already here: agents aren’t just acting solo; they’re starting to talk to each other. This Agent-to-Agent (A2A) communication, recently standardized by protocols like Google’s open-source A2A initiative, is poised to fundamentally reshape digital marketplaces and potentially shift significant power towards consumers.

While the technical details involve standardizing how different agents discover, communicate, and collaborate, the implications go far beyond mere plumbing. Think of it less like upgrading pipes and more like building the interconnected highways for an entirely new kind of commerce and interaction, operating at machine speed.

Market Disruption at Machine Speed

As discussed during Stanford CodeX’s AI Agents x Law Workshop, the widespread adoption of A2A protocols could trigger market shifts reminiscent of how High-Frequency Trading transformed finance, but on a much broader scale.

Hyper-Speed Transactions: Agents negotiating and executing deals directly with other agents bypass human bottlenecks, accelerating everything from price discovery to order fulfillment
New Intermediaries (and Disintermediation): Just as electronic trading created new market makers, A2A will likely spawn new kinds of digital intermediaries – agent “matchmakers,” reputation brokers, or specialized negotiation agents. Simultaneously, it could disintermediate existing players who rely on friction or information asymmetry. As highlighted in our workshop discussions, we might even see waves of “redisintermediation” as the ecosystem rapidly evolves.
Dynamic Competition: Standardized communication lowers the barrier for entry. Specialized agents focusing on specific tasks (like finding the absolute lowest price or negotiating the best warranty) can plug into the ecosystem, fostering intense competition based on capability and value.

Unlocking Consumer Power Through Interoperability

This is where A2A becomes particularly exciting from a consumer perspective. An open standard for agent communication directly enables:

Real Choice Among Agents: If agents can talk to each other via A2A, you’re not locked into a single provider’s ecosystem. You could choose a primary “concierge” agent from one company but employ a specialized “deal-hunting” agent known for its fierce loyalty from another, knowing they can collaborate effectively on your behalf. This interoperability is the bedrock for a competitive market where truly pro-consumer agents can thrive.
Agents as “Legal Hacks”: Remember the challenge of impenetrable terms and conditions? As explored by legal minds like Diana Stern during our workshop, AI agents, facilitated by A2A’s ability to interact with diverse services in a standardized way, could become powerful tools for navigating this complexity. Imagine instructing your agent: “Find me the retailer with the best price and the most consumer-friendly return policy according to these specific criteria.” A2A provides the rails for your agent to query, parse, and compare these terms across multiple sellers automatically.
Potential for Collective Action: The idea of a “union of agents” becomes more feasible. Platforms coordinating numerous consumer agents via A2A could potentially aggregate demand or negotiate terms collectively. Imagine thousands of agents simultaneously signaling preference for merchants who meet specific data privacy standards or offer extended warranties, creating collective bargaining power at an unprecedented scale and speed.

The Road Ahead: Opportunity & Responsibility

The emergence of A2A protocols marks a pivotal moment. It offers the potential for vastly more efficient and dynamic markets, but also new avenues for consumer empowerment, choice, and leverage. However, realizing this positive potential requires conscious effort.

Ensuring these protocols remain open, fostering genuine competition among agent providers, demanding transparency in how agents operate, and building robust mechanisms for accountability (like the UETA error handling discussed previously) are crucial next steps. Consumer Reports and collaborators at Stanford and MIT are actively researching and prototyping in this space, working to ensure that as agents learn to talk to each other, they do so in ways that ultimately benefit the consumers they serve.

The agent-to-agent future is rapidly approaching. By understanding the underlying technology and advocating for consumer-centric principles in its development, we can help shape a marketplace that is not only faster and smarter, but also fairer.

Get In Touch

Interested in how AI agents can better serve people? Want to help define that future? We’d love to hear from you. Reach out to us anytime at innovationlab@cr.consumer.org.

On AI Regulation "Third-Way"

Dazza Greenwood — Fri, 16 May 2025 05:57:04 GMT

Earlier today I appeared before the Wyoming Legislature’s Joint Select Committee on Blockchain, Financial Technology & Digital Innovation Technology to outline a practical path for governing artificial-intelligence systems without throttling innovation.

In my testimony, I presented California's SB 813 as a potential "third way" for AI regulation—a middle path between heavy-handed restrictions and complete absence of oversight. This approach creates voluntary certification through Multi-stakeholder Regulatory Organizations (MROs) that can verify AI systems meet safety and reliability standards. Certified systems gain a rebuttable presumption of "reasonable care" in tort cases—creating a powerful incentive for responsible innovation without mandating specific technical approaches.

The economic implications of AI agent systems formed a central focus of our discussion. These autonomous AI systems are already transforming software engineering, legal services, and commercial transactions. Companies like Perplexity and Amazon are deploying AI agents that can conduct transactions and make purchases on users' behalf, while Stripe now offers tools for businesses to authorize AI agents to make direct payments.

The economic boost could reach 3-5% of GDP by 2030, yet the same technology that scales productivity can displace jobs or amplify malicious actors. During questioning I discussed authenticated delegation protocols that tie every agent action to a verifiable human or legal entity, limiting liability drift and curbing fraud, and urged pairing flexible certification with robust up-skilling programs rather than blunt “human-in-the-loop” mandates that freeze scalability.

What's particularly striking is how quickly these technologies are moving from research concepts to everyday deployment. When I first testified to this committee on generative AI, many of these capabilities seemed theoretical. Today, they're commercially available. This rapid evolution suggests we need frameworks that can adapt as quickly as the technology while providing necessary guardrails around high-risk applications.

The committee demonstrated a sophisticated understanding of the challenges, asking thoughtful questions about security implications of foreign AI models, intellectual property concerns with training data, and evolving approaches to human oversight requirements. As Senator Rothfuss noted, Wyoming has a tradition of "regulating to enable rather than restrict"—a philosophy perfectly suited to this moment of technological transformation.

I've been honored to work with the Wyoming legislature over several years as they've crafted blockchain legislation and other digital innovation frameworks. Their approach of careful listening, thoughtful questioning, and balanced policy-making continues to serve as a model for how states can navigate technological disruption. I look forward to continuing this important conversation at future hearings as we work toward frameworks that unlock AI's benefits while mitigating potential harms.

May 16, 2025 Update: Further Thoughts on AI Regulation, MROs & a Path to Interstate Co-operation

After I posted my Wyoming testimony on multistakeholder regulatory organizations (MROs), Nancy (Leyes) Myrland left an insightful LinkedIn comment that zeroed-in on three issues:

Will state-level guardrails still matter if Washington eventually centralises AI oversight?
How often would a “trustworthy” badge have to be renewed when models evolve daily?
Are California-only guardrails enough, or must other states join for real protection?

I made a short reply to Nancy on LinkedIn but the character limit is short and her questions invite a richer look at both California’s SB 813 and an idea I sketched for the legislature: inter-state reciprocity. So let’s go deeper!

Nancy’s questions—answered

1 | Will a future federal regulator make state action moot?

Not at all. SB 813 obliges every MRO to spell out “an approach to interfacing effectively with federal and non-California state authorities” . In American law we repeatedly see innovations flow bottom-up: Blue-Sky securities rules, driver-licence compacts, the Uniform Commercial Code. States are nimble laboratories; Congress often scales what they prove. A running California MRO framework gives Washington a tested chassis to bolt onto.

2 | How often does “trustworthy” recertification happen?

Model-level triggers. Each MRO plan must define technical thresholds for updates requiring renewed certification . If a developer adds autonomous code-execution or a new multimodal dataset that crosses the line, the certificate pauses until a fresh audit clears it—much like the FDA’s 510(k)/PMA split for medical devices.
MRO-charter clock. An MRO’s own designation lasts three years and can be ripped up sooner if independence erodes, its methods become obsolete, or a certified model causes major harm . Oversight of the overseers updates at least as fast as the tech.

3 | Are one-state guardrails enough?

SB 813 already covers any AI deployed in California, so most national providers will seek certification. Still, a genuine safety net needs more than one state’s knots. Enter reciprocity.

Expanding the vision: a practical path to interstate AI reciprocity

While SB 813 gives California a robust foundation, legislators in Wyoming (and elsewhere) asked how to spread the benefit without fifty separate audits. The answer I proposed is an interstate reciprocity layer. It is not yet in SB 813; rather, it is a natural extension that lets developers certify once and be recognised in many jurisdictions, while each state keeps the power to yank recognition the minute another state’s protections slip.

4.1 A simple legislative starting-point

To switch reciprocity on, California (or any pioneering state) could add a single sentence to its safe-harbor section. Something like:

“A certificate issued under a substantially equivalent multistakeholder regulatory framework of another state shall confer the same rebuttable presumption, unless the Attorney General determines that framework no longer affords equivalent protections.”

That one clause empowers the AG to recognise outside frameworks and keep a live list of reciprocal states.

4.2 What “substantially equivalent” could mean

The phrase must have teeth. An outside framework would need to meet, at minimum, these pillars:

Comprehensive risk scope —covers CBRN, malign persuasion, autonomy, exfiltration.
Guaranteed independence —board composition and funding caps that block capture.
Transparency & accountability —public annual reports and decade-long record retention.
Robust enforcement —real-time power to revoke certificates when models drift.
Continuing governmental oversight —periodic review of each MRO by its home-state AG (or equivalent).
Collaborative data-sharing —MOUs so AG offices trade incident reports, best-practice memos and evolving threat intel in near-real time.

4.3 Making reciprocity work: procedural mechanics

Public registry & dynamic review. California’s AG would publish the recognised-states list; every listing sunsets (say) in three years, forcing re-inspection so standards evolve with the science.
Agile de-recognition. If State X’s MRO weakens or certifies a reckless model, California can strike that state overnight—integrity preserved, no legislative lag.
Interstate compact option. For deeper ties, two or more states could enshrine reciprocity in a compact, driver-licence-style. The Uniform Law Commission could draft model language so Wyoming and New Jersey start from the same page.

4.4 Why stake-holders win

Developers: one dossier, many states—lower friction, stronger incentive to certify.
States: pooled expertise and shared intel, yet full power to slam the door if another jurisdiction backslides.
Public: consistent guardrails and quicker access to vetted AI.
Nation: a bottom-up baseline forms while Congress deliberates—innovation and safety advance together.

4.5 Guardrails & challenges

Reciprocity must never spark a race to the bottom. That is why listings sunset and why de-recognition is swift. And remember: the safe-harbor is narrow and rebuttable—it shields developers only on personal-injury and property-damage claims, not consumer-protection, privacy, or civil-rights suits . Participation is voluntary; immunity is limited.

Additional clarifications

Transparency. MRO plans are filed with the AG; future regulations should publish them (redacting trade secrets) to build public trust.
Built-in safeguards. Whistle-blower protections (§ 8898.2(a)(7)), mandatory incident reports (§ 8898.2(a)(3)) and auditing of post-deployment practices (§ 8898.2(a)(1)) are core plan elements .

Closing – laboratories at work

I’ve spent my career in state-powered innovation: drafting the Uniform Electronic Transactions Act, co-ordinating early digital-signature standards, steering multi-state mega-procurementsthat pooled demand for better pricing, building open-source repositories shared across agencies, and countless other projects where states proved nimbler and bolder than Washington. More recently we’ve seen states pioneer everything from digital identity and electronic notarization to friction-less sales-tax collection. SB 813 stands firmly in that tradition—nimble, incentive-driven, and ready for replication.

Could your state benefit from a “certify once, recognised many” approach? I’m eager to refine these ideas with lawmakers, technologists and advocates. Drop me a comment at Civics.Com/contact and let’s keep building trustworthy AI, the federalist way.

AI Agents x Law Initiative

Dazza Greenwood — Wed, 09 Apr 2025 18:42:04 GMT

I'm thrilled to have convened the inaugural event marking the launch of an exciting new research and development initiative at Stanford University, in close collaboration with industry leaders and experts focused on AI Agents.

This kickoff workshop, co-presented by Stanford CodeX, MIT Computational Law Report, Stanford HAI Digital Economy Lab, and Consumer Reports Innovation Lab, began a crucial conversation about the legal dimensions and innovative applications of AI Agents.

April 8th, 2025 Inauguraal Workshop Program

Introductions

Speaker: Dazza Greenwood

Welcome Remarks

Speaker: Sandy Pentland

Setting the Context for AI Agents x Law

Speaker: Dazza Greenwood

Legal Issues and Options for AI Agents Conducting Transactions

Speaker: Diana Stern

Legal Practice and Innovating Law with AI Agents

Speaker: Damien Riehl

Open Source Demo Example of Legal Error Handling for AI Agent

Speaker: Andor Kesselman

Authenticated Delegation of Authority for AI Agents

Speaker: Tobin South

You can view the session recording here and embedded above, and learn more or share your insights via this feedback form at https://computationallaw.org

Unleashing Creativity with OpenAI’s New Agents SDK

Dazza Greenwood — Wed, 12 Mar 2025 00:25:53 GMT

I’m thrilled to dive into OpenAI’s new Agents SDK publicly released earlier today. It’s a game-changer for AI orchestration and workflow automation. Early access let me transform imaginative ideas into reality with near-effortless speed.

Here’s a demo of the first version of my project working with the SDK from last week, presented to the OpenAI Agent team, thanks to early access with AgentOps!

Initial Pre-Release Demo at https://x.com/AlexReibman/status/1899533549893746925

My Journey from Straight Python to the OpenAI Agents SDK

Previously, I built autonomous AI agents using pure Python—a powerful but intricate process. But I found it better to do it that way than using any of the available agent frameworks. Check out my original project here. It demanded meticulous orchestration and heavy coding to handle multi-agent workflows. The OpenAI Agents SDK slashed that complexity, letting me reimagine and rebuild my project into a streamlined, modular, and far more powerful system.

Introducing "Agento": A Modular AI Planning System

My new "Agento" project showcases how the OpenAI Agents SDK can be used to turn broad goals into structured, actionable plans with iterative polish. Literally, you can start this sucker off with ANY goal or idea you can think of and it will go to work on it for you. Here’s the breakdown:

Criteria Generation: Iteratively identifies and select custom success metrics, grounded in full web search to ensure they are relevant and actionable.
Plan Generation: Crafts detailed goal-achievement strategies and a plan outline.
Plan Expansion and Evaluation: Expands and critiques each plan outline into a full draft.
Revision Identification: Spots needed improvements based on your original goal and, critically, on the success criteria.
Revision Implementation: Applies and tests revisions for a solid and well-aligned draft.
There is also a module to export your final plan as easy to read markdown (with MS Word, PDF, and other formats depending on the plan content coming soon)

Each module is independent and interchangeable, linked by standard JSON interfaces for flexibility across agent frameworks. This means you can take any module and re-create it in whatever agent framework you prefer (LangGraph, Crew, AutoGen, etc, etc) and everything will still work. It’s just JSON in and JSON out. Dive into the details and grab the starter code here.

Making Your Life Easier with a Ready-to-Go Single File

To get you started fast, I’ve packed all of the OpenAI Agent SDK code and docs into one ready-to-use file. Just add or attach it to your LLM prompts for a seamless custom-agent-building experience. Grab the total Agent SDK in one file right here!

A Deeper Dive into OpenAI Agents SDK

The OpenAI Agents SDK, a versatile open-source tool, orchestrates complex multi-agent workflows with ease. It outshines earlier frameworks like Swarm, boosting productivity and simplicity. Key features:

Agent Configuration: Equip agents with built-in or custom tools effortlessly.
Smart Handoffs: Delegate tasks between agents seamlessly.
Guardrails: Enforce safety and other priorities with input/output validation.
Tracing & Observability: Debug and optimize with clear execution insights.

Dig into the details of the new SDK at these links

OpenAI Announcement: https://openai.com/index/new-tools-for-building-agents/
Documentation: https://platform.openai.com/docs/guides/agents
SDK docs: https://openai.github.io/openai-agents-python/
GitHub repo: https://github.com/openai/openai-agents-python
SDK walkthrough: https://x.com/OpenAIDevs/status/1899531225468969240?t=617

Try it Out!

Whether you’re building a breakthrough or simplifying daily tasks, the OpenAI Agents SDK supercharges your work. Dive into the docs, try my "Agento" example, and see how it can lift your projects to new heights. Let’s innovate together, just grab the code, start fast, and unlock endless possibilities with OpenAI’s latest gem!

UETA and LLM Agents: A Deep Dive into Legal Error Handling

Dazza Greenwood — Mon, 03 Feb 2025 07:17:37 GMT

Pre-Release Version

In previous explorations of UETA and LLM agents, we established that the law’s broad applicability extends to modern AI-powered transactions. In this deep dive, we focus on error handling—the critical yet often neglected factor that determines both user trust and system resilience.

Have you ever been stuck in a frustrating loop with an automated system, unable to fix a simple mistake? In AI-driven commerce, every transaction intermediated by an LLM agent is a moment of truth. Section 10 of the Uniform Electronic Transactions Act (UETA) provides a clear legal framework for error correction and prevention—yet it remains largely ignored in AI-powered transactions.

Without these safeguards, your transactions may not be final—leaving businesses exposed to transaction reversals, liability disputes, and operational uncertainty. But by building in error prevention, correction, and auditability, AI agent systems can establish true finality—where transactions are legally binding, disputes are minimized, and fairness is ensured for consumers.

It’s time to bring this critical legal requirement into the light—to protect businesses from liability, give consumers trustworthy digital transactions, and ensure AI-driven commerce operates with certainty and integrity.

To get into this topic, I’ll spotlight this passage from a recent post I co-authored with Diana Stern published by Stanford CodeX:

By implementing a user interface and process flow that enables customers to review and correct transactions before they are finalized, providers not only comply with UETA but also establish a strong argument for ratification. If a customer has the opportunity to correct an error but chooses not to, they have arguably adopted the transaction as final. Moreover, this provision of UETA cannot be varied by contract, which means this rule allowing customers to reverse transactions will apply even if providers insert disclaimers or other contract terms insisting the customer holds all responsibility and liability for mistakes and errors committed by the Transactional Agent.
Given this is the law of the land in the U.S., with UETA enacted in 49 states, it is prudent to take these rules seriously. This design pattern – proactively building in error prevention and correction mechanisms – is therefore not just about legal compliance; it’s a fundamental aspect of responsible Transactional Agent development that helps define the point of finality and clarify the allocation of risk. But it’s also just good practice and a fair rule. By implementing these mechanisms, providers can significantly reduce their risk of liability. By embracing error avoidance and corrections protocols in the design and deployment of Transactional Agents, perhaps the most valuable benefit will not be avoiding liability for reversed transactions but legitimately earning Transactional Agent customers’ trust and reliance upon this new technology and way of doing business.

With that context, let’s dive in!

Why Error Handling Matters Now More Than Ever

For business and technology leaders, error handling might seem like a technical detail best left to development teams. For legal and risk management professionals, it may appear as just another compliance checkbox. Both perspectives, however, overlook the larger strategic importance of robust error handling.

Every transaction your LLM agent handles is a moment of truth. When transactions proceed flawlessly, interactions feel seamless. But when errors occur, the system faces a critical choice:

- Leave users stranded: Failing to offer correction options can trap users in a rigid, automated process.

- Empower users: Providing clear, transparent paths for error correction builds trust and long-term loyalty.

This distinction not only affects user satisfaction but also lays the groundwork for sustainable, scalable automated commerce.

The Business Case for Robust Error Handling

Implementing strong error handling capabilities is an investment—not merely an added cost. Consider the following benefits:

Beyond these immediate advantages, robust error handling lays the foundation for the future of automated commerce.

UETA Section 10: A Framework for Fair Automation

UETA’s Section 10 provides a forward-thinking framework for error handling in electronic transactions. Its key principles include:

User Agency: Systems must offer meaningful opportunities for error prevention and correction.
Mutual Responsibility: Both parties should adhere to agreed-upon security procedures.
Clear Communication: Prompt notifications and clear procedures are essential when errors occur.
Fair Resolution: The system must ensure that users have a path to avoid being bound by erroneous transactions.

These principles serve not only as legal requirements but also as best practices that reinforce user trust and system reliability.

Implementation Requirements: Bridging Legal Theory and Technical Practice

For both business leaders and legal teams, meeting UETA compliance while optimizing user experience demands that error handling systems deliver on two fronts: legal integrity and technical robustness. Achieving this balance requires that your LLM-based system be designed around four core capabilities:

Here are the four points in narrative form, combining the business and legal/risk values for each capability:

Error Prevention serves dual purposes: it reduces support costs and drives higher user satisfaction on the business side, while proactively mitigating risks from a legal perspective. This capability helps organizations stay ahead of potential issues before they materialize.
Error Detection capabilities enable quick identification and resolution of issues, supporting operational efficiency. From a legal standpoint, this capability ensures proper evidence preservation and enables ongoing compliance monitoring, providing organizations with real-time insights into their regulatory adherence.
Error Correction enhances the user experience and helps retain customers by smoothly resolving issues when they occur. Legally, it provides clear demonstration of UETA (Uniform Electronic Transactions Act) compliance, showing that the organization maintains appropriate error handling procedures.
Record Keeping delivers valuable business intelligence and supports process improvement initiatives by maintaining comprehensive transaction data. On the legal side, it ensures audit readiness and provides robust documentation for dispute resolution, helping organizations maintain defensible positions in potential conflicts.

Practical UETA Compliance Strategies for LLM Agents

To translate these capabilities into a compliant and user-friendly system, consider the following actionable strategies:

Establish Clear Security Procedures:
Design your system with automated prompts or multi-factor confirmations for high-value or unusual transactions. For example, if an order exceeds a certain threshold, trigger an additional verification step. Document these procedures in your terms of service as evidence of adherence to UETA §10(1).
Provide a Human-in-the-Loop or Escalation Path:
Even though LLM agents operate autonomously, allow for an optional human review on transactions deemed high-risk. This extra layer ensures users have the opportunity to detect and correct errors—fulfilling UETA §10(2).
Implement Transparent, Actionable Prompts:
For every critical step, display clear, unambiguous prompts. For example, before finalizing a high-value transaction, show:
“You are about to purchase 100 self-heating mugs. Confirm or Cancel?”
This confirms that users have a genuine opportunity to reconsider their actions.
Maintain Comprehensive Audit Trails:
Record all user interactions and system responses—including timestamps, unique identifiers, and the exact text of prompts. This not only supports attribution under UETA §9 but also provides critical evidence during dispute resolution.
Highlight Error-Correction Procedures in Your Terms:
While UETA does not allow for waivers of mandatory error correction rights, you can clearly outline the process for reporting and remedying errors. For example:
“If you notice an unintended transaction, please contact us at [Contact Info] within 48 hours. We will investigate and provide instructions for returning goods or funds.”
Stay Vigilant for Regulatory Changes:
Build a modular system that can adapt quickly to evolving legal and regulatory standards. This future-proofs your error handling architecture against potential AI-specific guidelines or enhanced transparency requirements.

Building Error Prevention into LLM Agent Systems

Error prevention is about striking the right balance—ensuring that safeguards are strong enough to prevent mistakes without impeding efficiency. A robust prevention strategy operates on three levels:

The Three Layers of Error Prevention

Pre-Transaction Validation

Pre-transaction validation is the first line of defense. This step ensures that the data input into the system is accurate and that the transaction parameters are valid. Key capabilities include:

Input validation with clear user feedback
Identity and authorization verification
Parameter consistency checks
Contextual consistency assessments

UETA Compliance Note:
UETA Section 10(2) requires that electronic agents offer a genuine opportunity to prevent or correct errors. Robust pre-transaction validation is your first opportunity to satisfy this requirement.

Contextual Analysis

Contextual analysis involves verifying the transaction’s context to ensure it reflects the user’s true intent. For example, consider factors such as: - Transaction timing and sequence
- User history and behavioral patterns
- Environmental or situational factors (e.g., a purchase attempt at an unusual time)
- Cross-transaction dependencies

Example:
If a user typically makes purchases during business hours, a transaction attempted at 3 a.m. might be flagged as unusual. This not only protects the user from unintended transactions but also reinforces that the system is capturing the true intent—an essential element in meeting UETA requirements.

Progressive Confirmation

As transaction complexity increases, so does the need for confirmation. The system should adjust its verification process based on the transaction’s risk level:

This tiered approach ensures that: - Low-risk transactions proceed efficiently. - Higher-risk transactions receive additional scrutiny. - A comprehensive audit trail is maintained for all confirmations.

Error Detection: When Prevention Isn’t Enough

Despite robust prevention measures, errors may still occur. Rapid and accurate detection is essential for mitigating negative impacts.

Detection Mechanisms

Your system should incorporate multiple detection methods to catch errors as soon as they occur:

Rule-Based Detection: Utilizes predefined rules to catch common error patterns.
Anomaly Detection: Uses statistical models or machine learning to identify deviations from typical transaction behavior.
User Feedback: Enables users to quickly report errors when they notice discrepancies.
LLM Validation: Involves cross-checking responses for internal consistency and alignment with the user’s initial intent.
Example: If the agent’s response contradicts earlier confirmations, the system can flag this for review.

Measuring Detection Effectiveness

To ensure your error detection methods are working as intended, monitor these key metrics:

For example, “Detection Speed” can be measured by tracking the time elapsed from when an error occurs to when it is detected.

Designing Effective Error Correction Interfaces for LLM Agents

When errors occur in transactions managed by LLM agents, the correction interface becomes the system’s moment of truth. It must balance ease of use with rigorous compliance. An effective error correction interface should enable users to quickly understand the error, explore correction options, and confirm that the intended changes have been made—all while maintaining detailed records for audit purposes.

The Anatomy of Effective Error Correction

Effective error correction requires a multi-layered approach:

Error Communication: Use plain language to explain what went wrong. For example, rather than showing a cryptic error code, the system might state, “It appears that there was a typo in your credit card number. Please review and correct the digits.”
Correction Options: Offer users clear, actionable choices. For instance, a simple data error (such as an incorrect shipping address) can be corrected via a direct form, while more complex process errors (such as insufficient funds) might trigger a guided workflow.
Verification Steps: Confirm that the corrected information is accurate. This could involve a two-step process or multi-factor verification for high-value transactions.
Resolution Recording: Automatically log the correction process to create an audit trail that demonstrates compliance with UETA’s requirements and ensures transaction finality.

Three Levels of Error Correction

Different types of errors require tailored approaches:

This tiered approach ensures that:

- Simple Data Errors are quickly resolved, keeping the user experience smooth.

- Process Errors are handled with sufficient oversight through guided workflows.

- Complex Errors involving system integration benefit from human intervention, ensuring full documentation and resolution.

LLM-Enhanced Error Correction

LLM agents can improve the error correction process by:

- Generating plain-language explanations to help users understand the error.

- Suggesting likely corrections based on the transaction context.

- Guiding users through multi-step correction workflows.

- Maintaining contextual continuity so that corrections are appropriately applied.

For example, rather than simply alerting the user to an error, the agent might say, “We noticed a potential mismatch in your order details. Would you like to review your shipping address or update your payment method?” Such tailored prompts help ensure that the user can effectively resolve issues while the system logs every step for compliance purposes.

Measuring Correction Effectiveness

To ensure the correction interface works as intended, monitor these key performance metrics:

For example, tracking the “Time to Resolution” metric can help determine whether the correction process is efficient enough to maintain user confidence while providing timely compliance evidence.

Record Keeping: The Foundation of Trust and Compliance

Robust record keeping is critical—not only does it support business process improvements, but it is also essential for meeting legal requirements under UETA. In LLM agent systems, where transactions can be highly dynamic, comprehensive records serve as the backbone for transparency and accountability.

Essential Record Types

Different types of records are necessary to cover all aspects of a transaction:

Each record type provides a unique layer of insight:

- Transaction Records document the details of every interaction.

- Error Logs capture any discrepancies or issues that occur.

- Correction Trails offer a step-by-step account of how errors were resolved.

- System States track the performance and contextual environment at the time of the transaction.

Record Keeping Architecture

A robust record keeping system should incorporate:

Data Integrity:
1. Immutable storage (e.g., any write-once-read-many database will do, or blockchain if you really feel that need)
2. Version control and change tracking
3. Strict access controls
Accessibility:
1. Quick retrieval and searchable archives
2. Support for data export in standardized formats
3. Consistent format preservation to maintain context
Context Preservation:
1. Detailed logs of transaction states, user decisions, and system configurations
2. Mechanisms for preserving the intent behind changes or corrections

Future-Proofing Your Records

As LLM agent systems evolve, record keeping systems must adapt to emerging challenges:

To address these challenges, consider the following best practices:

Record Organization:
Develop clear classification systems, retention policies, and disposal procedures. Regular audits can help ensure that records remain accurate and accessible.
Context Management:
Track decisions, preserve user intent, and document all system changes to create an effective historical record that supports dispute resolution.
Access Control:
Implement role-based permissions, audit trails, and robust security protocols to protect sensitive data and ensure that records can be retrieved efficiently in the event of an audit or legal dispute.

Best Practices for LLM Agent Systems: Beyond Basic Compliance

While UETA provides the legal framework for error handling, truly effective LLM agent systems go well beyond minimal compliance. A robust system not only satisfies legal requirements but also drives business value through superior user experience and operational excellence.

System Design Principles

Adopt these design principles to ensure your LLM agent system remains resilient and adaptable:

Transparency: Ensure that all system processes are visible to users, including error handling and confirmation steps. This not only builds trust but also simplifies regulatory audits.
Predictability: Design processes that behave consistently under similar conditions, reducing unexpected errors.
Adaptability: Build modular architectures that can incorporate new technologies or comply with updated legal standards as they emerge.
Accountability: Maintain thorough records and audit trails to support both internal review and external regulatory scrutiny.

Measuring Success in LLM Agent Systems

Quantitative metrics are essential for evaluating system performance over time:

For instance, a high adoption rate coupled with low dispute frequency suggests that the system is both efficient and legally robust.

Advanced Use Cases and Future Considerations

As LLM agent systems continue to evolve, new challenges and opportunities will emerge. Understanding these future trends is key to staying ahead in the rapidly evolving landscape of automated commerce.

Agent-to-Agent Interactions

The future of automated commerce increasingly involves interactions between autonomous agents. This introduces new technical and legal complexities:

Protocol Standards: Establish clear, standardized protocols for agent-to-agent interactions to ensure smooth operations.
Error Propagation: Implement safeguards that prevent errors from cascading between systems.
Intent Preservation: Use contextual analysis to track and maintain the original intent behind transactions.
Conflict Resolution: Develop frameworks for resolving disputes between agents, thereby minimizing business interruptions.

Evolution of User Intent

Over time, user preferences and behaviors may evolve as use and reliance upon AI agent systems deepens and becomes more complex and integrated. An effective system must adapt without compromising compliance or operational efficiency:

Basic example: An LLM agent that tracks previous purchase behaviors might proactively suggest complementary products. However, it must also ensure that any changes in user intent are clearly documented to avoid misinterpretation of transactions.

Emerging Standards and Future Readiness

To prepare for the evolving landscape of automated transactions, it is essential to monitor emerging standards and align your system accordingly:

Preparing for the Future:
- Design for Evolution: Adopt modular architectures and extensible protocols that can quickly adapt to new standards.
- Plan for Complexity: Incorporate advanced analytics and comprehensive logging to manage increasing transaction volumes.
- Maintain Transparency: Keep detailed, traceable records to support compliance with evolving regulations.

If your organization has the resources and talent to actively participate in relevant standards development, being part of such processes can both ensure awareness/readiness as well as offer the opportunity to help shape future standards.

The Future of Transaction Finality in Agent Systems

A critical challenge for LLM agent systems is ensuring true transaction finality—where errors are not only prevented or corrected but also the final state of a transaction is clearly established and legally binding.

Transaction Finality: The Path Through Error Handling

The challenge of establishing transaction finality in AI agent systems reveals a critical business reality: without proper error handling, there can be no true finality. This isn’t just about good practice—it’s about legal certainty under UETA.

Key Relationships and Roles

Note: In some arrangements, the Third Party may also serve as the Agent Provider, offering an agent for users to interact with their own services.

The Legal Framework for Finality

UETA Section 10(2) provides a crucial right: users can “avoid the effect” of electronic records (essentially reverse transactions) if they weren’t given proper opportunity to prevent or correct errors. This means:

Without robust error handling, there is no true transaction finality
Users retain a statutory right to reverse transactions if proper error prevention/correction wasn’t available
This right cannot be waived by contract or agreement

Practical Implications

For businesses deploying AI agents, this creates a clear imperative. Organizations must first implement strong error prevention mechanisms throughout their transaction flows. They need to provide and document clear error correction pathways that users can easily access and understand. Importantly, they must maintain records of when and how these capabilities were made available to users during each transaction. Only after meeting these requirements can a business confidently establish transaction finality. These aren’t optional best practices—they’re essential steps for achieving legally defensible completion of transactions.

Two Implementation Models

Three-Party Arrangement:
1. User engages with Third Party merchant through Agent Provider’s system
2. Agent Provider implements error handling for both parties
3. Clear documentation of error prevention/correction opportunities
Two-Party Arrangement:
1. Merchant provides agent for users to interact with their own services
2. Merchant directly responsible for error handling
3. Simplified implementation but same legal requirements

The Business Value of True Finality

Implementing proper error handling delivers concrete business value beyond mere legal compliance. When organizations build robust error prevention and correction capabilities into their agent systems, they establish legally defensible transaction finality that protects all parties. This approach significantly reduces the risk of statutory transaction reversals, providing the certainty needed for efficient business operations. It creates clear, documented completion points that support reliable accounting and fulfillment processes. Perhaps most importantly, this framework builds genuine user confidence in automated transactions, paving the way for broader adoption of AI agent systems in commerce.

Understanding Practical Finality

While we speak of achieving “transaction finality” through proper error handling, it’s worth noting that finality in digital transactions is more of a practical business construct than an absolute state. As Patrick McKenzie expertly explains in his analysis of payment systems, true finality is more of a “probability distribution” influenced by technical infrastructure, relationships between parties, and governing laws rather than an absolute condition. For the purposes of AI agent transactions, we’re focused on reaching a clear point where all parties can confidently treat the transaction as complete for practical business purposes—whether that’s booking revenue, initiating fulfillment, or closing the accounting period. This framework of error prevention and correction helps establish that practical finality, even if philosophical arguments about absolute finality remain.

For a fascinating deeper dive into the broader concept of finality in payment systems, see McKenzie’s “Finality does not exist in payments” and I thank Alex Reibman of AgentOps for his feedback on this larger point. While absolute finality in transactions is philosophically complex, for business and legal purposes, the goal is to establish practical finality where transactions are recognized as complete and legally binding. Achieving that practical goal, and adding deeper context on the road ahead, is the purpose of this piece.

A Trust Protocol Stack

This notional “Trust Protocol Stack,” is a way to approaching assurance of transaction finality by integrating multiple layers of assurance:

This layered approach not only enhances confidence in the system but also opens new business models around premium, verified transaction services.

Protocol Standards for the Future

Developing and implementing standardized protocols is essential for future-proofing automated transactions:

Implementation Challenge: Achieving consensus or working agreed practices among stakeholders to ensure business and technical interoperability among different agent platforms, frameworks, or services will be critical in the agent-to-agent transactional context.

Bringing It All Together: A Call to Action

The evolution of LLM agent systems demands that businesses and legal professionals alike view error handling as a strategic investment rather than a regulatory checkbox. The following steps provide a roadmap for organizations looking to lead in this new era of automated commerce:

Key Takeaways

For Business Leaders:
Strategic Investment: Robust error handling drives user trust and creates competitive differentiation.
Innovative Opportunities: Premium verification and advanced correction capabilities open new revenue streams.
Market Leadership: Early adoption of best practices positions your organization at the forefront of automated commerce.
For Legal/Risk Professionals:
Defensible Processes: UETA compliance is a baseline that can be enhanced through transparent, robust error handling.
Clear Documentation: Detailed audit trails and correction records provide strong evidence in dispute resolution.
Regulatory Readiness: A future-proof system is essential for adapting to evolving legal and technological landscapes.

Strategic Implementation Path

Action Steps:
- Assess Your Current State: Conduct a thorough review of your existing error handling capabilities.
- Plan Your Evolution: Identify key enhancement opportunities and set a timeline for implementation (e.g., assess within 30 days, plan within 90 days).
- Implement Changes: Roll out modular improvements, starting with high-risk areas.
- Lead the Change: Engage with industry bodies to help shape future protocol standards.

The Opportunity Ahead

The future of automated commerce hinges on our ability to build transparent, trustworthy systems. By integrating robust error prevention, detection, correction, and record keeping, you not only comply with UETA’s mandatory requirements but also drive user confidence and operational excellence. The time to act is now—embrace these practices and lead the way in a new era of automated transactions.

From Ideas to Reality: A First Look at Autonomous Innovation

Dazza Greenwood — Fri, 17 Jan 2025 10:11:53 GMT

Hey everyone,

I’m excited to share a sneak peek of a project I’ve been deeply involved in – a multi-agent system designed to unlock autonomous innovation. I’ll be demonstrating this system at Davos later this month, and I’m thrilled to give you, my cherished subscribers, an early look!

Before we dive in, I want to extend a huge thank you to everyone who responded to my call on LinkedIn for feedback on this demo. Your insights were invaluable in refining the presentation.

Live Demo from Earlier Today:

What is Agento and GenSpring?

This system, which I’m calling Agento, is all about harnessing the power of AI agents to not just chat, but to actually bring those ideas to life. It’s built on three core innovations:

Modular Architecture: This allows for rapid experimentation and seamless integration of new technologies. Think of it like building with LEGOs – you can easily swap out parts and add new ones without rebuilding the entire structure.
Deep Agent Collaboration: The agents in this system are designed to work together in a way that mirrors successful human collaboration. They can reason deeply about complex problems, provide constructive criticism, and iterate towards a high-quality solution.
GenSpring - The Idea Engine: This is where things get really exciting. The most innovative modules is called GenSpring, which is designed to continuously generate new ideas, evaluates them, and feed the most promising ones into the development pipeline through all the other modules. It’s like having a perpetual brainstorming machine!

Why This Matters

I believe this technology has the potential to revolutionize how we innovate across a wide range of fields. Imagine a future where:

AI agents can autonomously generate and develop solutions to complex problems, like disaster response or affordable housing.
New products and services can be brought to market faster than ever before, thanks to the accelerated innovation cycles enabled by this system.
Organizations can become more agile and adaptable, thanks to the modular architecture and the ability to integrate new technologies seamlessly.

Demo in Action

To complement the conceptual design above, here are key moments from the actual system demonstration. In this practice run, you’ll see:

A slide deck and key talking points outlining this approach to using AI agents.
A demo of the system taking a user-defined goal and breaking it down into an actionable plan.
Multiple agents, powered by models like GPT–4o, Claude 3.5 and Gemini 1.5, working together to refine the plan through a process of revision requests and evaluations.
The importance of clear communication and well-defined evaluation criteria for successful agent collaboration.
The final output in both JSON and Markdown formats, demonstrating the system’s ability to produce structured, machine-readable, and human-readable results.
The role of GenSpring is as a kind of initialization module that can be swapped in instead of the user-defined goal input, so as to enable a fully autonomous general purpose innovation pipeline prototype.

Presentation Deck & Key Moments

These slides and talking points form the foundation of how I am currently communicating the system’s capabilities, novel design, and broader potential:

“What innovative challenges can we solve together?”

"What if AI agents could continuously generate breakthrough ideas AND autonomously develop them into real solutions? I've created a system that does exactly that through three key innovations: First, a modular architecture that enables rapid experimentation and seamless integration of new technologies without disrupting the whole system. Second, a sophisticated approach to AI agent collaboration that enables deep reasoning and effective handling of complex challenges. And third - perhaps most exciting - GenSpring, an 'idea engine' that constantly generates and evaluates new opportunities, feeding promising innovations directly into development."

"This modular architecture isn't just about flexibility - it's about enabling a new paradigm for AI innovation. Each module accepts structured inputs and produces structured outputs, creating clear interfaces where teams can plug in their preferred approaches. This means you can rapidly experiment with different technologies, frameworks, or entirely new approaches without rebuilding the whole system.
But here's where it gets interesting: this modularity ALSO opens the door to something bigger. Any team that believes they have superior agent technology can prove it by taking standard inputs from one module and showing they can produce better outputs. It's an open invitation to demonstrate real capabilities rather than just talk assert the superiority of a given implementation or approach."

"What makes this system unique is how it orchestrates AI conversations in a way that mirrors successful human-AI collaboration patterns. The agents can guide each other back to relevant topics, seek revision of outputs that don't meet quality benchmarks, and engage in deeper reasoning about complex challenges. By finding that crucial balance between steering and enabling, these agent conversations can adapt to tackle virtually any problem. Think of it as creating the conditions for AI creativity to flourish while ensuring the results remain practical and focused.
The power isn't in controlling every interaction, but in establishing the right design patterns for productive collaboration. Think of it as creating the conditions for AI creativity to flourish while ensuring the results remain practical and focused.”

"GenSpring is where this system truly breaks new ground. Imagine having access to a perpetual wellspring of innovative ideas - not just random concepts, but carefully validated opportunities that are novel, useful, and crucially, achievable. This isn't just an idea generator - it's a complete pipeline that continuously identifies promising innovations and filters them through sophisticated analysis to ensure real-world value creation.
What makes GenSpring transformative is its seamless integration with our modular architecture. Each idea is structured precisely to flow into subsequent modules - from detailed planning to implementation, testing, and eventual deployment. As the system runs, successful innovations feed back into the process, creating an ever-evolving fountain of refined, practical solutions."

"The implications of this modular, agent-driven approach extend far beyond any single organization or industry. By establishing clear interfaces for AI systems to exchange value - whether that's ideas, services, or solutions - we're laying the groundwork for an entirely new kind of innovation economy.
Imagine a future where AI-driven companies can seamlessly exchange specialized capabilities, where breakthrough ideas can flow freely between organizations, and where innovation isn't limited by organizational boundaries. This isn't just about accelerating R&D or reducing costs - it's about creating the fundamental infrastructure for a new era of open, collaborative innovation. Just as standardized shipping containers revolutionized global trade, standardized AI interfaces could transform how we create and exchange value in the digital age."

“What innovative challenges can we solve together?”

I’d Love Your Feedback!

This is just a first glimpse, and I’m eager to hear your thoughts. What aspects of the system are most exciting to you? What questions do you have? What potential applications do you see? Let me know in the comments.

Want a One-on-One Demo and Chat?

If you’re a paid subscriber and would like a personalized demo and a chance to discuss this technology further, I’d love to connect! I’d be happy to answer any questions and talk through how these approaches to AI agents could be useful in your contexts. Please reach out to me using this form, https://forms.gle/8LnVNGEs6u9n5UGT6, and be sure to use the same email and name you use for your Substack subscription so I know it’s you.

Looking Ahead

I believe that multi-agent systems like Agento and components like GenSpring have the potential to transform the way we approach innovation. By combining the creative power of AI agents with the structure and rigor of modular design, we can unlock new levels of productivity, problem-solving, and even new value creation. I’m excited to continue developing this technology and exploring its possibilities with you.

Thanks for being a part of this journey!

Best,

Dazza Greenwood

P.S. For a deeper dive into the legal aspects of LLM-powered agents, check out my Stanford CodeX project site: https://law.stanford.edu/codex-the-stanford-center-for-legal-informatics/projects/agentic-genai-transaction-systems/ and the first of three blog posts as part of that research on issues and opportunities for transactional AI Agents is now live at: https://law.stanford.edu/2025/01/14/from-fine-print-to-machine-code-how-ai-agents-are-rewriting-the-rules-of-engagement.

And for insights on empowering consumers with personal AI agents, see these posts I wrote with Consumer Reports Innovation Lab: https://innovation.consumerreports.org/empowering-consumers-with-personal-ai-agents-legal-foundations-and-design-considerations/ and https://innovation.consumerreports.org/engineering-loyalty-by-design-in-agentic-systems/.

Also, earlier today, some MIT colleagues and I published a pre-print of a new research paper on a potential way to use and extend OAuth 2 and OpenID Connect technical specifications to enable “Authenticated Delegation and Authorized AI Agents”. You can learn about that here: https://arxiv.org/abs/2501.09674

When AI Agents Conduct Transactions

Dazza Greenwood — Sat, 23 Nov 2024 00:26:37 GMT

From a business, legal, and technical perspective, there’s no more important LLM agent activity than conducting transactions. As someone deeply involved in crafting the Uniform Electronic Transactions Act (UETA) and a long-time advocate for responsible AI development, I’m struck by how much the world has changed since those UETA drafting meetings. We were grappling with e-commerce back then, but little did we know our work would be so remarkably prescient for today’s LLM agents 25 years later.

Key Terms and Definitions

Before diving into the infrastructure and frameworks that enable AI agent transactions, it’s essential to understand a few key terms and concepts:

Core Concepts

AI Agent: The technology program that autonomously performs tasks and interacts with third parties, in this context, including use of Large Language Models (LLMs)
AI Agent System: The AI agent technology plus the technology provider who operates the agent and acts as an intermediary, forming a legal agency relationship with the user
Agent (Legal): A person or entity authorized to act on behalf of another (the principal)
Principal (Legal): The person or entity for whom an agent acts and who exercises principal authority, some of which can be delegated to the agent
Third Party (Legal): Any person who is a counter-party in a transaction with the agent who is acting on behalf of the principal
Contract: A legally binding agreement between two or more parties
Electronic Contract (UETA): A contract formed through electronic means
Human: A natural person
Organization: A legal entity, such as a corporation, business, or government agency (also known legally as an “artificial person”)

Legal Definitions from UETA

Transaction: “‘Transaction’ means an action or set of actions occurring between two or more persons relating to the conduct of business, commercial, or governmental affairs.” (UETA § 2(16))
Person: “‘Person’ means an individual, corporation, business trust, estate, trust, partnership, limited liability company, association, joint venture, governmental agency, public corporation, or any other legal or commercial entity.” (UETA § 2(12))
Electronic Signature: “‘Electronic signature’ means an electronic sound, symbol, or process attached to or logically associated with a record and executed or adopted by a person with the intent to sign the record.” (UETA § 2(8))
Automated Transaction: (Defined in detail in Legal Framework section below)
Electronic Agent: (Defined in detail in Legal Framework section below)

Digital Identity Concepts

Digital Identity (Wyoming): The intangible digital representation of, by and for a person, over which they have principal authority and through which they intentionally communicate or act. Can be:
- Personal Digital Identity: For individuals
- Organizational Digital Identity: For legal entities (See WY Stat. § 8-1-102(a)(xviii-xix) (2022))
Attribution: The process of establishing that an action or communication originated from a specific person or entity
Impersonation: The act of falsely representing oneself as another person or entity, especially in a digital context. Doing so to commit a crime or fraud carries specific penalties.

Building the Legal Infrastructure: A Bridge to the Future

While use of AI agents is undeniably a novel situation for almost all people at this moment in history, there is an all-but-forgotten existing legal framework that nicely supports and reflects use of this technology, including for transactions.

Back in the late 1990s, I spent nearly two years in drafting meetings for Uniform Electronic Transactions Act (UETA), attending every session but one. During this time, we were grappling with how to create a legal framework that could adapt to the rapid evolution of technology and support the rise of e-commerce. I also co-chaired the American Bar Association group that advised on electronic agents provisions and later testified before Congress on related federal legislation (the E-SIGN Act).

The legal infrastructure we built—UETA and the federal Electronic Signatures in Global and National Commerce Act (E-SIGN)—is like a massive, invisible 50-lane highway bridge supporting today’s digital economy. We designed it with the future in mind, anticipating “lanes” for autonomous agents long before the technology existed. Those seemingly excessive “lanes” are now proving essential.

Well, we suddenly need that bridge to traverse a slightly different type of traffic. Now that we finally have tons of autonomous agents and many people want to deploy them, UETA is like that bridge with perfectly suited lanes for autonomous traffic. Those wide shoulder lanes that have been gathering dust for 25 years are exactly what we need for LLM agents conducting transactions for people and organizations. They just didn’t know it!

The Legal Framework: UETA and Electronic Agents

UETA provides explicit provisions for electronic agents to conduct transactions autonomously. The law defines several key concepts that are remarkably relevant to today’s AI landscape:

Core Definitions

Electronic Agent: “‘Electronic agent’ means a computer program or an electronic or other automated means used independently to initiate an action or respond to electronic records or performances in whole or in part, without review or action by an individual.” (UETA § 2(6))
Automated Transaction: “‘Automated transaction’ means a transaction conducted or performed, in whole or in part, by electronic means or electronic records, in which the acts or records of one or both parties are not reviewed by an individual in the ordinary course in forming a contract, performing under an existing contract, or fulfilling an obligation required by the transaction.” (UETA § 2(2))

Attribution and Legal Effect

The most important concept from these frameworks is attribution. Automated systems that ensure clear attribution to responsible legal persons help avoid an accountability gap for potential harm and damage these systems could cause. The federal ESIGN Act states that electronic agent actions are legally valid “so long as the action of any such electronic agent is legally attributable to the person to be bound.” UETA offers further guidance:

“An electronic record or electronic signature is attributable to a person if it was the act of the person. The act of the person may be shown in any manner, including a showing of the efficacy of any security procedure applied to determine the person to which the electronic record or electronic signature was attributable.” (UETA § 9)

Just as vehicles are required to have clearly visible license plates when they enter upon public roads, we need appropriate measures for attribution of the acts of automated and autonomous systems back to responsible parties.

The Iron Triangle: Principal, Agent, and Third Party

The relationships between users and their AI agents and external parties forms what I call the “iron triangle” of roles:

The Principal (the user/consumer/employee)
The Agent (the intermediary providing the AI agent tech for the Principal/user)
Third Parties (companies or other entities the AI agent interacts with)

The term “agent” itself can cause confusion, holding different meanings in the realms of software development and law. In software, it broadly refers to systems that perform tasks on behalf of users. However, the legal definition is much more specific, encompassing obligations that AI systems alone cannot fulfill. According to the Restatement (Second) of Agency § 1(1) (1958), agency is defined as “the fiduciary relation which results from the manifestation of consent by one person to another that the other shall act on his behalf and subject to his control, and consent by the other so to act.”

That definition might leave you scratching your head! Let’s break it down. In simpler terms, ‘agency’ means one person agrees to act for another, like a personal assistant handling tasks for their boss. It’s about a relationship built on trust, where the ‘agent’ is loyal to the ‘principal’ and follows their instructions. The three fundamental roles, legally, are the principal, the agent, and third parties, with whom the agent interacts on behalf of the principal to get tasks done. You can think of these three roles as a kind of iron triangle. Fiduciary duties owed by agents to principals, like the duty of loyalty, ensure the agent is legally obligated to act in the principal’s best interests. I want to emphasize that both individuals (like in our role as consumers) as well as organizations (operating through employees) using AI agent systems would be wise to prioritize working with fiduciary providers and operators of AI Agent Systems.

Now, consider this legal concept in the context of today’s rapidly evolving AI landscape. AI agents, particularly those powered by large language models (LLMs), are quickly becoming more sophisticated and widely deployed. They’re handling increasingly complex tasks for their users, including making purchases, managing finances, and even making significant decisions with real-world consequences. However, the current models governing these AI-powered interactions are often murky and lack clarity regarding the roles, responsibilities, and legal relationships between all the players involved. This lack of clarity creates uncertainty and potential risks for both consumers and businesses, hindering the widespread adoption and beneficial potential of these powerful tools.

When you rely upon an AI Agent to conduct transactions for you which involve your duty to pay and that form other legal obligations, you should confirm that you are in fact the principal and the provider of the technology has not arrogated the role of principal to itself, leaving you as a user of their system who is relegated to operate under their principal authority. Arguably, the entire framework of hundreds of years of agency law and practice exists to support and advance precisely such relationships of trust and reliance. It is not only reasonable, but recommended, that these frameworks be applied to AI agent intermediated transactons as well, in order to ensure alignment with the user’s interests and expected legal and business relationships and results.

To address this challenge, we can apply the robust legal framework of agency to structure the unique context of AI Agent Systems. By clarifying the roles and relationships of each party involved – the consumer or employee as principal, the intermediary that provides the AI as a tool as Agent– we can create a model that fosters trust, predictability, and accountability. The role of the intermediary combined with the AI Agent can be called an “AI Agent System.” This allows us to build on the iron triangle of agency, leveraging hundreds of years of well-understood precedent. This approach not only provides principals with greater certainty but also empowers third-parties to engage in AI-powered interactions with greater confidence and clarity, unlocking the tremendous benefits of this technology for all.

This structure should be supported by five critical levels of system design:

Governance: Rules and bylaws ensuring transparency and accountability
Data Stewardship: Protection and ethical use of consumer data
Instructions & Tooling: Mechanisms to control and direct agent actions
Agent-to-Agent Communication: Secure interaction protocols (mostly coming soon)
Identity & Payments: Secure verification and transaction processing

Key Considerations for Agent Transactions

Confidentiality and Data Protection

Within the fiduciary model, robust data protection is paramount. The AI Agent System provider has a high duty of care and loyalty to the user, which includes maintaining strict confidentiality of their private information and commercial transactions. This reinforces the trust essential for users to reasonably rely upon AI agents to manage sensitive tasks.

Security and Error Prevention

LLM agents may make unexpected errors when conducting automated transactions. UETA provides a framework for addressing these very issues through specific mechanisms for error prevention and correction. For example:

Security procedures can establish spending limits
Error detection mechanisms can trigger alerts
Failed security procedures may provide grounds for transaction reversal

Fiduciary Duty and Trust

The most compelling use case for AI Agent Systems is their ability to act as fiduciaries, prioritizing user interests above all else. The party providing the AI Agent technology to users, in this context, also forms a legal principal-agent relationship with that user. These agents can be bound by a “duty of loyalty” to their users, creating a trustworthy foundation for autonomous transactions. This fiduciary approach is especially important in the context of transactions, where financial and legal ramifications can be significant.

Parallel Tracks: Individuals and Organizations

These principles apply equally to individuals and organizations using LLM agents. The Wyoming Digital Identity Act provides a framework for recognizing and managing digital identities, further strengthening the legal foundation for AI agent transactions. The Act recognizes this duality:

Personal Digital Identity: “the intangible digital representation of, by and for a natural person…over which he has principal authority” (WY Stat. § 8-1-102(a)(xviii) (2022))
Organizational Digital Identity: “the intangible digital representation of, by and for a corporation, business trust…or any other legal or commercial entity…over which it has principal authority” (WY Stat. § 8-1-102(a)(xix) (2022))

The Act provides strong protections against impersonation, including injunctive relief and the potential for triple damages:

“Any person with a personal or organizational digital identity may proceed by suit to enjoin the use of any impersonations…and may require the defendants to pay to such person all profits derived from or all damages suffered by reason of such wrongful use…the court, in its discretion, may enter judgment for an amount not to exceed three (3) times any profits or damages and reasonable attorneys’ fees…” (WY Stat. § 40-30-103 (2022))

Wyoming statute provides crisp clarity on these specific points, but every state of the US has legal frameworks that can be used in combinations to achieve the same results. While the legal foundations are in place, the field of AI agent transactions is rapidly evolving. Recent developments highlight the growing momentum and practical applications of this technology.

Recent Developments in Agent Transactions: The Stripe Agent Toolkit

The landscape of agent transactions has shifted dramatically with the recent release of Stripe’s Agent Toolkit. This development, from the dominant player in online payments, is poised to accelerate the adoption of AI agents for real-world commerce. This isn’t a future prediction; it’s happening right now. Stripe’s massive reach means this technology will quickly become embedded within the core transactional fabric of the digital economy.

The Stripe Agent Toolkit enables developers to integrate Stripe’s powerful financial services directly into agentic workflows, empowering agents to not just facilitate transactions but to actively participate in them through secure, controlled mechanisms built on Stripe’s robust financial infrastructure.

Key Capabilities

Creating and Managing Stripe Objects
Agents can now programmatically create payment links, manage products and prices, generate invoices, and handle other essential Stripe objects. This streamlines payment workflows and automates key business processes.
Use Cases:
- Generating dynamic payment links for e-commerce purchases
- Creating and managing invoices for freelancers
- Automating product catalog management
- Streamlining customer support workflows
Metered Billing (Usage-Based Billing)
Businesses can easily implement usage-based pricing for their agent services, tracking and charging customers based on metrics like token counts or execution time. This opens up new possibilities for monetizing AI agent platforms.
Use Cases:
- Billing for chatbot usage (messages or tokens)
- Charging for API calls
- Tracking and billing agent execution time
- Usage-based pricing for AI services
Online Purchasing with Stripe Issuing
Perhaps the most transformative capability, agents can now generate single-use virtual cards to make purchases online. This eliminates the need for consumers to share their primary card details with multiple merchants, significantly enhancing security while streamlining procurement processes.
Use Cases:
- Automating travel booking with controlled spending limits
- Managing company expenses through virtual cards
- Dynamically managing ad campaign budgets
- Secure online purchasing with transaction-specific cards

Technical Implementation

The toolkit is designed for broad compatibility and ease of integration:

Framework Support: Native support for popular agent frameworks including LangChain, CrewAI, and Vercel’s AI SDK
Language Options: Available in both Python and TypeScript
LLM Compatibility: Works with any LLM provider that supports function calling
Security Controls: Fine-grained access control through configurable actions
Error Prevention: Built-in safeguards and monitoring capabilities

Stripe is known for its excellent developer documentation and support, making the integration process even smoother. For detailed implementation guidance, the Stripe documentation provides comprehensive examples and best practices.

Integration Examples

Here are two practical examples of how the Stripe Agent Toolkit enables sophisticated transaction scenarios:

Consumer Purchase via Intermediary Service

A consumer uses a shopping agent service to find and purchase products. The agent searches for the best deals, and upon consumer approval, completes the purchase using a virtual card issued by Stripe through the intermediary service.

Key Components:

Consumer-facing interface (app/website)
AI shopping agent powered by LLMs
Stripe Agent Toolkit integration
Virtual card issuance for secure purchases
Order tracking and fulfillment

Employee Procurement System

An employee uses a company-provided procurement tool (powered by an LLM agent) to purchase office supplies. The agent identifies approved vendors and products, and after employee confirmation, completes the purchase using a virtual card issued by Stripe.

Key Components:

Company intranet/procurement portal
AI procurement agent with policy enforcement
Stripe Agent Toolkit integration
Automated budget tracking and reporting
Integration with accounting systems

These developments represent a significant step forward in making agent transactions practical and secure for both consumers and businesses. The Stripe Agent Toolkit provides the crucial infrastructure needed to bridge the gap between AI agents and real-world financial transactions.

Perplexity’s Direct-to-Consumer Shopping Agent

Just days after Stripe’s announcement, Perplexity introduced a new AI-powered ecommerce feature called “Buy with Pro,” marking another significant milestone in agent transactions. While Stripe enables developers to build agent-powered commerce solutions, Perplexity is taking a direct-to-consumer approach, offering U.S. Pro users the ability to purchase items through their AI agent without visiting retailer websites.

Key Features of “Buy with Pro”

One-Click Checkout: Users can store their billing and shipping information securely within Perplexity, enabling them to complete purchases with a single click. This streamlined process includes automatic tax calculations based on the user’s address. Unlike the Stripe agent API, this new Perplexity shopping agent is provided direct-to-consumer by Perplexity itself, and it will conduct the transaction on behalf of the user including handling payment.

Here are the key business components of this new agent transaction service:

Free Shipping: Pro subscribers benefit from free shipping on all purchases made through the “Buy with Pro” feature.
Visual Product Cards: For shopping-related queries, Perplexity displays visual cards that provide detailed product information, including pricing, seller details, and pros and cons. These cards are designed to offer unbiased recommendations without sponsored content.
Snap to Shop: This visual search tool allows users to upload a photo of a product they are interested in. Perplexity then identifies and displays similar items available for purchase, enhancing the shopping experience even when users lack specific product names or descriptions.
Integration with Shopify: By integrating Shopify’s API, Perplexity gains access to a wide range of products and merchants, allowing it to provide comprehensive shopping options directly within its platform.

This new feature positions Perplexity as a competitor to major ecommerce platforms like Amazon and Google Shopping by offering a seamless shopping experience directly through its AI search engine. The company is currently focusing on growing its search query volume rather than monetizing this feature immediately, with advertising business remaining the primary revenue stream focus.

The Inflection Point

Between the Stripe Agent API and Perplexity’s shopping agent both launching within the last week (as of November 20, 2024), it is clear that transactional AI agents are no longer a future possibility but have reached broad scale availability. These complementary approaches - Stripe’s developer toolkit and Perplexity’s direct-to-consumer service - demonstrate how quickly this technology is being commercialized and made available at population-scale.

The Future of Agent Transactions

As transactional AI agent technology matures, two key areas (among others) that will shape its evolution are:

The development of common protocols for agent-to-agent communication, enabling seamless and efficient automated transactions
Sophisticated mechanisms for managing the delegation of authority from the principal user to the AI agent, balancing automation with user control

This will ensure that agents act within clearly defined boundaries while maximizing their utility. The foundation we laid with UETA has proven remarkably prescient, providing crucial guardrails for responsible innovation while protecting user interests. The challenge now is to build upon this foundation, creating systems that maintain trust while unleashing the transformative potential of autonomous agents.

Note: This is a beta version preview of materials that will be released shortly on my new site OnAgents.org site, so check there for the most up to date versions of this and other AI Agent topics.

Also: For more detailed discussion, including the role of zero-knowledge proofs and other emerging legal considerations for transactional AI agents, standby for an upcoming white paper I’m co-authoring with the ever-awesome Diana Stern, titled: “From Fine Print to Machine Code: How AI Agents are Rewriting the Rules of Engagement”. I’ll post a link to it here in the OnAgents section of DazzaGreenwood.com.

Empowering Consumers with Personal AI

Dazza Greenwood — Sat, 19 Oct 2024 23:09:06 GMT

I wrote this post for the Consumer Reports Innovation Lab, where it was published on October 18, 2024

The marketplace for AI-powered personal agents is rapidly evolving. Companies like Amazon and Salesforce are already offering services that help consumers navigate online shopping, manage subscriptions, and automate routine tasks. These developments signal a shift in how we interact with digital services and make purchasing decisions.

Consumer Reports is exploring the potential for developing pro-consumer AI agents that prioritize user interests above all else. This approach comes with unique legal and design challenges that set it apart from purely commercial offerings.

The idea of using such agents has the potential to fundamentally reshape how consumers use their data, navigate complex services, and make decisions. If developed thoughtfully, these agents could safeguard privacy and act as trusted intermediaries. While there are many interesting questions of law and practice to enable and safeguard this, thankfully, existing legal frameworks have already begun to anticipate and support such innovations.

In this post, I’ll examine three key areas:

The existing legal framework that supports the use of AI agents for transactions;
Design paths for creating truly user-centric AI agents; and
The potential impact of these agents on consumer empowerment in the digital marketplace.

By understanding these foundations, we can work towards AI agents that genuinely serve consumers’ best interests.

The Forgotten Framework: UETA and the Rise of LLM Agents

For decades, the law has envisioned a world where electronic agents can represent us and act on our behalf. In 1999, the Uniform Electronic Transactions Act (UETA) laid out a framework for e-commerce. UETA was created to address the legal uncertainties surrounding electronic transactions and to provide a consistent framework across states.

This law is the very reason we can confidently use electronic signatures and contracts in our daily digital interactions. It is a cornerstone of the information age, providing legal certainty for online commerce and other electronic transactions. More to the point, UETA provides explicit provisions for electronic agents to conduct transactions autonomously.

This uniform law has been adopted across the United States, statutorily enacted in 52 states and territories, and truly is the law of the land. This legal foundation provides clear definitions and applicable rules for concepts like electronic signatures, automated transactions, and attribution for the acts of autonomous agents. Fast forward to today, and this vision can finally come to life impactfully by supporting the use of new software services, including advanced AI assistants and LLM-based agentic software applications, including for individuals.

This existing legal foundation provides clear definitions and rules for key concepts:

Electronic Agent: “A computer program or an electronic or other automated means used independently to initiate an action or respond to electronic records or performances in whole or in part, without review or action by an individual.” This definition perfectly describes the capabilities of LLM-powered AI agents.
Automated Transaction: “A transaction conducted or performed, in whole or in part, by electronic means or electronic records, in which the acts or records of one or both parties are not reviewed by an individual in the ordinary course in forming a contract, performing under an existing contract, or fulfilling an obligation required by the transaction.” This clarifies the legal validity of agent-led transactions.
Attribution: UETA also establishes how to determine on whose behalf an electronic agent is operating, ensuring accountability. Essentially, under UETA, an electronic record or signature is attributable to a person if it was the act of that person, which can be shown in any manner, including the efficacy of any security procedures applied.

LLM agents may make unexpected errors when conducting automated transactions, which is a significant concern. UETA establishes a framework for error prevention and correction, particularly emphasizing agreed-upon security procedures. For instance, a consumer and an online retailer could establish a spending limit for the consumer’s AI agent. If the agent attempts to exceed this limit, the security procedure would trigger an alert, preventing the error.

Importantly, if a merchant fails to implement an agreed-upon security procedure and an error occurs, UETA provides the consumer with the right to reverse the transaction. Conversely, if the retailer fails to implement an agreed-upon security procedure, such as verifying the purchase amount with the consumer before finalizing the transaction, and the agent makes a purchase beyond the agreed-upon limit, UETA could provide the consumer with legal grounds to reverse the transaction and recoup the excess funds.

With the emergence of AI agents, we now have the technology capable of meaningfully fulfilling UETA’s vision. These agents can communicate in natural language, negotiate, retrieve information, and even execute decisions—but critically, they can also be built to operate on behalf of the consumer, avoiding conflicting interests. Rather than invent new legal frameworks, we can leverage and extend existing ones like UETA to achieve predictable legal outcomes and accelerate the responsible development of personal AI.

For example, imagine a personal AI agent negotiating a better price for a subscription service on your behalf. Under UETA, this automated transaction would be legally binding, just as if you had negotiated it yourself.

Beyond price negotiation, such agents could automatically handle your insurance claims, gather quotes for home repairs, or even help you manage your investments according to your risk tolerance. Imagine receiving proactive alerts from your AI agent about better deals on services you frequently use or having it automatically adjust your utility plans based on your actual consumption patterns to save you money. These examples illustrate the potential of personal AI agents to simplify our lives and give us more control over our interactions with complex systems.

Such capabilities make LLM agents uniquely suited to leverage the legal framework established by UETA and extend it to new domains of personal empowerment.

Being Loyal: Building Agents That Work for You

While UETA provides the legal foundation for AI agents, the next step is to ensure these agents can operate securely, reliably, and in alignment with the user’s interests. The most compelling use case for personal AI agents is their ability to advocate on behalf of consumers without bias or conflicting interests. Unlike AI systems embedded within purely profit-seeking enterprises, or to advance a commercial objective or to fulfill a narrow “customer service” framework, personal AI agents could be entrusted with a “duty of loyalty” that binds the service provider to operate the agent in the best interests of the user. These agents could manage tasks like travel bookings or e-commerce purchases with the same trustworthiness as a high-end fiduciary representative, advocating only your interests.

Robust encryption, privacy standards, and transparent data stewardship practices could bolster this trustworthiness. These types of measures begin with terms of service and governance-based assurance, and that are also be encoded into the design of the system. To achieve this level of trust, AI agents must also implement clear attribution mechanisms. This means that any action the agent takes can be reliably traced back to the user, establishing accountability and legal responsibility.

Looking ahead, it’s crucial to consider how AI agents will interact not just with traditional online systems, but where AI agents negotiate and transact with each other on our behalf. This could lead to a more efficient and potentially fairer marketplace. For example, your personal AI agent could automatically negotiate the best price for a product or service by interacting with the AI agents of multiple vendors, comparing offers, and securing the most favorable terms, all while adhering to your pre-defined preferences and limits. Furthermore, exploring concepts like delegation of authority in multi-agent systems can pave the way for even more powerful consumer empowerment tools. While LLM agents can interact with natural language and web-based systems surprisingly well, eventual high-velocity agent-to-agent transactions would require the development of common protocols and standards for inter-agent communication and negotiation. However such standards are not needed to use LLM agents with existing online services and platforms. UETA envisions and supports transactions with one electronic agent, two electronic agents, or large numbers of electronic agents. There is room to grow under the existing law.

With this robust legal foundation in place, the next challenge is to design AI agents that not only comply with these laws but also operate effectively on behalf of consumers.

Design Paths for Consumer AI Agents

Designing consumer AI agents requires a thoughtful approach to balancing security, user experience, and legal or regulatory considerations. Consider the following three potential models, each with its own advantages and challenges:

Full Authentication Model whereby the agent uses the same authentication and authorization credentials as its user;
Intermediary Model, whereby the agent is operated by another party who uses the agent to act on the user’s behalf; and the
Decentralized Identity Model, whereby the agent leverages decentralized identifiers and verifiable credentials to interact with third parties, giving users direct control over their digital identity.

Each model presents unique trade-offs and aligns differently with user trust, system complexity, and risk frameworks. Let’s examine each of these design paths in more detail, considering their strengths, weaknesses, and potential applications in the context of consumer AI agents.

Full Authentication

The Full Authentication path positions the AI agent as a direct extension of the user. By acting with the user’s authorization and utilizing their credentials and permissions, the agent can access online platforms, add items to shopping carts, compare prices, and even complete purchases autonomously according to pre-defined rules. The main strength of this approach is its simplicity. It uses current technology to enable the AI agent to perform various tasks without requiring companies to develop new infrastructure or protocols. This seamless interaction is achieved through existing standards, making it easy to deploy and integrate.

To effectively execute this, the agent would need capabilities to interact using the same interfaces that would be made available to an authenticated user.

However, this approach also carries significant security risks, as it requires granting the agent extensive access to user accounts and credentials. There are also substantial risks in terms of data stewardship and liability (and it’s for this reason regulators have discouraged the use of screen scraping and are encouraging the development of more secure interfaces for third-parties to authenticate on a users’ behalf).

For instance, if the agent misuses the data or performs unintended actions, it may be unclear who should be held responsible—the user, the agent provider, or the third party with whom the agent transacted. Moreover, compliance with data protection and privacy regulations like GDPR or CCPA is more challenging to implement because the agent’s full access could potentially implicate user data rights. This ambiguity can hinder adoption, as users and companies may be reluctant to grant the required level of permissions.

Intermediary Path

In contrast, the Intermediary path positions the AI agent as part of a distinct entity that acts as a negotiator or advocate on behalf of the user. Instead of using the user’s credentials, the agent operates under its own identity and permissions, creating a clear separation between the user and the agent service provider. Here, the agent is provided to the user by another party, such as a consumer group or other service provider, and is designed to operate on behalf of the user with third parties like online vendors.

In this setup, the agent operates under a set of rules that define its role, allowing it to handle transactions, share specific data points, and communicate the user’s preferences in a controlled manner. This granular control may empower users with greater agency over their data and privacy. To enable this, the Intermediary path would require new protocols or handshake mechanisms to establish the agent’s legitimacy and scope of authority with third parties like online merchants and other organizations the user seeks to transact with.

To function effectively as an intermediary, the agent could leverage existing standards like OAuth 2 and OpenID Connect, but in a different way. In effect, the intermediary acts as an authorized application of the consumer, with permissions explicitly granted by the user to take specific actions on their behalf. This means the intermediary holds tokens or authorizations that permit it to execute tasks as the user’s representative without the agent ever directly holding the user’s core credentials. This model maintains a clear distinction, allowing the intermediary to act independently while still adhering to permissions that have been transparently defined and authorized by the user.

This approach offers a range of advantages, primarily stemming from the clear delineation of roles and responsibilities, which helps simplify accountability and can foster greater trust. In a technical legal sense, the party providing the agent service would be the legal “agent” of the user in this case, greatly clarifying the roles and relationships with the user (who would be the “principal,” legally) and third parties with whom transactions are conducted. In simpler terms, this means the organization providing the AI agent service could be legally responsible for the agent’s actions, because that organization is the legal agent and may owe the user a duty of care to act reasonably and competently.

This separation clarifies liability and simplifies compliance with data protection and other laws. Furthermore, the agent can engage in advanced activities such as dynamic pricing negotiations or crafting customized agreements with service providers, offering enhanced value to the user. However, a significant challenge in adopting the Intermediary path lies in the need for standardization. Creating the necessary infrastructure and achieving industry-wide consensus on configuring existing protocols in new ways (eg, to support an authorized agent role with standards like OAuth 2 and OpenID Connect) and filling the remaining gap with new protocols involves substantial coordination and time, making this a more complex and long-term solution.

Decentralized Identity: A Glimpse into the Future?

Looking further ahead, decentralized identity systems offer an intriguing possibility. Decentralized identity approaches enable users to control and selectively share their data with service providers and other third parties through verifiable credentials, theoretically eliminating the need for centralized authentication. This approach aligns well with the goals of personal AI agents, empowering users with granular control and principal authority over their digital identities and interactions. While still in its early stages, decentralized identity technology holds some potential for shaping the future of consumer AI agents. However, the novel technologies and consequent switching costs for all the parties involved—especially online merchants and other organizations the consumer wishes to interact with—would be considerable. Therefore, while promising, this remains a more speculative and longer-term potential path that calls for continued innovation and collaboration.

Ultimately, these three design paths offer varying levels and pathways of control, security, and functionality. The decision on which path to adopt will depend heavily on the use case, user expectations, and industry acceptance. While the Full Authentication path is practical for quick adoption and basic tasks, the Intermediary path offers a higher level of security and compliance at the cost of complexity, while Decentralized Identity remains, for the moment, even more complex and speculative. Continued research and development are crucial to address the inherent challenges of each path and unlock the full potential of consumer AI agents.

The Future of Personal AI Agents: Reimagining Consumer Empowerment

The implications of LLM agents for consumer empowerment are profound. If built with the right legal and technical safeguards, they could shift the balance of power, allowing individuals to navigate complex systems—whether financial, commercial, legal, or social—with an AI working solely in their interests. These agents could help consumers make informed choices, protect their privacy, and advocate for their needs in previously impossible ways.

The existing legal framework, starting with UETA, provides a solid foundation on which to build. By leveraging this legal basis and focusing on designing AI agents that align with consumer interests, we can create technologies that empower consumers, giving them tools to engage in the digital world with confidence and consumer-directed autonomy. Understanding this legal foundation allows us to explore how AI agents can be designed to prioritize the consumer’s interests.

Personal AI agents represent a significant shift in how consumers can interact with digital services. By leveraging existing legal frameworks like UETA and focusing on consumer-centric design, we can create AI systems that truly empower individuals. As we move forward, collaboration among technologists, legal experts, policymakers, and consumer advocates will be crucial to ensure these agents are developed securely, reliably, and responsibly. The potential for personal AI agents to level the playing field for consumers in the digital landscape is immense, making this an exciting and important area for continued innovation and development.

In the next blog post, I will delve deeper into how fiduciary duties, especially the duty of loyalty, could serve as a powerful model for AI agents acting and transacting in the interest of consumers, distinct from the interests of merchants and other counterparties to transactions.

Leaping the Uncanny Valley

Dazza Greenwood — Tue, 01 Oct 2024 04:46:39 GMT

The world of AI is moving at an incredible pace, and it can feel overwhelming to keep up with the constant stream of new developments. But every once in a while, a technology comes along that genuinely captures my attention, not just for its novelty, but for its potential to fundamentally change how we work and think. NotebookLM is one of those technologies.

It's not just about boosting productivity, though it certainly does that. For me, NotebookLM unlocks new levels of creativity and insight that were simply impossible before. As someone who constantly grapples with massive amounts of complex information—legal documents, research papers, data sets—I'm always searching for tools that can help me synthesize, analyze, and ultimately understand that information on a deeper level. NotebookLM is a game-changer for serious thinkers and doers. It's like having a super-powered research assistant working alongside you, helping you to dig through data, analyze arguments, and ultimately, think faster and better. Here’s a complete (as of the date of this post) collection of NotebookLM documentation you can scan to get a quick look at what it does and how to use it.

But what truly blew my mind is NotebookLM's AI-generated podcast feature. Initially, I dismissed it as a cool party trick, but after experiencing the quality firsthand, I can confidently say it's astounding. The two-host audio conversations are not just "good for AI," they're genuinely good – surpassing the vast majority of human-produced podcasts. They've completely transcended the uncanny valley – that eerie feeling you get when encountering AI that is almost human but not quite, leaving you with a sense of unease – delivering a listening experience that's both engaging and enjoyable. Most importantly, the underlying intelligence does a great job of surfacing and synthesizing the important points, perpectives, and even questions posed by the source materials you feed it. So the podcast ends up being astonishingly on-point.

This is particularly remarkable because the AI doesn't just mimic human speech, it goes through a process of drafting, revising, and refining its content, just like a human writer. It even throws in those little pauses and "ums" that make a conversation sound natural. The result is a clear, concise audio summary that feels like you're listening to a conversation between two colleagues who have a deep understanding of the topic at hand.

The applications for this technology are endless. Imagine students getting custom audio explainers tailored to their learning styles, professionals getting up to speed on a new topic during their commute, or even families having deeper, more meaningful conversations guided by evidence and diverse viewpoints. This is the kind of future that NotebookLM is making possible.

I've been experimenting with the podcast feature in some creative ways, by adding custom instructions to steer the content and make specific points. It's incredible to see how responsive the AI is to these prompts. It's like having a personalized audio production team at your fingertips. I just made a podcast about NotebookLM (embedded at the top of this post) as an example.

It's not about replacing human potential, it's about amplifying it. As I often say to the professionals I train, "If you're not using AI to enhance your productivity and creativity, you're falling behind." NotebookLM is a must-have tool for anyone who wants to stay ahead of the curve.

Legislative Hearing on LLM Agents

Dazza Greenwood — Tue, 17 Sep 2024 07:00:10 GMT

Earlier today I was thrilled to organize an experts panel to brief the Wyoming legislature on the state of LLM Agents. The presentations and discussion provide an up to date overview of this important technology and raise some of the legal, policy, and governance challenges and opportunities arising from this innovation.

My own presentation begins at 1:08:41 but I commend the entire hearing panel for your review and consideration.

Panelists:

Dazza Greenwood, https://www.linkedin.com/in/dazzagreenwood/
Alex Reibman, https://www.linkedin.com/in/alex-reibman-67951589/
Campbell Hutcheson, https://www.linkedin.com/in/campbell-hutcheson-80409a83/
Anh Mac, https://www.linkedin.com/in/anh-mac/
Nam Nguyen, https://www.linkedin.com/in/hoangnamm21/

Co-Chairs:

Chris Rothfuss, Senate Co-Chair, https://en.wikipedia.org/wiki/Chris_Rothfuss
Cyrus Western, House Co-Chair, https://en.wikipedia.org/wiki/Cyrus_Western

Self-Designed AI: Introducing Automated Agent Creation

Dazza Greenwood — Mon, 19 Aug 2024 05:50:03 GMT

We’re living in the age of incredibly powerful Large Language Models (LLMs), but even the most sophisticated LLMs need structure and guidance to reliably solve complex problems. That’s where agentic systems come in. Think of them as frameworks built around LLMs, incorporating things like planning, tool use, and self-reflection to take the rights actions and achieve your goal.

Up until now, building these agentic systems has been a painstaking, manual process. Researchers and engineers have had to meticulously hand-craft each component, experiment with different combinations, and rigorously configure for specific tasks. It’s a time-consuming bottleneck in the development of truly powerful LLM-based agents.

But what if we could automate this design process? What if we could let AI design the AI? That’s the audacious goal of a new research area called Automated Design of Agentic Systems (ADAS).

How ADAS Works: AI Coding AI

The key insight behind ADAS is to use code as the design language for agentic systems. This leverages a few powerful ideas:

Turing Completeness: Programming languages are “Turing Complete,” meaning they can theoretically represent any computational process – including the intricate designs of agentic systems.
LLM Coding Proficiency: Modern LLMs are becoming increasingly adept at writing and understanding code, making them ideal candidates for automating agent design.

Imagine a “meta agent” – an automated LLM-based process specifically designed to identify and create new agents. It iteratively creates agents in code, tests them on specific tasks, learns from the results, and stores successful designs in an “archive” for future inspiration. This process, called Meta Agent Search, mimics the way human researchers iterate and build upon previous discoveries. Check out their GitHub repo and see how it works for yourself.

The Surprising Results: Learned Agents Outshine Hand-Designed Ones

The early results of ADAS are remarkable. In experiments across various domains, including logic puzzles, reading comprehension, math, and even multi-task problem solving, learned agents consistently outperform state-of-the-art hand-designed agents.

Even more surprisingly, these learned agents show a remarkable ability to generalize. One striking example is how an agent initially designed for solving complex math problems was able to transfer to reading comprehension tasks, maintaining competitive performance. This cross-domain generalization highlights the robustness of the agent designs uncovered by ADAS. An agent designed to solve math problems can be transferred to reading comprehension tasks and still achieve competitive performance. This suggests that ADAS is uncovering fundamental design patterns that transcend individual domains.

Implications and The Future

The research into ADAS is just beginning, but it holds the promise of turbo-charging how we create and deploy LLM-based agents. It’s a powerful example of how AI can not only solve problems but also design the solutions to those problems – a glimpse into a future where AI systems become increasingly self-sufficient and capable of shaping their own evolution.

ABA’s Landmark Opinion on Generative AI

Dazza Greenwood — Tue, 30 Jul 2024 20:38:26 GMT

Yesterday, the American Bar Association (ABA) took a significant step forward in addressing the role of artificial intelligence in the legal profession. On July 29, 2024, the ABA released Formal Opinion 512, providing thoughtful and comprehensive ethics guidance on the use of “Generative Artificial Intelligence Tools” in legal practice. This important opinion represents a pivotal moment in the U.S. legal landscape, signaling a growing recognition of generative AI as a valuable and beneficial technology for the practice of law.

A Shift in Perspective

The ABA’s new guidance marks an important shift in how the legal profession views generative AI. While not explicitly mandating its use, the opinion certainly suggests that understanding and potentially utilizing generative AI tools is becoming increasingly important for competent legal practice. This perspective aligns with the evolving nature of legal technology competence, drawing parallels to how use of email, computerized legal research, and eDiscovery have become standard skills in the lawyer’s arsenal of tool use.

Recognizing the Benefits

Formal Opinion 512 acknowledges the potential of generative AI to enhance both the efficiency and quality of legal services. By highlighting these benefits, the ABA is effectively encouraging lawyers to explore and consider how these tools might improve their practice and better serve their clients. This recognition is a clear indication that the legal profession is moving towards embracing innovative technologies rather than viewing them primarily with skepticism.

Balancing Innovation and Ethics

While the opinion is forward-thinking in its approach to generative AI, it appropriately emphasizes the importance of responsible use. The guidance carefully outlines how existing ethical rules apply to this new technology, ensuring that the core values of the legal profession are maintained even as new tools are adopted. This balanced approach demonstrates the ABA’s commitment to fostering innovation while upholding the highest standards of professional conduct.

Ethical Considerations

The ABA Formal Opinion 512 outlines several crucial ethical considerations for lawyers using generative artificial intelligence (GAI) tools in legal practice. These considerations include maintaining competence, ensuring confidentiality, proper communication with clients, and upholding supervisory responsibilities. Below are the key points and recommendations for alignment with the opinion:

Competence:

Lawyers must have a reasonable understanding of the capabilities and limitations of the GAI tools they use. This includes understanding the potential for inaccurate outputs, such as hallucinations or biased content, due to the underlying data or algorithms. Lawyers must independently verify and review the accuracy of GAI outputs and should not rely solely on these tools without applying their professional judgment. Continuous learning and staying updated with advancements in GAI technology are necessary to maintain competence.

Confidentiality:

Protecting client information is paramount when using GAI tools. Lawyers must evaluate the risks of unauthorized disclosure or access, particularly when using self-learning GAI tools. These tools can potentially expose client information in unintended ways, necessitating informed consent before inputting sensitive data. The opinion emphasizes that informed consent must be specific and clear, detailing the risks and benefits of using such tools. General boilerplate provisions in engagement letters are insufficient for this purpose.

Communication:

Lawyers are required to inform clients about the use of GAI tools when it impacts the representation. This includes situations where the use of GAI affects fees, decision-making processes, or significantly influences case outcomes. Disclosure is also necessary if clients inquire about the use of these tools. Lawyers must provide adequate explanations to enable clients to make informed decisions, adhering to Model Rule 1.4.

Supervisory Responsibilities:

Supervisory lawyers must implement policies and training programs to ensure the ethical use of GAI tools within their firms. This includes overseeing both lawyers and non-lawyers to ensure compliance with professional standards. Training should cover the ethical and practical aspects of using GAI tools, including data security, privacy, and the limitations of these technologies. Supervisors must ensure that any use of GAI tools by non-lawyers aligns with ethical guidelines and does not compromise client confidentiality or the quality of legal services.

Fees:

Lawyers must charge reasonable fees for the use of GAI tools, clearly communicating the basis for these charges to clients. They cannot bill clients for time spent learning to use GAI tools unless specifically agreed upon. If a GAI tool is used to expedite tasks, the fees must reflect the actual time spent and the efficiency gained. Disbursements related to GAI tools must be reasonable and transparently communicated, avoiding any additional profit beyond the actual cost incurred.

These ethical considerations underscore the importance of responsible and transparent use of GAI tools in legal practice. The ABA’s guidance helps ensure that the adoption of these technologies enhances legal services while maintaining the profession’s highest ethical standards.

It is especially encouraging to see the explicit recognition of the need for continuous vigilance given the dynamic evolution of this technology. The opinion holds that lawyers must stay updated with technological advancements and ethical standards to provide competent legal services and, critically, that further guidance is anticipated as GAI tools and their applications evolve.

Building on State-Level and MIT Initiatives

It’s noteworthy that the ABA’s guidance specifically cites the exemplary work done by state bar associations that have released rules and ethics opinion on the topic. This acknowledgment reflects a growing consensus across the legal community about the importance of addressing generative AI in legal practice. Moreover, it’s encouraging to see that ideas and approaches originating from initiatives like the MIT Task Force on Responsible Use of Generative AI for Law are now being more fully integrated into mainstream legal thinking.

The ABA’s Formal Opinion 512 represents a significant milestone in the legal profession’s journey towards embracing generative AI. By providing clear, thoughtful guidance on how to apply existing ethical rules to this new technology, the ABA is not standing in the way for lawyers to responsibly harness the power of AI to enhance their practice and better serve their clients. As the legal landscape continues to evolve, this opinion will undoubtedly serve as a crucial reference point for lawyers navigating the exciting intersection of law and artificial intelligence.

Testimony on Agentic AI Systems and Automated Decision Making

Dazza Greenwood — Tue, 02 Jul 2024 18:20:09 GMT

Yesterday I testified again to a Select Committee of the Wyoming legislature, led by Co-Chairs Senator Rothfuss and Representative Western, on the topic of automated decision making technology in the context of generative AI. We delved into the use of large language models as “agents” who can operate and even conduct transactions on behalf of individuals and organizations. I’m delighted to say this topic will continue to be explored through one of the Select Committees informal drafting groups, resulting in a deeper discussion and perhaps draft legislation at their next hearing this coming autumn.

The testimony can be found here, and is embedded below.

The full hearing (which covered several interesting topics) can be found here.

Redefining 'Ordinary Meaning': GenAI and Legal Language

Dazza Greenwood — Wed, 29 May 2024 22:11:43 GMT

Can generative artificial intelligence help us understand what words mean in legal disputes? Published yesterday, a federal judge's unusual concurring opinion suggested it might be possible—and sparked a fascinating conversation about the future of legal interpretation.

A "Landscaping" Conundrum

The case itself involved an insurance dispute hinging on whether installing an in-ground trampoline qualified as "landscaping." Sounds simple, right? But as Judge Kevin Newsom of the Eleventh Circuit Court of Appeals discovered, pinpointing the "ordinary meaning" of even seemingly straightforward terms can be trickier than it seems.

Judge Newsom, a self-described "plain-language guy," dutifully consulted his dictionaries. Yet, the definitions he found felt incomplete, failing to fully capture the essence of how the word "landscaping" is used in everyday life. He even examined photos of the trampoline installation, his intuition telling him it didn't quite fit the bill. But he struggled to articulate *why*. As he noted, “Nothing in them really struck me as particularly 'landscaping'-y.”

From Dictionaries to Doubt...and an AI Assist

Enter ChatGPT. Out of frustration, Judge Newsom decided—almost as a joke—to ask the AI chatbot for its take on the meaning of "landscaping." To his surprise, ChatGPT delivered a reasoned definition that resonated with his own developing understanding of the term. "Perhaps in a fit of frustration, and most definitely on what can only be described as a lark, I said to one of my clerks, ‘I wonder what ChatGPT thinks about all this.’”

Intrigued, Judge Newsom ventured further. What did ChatGPT think about the trampoline? Could *that* be considered "landscaping"? The AI's answer—a confident "yes," backed by logical explanations—pushed Judge Newsom to consider the potential of this technology in a whole new light. ChatGPT responded: "Yes, installing an in-ground trampoline can be considered a part of landscaping... It’s a deliberate change to the outdoor environment, often aimed at enhancing the overall landscape and usability of the area."

The Promise (and Perils) of AI-Powered Interpretation

Judge Newsom suggests that the appeal lies in AI's ability to tap into something fundamental about legal interpretation: the importance of understanding how words are *actually* used by ordinary people. Large language models (LLMs), trained on vast datasets of online text, offer a potential window into this everyday usage. As he puts it, “The ordinary-meaning rule... has always emphasized 'common language', 'common speech', and 'common parlance'—in short, as I’ve explained it elsewhere, 'how people talk.'”

Moreover, unlike static dictionary definitions, AI can analyze language in context, recognizing nuances and shades of meaning that traditional methods might miss. This "contextual intelligence" is key to unlocking the intended meaning of legal texts. Judge Newsom explains, "The combination of the massive datasets used for training and this cutting-edge ‘mathematization’ of language enables LLMs to absorb and assess the use of terminology in context.”

However, Judge Newsom doesn't shy away from the potential pitfalls. He acknowledges the risk of AI "hallucinations"—generating inaccurate or misleading information—underscoring the need for careful scrutiny of their outputs. “First, the elephant in the room: What about LLMs’ now-infamous 'hallucinations'?" He also raises concerns about potential biases within training data, emphasizing the importance of transparency and inclusivity: “The absence of offline usages from the training pool—and in particular, the implications for underrepresented populations—strikes me as a sufficiently serious concern.”

Kudos to Judge Newsom

Judge Newsom deserves kudos for his courage and foresight. Including the prompts and outputs from ChatGPT as a published appendix in his opinion is not only transparent but also very useful. Moreover, his willingness to share this view, which challenges conventional wisdom, is commendable and reflects an openness to innovation that is crucial for the legal field. As Judge Newsom rightly notes, the experiment with ChatGPT "no longer strikes me as ridiculous" and indeed "might have something useful to say about the common, everyday meaning of the words and phrases used in legal texts."

Proceed with Caution...and Curiosity

So, can ChatGPT replace judges and lawyers? Not quite. But Judge Newsom's experiment offers a compelling glimpse into a future where AI could play a valuable role in legal interpretation. As he aptly puts it, LLMs should be viewed as one tool among many, offering additional insights that can help us better understand the law and ensure its fair and just application. “It seems to me scarcely debatable that the LLMs’ training data are at the very least relevant to the ordinary-meaning analysis.”

The conversation is just beginning, and Judge Newsom is among the first to highlight the true potential of AI in the legal field. While many in legal circles have yet to fully grasp how well this technology fits various legal tasks and processes, Judge Newsom's concurring opinion is just the beginning. This technology is unusually well suited for a wide range of legal interpretation, analysis, and other key functions. As AI continues to develop, the intersection of artificial intelligence and the law promises to be a transformative space to watch.

USPTO Recognizes Prompting as Sufficient Contribution for Patentable Invention

Dazza Greenwood — Tue, 13 Feb 2024 04:02:42 GMT

I'm delighted to share that the USPTO has just announced guidance that AI-assisted inventions are not unpatentable and that patent protection may be sought if a human provides a significant contribution to the invention (ie the Pannu factors) and - YES - being an awesomely inventive prompter can do the trick!

To determine inventorship of an AI-assisted invention, the guidance instructs examiners to apply the existing "significant contribution" test and determine if the human named on the patent made a significant enough contribution to qualify as an inventor.

In blog post today by Kathi Vidal, Under Secretary of Commerce for Intellectual Property and Director of the USPTO, discusses and provides an example of how the guidance would apply, stating "...if an individual made a signification contribution through the construction of a prompt, that could be sufficient"!

The full official guidance by USPTO looks set to be released tomorrow at https://www.federalregister.gov/public-inspection/2024-02623/guidance-inventorship-guidance-on-ai-assisted-inventions but you can get pre-publication advance look at it right here: https://public-inspection.federalregister.gov/2024-02623.pdf

We owe much to Jerry Ma and his team for this timely and enlightened policy measure. So, Kudos Jerry!

My Take:

This sounds like the right legal result and more importantly it is the best outcome for leveraging the powerful new capabilities of generative AI to incentivize another sustained burst of innovation. I'm tracking (and sometimes contributing to) a number of forward looking policy measures designed to unleash the potential of this technology in other domains as well, and hope to be making more such posts soon.

There is a perhaps natural and expected tendency among many policy oriented people and lawyers to put on the breaks in the face of something truly novel, but the potent new vehicles of generative AI also, perhaps mostly, need active use of the steering wheel and the accelerator because to move forward sometimes you need to pump the gas and steer to where you want to be, not just hit the breaks. Successful navigation is a combination of these controls and you go nowhere with primary obsession on applying the breaks to remain parked in the past as new events pass you by. I hear a lot of breaks from too many people who seem rather checked out to the other, frequently more important, combinations of controls at our disposal, such as propulsions and steering and a destination where we are navigating to that's better than where we came from.

Let's drive forward to the goal.

ChatGPT Year in Review + GenAI Look Ahead to 2024

Dazza Greenwood — Thu, 30 Nov 2023 23:57:52 GMT

One year ago today, ChatGPT was released to the world, and what a year it has been!

Happy First Birthday, ChatGPT!

I’d been digging deep into GPT-3 throughout the previous year, working with Megan Ma and others to clarify how well or poorly that earlier version of the technology could understand and correctly apply the fiduciary duty of loyalty and other related legal frameworks and rules. I was impressed by what GPT-3 could do, and well aware of its limitations as well. But when I got my hands on GPT-3.5, which is the model that powered ChatGPT, I was - and continue to be - astonished at its human-like natural language capabilities. Within a couple of weeks, I determined the ChatGPT model was performing well on the same fiduciary duties tests and evaluations that had stymied GPT-3 and was able to go well beyond anything I had previously devised to test it. I had to invent new tests just to begin to map the contours of the new capabilities. Here is a snapshot of some of those early tests, from Dec 17th of 2023: https://www.civics.com/pub/chatgpt-session-2022-12-17/ Later this year, when GPT-4 was finally released to the public, it blew away even those boundaries.

I’d first seen GPT-4 a few months before ChatGPT was released, when Pablo Arredondo showed me a private demo of his pre-release version of CoCounsel. So I was aware there were even more powerful models in the pipeline, but being aware and being able to have open, easy, web-accessible access are two very different things. When OpenAI, to their credit, made GPT-4 available to everyone through their premium access and the API, it was a revelation for those of us who apply this technology to legal use cases and more broadly to solve business challenges or realize creative new ideas in industry and the professions.

Meanwhile, I hasten to add there are still serious limits and flaws with this technology. It is prone to so-called hallucination and providing factually incorrect information, for example, and there are myriad conundrums about the role of intellectual property and personal or confidential data, to name a few issues. I have found that anchoring the technology to authoritative data, such as by adding that information to the prompt (e.g., via the context window) or through processes like Retrieval Augmented Generation, the hallucinations go down and the factual grounding goes way up, as well as the reasoning-like process.

In light of these shortcomings, and to begin providing guidance on the responsible use of GenAI for law, I started a Task Force at MIT on the Responsible Use of Generative AI for Law and Legal Processes, which convened a group of super-stars who collaborated to develop some solid, if preliminary, principles and guidelines (published here, toward the bottom of the page: law.MIT.edu/ai). I’m delighted to say that I also served as an advisory member of the California State bar association working group that started with the MIT draft and formulated it into a very solid (not perfect, but very good) set of more formal guidance for attorneys to observe when using this technology as part of their practice (available here: https://board.calbar.ca.gov/docs/agendaItem/Public/agendaitem1000031702.pdf).

The primary need among attorneys at this point in time is to learn about this technology and to acquire skills in knowing how, when, and for what to use the technology. Much of this can happen by simply using the technology with an eye toward experimentation and exploration. To that end, over the past year I’ve personally shared quite a few resources for lawyers, as well as other professionals (e.g., in tax, audit, consulting, etc.) on the emerging skill known as prompt engineering, both through open and free resources at law.MIT.edu/ai and more so through private consulting and workshops. Next year, I’m on track to release several more resources and to provide a few new services to make these skills ever more accessible. More on that in the weeks and months to come.

The past year has seen many people and groups raise a lot of fears and objections and outright resistance to this technology, for a range or reasons and from a range of perspectives and priorities. I think this is natural and I know it is to be expected, and yet, in my view, the temptation - especially among lawyers and institutions of the law - to react with prohibitions and overly restrictive regulations is a mistake. Appropriate regulation and policy should balance the enormous utility of this technology against the largely speculative risks, and in general, the main thrust of investment and policy at this point should be toward the beneficial adoption, adaptation, and leveraging of this very useful new technology and the capabilities it affords. Nonetheless, there are demonstrable limits and flaws with the technology and there remains a distance to go before these and other issues are fully addressed.

In the arena of law, in 2024 we’ll see more helpful guidance along the lines of the pioneering guidelines by the California Bar late this year to help lawyers use and integrate GenAI into their practices in a responsible and ethical manner.

I expect we will also see more focus on the second-order implications of this technology in broader institutional contexts, such as updated training of law students and new hire lawyers, better judicial processes (and perhaps more importantly, non-judicial alternative dispute resolution systems) to make this technology available in effective ways for unrepresented and under-represented litigants and criminal defendants, and some potential reforms in the rules of evidence to better deal with the coming wave of deep fakes, among many other ripples in law and legal processes.

More broadly, I foresee a new area of off-line, on-premises, and even on-device LLMs taking hold this year. For example, this week many tech enthusiasts are talking about llamafile (https://hacks.mozilla.org/2023/11/introducing-llamafile/), a groundbreaking multi-gigabyte file that revolutionizes personal computing by bundling both the model weights and the necessary code to run Large Language Models like ChatGPT on your own device. That’s right - you can now easily run a functional LLM on your desktop and even on your laptop! This innovation marks a significant leap in making advanced AI more accessible, bridging the gap between professional AI applications and everyday tech enthusiasts.

A key implication of this on-device approach to running GenAI is that users can now integrate their own confidential, proprietary, and otherwise sensitive data in a completely air-gapped system. I’ve already been exploring the usefulness and security benefits of this approach with enterprise clients of my consulting company, CIVICS.com, but for this blog post, I want to connect this capability to something even more important, namely, YOU! What I mean is that individuals will soon have the tools needed to easily run our own powerful GenAI systems and we’ll be able to connect the rich treasure trove of our personal data to anchor the technology to our contexts, our knowledge, our relationships, and our unique goals and priorities. My esteemed friend Doc Searls has recently begun speculating about the advent of so-called “Personal AI” (e.g., https://projectvrm.org/2023/11/11/individual-empowerment-and-agency-on-a-scale-weve-never-seen-before/) and I predict this will be among the true killer apps (or more accurately, sets of connected apps) to drive adoption and beneficial use of GenAI in the coming year and beyond.

I also see the emergence of automated or quasi-autonomous personal agents and the increasing integration of Generative AI with a wide set of existing widely used apps and platforms not only as major trends in their own right, but also as capabilities that will be super-charged by on-device models with private access to personal and sensitive data.

In my law.MIT.edu capacity, I’ll be working with our team to kick-start 2024 with some major GenAI initiatives, including the annual MIT IAP Computational Law Workshop happening this January with a remarkable set of speakers, topics, and learning activities, and an associated GenAI Online Legal Hackathon. We’ll be announcing those shortly, and to get on the list you can use our pre-registration form here: https://forms.gle/92WwhEWwpGdLyfE5A The MIT Computational Law Report is also about to announce a special collection on GenAI for Law, which will be featuring written works as well as open source applications and Jupyter or Colab Notebooks representing a range of legal use cases that can be achieved with GenAI. And that’s just January!

In my private capacity, through the CIVICS.com consultancy, I’ll be leaning into projects with companies, law firms, and some open source initiatives, who are making innovative use of GenAI both to make their current work faster, less expensive, and better as well as for creating totally new types of products, services, and even novel lines of business. I have also totally revised my standard lunch-talk and private workshop offerings to provide more accessible and flexible opportunities for companies and legal teams to bring me in for learning sessions or to focus on emerging projects. If you’d like to set up a consultation, a talk, or workshop in 2024, reach out through CIVICS.com here: https://www.civics.com/contact

In my public capacity, I plan to keep contributing to standards efforts and professional association efforts, such as through my membership on the ABA Task Force on AI and contributing to open and free community building and skill sharing efforts, such as through meetups, hackathons, and my favorite group Legal Hackers. In a couple of weeks, for example, anybody who wants to meet up with a group of like-minded creative types to share generative AI prompts, ideas and solutions is invited to join the Bay Area Legal Hackers Happy Hour in Oakland. You can learn more and register for this event here.

Looking ahead, I predict two overarching GenAI trends in 2024, first a lot of catch-up by companies, teams, and individuals who are currently aware of this new technology but only have a superficial understanding of it and few skills in using it. We are still at the early part of the adoption curve, and that is normal. We will climb that curve in the coming year. Second, I foresee a number of major changes and breakthroughs in the technology itself, both in the form of integrations of the technology in current common products as well as totally new capabilities and deployment models. I mentioned on-device and secure personal data design patterns as one example of a new deployment model, and there are many others in the pipeline and some that have not even been conceived of yet.

In the face of all this emerging change, the task today is to get educated. Now.

DazzaGreenwood's Weblog

Existing on the New Web

The Shift (Almost) Nobody Prepared For

Three Properties Your Web Presence Now Needs

The Verification Problem (And Why It’s Being Solved)

What “Agent Optimization” Actually Means

The Stakes Are Higher Than You Think

What You Should Do Now

Practical Standards: What’s Working Now

For Accessibility

For Legibility

For Actionability

For Diagnostics

The Web Is Being Rebuilt. Quietly.

AI Agent ID

Agent Payments Protocol (AP2)

Overview: AP2 as a Foundational Protocol for Trusted AI Commerce

Deep Dive: The Intent Mandate - The "Digital Power of Attorney"

Deep Dive: The Other Mandates - The "Evidentiary Chain"

Examples and Use Cases for Consumers and Businesses

Consumer Use Cases: Convenience and Automation with Guardrails

Business Use Cases: Auditable Automation and Control

Structuring the Corresponding Legal Framework: The Letter of Authorization

OPTION 1: The Principal-Agent Model (User as Authorizer, Provider as Enforcer)

OPTION 2: The Managed Platform Model (Template-Based Delegation)

OPTION 3: The Certified Fiduciary Model (Role-Based Trust & Duty of Care)

Remaining Work and Strategic Next Steps

For Businesses and Consumers (as Users):

For AI Agent Providers:

For the AP2 Standard and the Intent Mandate:

Beyond AI Benchmarks

The Blindspot in Every AI Playbook

The Non-Delegable Duty of Defining “Good”

From Abstract Principles to Executable Standards

The Strategic Asset Nobody’s Talking About

Making It Real: From Theory to Practice

The Agent Revolution Changes Everything

Proof That This Works

The Ecosystem of Evaluation

Your Path Forward

The Executive Imperative

Recent Posts on AI Agents

From Fine Print to Machine Code: How AI Agents are Rewriting the Rules of Engagement: Part 1 of 3

Part 1 of 3

What is a Transactional Agent?

Your Transactional Agent Is Not A Legal Agent, But You Might Be

Making Agency (or Alternatives) Work For You

From Fine Print to Machine Code: How AI Agents are Rewriting the Rules of Engagement: Part 2 of 3

Part 2 of 3

Mistakes and Errors – at AI Scale

If a Transactional Agent Makes a Mistake, Who is on the Hook?

Enter the Regulators

What About Missteps between the Transactional Agent Provider and LLM Provider?

From Fine Print to Machine Code: How AI Agents are Rewriting the Rules of Engagement: Part 3 of 3

Defining ‘Loyalty’ for AI Agents: Insights from the Stanford AI Agents x Law Workshop

Setting the Stage: The Quest for Consumer-Centric Agents

What Does a “Loyal” AI Agent Mean for Consumers?

Beyond Promises: The Link Between Legal Frameworks & Technical Reality

Looking Ahead

My Agent Messed Up! Understanding Errors and Recourse in AI Transactions

Setting the Scene

UETA Section 10(b): The Right to Undo Agent Errors

The Provider’s Role: Building the Escape Hatch

Why This Matters Now More Than Ever

Looking Ahead

Agents Talking to Agents (A2A): Reshaping the Marketplace and Your Power

Market Disruption at Machine Speed

Unlocking Consumer Power Through Interoperability

The Road Ahead: Opportunity & Responsibility

Get In Touch

On AI Regulation "Third-Way"

May 16, 2025 Update: Further Thoughts on AI Regulation, MROs & a Path to Interstate Co-operation

Nancy’s questions—answered

1 | Will a future federal regulator make state action moot?

2 | How often does “trustworthy” recertification happen?

3 | Are one-state guardrails enough?

Expanding the vision: a practical path to interstate AI reciprocity

4.1 A simple legislative starting-point

4.2 What “substantially equivalent” could mean

4.3 Making reciprocity work: procedural mechanics