Skip to main content
< All Topics
Print

Agentic Task Execution

name: agentic-task-execution

description: Patterns for Claude tool use and function calling to execute real-world tasks (sending emails, booking appointments, purchasing items) on behalf of users, with safety guardrails, confirmation flows, and rollback capabilities. Covers tool definition schemas, multi-step execution plans, human-in-the-loop confirmation, side-effect classification, and audit logging. Use when building AI agents that perform actions with real-world consequences, implementing tool use safety patterns, or designing human-in-the-loop confirmation flows.

Agentic Task Execution

Instructions

Side-Effect Classification

Classify every tool/action by its reversibility and consequence severity:

Tier 1: Read-Only (No Confirmation Required)

  • Fetching data (calendar events, emails, weather, search results)
  • Reading user preferences or settings
  • Generating text, summaries, or recommendations

Tier 2: Reversible Actions (Soft Confirmation)

  • Drafting an email (saved to drafts, not sent)
  • Adding a calendar event (can be deleted)
  • Creating a to-do item or reminder
  • Saving a note or document
  • Adding items to a shopping list

Tier 3: Consequential Actions (Hard Confirmation Required)

  • Sending an email or message
  • Booking an appointment or reservation
  • Making a purchase or payment
  • Modifying a shared calendar event
  • Posting to social media
  • Canceling an existing booking

Tier 4: Irreversible/High-Stakes (Explicit Multi-Step Confirmation)

  • Financial transactions above a threshold
  • Deleting data permanently
  • Sending messages to groups or mailing lists
  • Actions with contractual implications
  • Anything involving PII disclosure to third parties

Tool Definition Patterns

  1. Structure every tool with these fields:

   {
     "name": "send_email",
     "description": "Send an email via the user's configured email account",
     "input_schema": {
       "type": "object",
       "properties": {
         "to": { "type": "array", "items": { "type": "string" } },
         "subject": { "type": "string" },
         "body": { "type": "string" },
         "cc": { "type": "array", "items": { "type": "string" } },
         "attachments": { "type": "array", "items": { "type": "string" } }
       },
       "required": ["to", "subject", "body"]
     },
     "side_effect_tier": 3,
     "confirmation_required": true,
     "rollback_available": false
   }
  1. Tool metadata extensions (beyond standard MCP/Claude tool schema):
  • side_effect_tier: 1–4 classification
  • confirmation_required: boolean, derived from tier
  • rollback_available: boolean, whether the action can be undone
  • cost_estimate: for tools involving purchases
  • rate_limit: maximum invocations per time window

Multi-Step Execution Plans

For complex tasks requiring multiple tool calls:

  1. Plan before executing:

   User: "Book dinner for 4 at a nice Italian restaurant Saturday night and send the details to Sarah"

   Plan:
   1. [Tier 1] Search for Italian restaurants with availability Saturday 7pm, party of 4
   2. [Tier 1] Present top 3 options to user for selection
   3. [Tier 3] Book reservation at selected restaurant → CONFIRMATION REQUIRED
   4. [Tier 3] Send email to Sarah with restaurant details → CONFIRMATION REQUIRED
  1. Present the plan to the user before execution begins
  2. Execute sequentially, pausing at each confirmation point
  3. Batch confirmations when appropriate: “I’ll book the restaurant and email Sarah. Here are the details — shall I proceed with both?”

Human-in-the-Loop Confirmation Flow

  1. Soft confirmation (Tier 2):
  • Show what will happen: “I’ll add ‘Dentist appointment’ to your calendar for Tuesday at 2pm”
  • Proceed unless user objects within the response
  • User can undo afterward
  1. Hard confirmation (Tier 3):
  • Show complete details of the action
  • Explicitly ask: “Shall I send this email?” / “Shall I book this reservation?”
  • Do NOT proceed without affirmative response
  • Show a preview of the exact content (email body, reservation details)
  1. Multi-step confirmation (Tier 4):
  • Show full action details
  • State the consequences explicitly: “This will charge $250 to your card ending in 4521”
  • Require explicit confirmation phrase
  • Implement a cooling-off period for very high-stakes actions

Safety Guardrails

  1. Rate limiting:
  • Maximum emails per hour: 10
  • Maximum purchases per session: 1 (require new session for additional)
  • Maximum calendar modifications per minute: 5
  1. Amount limits:
  • Purchases under $50: Tier 3 (single confirmation)
  • Purchases $50–$500: Tier 4 (explicit amount confirmation)
  • Purchases over $500: Tier 4 + require re-authentication
  1. Recipient validation:
  • Verify email addresses against user’s contacts before sending
  • Flag unknown recipients with a warning
  • Prevent sending to large groups (>10 recipients) without Tier 4 confirmation
  1. Content safety:
  • Never include sensitive data (SSN, full credit card numbers, passwords) in tool outputs
  • Mask PII in confirmation previews where possible
  • Log tool invocations without logging sensitive content

Audit Logging

  1. Log every tool invocation:

   {
     "timestamp": "ISO-8601",
     "tool_name": "send_email",
     "tier": 3,
     "user_id": "string",
     "session_id": "string",
     "parameters_hash": "sha256 of parameters (not the content)",
     "confirmation_shown": true,
     "user_confirmed": true,
     "execution_result": "success|failure|cancelled",
     "rollback_available": false,
     "error_message": null
   }
  1. Retention: keep audit logs for 90 days minimum
  2. User access: users should be able to view their action history

Error Handling and Rollback

  1. On tool execution failure:
  • Report the failure clearly to the user
  • Do not retry Tier 3/4 actions automatically
  • Offer to retry Tier 1/2 actions after a brief delay
  1. Partial execution (multi-step plans):
  • If step N fails, report which steps completed and which failed
  • Offer to rollback completed steps if rollback is available
  • Never continue to dependent steps after a failure
  1. Timeout handling:
  • Set reasonable timeouts per tool (API calls: 30s, booking systems: 60s)
  • On timeout, treat as failure; do not assume success

Inputs Required

  • Tool catalog with side-effect classifications
  • User authentication context (which services are connected)
  • User preference for confirmation level (more cautious / default / streamlined)
  • Rate limit configuration
  • Amount thresholds for purchase confirmation tiers

Output Format

Execution Plan


{
  "plan_id": "uuid",
  "task_description": "Book dinner and notify Sarah",
  "steps": [
    {
      "step": 1,
      "tool": "search_restaurants",
      "tier": 1,
      "confirmation": "none",
      "parameters": { "cuisine": "Italian", "party_size": 4, "date": "2026-04-18", "time": "19:00" },
      "status": "pending"
    },
    {
      "step": 2,
      "tool": "book_reservation",
      "tier": 3,
      "confirmation": "hard",
      "parameters": { "restaurant_id": "pending_selection" },
      "status": "blocked_by_step_1"
    }
  ],
  "total_confirmations_needed": 2,
  "estimated_cost": "$0 (reservation is free)"
}

Confirmation Prompt


I'd like to book a reservation:

  Restaurant: Osteria del Angolo
  Date: Saturday, April 18 at 7:00 PM
  Party size: 4
  Name: Peter Westerman

Shall I proceed with this booking?

Anti-Patterns

  • Executing Tier 3/4 actions without confirmation — the single most dangerous pattern; always confirm before consequential actions
  • Confirming with vague descriptions — “I’ll take care of it” is not a confirmation preview; show exact details
  • Auto-retrying failed consequential actions — retrying a failed payment or email send can cause duplicates
  • Logging sensitive content in audit trails — log the action metadata, not the email body or payment details
  • Batching unrelated confirmations — “I’ll send 3 emails and book a restaurant” forces an all-or-nothing decision; separate unrelated actions
  • Skipping the execution plan — for multi-step tasks, users need to see the full plan before any step executes
  • Treating all actions the same — reading data and sending money require fundamentally different safety levels
  • Not providing undo options — if an action is reversible, always offer the undo path after execution
Table of Contents