Back to Articles
Temporal Workflow Engine — The Complete Guide

Temporal Workflow Engine — The Complete Guide

Temporal is a durable execution platform that guarantees your code runs to completion, surviving crashes, deployments, network failures, and server restarts. It turns any function into a reliable, fault-tolerant workflow.


Table of Contents

  1. Architecture & Internals
  2. Core Concepts
  3. Workflow Execution Model
  4. Workers — Deep Dive
  5. Task Queues
  6. Timeouts — Complete Reference
  7. Retry Policies
  8. Signals, Queries & Updates
  9. Child Workflows
  10. Continue-As-New
  11. Saga Pattern & Compensation
  12. Versioning & Patching
  13. Schedules & Cron Workflows
  14. Search Attributes & Visibility
  15. Data Conversion & Encryption
  16. Namespaces
  17. Testing
  18. Observability — Metrics, Logging, Tracing
  19. Security
  20. Production Deployment
  21. Temporal Cloud vs Self-Hosted
  22. CLI Cheatsheet
  23. SDK Code Examples
  24. Advanced Patterns
  25. Use Cases
  26. Anti-Patterns & Pitfalls
  27. Quick Reference Cheatsheet

1. Architecture & Internals

High-Level Architecture

                        ┌──────────────────────────────────┐
                        │         Temporal Server           │
                        │  (Cluster of 4 internal services) │
                        │                                    │
  Client ──gRPC──────►  │  ┌──────────┐  ┌──────────────┐  │
  (SDK)                  │  │ Frontend │  │   Matching    │  │
                        │  │ Service  │  │   Service     │  │
                        │  └────┬─────┘  └──────┬───────┘  │
                        │       │               │           │
  Worker ──gRPC──────►  │  ┌────┴─────┐  ┌──────┴───────┐  │
  (your code)            │  │ History  │  │   Worker      │  │
                        │  │ Service  │  │   Service     │  │
                        │  └────┬─────┘  └──────────────┘  │
                        │       │                           │
                        │  ┌────┴───────────────────────┐  │
                        │  │    Persistence Layer        │  │
                        │  │  (Postgres/MySQL/Cassandra) │  │
                        │  └────────────┬───────────────┘  │
                        │               │                   │
                        │  ┌────────────┴───────────────┐  │
                        │  │    Visibility Store         │  │
                        │  │  (Elasticsearch/PostgreSQL) │  │
                        │  └────────────────────────────┘  │
                        └──────────────────────────────────┘

The 4 Internal Services

ServiceResponsibility
FrontendRate limiting, routing, authorization, request validation. Entry point for all gRPC calls. Ports: gRPC 7233, Membership 6933
HistoryMaintains workflow event histories, executes state machine transitions, handles timers. The brain of Temporal. Each workflow is owned by exactly one history shard. Ports: gRPC 7234, Membership 6934. Scale: 1 process per 500 shards.
MatchingManages task queues. Dispatches workflow tasks and activity tasks to polling workers. Handles sync vs async matching. Ports: gRPC 7235, Membership 6935
Worker (internal)Runs system-level workflows: archival, replication, batch operations. Not your application workers. Port: Membership 6939

History Shards

  • The History service is sharded — default 512 shards for self-hosted, 8192 for Temporal Cloud.
  • Each workflow is assigned to a shard via hash(namespaceID + workflowID) % numShards.
  • Shards are distributed across History service instances via Ringpop (consistent hashing).
  • More shards = more parallelism = higher throughput. But more DB connections.
  • CRITICAL: Shard count is IMMUTABLE after initial setup. Cannot be changed without re-creating the database.
  • Each shard maintains 4 internal queues: Transfer (Matching dispatch), Timer (durable timers), Replicator (multi-cluster), Visibility (indexing).
  • Scale: Range from 1 to 128K shards. Recommendation: 1 History process per 500 shards.

Shard Sizing Guide:

ScaleRecommended Shards
Dev/test1-4
Small prod16-64
Medium prod128-512
Large prod512-4096
Extreme scale4096-128K
History Shard Assignment:
  Workflow "order-123" → hash → Shard 47 → History Node 3
  Workflow "order-456" → hash → Shard 191 → History Node 1

Persistence Layer

┌─────────────────────────────────────────────────────┐
│                  Persistence Layer                    │
│                                                       │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────┐ │
│  │  Executions   │  │   History    │  │  Visibility │ │
│  │  (workflow     │  │  (event      │  │  (search    │ │
│  │   metadata)   │  │   histories) │  │   index)    │ │
│  └──────────────┘  └──────────────┘  └────────────┘ │
│                                                       │
│  Supported Databases (production-tested versions):     │
│  • PostgreSQL 13-16 (recommended for most)            │
│  • MySQL 5.7, 8.0 (8.0.19+ required for some features│
│  • Cassandra 3.11, 4.0 (extreme scale, 100K+ wf/s)  │
│  • SQLite 3.x (dev only, via temporal server start-dev│
│                                                       │
│  Four data categories stored:                         │
│  1. Tasks (queued for dispatch)                       │
│  2. Workflow Execution State (mutable + event history)│
│  3. Namespace Metadata                                │
│  4. Visibility Data (search/filter index)             │
└─────────────────────────────────────────────────────-─┘

Ringpop (Membership Protocol)

  • Temporal server instances discover each other via Ringpop (SWIM-based gossip protocol).
  • Consistent hashing ring distributes shards across nodes.
  • When a node joins/leaves, shards rebalance automatically.
  • Requires seed nodes for bootstrapping (configured in temporal-server.yaml).

2. Core Concepts

Workflow

A deterministic function that orchestrates your business logic. Temporal persists its execution state so it survives any failure.

Properties:

  • Must be deterministic — same input always produces same execution path
  • Can run for seconds to years
  • Has a unique Workflow ID (user-defined or auto-generated)
  • Has a Run ID (system-generated UUID per execution attempt)
  • Maintains full event history in the server

Deterministic constraints — you CANNOT do these inside a workflow:

ForbiddenUse Instead
Math.random() / rand()workflow.random() / workflow.newRandom()
Date.now() / time.Now()workflow.now() / workflow.GetInfo().CurrentTime
setTimeout / time.Sleepworkflow.sleep() / workflow.Sleep()
Network I/O, DB callsActivities
File system accessActivities
Global mutable stateWorkflow-scoped variables
Non-deterministic librariesWrap in activities
Goroutines / threadsworkflow.Go() / workflow.executeChild()

Activity

A normal function that performs side-effects (API calls, DB writes, file I/O). Activities run on workers and their results are recorded in workflow history.

Properties:

  • Can be non-deterministic — free to do anything
  • Automatically retried on failure (configurable)
  • Support heartbeating for long-running work
  • Have configurable timeouts
  • Results are cached — on workflow replay, cached result is returned

Activity Types:

TypeDescription
Regular ActivityDispatched via task queue, runs on any worker polling that queue
Local ActivityRuns on the same worker as the workflow, bypasses task queue. Lower latency but no independent retry/timeout tracking.

Worker

A process you run that executes workflows and activities. It polls the Temporal server for tasks.

Task Queue

A named queue that connects workflows/activities to workers. Workers poll specific task queues. Multiple workers can poll the same queue for load balancing.

Namespace

A logical isolation boundary. Each namespace has its own workflow IDs, task queues, search attributes, and retention policies. Workflows in different namespaces cannot interact directly.


3. Workflow Execution Model

Event Sourcing & Replay

Temporal uses event sourcing — every step of a workflow is recorded as an event.

Event History for "order-123":
┌─────┬────────────────────────────────┬────────────────────┐
│  #  │ Event Type                      │ Details             │
├─────┼────────────────────────────────┼────────────────────┤
│  1  │ WorkflowExecutionStarted       │ input: {orderId}    │
│  2  │ WorkflowTaskScheduled          │                     │
│  3  │ WorkflowTaskStarted            │ worker: host-1      │
│  4  │ WorkflowTaskCompleted          │                     │
│  5  │ ActivityTaskScheduled          │ chargeCard          │
│  6  │ ActivityTaskStarted            │ worker: host-2      │
│  7  │ ActivityTaskCompleted          │ result: {txnId}     │
│  8  │ TimerStarted                   │ 10 minutes          │
│  9  │ TimerFired                     │                     │
│ 10  │ ActivityTaskScheduled          │ shipOrder           │
│ 11  │ ActivityTaskStarted            │ worker: host-1      │
│ 12  │ ActivityTaskCompleted          │ result: {trackId}   │
│ 13  │ WorkflowExecutionCompleted     │ result: "done"      │
└─────┴────────────────────────────────┴────────────────────┘

Replay Mechanism

When a worker picks up a workflow task:

1. Load event history from server
2. Execute workflow code from the beginning
3. For each command (activity, timer, child workflow):
   a. If matching event exists in history → return cached result (NO re-execution)
   b. If no matching event → this is NEW work, execute it
4. Return new commands to server
5. If worker crashes before completing → server re-assigns to another worker

Key insight: Activities are NOT re-executed on replay. Only the workflow function re-runs; activity results come from history.

Sticky Execution

By default, Temporal uses sticky task queues to route workflow tasks back to the same worker that previously ran them. This avoids re-downloading the full event history.

Normal flow (sticky):
  Worker A caches workflow state → next task routes to Worker A → resumes from cache

Sticky cache miss:
  Worker A dies → task goes to normal queue → Worker B downloads full history → replays

Sticky queue config:

StickyScheduleToStartTimeout: 5s  (how long to wait for sticky worker before falling back)
WorkflowCacheSize: 2000           (how many workflows to cache per worker)

Workflow Task vs Activity Task

┌─────────────────────┐     ┌────────────────────────┐
│   Workflow Task      │     │   Activity Task         │
├─────────────────────┤     ├────────────────────────┤
│ • Runs workflow code │     │ • Runs activity code    │
│ • Very fast (~1-5ms) │     │ • Can take seconds/mins │
│ • Deterministic      │     │ • Non-deterministic     │
│ • Produces commands  │     │ • Produces results      │
│ • Lightweight        │     │ • Can be CPU/IO heavy   │
└─────────────────────┘     └────────────────────────┘

4. Workers — Deep Dive

Worker Architecture

┌──────────────────────────────────────────────────────────┐
│                    Worker Process                          │
│                                                            │
│  ┌─────────────── Task Queue: "payments" ──────────────┐  │
│  │                                                       │  │
│  │  Workflow Task Poller(s)    Activity Task Poller(s)   │  │
│  │  [Thread 1] [Thread 2]     [Thread 1] [Thread 2]     │  │
│  │       │                          │                    │  │
│  │       ▼                          ▼                    │  │
│  │  ┌──────────────┐      ┌──────────────────┐          │  │
│  │  │ WF Executor  │      │ Activity Executor │          │  │
│  │  │ Slots: 100   │      │ Slots: 100        │          │  │
│  │  │              │      │                    │          │  │
│  │  │ [wf-1][wf-2] │      │ [act-1][act-2]    │          │  │
│  │  │ [wf-3][...]  │      │ [act-3][...]       │          │  │
│  │  └──────────────┘      └──────────────────┘          │  │
│  │                                                       │  │
│  │  Sticky Cache: LRU(2000 workflows)                   │  │
│  └───────────────────────────────────────────────────────┘  │
│                                                            │
│  Registered Workflows: [OrderWorkflow, PaymentWorkflow]    │
│  Registered Activities: [chargeCard, refund, sendEmail]    │
└──────────────────────────────────────────────────────────-─┘

All Worker Configuration Options

OptionDefaultDescription
MaxConcurrentWorkflowTaskExecutions100Max workflow tasks executing simultaneously
MaxConcurrentActivityTaskExecutions100Max activity tasks executing simultaneously
MaxConcurrentWorkflowTaskPollers2Threads polling for workflow tasks
MaxConcurrentActivityTaskPollers2Threads polling for activity tasks
MaxConcurrentLocalActivityExecutions100Max local activities executing simultaneously
WorkerActivitiesPerSecondunlimitedRate limit: activities/sec this worker can process
TaskQueueActivitiesPerSecondunlimitedRate limit: activities/sec across ALL workers on this queue
WorkflowPanicPolicyBlockWorkflowWhat happens on non-determinism: BlockWorkflow or FailWorkflow
StickyScheduleToStartTimeout5sTime to wait for sticky worker before fallback
WorkflowCacheSize2000Number of cached workflow states (sticky cache)
EnableSessionWorkerfalseEnable session-based activity routing
MaxConcurrentSessionExecutionSize1000Max concurrent sessions
DeadlockDetectionTimeout1s (Go)Detects blocked workflow goroutines
Identityhostname+pidWorker identity shown in UI
EnableLoggingInReplayfalseWhether to emit logs during replay

Worker Scaling Guidelines

Scaling Decision Matrix:
─────────────────────────────────────────────────────────
Symptom                          → Action
─────────────────────────────────────────────────────────
schedule_to_start_latency HIGH   → Add more worker instances
  + slots available              → Increase pollers
  + slots full                   → Add instances or increase MaxConcurrent*

Activity execution slow          → Optimize activity code, or increase instances

Workflow task latency HIGH       → Increase WorkflowCacheSize (reduce replays)
  + sticky cache hit rate LOW    → ^^ same

Memory high on workers           → Reduce WorkflowCacheSize
                                 → Reduce MaxConcurrent*

CPU high on workers              → Add worker instances
                                 → Reduce MaxConcurrent*
─────────────────────────────────────────────────────────

Multiple Task Queues per Worker

A single worker can poll multiple task queues:

// Go — one worker per task queue
w1 := worker.New(c, "payments", worker.Options{})
w2 := worker.New(c, "notifications", worker.Options{})
// Both run in the same process
// TypeScript — each Worker instance handles one task queue
const w1 = await Worker.create({ taskQueue: 'payments', ... });
const w2 = await Worker.create({ taskQueue: 'notifications', ... });
// Run both
await Promise.all([w1.run(), w2.run()]);

5. Task Queues

How Task Queues Work

                    Temporal Server
                    ┌──────────────────────────┐
Client starts       │  Task Queue: "payments"  │
workflow ──────────►│  ┌────┬────┬────┬────┐  │
                    │  │ T1 │ T2 │ T3 │ T4 │  │◄── Tasks waiting
                    │  └────┴────┴────┴────┘  │
                    └──────────┬───────────────┘
                               │ Long-poll
                    ┌──────────┼──────────────┐
                    │          │              │
                Worker A   Worker B      Worker C
                (picks T1) (picks T2)   (picks T3)

Task queues are:

  • Virtual — created on first use, no pre-configuration needed
  • Per-namespace — same name in different namespaces = different queues
  • Server-side — tasks live on the Temporal server, not on workers
  • Load-balanced — round-robin (with sync matching optimization)

Sync vs Async Matching

ModeHow It WorksWhen
Sync MatchTask is directly handed to a waiting poller (no DB write)A worker is already polling when task arrives
Async MatchTask is written to DB, worker picks it up laterNo worker is polling at that moment

Sync matching is faster (sub-millisecond). Temporal optimizes for it.

Task Queue Partitions

For high-throughput queues, Temporal splits them into partitions (default: 4 per queue in Temporal Cloud).

Task Queue "high-volume"
  ├── Partition 0
  ├── Partition 1
  ├── Partition 2
  └── Partition 3

More partitions = higher dispatch throughput, but workers must poll each partition.


6. Timeouts — Complete Reference

Visual Overview

                    Workflow Execution Timeout
◄─────────────────────────────────────────────────────────►
                    Workflow Run Timeout
◄───────────────────────────────────►  (per run, resets on ContinueAsNew)

  ┌─ Activity ScheduleToClose Timeout ──────────────────┐
  │                                                       │
  │  ScheduleToStart     StartToClose                    │
  │  ◄──────────►  ◄──────────────────────────►          │
  │  (queue wait)  (execution time)                       │
  │                                                       │
  │                ◄─ Heartbeat ─► ◄─ Heartbeat ─►       │
  │                (periodic check-in)                    │
  └───────────────────────────────────────────────────────┘

All Timeout Types

TimeoutApplies ToDefaultDescription
WorkflowExecutionTimeoutWorkflowUnlimitedMax time for the ENTIRE workflow, including all retries and ContinueAsNew runs
WorkflowRunTimeoutWorkflowSame as ExecutionTimeoutMax time for a SINGLE workflow run (resets on ContinueAsNew)
WorkflowTaskTimeoutWorkflow Task10sMax time for a worker to process a single workflow task (code step). If exceeded, task is re-assigned to another worker. Increase for very large histories.
ScheduleToCloseTimeoutActivityUnlimitedMax time from scheduling to completion (includes queue wait + execution + retries)
StartToCloseTimeoutActivityScheduleToCloseMax time from when a worker picks up an activity to when it must complete. This is the one you almost always set.
ScheduleToStartTimeoutActivityScheduleToCloseMax time an activity can wait in the queue. Detects when no workers are available.
HeartbeatTimeoutActivityNoneMax time between heartbeats. If exceeded, activity is considered failed. Use for long-running activities.

Timeout Best Practices

# For a typical API call activity:
StartToCloseTimeout: 30s
RetryPolicy: { maximumAttempts: 3 }

# For a long-running data processing activity:
StartToCloseTimeout: 1h
HeartbeatTimeout: 60s     ← ensures we detect stuck activities
RetryPolicy: { maximumAttempts: 5 }

# For human-approval workflow:
No activity timeout (it's a signal wait)
WorkflowExecutionTimeout: 30d

# To detect "no workers" situations:
ScheduleToStartTimeout: 30s  ← fails fast if no worker picks it up

7. Retry Policies

Default Retry Behavior

ComponentDefault RetryNotes
ActivityInfinite retriesRetries until ScheduleToCloseTimeout or explicit max
WorkflowNo retriesWorkflows don’t retry by default (set in WorkflowOptions)
Workflow TaskInfinite retriesInternal retries on non-determinism errors, panics, etc.

Retry Policy Configuration

RetryPolicy {
  InitialInterval:    1s      // First retry delay
  BackoffCoefficient: 2.0     // Multiplier for each retry
  MaximumInterval:    100s    // Cap on retry delay
  MaximumAttempts:    0       // 0 = unlimited (until timeout)
  NonRetryableErrorTypes: [   // Error types that skip retries
    "InvalidInputError",
    "BusinessRuleViolation"
  ]
}

Retry Timeline Example

Attempt 1: fails at T=0
  wait 1s (InitialInterval)
Attempt 2: fails at T=1s
  wait 2s (1s × 2.0 backoff)
Attempt 3: fails at T=3s
  wait 4s
Attempt 4: fails at T=7s
  wait 8s
Attempt 5: fails at T=15s
  wait 16s
Attempt 6: fails at T=31s
  wait 32s
Attempt 7: fails at T=63s
  wait 64s
Attempt 8: fails at T=127s
  wait 100s (capped by MaximumInterval)
Attempt 9: at T=227s
  ...continues until MaximumAttempts or ScheduleToCloseTimeout

Non-Retryable Errors

Mark errors as non-retryable to skip retry:

// Go
temporal.NewNonRetryableApplicationError("invalid input", "INVALID", nil)
// TypeScript
throw ApplicationFailure.nonRetryable("invalid input", "INVALID");
# Python
raise ApplicationError("invalid input", non_retryable=True)
// Java
throw ApplicationFailure.newNonRetryableFailure("invalid input", "INVALID");

8. Signals, Queries & Updates

Signals

Asynchronous messages sent to a running workflow. The workflow handles them when it’s ready.

Client ──signal──► Temporal Server ──delivers──► Workflow (handles in handler)
  • Fire-and-forget (client doesn’t wait for result)
  • Buffered — delivered even if workflow is currently busy
  • Persisted in event history
  • Can carry data payload

Queries

Synchronous, read-only requests to inspect workflow state. Returns immediately.

Client ──query──► Temporal Server ──routes──► Worker (executes handler) ──result──► Client
  • Read-only — cannot modify workflow state
  • Not recorded in event history
  • Must have a running workflow to query
  • Routed to the worker with cached state (sticky) if possible

Updates (Temporal 1.21+)

Synchronous, read-write messages. Client sends data, workflow processes it, and returns a result.

Client ──update──► Temporal Server ──delivers──► Workflow (processes) ──result──► Client
  • Can modify workflow state AND return a result
  • Client waits for the result (unlike signals)
  • Recorded in event history
  • Supports validators (reject invalid updates before applying)

Comparison

FeatureSignalQueryUpdate
DirectionClient → WorkflowClient ↔ WorkflowClient ↔ Workflow
Modifies StateYesNoYes
Returns ResultNoYesYes
In Event HistoryYesNoYes
Sync/AsyncAsyncSyncSync
Requires Running WFNo (buffered)YesYes

9. Child Workflows

A workflow started by another workflow. The parent can wait for the result or fire-and-forget.

When to Use Child Workflows

Use CaseWhy
Separate failure domainsChild can fail/retry independently
Different timeoutsChild has its own execution timeout
Break up large historiesEach child has its own history (avoids 50K event limit)
Reuse workflow logicSame workflow can be started as child or standalone
Different task queuesRoute child to different workers/services
Different retry policiesEach child retries independently

Parent Close Policy

What happens to children when parent completes/fails:

PolicyBehavior
TERMINATEChild is terminated (default)
ABANDONChild continues running independently
REQUEST_CANCELCancellation request sent to child

10. Continue-As-New

Resets workflow history by starting a new run with fresh state. Essential for long-running or unbounded workflows.

Why?

Event history has a hard limit (~50,000 events or 50MB). The service warns at 10,240 events and errors at the limit. ContinueAsNew resets the counter.

Event counting guide:

  • Minimal workflow (start + complete): ~5 events
  • Single activity execution: ~11 events (schedule, start, complete + workflow task events)
  • Activity in a loop: 5 + 6 * iterations
  • A workflow with 20 sequential activities: ~125 events
  • Use workflow.GetInfo(ctx).GetContinueAsNewSuggested() (Go) or workflowInfo().historyLength (TS) to check.
Run 1: Events 1-10,000 → ContinueAsNew(state)
Run 2: Events 1-10,000 → ContinueAsNew(state)
Run 3: Events 1-5,000 → Complete

Pattern

// Go
func ProcessorWorkflow(ctx workflow.Context, state State) error {
    for i := 0; i < 1000; i++ {  // process batch
        // ... do work
    }
    // Reset with accumulated state
    return workflow.NewContinueAsNewError(ctx, ProcessorWorkflow, state)
}
// TypeScript
export async function processorWorkflow(state: State): Promise<void> {
  for (let i = 0; i < 1000; i++) {
    // ... do work
  }
  await continueAsNew<typeof processorWorkflow>(state);
}

11. Saga Pattern & Compensation

Implementing Distributed Transactions

Forward actions:                    Compensating actions:
  1. Reserve Hotel      ───────►    1'. Cancel Hotel
  2. Book Flight        ───────►    2'. Cancel Flight
  3. Charge Payment     ───────►    3'. Refund Payment

If step 3 fails → run compensations in reverse: 2', 1'

Go Implementation

func BookTripWorkflow(ctx workflow.Context, trip TripRequest) error {
    var compensations []func(ctx workflow.Context) error

    // Step 1: Reserve hotel
    hotelRes, err := activities.ReserveHotel(ctx, trip.Hotel)
    if err != nil {
        return err
    }
    compensations = append(compensations, func(ctx workflow.Context) error {
        return activities.CancelHotel(ctx, hotelRes.ReservationID)
    })

    // Step 2: Book flight
    flightRes, err := activities.BookFlight(ctx, trip.Flight)
    if err != nil {
        return compensate(ctx, compensations) // run compensations
    }
    compensations = append(compensations, func(ctx workflow.Context) error {
        return activities.CancelFlight(ctx, flightRes.BookingID)
    })

    // Step 3: Charge payment
    err = activities.ChargePayment(ctx, trip.Payment)
    if err != nil {
        return compensate(ctx, compensations)
    }

    return nil
}

func compensate(ctx workflow.Context, compensations []func(workflow.Context) error) error {
    // Run compensations in reverse order
    for i := len(compensations) - 1; i >= 0; i-- {
        if err := compensations[i](ctx); err != nil {
            // Log but continue — compensations should be best-effort
            workflow.GetLogger(ctx).Error("Compensation failed", "error", err)
        }
    }
    return fmt.Errorf("saga failed, compensations executed")
}

TypeScript Implementation

export async function bookTripWorkflow(trip: TripRequest): Promise<void> {
  const compensations: Array<() => Promise<void>> = [];

  try {
    const hotelRes = await reserveHotel(trip.hotel);
    compensations.push(() => cancelHotel(hotelRes.reservationId));

    const flightRes = await bookFlight(trip.flight);
    compensations.push(() => cancelFlight(flightRes.bookingId));

    await chargePayment(trip.payment);
  } catch (err) {
    // Execute compensations in reverse
    for (const comp of compensations.reverse()) {
      try {
        await comp();
      } catch (compErr) {
        log.error('Compensation failed', { error: compErr });
      }
    }
    throw err;
  }
}

12. Versioning & Patching

Problem

You need to update workflow code, but existing workflows are still running. Replay must produce the same sequence of events.

// Go — workflow.GetVersion
func MyWorkflow(ctx workflow.Context) error {
    v := workflow.GetVersion(ctx, "change-id-1", workflow.DefaultVersion, 1)
    if v == workflow.DefaultVersion {
        // Old code path (for existing workflows)
        err = activities.OldActivity(ctx)
    } else {
        // New code path (v == 1, for new workflows)
        err = activities.NewActivity(ctx)
    }
    return err
}
// TypeScript — patched()
export async function myWorkflow(): Promise<void> {
  if (patched('change-id-1')) {
    // New code path
    await newActivity();
  } else {
    // Old code path
    await oldActivity();
  }
}

Strategy 2: Worker Versioning (Build ID-based)

Temporal 1.21+ supports Worker Versioning — route workflows to compatible workers based on build IDs.

# Register build ID
temporal task-queue versioning insert-assignment-rule \
  --task-queue payments \
  --build-id "v2.0" \
  --percentage 100

# Worker declares its build ID
w := worker.New(c, "payments", worker.Options{
    BuildID:                 "v2.0",
    UseBuildIDForVersioning: true,
})

Strategy 3: Task Queue per Version

Route new workflows to a new task queue:

v1 workflows → task-queue: "payments-v1" → old workers
v2 workflows → task-queue: "payments-v2" → new workers

Simple but requires managing multiple queues.


13. Schedules & Cron Workflows

# Create a schedule via CLI
temporal schedule create \
  --schedule-id "daily-report" \
  --cron "0 9 * * *" \
  --workflow-type "GenerateReportWorkflow" \
  --task-queue "reports" \
  --input '{"type": "daily"}'

# List schedules
temporal schedule list

# Pause/unpause
temporal schedule toggle --schedule-id daily-report --pause
temporal schedule toggle --schedule-id daily-report --unpause

# Trigger immediately
temporal schedule trigger --schedule-id daily-report

# Delete
temporal schedule delete --schedule-id daily-report

Schedules via SDK (TypeScript)

import { Client, ScheduleOverlapPolicy } from '@temporalio/client';

const client = new Client();
const schedule = await client.schedule.create({
  scheduleId: 'daily-report-schedule',
  action: {
    type: 'startWorkflow',
    workflowType: 'generateDailyReport',
    args: [{ reportType: 'sales' }],
    taskQueue: 'reports',
  },
  spec: {
    // Option 1: Intervals
    intervals: [{ every: '1h' }],
    // Option 2: Calendar
    calendars: [{
      hour: { start: 9 },
      minute: { start: 30 },
      dayOfWeek: [{ start: 'MONDAY', end: 'FRIDAY' }],
    }],
    // Option 3: Cron expressions
    cronExpressions: ['30 9 * * MON-FRI'],
  },
  policies: {
    catchupWindow: '1 day',
    overlap: ScheduleOverlapPolicy.SKIP,
    pauseOnFailure: true,
  },
});

// Manage the schedule
const handle = client.schedule.getHandle('daily-report-schedule');
await handle.pause('Pausing for maintenance');
await handle.unpause('Maintenance complete');
await handle.trigger();  // Run immediately
await handle.backfill({
  start: new Date('2026-03-01T00:00:00Z'),
  end: new Date('2026-03-15T00:00:00Z'),
  overlap: ScheduleOverlapPolicy.ALLOW_ALL,
});
await handle.delete();

Schedule Policies

PolicyOptionsDescription
OverlapSkip, BufferOne, BufferAll, CancelOther, TerminateOther, AllowAllWhat to do if previous run is still running
Catchupnumber (default 0)How many missed runs to catch up on
Pause on Failuretrue/falseAuto-pause after workflow failure

Cron Workflows (Legacy)

// Set cron schedule at workflow start
opts := client.StartWorkflowOptions{
    ID:           "daily-report",
    TaskQueue:    "reports",
    CronSchedule: "0 9 * * *", // 9 AM daily
}

Cron vs Schedules:

FeatureCron WorkflowSchedule
Overlap policyNone (waits)Configurable
Pause/resumeMust terminate & restartBuilt-in
BackfillNot supportedSupported
Multiple specsNot supportedSupported
CLI managementLimitedFull CRUD

14. Search Attributes & Visibility

What Are Search Attributes?

Key-value pairs attached to workflows that enable custom filtering in the visibility API and Web UI.

Built-in Search Attributes

AttributeTypeDescription
WorkflowTypeKeywordWorkflow type name
WorkflowIdKeywordWorkflow ID
RunIdKeywordRun ID
StartTimeDatetimeWhen workflow started
CloseTimeDatetimeWhen workflow closed
ExecutionStatusKeywordRunning, Completed, Failed, etc.
TaskQueueKeywordTask queue name
ExecutionDurationIntDuration in nanoseconds

Custom Search Attributes

# Add custom search attributes (Temporal Server)
temporal operator search-attribute create \
  --name CustomerId --type Keyword
temporal operator search-attribute create \
  --name OrderAmount --type Double
temporal operator search-attribute create \
  --name IsVIP --type Bool

Using in Code

// Go — set at start
opts := client.StartWorkflowOptions{
    SearchAttributes: map[string]interface{}{
        "CustomerId":  "cust-123",
        "OrderAmount": 99.99,
        "IsVIP":       true,
    },
}

// Go — update during workflow
workflow.UpsertSearchAttributes(ctx, map[string]interface{}{
    "OrderAmount": 149.99,
})
// TypeScript — set at start
await client.workflow.start(orderWorkflow, {
  searchAttributes: {
    CustomerId: ['cust-123'],
    OrderAmount: [99.99],
    IsVIP: [true],
  },
});

// TypeScript — update during workflow
upsertSearchAttributes({
  OrderAmount: [149.99],
});

Querying Workflows

# List workflows with filter
temporal workflow list --query 'WorkflowType="OrderWorkflow" AND CustomerId="cust-123"'
temporal workflow list --query 'ExecutionStatus="Running" AND OrderAmount > 100'
temporal workflow list --query 'StartTime > "2024-01-01T00:00:00Z" AND IsVIP=true'

Search Attribute Types

TypeSQL Filter Operators
Keyword=, !=, IN
Text=, !=, LIKE (full-text search)
Int=, !=, >, <, >=, <=
Double=, !=, >, <, >=, <=
Bool=, !=
Datetime=, !=, >, <, >=, <=
KeywordList=, !=, IN

15. Data Conversion & Encryption

Data Converter Pipeline

Application Data


┌──────────────────┐
│ Payload Converter │  Serializes data (JSON, Protobuf, etc.)
└────────┬─────────┘


┌──────────────────┐
│  Payload Codec   │  Optional: encrypts, compresses
└────────┬─────────┘


   Temporal Server
   (stores encoded payloads)

Default Converters (in order)

  1. Nilnull/nil values
  2. Bytes — raw []byte
  3. Protobuf JSON — protobuf messages as JSON
  4. Protobuf — protobuf binary
  5. JSON — everything else (default for most data)

Custom Encryption Codec

// Go — encryption codec
type EncryptionCodec struct {
    key []byte
}

func (c *EncryptionCodec) Encode(payloads []*commonpb.Payload) ([]*commonpb.Payload, error) {
    result := make([]*commonpb.Payload, len(payloads))
    for i, p := range payloads {
        encrypted, err := encrypt(c.key, p.SerializeToBytes())
        result[i] = &commonpb.Payload{
            Metadata: map[string][]byte{"encoding": []byte("encrypted/aes256")},
            Data:     encrypted,
        }
    }
    return result, nil
}

func (c *EncryptionCodec) Decode(payloads []*commonpb.Payload) ([]*commonpb.Payload, error) {
    // Reverse of Encode
}

Codec Server

A standalone HTTP server that decodes payloads for the Web UI. Without it, encrypted payloads show as opaque blobs in the UI.

Web UI ──► Codec Server (decrypts) ──► Readable data in browser

16. Namespaces

What Namespaces Provide

FeatureIsolation?
Workflow IDsYes — same ID can exist in different namespaces
Task QueuesYes — same name, different namespaces = separate queues
Search AttributesYes — defined per namespace
Retention PeriodYes — configurable per namespace
ArchivalYes — configurable per namespace
Security/AuthYes — different permissions per namespace

Managing Namespaces

# Create namespace
temporal operator namespace create \
  --namespace staging \
  --retention 7d \
  --description "Staging environment"

# List namespaces
temporal operator namespace list

# Update retention
temporal operator namespace update \
  --namespace staging \
  --retention 30d

# Describe
temporal operator namespace describe --namespace staging

Common Namespace Strategy

production    → 30d retention, strict access
staging       → 7d retention, team access
development   → 3d retention, open access
team-payments → 14d retention, payments team only

17. Testing

Testing Approaches

ApproachSpeedFidelityUse For
Unit test (mock activities)FastLowLogic testing
Integration test (test server)MediumHighEnd-to-end flows
Time-skipping test serverFastHighWorkflows with timers/sleeps
Replay testFastMediumBackward compatibility

Go Testing

func TestOrderWorkflow(t *testing.T) {
    testSuite := &testsuite.WorkflowTestSuite{}
    env := testSuite.NewTestWorkflowEnvironment()

    // Mock activities
    env.OnActivity(activities.ChargeCard, mock.Anything, mock.Anything).
        Return(&ChargeResult{TxnID: "txn-123"}, nil)
    env.OnActivity(activities.ShipOrder, mock.Anything, mock.Anything).
        Return(nil)

    // Execute workflow
    env.ExecuteWorkflow(OrderWorkflow, OrderInput{OrderID: "order-1"})

    // Assert
    require.True(t, env.IsWorkflowCompleted())
    require.NoError(t, env.GetWorkflowError())

    var result string
    require.NoError(t, env.GetWorkflowResult(&result))
    assert.Equal(t, "completed", result)
}

// Test with timer
func TestReminderWorkflow(t *testing.T) {
    env := testSuite.NewTestWorkflowEnvironment()
    env.RegisterDelayedCallback(func() {
        env.SignalWorkflow("approve", ApprovalSignal{Approved: true})
    }, time.Hour*24) // Signal after 24h (instant in test)

    env.ExecuteWorkflow(ApprovalWorkflow, input)
    require.True(t, env.IsWorkflowCompleted())
}

TypeScript Testing

import { TestWorkflowEnvironment } from '@temporalio/testing';
import { Worker } from '@temporalio/worker';

describe('OrderWorkflow', () => {
  let testEnv: TestWorkflowEnvironment;

  beforeAll(async () => {
    testEnv = await TestWorkflowEnvironment.createTimeSkipping();
  });

  afterAll(async () => {
    await testEnv.teardown();
  });

  it('completes order successfully', async () => {
    const { client, nativeConnection } = testEnv;

    const worker = await Worker.create({
      connection: nativeConnection,
      taskQueue: 'test',
      workflowsPath: require.resolve('./workflows'),
      activities: {
        chargeCard: async () => ({ txnId: 'txn-123' }),
        shipOrder: async () => {},
      },
    });

    const result = await worker.runUntil(
      client.workflow.execute(orderWorkflow, {
        workflowId: 'test-order-1',
        taskQueue: 'test',
        args: [{ orderId: 'order-1' }],
      })
    );

    expect(result).toBe('completed');
  });
});

Python Testing

import pytest
from temporalio.testing import WorkflowEnvironment
from temporalio.worker import Worker

@pytest.fixture
async def env():
    async with await WorkflowEnvironment.start_time_skipping() as env:
        yield env

async def test_order_workflow(env: WorkflowEnvironment):
    async with Worker(
        env.client,
        task_queue="test",
        workflows=[OrderWorkflow],
        activities=[charge_card_mock, ship_order_mock],
    ):
        result = await env.client.execute_workflow(
            OrderWorkflow.run,
            OrderInput(order_id="order-1"),
            id="test-order-1",
            task_queue="test",
        )
        assert result == "completed"

Replay Testing (Backward Compatibility)

// Go — test that current code can replay old histories
func TestReplayFromJSON(t *testing.T) {
    replayer := worker.NewWorkflowReplayer()
    replayer.RegisterWorkflow(OrderWorkflow)

    err := replayer.ReplayWorkflowHistoryFromJSONFile(nil, "testdata/order_history.json")
    require.NoError(t, err) // Fails if code produces different commands
}
# Export workflow history for replay tests
temporal workflow show --workflow-id order-123 --output json > testdata/order_history.json

18. Observability — Metrics, Logging, Tracing

Prometheus Metrics

Server-side metrics (temporal server exposes on :9090):

MetricWhat It Shows
temporal_persistence_latencyDB operation latency
temporal_service_requestsRequest count by operation
temporal_service_errorsError count by type
temporal_persistence_requestsDB request count
temporal_history_sizeWorkflow history sizes
temporal_timer_active_countActive timers

SDK/Worker-side metrics (your workers expose):

MetricWhat It ShowsAlert When
temporal_workflow_task_schedule_to_start_latencyQueue wait time> 1s
temporal_activity_schedule_to_start_latencyActivity queue wait> 5s
temporal_activity_execution_latencyActivity durationExceeds SLA
temporal_sticky_cache_hitCache efficiency< 80%
temporal_worker_task_slots_availableFree executor slots= 0
temporal_workflow_completedCompleted workflowsSudden drop
temporal_workflow_failedFailed workflowsSudden spike
temporal_workflow_canceledCanceled workflows
temporal_activity_execution_failedFailed activities

Enabling Metrics

// Go — Prometheus metrics
clientOptions := client.Options{
    MetricsHandler: sdktally.NewMetricsHandler(newPrometheusScope(
        prometheus.Configuration{
            ListenAddress: "0.0.0.0:9464",
            TimerType:     "histogram",
        },
    )),
}
// TypeScript — Prometheus
import { Runtime } from '@temporalio/worker';
Runtime.install({
  telemetryOptions: {
    metrics: {
      prometheus: { bindAddress: '0.0.0.0:9464' },
    },
  },
});
# Python — Prometheus
from temporalio.runtime import Runtime, TelemetryConfig, PrometheusConfig
runtime = Runtime(telemetry=TelemetryConfig(
    metrics=PrometheusConfig(bind_address="0.0.0.0:9464"),
))
client = await Client.connect("localhost:7233", runtime=runtime)

OpenTelemetry Tracing

// Go — OpenTelemetry interceptor
import "go.temporal.io/sdk/contrib/opentelemetry"

clientOptions := client.Options{
    Interceptors: []interceptor.ClientInterceptor{
        opentelemetry.NewTracingInterceptor(opentelemetry.TracerOptions{
            Tracer: otel.Tracer("temporal"),
        }),
    },
}

Logging

// Go — structured logging
clientOptions := client.Options{
    Logger: temporal.NewStructuredLogger(slog.Default()),
}

// Inside workflow — use workflow logger (replay-safe)
workflow.GetLogger(ctx).Info("Processing order", "orderId", orderId)

// Inside activity — use activity logger
activity.GetLogger(ctx).Info("Charging card", "amount", amount)

Important: Never use fmt.Println or direct loggers in workflows — they’ll log on every replay. Use workflow.GetLogger() which auto-suppresses during replay.


19. Security

mTLS (Mutual TLS)

# temporal-server.yaml
tls:
  frontend:
    server:
      certFile: /certs/server.crt
      keyFile: /certs/server.key
      requireClientAuth: true
      clientCaFiles:
        - /certs/ca.crt
    client:
      serverName: temporal.example.com
// Go client with mTLS
cert, _ := tls.LoadX509KeyPair("client.crt", "client.key")
clientOptions := client.Options{
    HostPort: "temporal.example.com:7233",
    ConnectionOptions: client.ConnectionOptions{
        TLS: &tls.Config{
            Certificates: []tls.Certificate{cert},
            RootCAs:      caCertPool,
        },
    },
}

Authorization

Temporal supports pluggable Authorizer and ClaimMapper:

Request → ClaimMapper (extracts claims from cert/token) → Authorizer (permits/denies)

Built-in authorizer: Checks namespace-level permissions (read, write, admin).

Data Encryption

  • Use Payload Codecs for encryption-at-rest (see section 15)
  • All data (inputs, outputs, errors) can be encrypted before reaching the server
  • Server never sees plaintext — only encrypted payloads

Namespace Isolation

  • Each namespace has independent access controls
  • Cross-namespace communication not supported (use separate clients or Temporal Nexus)
  • Different teams can have different namespaces with different permissions

SSO Configuration (Web UI)

temporal-ui:
  environment:
    - TEMPORAL_AUTH_ENABLED=true
    - TEMPORAL_AUTH_PROVIDER_URL=https://your-idp.example.com
    - TEMPORAL_AUTH_CLIENT_ID=your-client-id
    - TEMPORAL_AUTH_CLIENT_SECRET=your-client-secret
    - TEMPORAL_AUTH_CALLBACK_URL=https://temporal-ui.example.com/auth/sso/callback
    - TEMPORAL_AUTH_SCOPES=openid profile email

Failure Types Reference

TypeDescription
Application FailureUser-thrown errors with type, message, non_retryable, details, nextRetryDelay
Timeout FailureActivity/Workflow timeout; attaches last heartbeat details
Cancelled FailureSuccessful cancellation of Workflow/Activity
Terminated FailureForceful Workflow termination (no cleanup runs)
Server FailureErrors originating in the Temporal Service
Activity FailureWraps the actual failure cause when an Activity fails
Child Workflow FailureWraps the actual failure cause when a Child Workflow fails

Key design principle: An Activity Failure never directly causes a Workflow Failure. Workflows only fail if coded to do so. Workflow Task Failures (e.g., non-determinism) are retried automatically with exponential backoff (max 10min interval).

Activity Heartbeating Details

// Record progress (saved; available on retry)
activity.RecordHeartbeat(ctx, progressData)

// Resume from last progress on retry
if activity.HasHeartbeatDetails(ctx) {
    var lastProgress MyProgress
    activity.GetHeartbeatDetails(ctx, &lastProgress)
    // Resume from lastProgress
}
  • Heartbeat throttling: min(heartbeatTimeout * 0.8, defaultThrottleInterval). Default throttle: 30s, max: 60s.
  • Activities without heartbeats cannot receive Cancellation signals.
  • Final heartbeat on failure is never throttled.

Multi-Cluster Replication

For disaster recovery and multi-region deployments using Global Namespaces:

# Cluster A config
clusterMetadata:
  enableGlobalNamespace: true
  failoverVersionIncrement: 100
  masterClusterName: "cluster-us-east"
  currentClusterName: "cluster-us-east"
  clusterInformation:
    cluster-us-east:
      enabled: true
      initialFailoverVersion: 1
      rpcAddress: "10.0.1.100:7233"
# Cluster B config
clusterMetadata:
  enableGlobalNamespace: true
  failoverVersionIncrement: 100
  masterClusterName: "cluster-us-west"
  currentClusterName: "cluster-us-west"
  clusterInformation:
    cluster-us-west:
      enabled: true
      initialFailoverVersion: 2
      rpcAddress: "10.0.2.100:7233"
# Register clusters with each other
temporal operator cluster upsert --frontend-address="10.0.2.100:7233"  # on Cluster A
temporal operator cluster upsert --frontend-address="10.0.1.100:7233"  # on Cluster B

# Failover
temporal operator namespace update --namespace my-global-ns --active-cluster cluster-us-west

Requirements: enableGlobalNamespace=true on all clusters, identical failoverVersionIncrement, unique initialFailoverVersion per cluster. Workers must poll on all clusters.


20. Production Deployment

Docker Compose (Quick Start)

version: "3.5"
services:
  postgresql:
    image: postgres:15
    environment:
      POSTGRES_USER: temporal
      POSTGRES_PASSWORD: temporal
    volumes:
      - postgres-data:/var/lib/postgresql/data
    ports:
      - "5432:5432"

  temporal:
    image: temporalio/auto-setup:latest
    depends_on:
      - postgresql
    environment:
      - DB=postgresql
      - DB_PORT=5432
      - POSTGRES_USER=temporal
      - POSTGRES_PWD=temporal
      - POSTGRES_SEEDS=postgresql
      - DYNAMIC_CONFIG_FILE_PATH=config/dynamicconfig/production.yaml
    ports:
      - "7233:7233"  # gRPC
    volumes:
      - ./dynamicconfig:/etc/temporal/config/dynamicconfig

  temporal-ui:
    image: temporalio/ui:latest
    depends_on:
      - temporal
    environment:
      - TEMPORAL_ADDRESS=temporal:7233
    ports:
      - "8080:8080"  # Web UI

  temporal-admin-tools:
    image: temporalio/admin-tools:latest
    depends_on:
      - temporal
    environment:
      - TEMPORAL_ADDRESS=temporal:7233

volumes:
  postgres-data:

Kubernetes (Helm)

# Add Temporal Helm repo
helm repo add temporal https://temporalio.github.io/helm-charts
helm repo update

# Install with PostgreSQL
helm install temporal temporal/temporal \
  --set server.replicaCount=3 \
  --set cassandra.enabled=false \
  --set mysql.enabled=false \
  --set postgresql.enabled=true \
  --set prometheus.enabled=true \
  --set grafana.enabled=true \
  --set elasticsearch.enabled=true \
  --namespace temporal \
  --create-namespace

# Custom values.yaml
helm install temporal temporal/temporal -f values.yaml

Production values.yaml Example

server:
  replicaCount: 3
  config:
    persistence:
      default:
        driver: sql
        sql:
          driver: postgres
          host: your-rds-endpoint.amazonaws.com
          port: 5432
          database: temporal
          user: temporal
          password: ${DB_PASSWORD}
          maxConns: 20
          maxIdleConns: 20
      visibility:
        driver: sql
        sql:
          driver: postgres
          host: your-rds-endpoint.amazonaws.com
          port: 5432
          database: temporal_visibility
          user: temporal
          password: ${DB_PASSWORD}

  frontend:
    replicaCount: 3
    resources:
      requests: { cpu: "1", memory: "2Gi" }
      limits:   { cpu: "2", memory: "4Gi" }

  history:
    replicaCount: 3
    resources:
      requests: { cpu: "2", memory: "4Gi" }
      limits:   { cpu: "4", memory: "8Gi" }

  matching:
    replicaCount: 3
    resources:
      requests: { cpu: "1", memory: "2Gi" }
      limits:   { cpu: "2", memory: "4Gi" }

  worker:
    replicaCount: 2
    resources:
      requests: { cpu: "0.5", memory: "1Gi" }
      limits:   { cpu: "1", memory: "2Gi" }

elasticsearch:
  enabled: true
  replicas: 2

web:
  replicaCount: 2

Dynamic Configuration

Temporal server supports runtime config changes without restart:

# dynamicconfig/production.yaml
# History settings
history.maximumBufferedEvents:
  - value: 100
    constraints: {}

# Retention
system.forceSearchAttributesCacheRefreshOnRead:
  - value: true
    constraints: {}

# Rate limits
frontend.rps:
  - value: 2400
    constraints: {}

frontend.namespaceRPS:
  - value: 400
    constraints: {}

# History size limit
limit.historySize.warn:
  - value: 10485760  # 10MB
    constraints: {}

limit.historySize.error:
  - value: 52428800  # 50MB
    constraints: {}

limit.historyCount.warn:
  - value: 10240
    constraints: {}

limit.historyCount.error:
  - value: 51200
    constraints: {}

Production Sizing Guide

ScaleWorkflows/secFrontendHistoryMatchingWorkerDB
Small (<100 wf/s)10-1002× 1CPU/2GB2× 2CPU/4GB2× 1CPU/2GB1× 0.5CPU/1GB2CPU/4GB PG
Medium (100-1K wf/s)100-10003× 2CPU/4GB3× 4CPU/8GB3× 2CPU/4GB2× 1CPU/2GB4CPU/16GB PG
Large (1K-10K wf/s)1000-100005× 4CPU/8GB5× 8CPU/16GB5× 4CPU/8GB3× 2CPU/4GB8CPU/32GB PG
XL (>10K wf/s)10000+Consider Cassandra + dedicated ES cluster

Database Tuning

-- PostgreSQL recommendations for Temporal
ALTER SYSTEM SET max_connections = 200;
ALTER SYSTEM SET shared_buffers = '4GB';       -- 25% of RAM
ALTER SYSTEM SET effective_cache_size = '12GB'; -- 75% of RAM
ALTER SYSTEM SET work_mem = '64MB';
ALTER SYSTEM SET maintenance_work_mem = '512MB';
ALTER SYSTEM SET random_page_cost = 1.1;       -- SSD
ALTER SYSTEM SET effective_io_concurrency = 200; -- SSD
ALTER SYSTEM SET wal_buffers = '64MB';
ALTER SYSTEM SET checkpoint_completion_target = 0.9;
ALTER SYSTEM SET max_wal_size = '2GB';

21. Temporal Cloud vs Self-Hosted

FeatureSelf-HostedTemporal Cloud
Server managementYou manageManaged by Temporal
DatabaseYou manageManaged
UpgradesYou do themAutomatic, zero-downtime
Multi-regionComplex to set upBuilt-in
SLADepends on you99.99% uptime SLA
mTLSYou configureBuilt-in
EncryptionYou implementAES-256-GCM at rest, TLS in transit
ComplianceDIYSOC 2 Type II, HIPAA eligible
PricingInfrastructure costPer-action pricing
SupportCommunity / OSSDedicated support
NamespacesUnlimitedBased on plan
History shards512 (configurable)8192
ScaleBased on infraEffectively unlimited

Temporal Cloud Pricing (2025)

PlanMonthly MinimumIncluded ActionsActive StorageRetained Storage
Essentials$100 / 5% usage1M1GB40GB
Business$500 / 10% usage2.5M2.5GB100GB
EnterpriseAnnual contract10M10GB400GB
Mission CriticalAnnual contract10M10GB400GB

Overage: $50/M (first 5M) down to $25/M (100-200M). Storage: $0.042/GBh active, $0.00105/GBh retained.

What counts as an action:
  - Activity/workflow task started
  - Signal sent/received
  - Timer started
  - Query handled
  - Child workflow started

What does NOT count:
  - Worker polling
  - Workflow/activity completion
  - History replay events

22. CLI Cheatsheet

Modern CLI (temporal)

# ============== SERVER ==============
temporal server start-dev                          # Start dev server (SQLite, port 7233, UI on 8233)
temporal server start-dev --db-filename temporal.db # Persistent dev DB
temporal server start-dev --port 7233 --ui-port 8233
temporal server start-dev --namespace my-ns         # Dev server with custom namespace

# ============== WORKFLOWS ==============
# Start a workflow
temporal workflow start \
  --type OrderWorkflow \
  --task-queue payments \
  --workflow-id order-123 \
  --input '{"orderId": "abc"}'

# Execute (start + wait for result)
temporal workflow execute \
  --type OrderWorkflow \
  --task-queue payments \
  --workflow-id order-123 \
  --input '{"orderId": "abc"}'

# List workflows
temporal workflow list
temporal workflow list --query 'WorkflowType="OrderWorkflow" AND ExecutionStatus="Running"'

# Describe a workflow
temporal workflow describe --workflow-id order-123

# Show event history
temporal workflow show --workflow-id order-123
temporal workflow show --workflow-id order-123 --output json > history.json

# Signal a workflow
temporal workflow signal \
  --workflow-id order-123 \
  --name approve \
  --input '{"approved": true}'

# Query a workflow
temporal workflow query \
  --workflow-id order-123 \
  --name getStatus

# Update a workflow
temporal workflow update \
  --workflow-id order-123 \
  --name updatePrice \
  --input '{"newPrice": 29.99}'

# Cancel a workflow (graceful)
temporal workflow cancel --workflow-id order-123

# Terminate a workflow (immediate, skip cleanup)
temporal workflow terminate --workflow-id order-123 --reason "manual cleanup"

# Reset a workflow to a specific event
temporal workflow reset \
  --workflow-id order-123 \
  --event-id 10 \
  --reason "replay from event 10"

# Count workflows
temporal workflow count --query 'ExecutionStatus="Running"'

# Delete a workflow (removes from visibility)
temporal workflow delete --workflow-id order-123

# View workflow call stack (debugging blocked workflows)
temporal workflow stack --workflow-id order-123

# Trace workflow execution tree (children, activities)
temporal workflow trace --workflow-id order-123 --depth 3

# Follow workflow events in real-time
temporal workflow show --workflow-id order-123 --follow

# Reset by type (reset to last ContinueAsNew point)
temporal workflow reset --workflow-id order-123 --type LastContinuedAsNew

# ============== TASK QUEUES ==============
temporal task-queue describe --task-queue payments
temporal task-queue list-partition --task-queue payments

# ============== SCHEDULES ==============
temporal schedule create --schedule-id daily-cleanup --cron "0 2 * * *" \
  --workflow-type CleanupWorkflow --task-queue maintenance

temporal schedule list
temporal schedule describe --schedule-id daily-cleanup
temporal schedule toggle --schedule-id daily-cleanup --pause --reason "maintenance"
temporal schedule toggle --schedule-id daily-cleanup --unpause
temporal schedule trigger --schedule-id daily-cleanup
temporal schedule update --schedule-id daily-cleanup --cron "0 3 * * *"
temporal schedule delete --schedule-id daily-cleanup

# ============== NAMESPACES ==============
temporal operator namespace create --namespace staging --retention 7d
temporal operator namespace list
temporal operator namespace describe --namespace staging
temporal operator namespace update --namespace staging --retention 30d
temporal operator namespace delete --namespace staging

# ============== SEARCH ATTRIBUTES ==============
temporal operator search-attribute create --name CustomerId --type Keyword
temporal operator search-attribute list

# ============== CLUSTER ==============
temporal operator cluster describe
temporal operator cluster health

# ============== BATCH OPERATIONS ==============
# Terminate all running workflows of a type
temporal workflow terminate \
  --query 'WorkflowType="BrokenWorkflow" AND ExecutionStatus="Running"' \
  --reason "batch cleanup"

# Signal multiple workflows
temporal workflow signal \
  --query 'WorkflowType="OrderWorkflow" AND ExecutionStatus="Running"' \
  --name shutdown \
  --input '{"graceful": true}'

# ============== ENV / CONFIG ==============
temporal env set local.address localhost:7233
temporal env set local.namespace default
temporal env get local
temporal env list

# Use specific environment
temporal --env local workflow list
temporal --env production workflow list

Legacy CLI (tctl) — Deprecated but Still Used

tctl workflow start --workflow_type OrderWorkflow --taskqueue payments --workflow_id order-123 --input '{"orderId":"abc"}'
tctl workflow observe --workflow_id order-123
tctl workflow list --open
tctl workflow signal --workflow_id order-123 --name approve --input '{"approved":true}'
tctl workflow query --workflow_id order-123 --query_type getStatus
tctl workflow cancel --workflow_id order-123
tctl workflow terminate --workflow_id order-123
tctl workflow showid order-123
tctl namespace register --namespace staging --retention 7
tctl namespace describe --namespace staging
tctl admin cluster describe

23. SDK Code Examples

Go — Complete Example

package main

import (
    "context"
    "fmt"
    "time"

    "go.temporal.io/sdk/activity"
    "go.temporal.io/sdk/client"
    "go.temporal.io/sdk/temporal"
    "go.temporal.io/sdk/worker"
    "go.temporal.io/sdk/workflow"
)

// ─── Workflow ───────────────────────────────────────────

type OrderInput struct {
    OrderID    string
    CustomerID string
    Amount     float64
}

type OrderResult struct {
    Status      string
    TrackingID  string
}

func OrderWorkflow(ctx workflow.Context, input OrderInput) (*OrderResult, error) {
    logger := workflow.GetLogger(ctx)
    logger.Info("Starting order workflow", "orderId", input.OrderID)

    // Activity options
    actOpts := workflow.ActivityOptions{
        StartToCloseTimeout: 30 * time.Second,
        RetryPolicy: &temporal.RetryPolicy{
            InitialInterval:    time.Second,
            BackoffCoefficient: 2.0,
            MaximumInterval:    30 * time.Second,
            MaximumAttempts:    3,
        },
    }
    ctx = workflow.WithActivityOptions(ctx, actOpts)

    // Step 1: Validate order
    var validated bool
    err := workflow.ExecuteActivity(ctx, ValidateOrder, input).Get(ctx, &validated)
    if err != nil {
        return nil, fmt.Errorf("validation failed: %w", err)
    }

    // Step 2: Charge payment
    var paymentResult PaymentResult
    err = workflow.ExecuteActivity(ctx, ChargePayment, input.OrderID, input.Amount).
        Get(ctx, &paymentResult)
    if err != nil {
        return nil, fmt.Errorf("payment failed: %w", err)
    }

    // Step 3: Ship order (longer timeout)
    shipCtx := workflow.WithActivityOptions(ctx, workflow.ActivityOptions{
        StartToCloseTimeout: 5 * time.Minute,
        HeartbeatTimeout:    30 * time.Second,
    })
    var trackingID string
    err = workflow.ExecuteActivity(shipCtx, ShipOrder, input.OrderID).Get(ctx, &trackingID)
    if err != nil {
        // Compensate: refund payment
        _ = workflow.ExecuteActivity(ctx, RefundPayment, paymentResult.TxnID).Get(ctx, nil)
        return nil, fmt.Errorf("shipping failed: %w", err)
    }

    // Step 4: Send confirmation
    _ = workflow.ExecuteActivity(ctx, SendConfirmation, input.CustomerID, trackingID).
        Get(ctx, nil)

    return &OrderResult{Status: "completed", TrackingID: trackingID}, nil
}

// ─── Signal Handler ─────────────────────────────────────

func OrderWithApprovalWorkflow(ctx workflow.Context, input OrderInput) (*OrderResult, error) {
    var approved bool
    approvalCh := workflow.GetSignalChannel(ctx, "approval")

    // Wait for approval signal (with timeout)
    timerCtx, cancelTimer := workflow.WithCancel(ctx)
    timerFuture := workflow.NewTimer(timerCtx, 24*time.Hour)

    selector := workflow.NewSelector(ctx)
    selector.AddReceive(approvalCh, func(ch workflow.ReceiveChannel, more bool) {
        var signal ApprovalSignal
        ch.Receive(ctx, &signal)
        approved = signal.Approved
        cancelTimer()
    })
    selector.AddFuture(timerFuture, func(f workflow.Future) {
        // Timeout — not approved
        approved = false
    })
    selector.Select(ctx)

    if !approved {
        return &OrderResult{Status: "rejected"}, nil
    }

    // Continue with order processing...
    return &OrderResult{Status: "approved"}, nil
}

// ─── Query Handler ──────────────────────────────────────

func OrderWithQueryWorkflow(ctx workflow.Context, input OrderInput) (*OrderResult, error) {
    status := "initialized"

    // Register query handler
    err := workflow.SetQueryHandler(ctx, "getStatus", func() (string, error) {
        return status, nil
    })
    if err != nil {
        return nil, err
    }

    status = "processing"
    // ... do work ...
    status = "completed"

    return &OrderResult{Status: status}, nil
}

// ─── Activities ─────────────────────────────────────────

type PaymentResult struct {
    TxnID string
}

type ApprovalSignal struct {
    Approved bool
    Reason   string
}

func ValidateOrder(ctx context.Context, input OrderInput) (bool, error) {
    logger := activity.GetLogger(ctx)
    logger.Info("Validating order", "orderId", input.OrderID)
    // ... validation logic
    return true, nil
}

func ChargePayment(ctx context.Context, orderID string, amount float64) (*PaymentResult, error) {
    // ... charge payment gateway
    return &PaymentResult{TxnID: "txn-" + orderID}, nil
}

func ShipOrder(ctx context.Context, orderID string) (string, error) {
    // Long-running activity with heartbeating
    for i := 0; i < 10; i++ {
        // Check for cancellation
        activity.RecordHeartbeat(ctx, i)
        // ... process shipping step
        time.Sleep(time.Second) // simulate work
    }
    return "TRACK-" + orderID, nil
}

func RefundPayment(ctx context.Context, txnID string) error {
    // ... refund logic
    return nil
}

func SendConfirmation(ctx context.Context, customerID, trackingID string) error {
    // ... send email/notification
    return nil
}

// ─── Worker ─────────────────────────────────────────────

func runWorker() error {
    c, err := client.Dial(client.Options{
        HostPort:  "localhost:7233",
        Namespace: "default",
    })
    if err != nil {
        return err
    }
    defer c.Close()

    w := worker.New(c, "order-processing", worker.Options{
        MaxConcurrentWorkflowTaskExecutions: 100,
        MaxConcurrentActivityTaskExecutions: 50,
    })

    // Register workflows
    w.RegisterWorkflow(OrderWorkflow)
    w.RegisterWorkflow(OrderWithApprovalWorkflow)
    w.RegisterWorkflow(OrderWithQueryWorkflow)

    // Register activities
    w.RegisterActivity(ValidateOrder)
    w.RegisterActivity(ChargePayment)
    w.RegisterActivity(ShipOrder)
    w.RegisterActivity(RefundPayment)
    w.RegisterActivity(SendConfirmation)

    return w.Run(worker.InterruptCh())
}

// ─── Client Usage ───────────────────────────────────────

func startWorkflow() error {
    c, err := client.Dial(client.Options{})
    if err != nil {
        return err
    }
    defer c.Close()

    // Start workflow
    we, err := c.ExecuteWorkflow(context.Background(), client.StartWorkflowOptions{
        ID:        "order-123",
        TaskQueue: "order-processing",
    }, OrderWorkflow, OrderInput{
        OrderID:    "order-123",
        CustomerID: "cust-456",
        Amount:     99.99,
    })
    if err != nil {
        return err
    }

    // Wait for result
    var result OrderResult
    err = we.Get(context.Background(), &result)
    fmt.Printf("Result: %+v\n", result)

    // Signal a workflow
    err = c.SignalWorkflow(context.Background(), "order-123", "", "approval",
        ApprovalSignal{Approved: true})

    // Query a workflow
    resp, err := c.QueryWorkflow(context.Background(), "order-123", "", "getStatus")
    var status string
    resp.Get(&status)

    return nil
}

TypeScript — Complete Example

// ─── workflows.ts ───────────────────────────────────────

import {
  proxyActivities,
  defineSignal,
  defineQuery,
  defineUpdate,
  setHandler,
  condition,
  sleep,
  continueAsNew,
  patched,
  ApplicationFailure,
  CancellationScope,
  log,
} from '@temporalio/workflow';
import type * as activities from './activities';

const { validateOrder, chargePayment, shipOrder, refundPayment, sendConfirmation } =
  proxyActivities<typeof activities>({
    startToCloseTimeout: '30s',
    retry: {
      initialInterval: '1s',
      backoffCoefficient: 2,
      maximumInterval: '30s',
      maximumAttempts: 3,
    },
  });

// Signals, Queries, Updates
export const approvalSignal = defineSignal<[{ approved: boolean; reason?: string }]>('approval');
export const getStatusQuery = defineQuery<string>('getStatus');
export const updatePriceUpdate = defineUpdate<
  { newPrice: number },  // result
  [{ price: number }]     // args
>('updatePrice');

// ─── Main Workflow ──────────────────────────────────────

export interface OrderInput {
  orderId: string;
  customerId: string;
  amount: number;
}

export async function orderWorkflow(input: OrderInput): Promise<{ status: string; trackingId?: string }> {
  log.info('Starting order workflow', { orderId: input.orderId });

  let status = 'initialized';
  let approved = false;

  // Query handler
  setHandler(getStatusQuery, () => status);

  // Signal handler
  setHandler(approvalSignal, (signal) => {
    approved = signal.approved;
  });

  // Update handler (with validator)
  setHandler(updatePriceUpdate, (args) => {
    input.amount = args.price;
    return { newPrice: args.price };
  }, {
    validator: (args) => {
      if (args.price <= 0) throw ApplicationFailure.nonRetryable('Price must be positive');
    },
  });

  // Wait for approval (max 24 hours)
  status = 'waiting_approval';
  const gotApproval = await condition(() => approved, '24h');
  if (!gotApproval) {
    return { status: 'timeout' };
  }

  status = 'processing';

  // Validate
  await validateOrder(input);

  // Charge
  const paymentResult = await chargePayment(input.orderId, input.amount);

  // Ship (with compensation on failure)
  try {
    const trackingId = await shipOrder(input.orderId);
    status = 'shipped';
    await sendConfirmation(input.customerId, trackingId);
    status = 'completed';
    return { status: 'completed', trackingId };
  } catch (err) {
    status = 'compensating';
    await refundPayment(paymentResult.txnId);
    throw err;
  }
}

// ─── Long-running workflow with ContinueAsNew ───────────

export async function eventProcessorWorkflow(state: {
  processedCount: number;
  lastEventId: string;
}): Promise<void> {
  const { fetchEvents, processEvent } = proxyActivities<typeof activities>({
    startToCloseTimeout: '10s',
  });

  for (let i = 0; i < 500; i++) {
    const events = await fetchEvents(state.lastEventId);
    if (events.length === 0) {
      await sleep('5s');
      continue;
    }
    for (const event of events) {
      await processEvent(event);
      state.processedCount++;
      state.lastEventId = event.id;
    }
  }

  // Reset history — continue with accumulated state
  await continueAsNew<typeof eventProcessorWorkflow>(state);
}

// ─── activities.ts ──────────────────────────────────────

import { Context, heartbeat, log } from '@temporalio/activity';

export async function validateOrder(input: { orderId: string }): Promise<boolean> {
  log.info('Validating order', { orderId: input.orderId });
  return true;
}

export async function chargePayment(orderId: string, amount: number): Promise<{ txnId: string }> {
  // ... call payment gateway
  return { txnId: `txn-${orderId}` };
}

export async function shipOrder(orderId: string): Promise<string> {
  // Long-running with heartbeat
  for (let step = 0; step < 10; step++) {
    Context.current().heartbeat(step);
    // ... shipping logic per step
    await new Promise((r) => setTimeout(r, 1000));
  }
  return `TRACK-${orderId}`;
}

export async function refundPayment(txnId: string): Promise<void> {
  // ... refund logic
}

export async function sendConfirmation(customerId: string, trackingId: string): Promise<void> {
  // ... send email
}

// ─── worker.ts ──────────────────────────────────────────

import { Worker, NativeConnection, Runtime } from '@temporalio/worker';
import * as activities from './activities';

async function run() {
  // Enable metrics
  Runtime.install({
    telemetryOptions: {
      metrics: { prometheus: { bindAddress: '0.0.0.0:9464' } },
    },
  });

  const connection = await NativeConnection.connect({
    address: 'localhost:7233',
  });

  const worker = await Worker.create({
    connection,
    namespace: 'default',
    taskQueue: 'order-processing',
    workflowsPath: require.resolve('./workflows'),
    activities,
    maxConcurrentWorkflowTaskExecutions: 100,
    maxConcurrentActivityTaskExecutions: 50,
  });

  await worker.run();
}

run().catch(console.error);

// ─── client.ts ──────────────────────────────────────────

import { Client, Connection } from '@temporalio/client';
import { orderWorkflow, approvalSignal, getStatusQuery } from './workflows';

async function main() {
  const connection = await Connection.connect({ address: 'localhost:7233' });
  const client = new Client({ connection });

  // Start workflow
  const handle = await client.workflow.start(orderWorkflow, {
    workflowId: 'order-123',
    taskQueue: 'order-processing',
    args: [{ orderId: 'order-123', customerId: 'cust-456', amount: 99.99 }],
    searchAttributes: { CustomerId: ['cust-456'] },
  });

  // Signal
  await handle.signal(approvalSignal, { approved: true });

  // Query
  const status = await handle.query(getStatusQuery);
  console.log('Status:', status);

  // Wait for result
  const result = await handle.result();
  console.log('Result:', result);

  // Cancel
  await handle.cancel();

  // Terminate
  await handle.terminate('manual cleanup');
}

Python — Complete Example

# ─── workflows.py ────────────────────────────────────────

import asyncio
from datetime import timedelta
from dataclasses import dataclass
from temporalio import workflow
from temporalio.common import RetryPolicy

with workflow.unsafe.imports_passed_through():
    from activities import (
        validate_order, charge_payment, ship_order,
        refund_payment, send_confirmation,
        OrderInput, PaymentResult,
    )

@dataclass
class ApprovalSignal:
    approved: bool
    reason: str = ""

@workflow.defn
class OrderWorkflow:
    def __init__(self):
        self._status = "initialized"
        self._approved = False

    @workflow.run
    async def run(self, input: OrderInput) -> dict:
        workflow.logger.info(f"Starting order workflow: {input.order_id}")

        # Wait for approval
        self._status = "waiting_approval"
        try:
            await workflow.wait_condition(
                lambda: self._approved,
                timeout=timedelta(hours=24),
            )
        except asyncio.TimeoutError:
            return {"status": "timeout"}

        self._status = "processing"

        # Validate
        await workflow.execute_activity(
            validate_order,
            input,
            start_to_close_timeout=timedelta(seconds=30),
            retry_policy=RetryPolicy(maximum_attempts=3),
        )

        # Charge
        payment_result = await workflow.execute_activity(
            charge_payment,
            args=[input.order_id, input.amount],
            start_to_close_timeout=timedelta(seconds=30),
        )

        # Ship (with compensation)
        try:
            tracking_id = await workflow.execute_activity(
                ship_order,
                input.order_id,
                start_to_close_timeout=timedelta(minutes=5),
                heartbeat_timeout=timedelta(seconds=30),
            )
        except Exception:
            self._status = "compensating"
            await workflow.execute_activity(
                refund_payment,
                payment_result.txn_id,
                start_to_close_timeout=timedelta(seconds=30),
            )
            raise

        self._status = "completed"
        await workflow.execute_activity(
            send_confirmation,
            args=[input.customer_id, tracking_id],
            start_to_close_timeout=timedelta(seconds=30),
        )

        return {"status": "completed", "tracking_id": tracking_id}

    @workflow.signal
    async def approval(self, signal: ApprovalSignal):
        self._approved = signal.approved

    @workflow.query
    def get_status(self) -> str:
        return self._status

    @workflow.update
    async def update_price(self, new_price: float) -> float:
        # Validate and update
        if new_price <= 0:
            raise ValueError("Price must be positive")
        return new_price

# ─── activities.py ───────────────────────────────────────

from dataclasses import dataclass
from temporalio import activity
import asyncio

@dataclass
class OrderInput:
    order_id: str
    customer_id: str
    amount: float

@dataclass
class PaymentResult:
    txn_id: str

@activity.defn
async def validate_order(input: OrderInput) -> bool:
    activity.logger.info(f"Validating order: {input.order_id}")
    return True

@activity.defn
async def charge_payment(order_id: str, amount: float) -> PaymentResult:
    return PaymentResult(txn_id=f"txn-{order_id}")

@activity.defn
async def ship_order(order_id: str) -> str:
    for step in range(10):
        activity.heartbeat(step)
        await asyncio.sleep(1)
    return f"TRACK-{order_id}"

@activity.defn
async def refund_payment(txn_id: str) -> None:
    pass

@activity.defn
async def send_confirmation(customer_id: str, tracking_id: str) -> None:
    pass

# ─── worker.py ───────────────────────────────────────────

import asyncio
from temporalio.client import Client
from temporalio.worker import Worker
from workflows import OrderWorkflow
from activities import (
    validate_order, charge_payment, ship_order,
    refund_payment, send_confirmation,
)

async def main():
    client = await Client.connect("localhost:7233")

    worker = Worker(
        client,
        task_queue="order-processing",
        workflows=[OrderWorkflow],
        activities=[
            validate_order, charge_payment, ship_order,
            refund_payment, send_confirmation,
        ],
        max_concurrent_workflow_tasks=100,
        max_concurrent_activities=50,
    )

    await worker.run()

if __name__ == "__main__":
    asyncio.run(main())

# ─── client_usage.py ─────────────────────────────────────

import asyncio
from temporalio.client import Client
from workflows import OrderWorkflow, ApprovalSignal
from activities import OrderInput

async def main():
    client = await Client.connect("localhost:7233")

    # Start workflow
    handle = await client.start_workflow(
        OrderWorkflow.run,
        OrderInput(order_id="order-123", customer_id="cust-456", amount=99.99),
        id="order-123",
        task_queue="order-processing",
    )

    # Signal
    await handle.signal(OrderWorkflow.approval, ApprovalSignal(approved=True))

    # Query
    status = await handle.query(OrderWorkflow.get_status)
    print(f"Status: {status}")

    # Wait for result
    result = await handle.result()
    print(f"Result: {result}")

if __name__ == "__main__":
    asyncio.run(main())

Java — Complete Example

// ─── OrderWorkflow.java (Interface) ────────────────────

@WorkflowInterface
public interface OrderWorkflow {
    @WorkflowMethod
    OrderResult run(OrderInput input);

    @SignalMethod
    void approval(ApprovalSignal signal);

    @QueryMethod
    String getStatus();

    @UpdateMethod
    double updatePrice(double newPrice);

    @UpdateValidatorMethod(updateName = "updatePrice")
    void validateUpdatePrice(double newPrice);
}

// ─── OrderWorkflowImpl.java ────────────────────────────

public class OrderWorkflowImpl implements OrderWorkflow {
    private static final Logger logger = Workflow.getLogger(OrderWorkflowImpl.class);
    private String status = "initialized";
    private boolean approved = false;

    private final OrderActivities activities = Workflow.newActivityStub(
        OrderActivities.class,
        ActivityOptions.newBuilder()
            .setStartToCloseTimeout(Duration.ofSeconds(30))
            .setRetryOptions(RetryOptions.newBuilder()
                .setInitialInterval(Duration.ofSeconds(1))
                .setBackoffCoefficient(2.0)
                .setMaximumInterval(Duration.ofSeconds(30))
                .setMaximumAttempts(3)
                .build())
            .build()
    );

    @Override
    public OrderResult run(OrderInput input) {
        status = "waiting_approval";
        Workflow.await(Duration.ofHours(24), () -> approved);
        if (!approved) return new OrderResult("timeout", null);

        status = "processing";
        activities.validateOrder(input);

        PaymentResult payment = activities.chargePayment(input.getOrderId(), input.getAmount());

        try {
            String trackingId = activities.shipOrder(input.getOrderId());
            status = "completed";
            activities.sendConfirmation(input.getCustomerId(), trackingId);
            return new OrderResult("completed", trackingId);
        } catch (ActivityFailure e) {
            status = "compensating";
            activities.refundPayment(payment.getTxnId());
            throw e;
        }
    }

    @Override
    public void approval(ApprovalSignal signal) {
        this.approved = signal.isApproved();
    }

    @Override
    public String getStatus() {
        return status;
    }

    @Override
    public double updatePrice(double newPrice) {
        return newPrice;
    }

    @Override
    public void validateUpdatePrice(double newPrice) {
        if (newPrice <= 0) throw new IllegalArgumentException("Price must be positive");
    }
}

// ─── Worker ────────────────────────────────────────────

WorkflowServiceStubs service = WorkflowServiceStubs.newLocalServiceStubs();
WorkflowClient client = WorkflowClient.newInstance(service);

WorkerFactory factory = WorkerFactory.newInstance(client);
Worker worker = factory.newWorker("order-processing", WorkerOptions.newBuilder()
    .setMaxConcurrentWorkflowTaskExecutionSize(100)
    .setMaxConcurrentActivityExecutionSize(50)
    .build());

worker.registerWorkflowImplementationTypes(OrderWorkflowImpl.class);
worker.registerActivitiesImplementations(new OrderActivitiesImpl());

factory.start();

24. Advanced Patterns

Entity Workflow (Long-Lived Stateful Entity)

Model a persistent entity (user account, shopping cart, device) as a workflow:

export async function shoppingCartWorkflow(userId: string): Promise<void> {
  const items: Map<string, CartItem> = new Map();
  let checkedOut = false;

  // Signal handlers for mutations
  setHandler(addItemSignal, (item: CartItem) => {
    items.set(item.productId, item);
  });

  setHandler(removeItemSignal, (productId: string) => {
    items.delete(productId);
  });

  setHandler(checkoutSignal, () => {
    checkedOut = true;
  });

  // Query handler for reads
  setHandler(getCartQuery, () => Array.from(items.values()));

  // Keep alive until checkout or 30-day timeout
  const didCheckout = await condition(() => checkedOut, '30d');

  if (didCheckout && items.size > 0) {
    await processCheckout(userId, Array.from(items.values()));
  }

  // Cart expires — optionally continueAsNew for truly infinite entities
}

Async Activity Completion

Activity started in one process, completed from another:

// Start activity — return ErrResultPending
func ExternalApproval(ctx context.Context, request ApprovalRequest) (string, error) {
    info := activity.GetInfo(ctx)
    taskToken := info.TaskToken

    // Send task token to external system (webhook, email link, etc.)
    sendApprovalEmail(request.ApproverEmail, taskToken)

    return "", activity.ErrResultPending // Activity stays open
}

// Later — complete from external system (webhook handler)
func handleWebhookApproval(c client.Client, taskToken []byte, result string) {
    c.CompleteActivity(context.Background(), taskToken, result, nil)
    // or: c.CompleteActivityByID(...)
}

Local Activities

Run an activity on the same worker as the workflow — no task queue dispatch:

localOpts := workflow.LocalActivityOptions{
    StartToCloseTimeout: 5 * time.Second,
}
localCtx := workflow.WithLocalActivityOptions(ctx, localOpts)
err := workflow.ExecuteLocalActivity(localCtx, LookupCache, key).Get(ctx, &result)

Use when:

  • Very short execution time (<1s)
  • No need for independent retries
  • Want to avoid task queue overhead
  • Activity runs on same machine as workflow

Side Effects

Run non-deterministic code inside a workflow (recorded once, replayed from history):

// Go
encodedRandom := workflow.SideEffect(ctx, func(ctx workflow.Context) interface{} {
    return rand.Intn(100)
})
var random int
encodedRandom.Get(&random)
// TypeScript — uuid
import { uuid4 } from '@temporalio/workflow';
const id = uuid4(); // deterministic UUID

Workflow Interceptors (Middleware)

// Go — logging interceptor
type LoggingInterceptor struct {
    interceptor.WorkflowInboundInterceptorBase
}

func (i *LoggingInterceptor) ExecuteWorkflow(ctx workflow.Context, in *interceptor.ExecuteWorkflowInput) (interface{}, error) {
    log.Printf("Workflow started: %s", workflow.GetInfo(ctx).WorkflowType.Name)
    result, err := i.Next.ExecuteWorkflow(ctx, in)
    log.Printf("Workflow completed: %v, error: %v", result, err)
    return result, err
}

Mutex / Semaphore (Cross-Workflow Locking)

Use a “mutex workflow” to serialize access to a shared resource:

func MutexWorkflow(ctx workflow.Context, resourceID string) error {
    lockCh := workflow.GetSignalChannel(ctx, "lock")
    unlockCh := workflow.GetSignalChannel(ctx, "unlock")

    for {
        // Wait for lock request
        var requester string
        lockCh.Receive(ctx, &requester)

        // Grant lock (signal back to requester)
        workflow.SignalExternalWorkflow(ctx, requester, "", "lock-acquired", nil)

        // Wait for unlock
        unlockCh.Receive(ctx, nil)
    }
}

25. Use Cases

1. Order Processing & E-Commerce

Customer places order → Validate → Reserve inventory → Charge payment
→ Fulfill/Ship → Send confirmation → Handle returns (days later)

Why Temporal: Multi-step, spans multiple services, needs compensation on failure.

2. Payment Processing

Initiate payment → Fraud check → Charge → Settlement → Reconciliation

Why Temporal: Financial accuracy — every step must complete or compensate.

3. Microservice Orchestration (Saga)

Service A → Service B → Service C → Done
              ↓ fail
           Compensate B → Compensate A

Why Temporal: Native saga support, guaranteed compensation.

4. Human-in-the-Loop Approvals

Submit request → Wait for manager approval (days) → Process → Notify

Why Temporal: Workflows can wait indefinitely for signals.

5. Subscription & Billing

Trial (14 days) → First charge → Monthly recurring → Renewal/Cancellation

Why Temporal: Long-running with timers and state transitions.

6. User Onboarding

Sign up → Welcome email → Wait 1 day → Tutorial email → Wait 3 days
→ Feature highlight → Check engagement → Branch based on behavior

Why Temporal: Complex scheduling with conditional branching.

7. Data Pipelines / ETL

Extract (paginated) → Transform (batch) → Load → Validate → Report

Why Temporal: Automatic retry, checkpointing via heartbeat, progress tracking.

8. Infrastructure Provisioning

Create VPC → Create subnet → Launch instances → Configure LB → DNS setup

Why Temporal: Long-running (minutes), needs compensation on partial failure.

9. CI/CD Pipelines

Build → Test → Deploy to staging → Integration tests → Approval gate
→ Deploy to production → Smoke tests → Rollback on failure

Why Temporal: Multi-stage with human gates and automatic rollback.

10. Document Processing

Upload → OCR/Extract → Classify → Validate → Route for review → Archive

Why Temporal: Mix of automated and human steps.

11. IoT Device Management

Device registers → Provision certificates → Configure → Monitor heartbeats
→ OTA update (with rollback) → Decommission

Why Temporal: Entity workflow pattern for device lifecycle.

12. Customer Support Ticket

Created → Auto-classify → Route to team → SLA timer starts → Escalate if no response
→ Resolution → Customer satisfaction survey (3 days later)

Why Temporal: SLA tracking with timers and escalation.

13. Batch Processing with Rate Limiting

Read 1M records → Process in batches of 1000 → Rate limit to 100/sec
→ Retry failures → Generate completion report

Why Temporal: ContinueAsNew for unbounded work, activity rate limiting.

14. Multi-Region / Multi-Cluster Replication

Write to primary → Replicate to region A → Replicate to region B → Confirm

Why Temporal: Durability guarantees across regions.

15. Machine Learning Pipelines

Fetch training data → Preprocess → Train model → Evaluate → A/B test
→ Deploy (if metrics pass) → Monitor in production

Why Temporal: Long-running training with checkpoints and branching.


26. Anti-Patterns & Pitfalls

DON’T: Non-Deterministic Workflow Code

// BAD — different on replay
if time.Now().Hour() > 12 { ... }
if rand.Intn(10) > 5 { ... }

// GOOD — use Temporal APIs
if workflow.Now(ctx).Hour() > 12 { ... }
encoded := workflow.SideEffect(ctx, func(ctx workflow.Context) interface{} {
    return rand.Intn(10)
})

DON’T: Huge Payloads in Workflow History

// BAD — 10MB stored in history every time
result := workflow.ExecuteActivity(ctx, FetchLargeDataset).Get(ctx, &data)

// GOOD — store reference, not data
result := workflow.ExecuteActivity(ctx, FetchAndStoreInS3).Get(ctx, &s3Key)

DON’T: Unbounded Loops Without ContinueAsNew

// BAD — history grows forever
for {
    workflow.Sleep(ctx, time.Minute)
    workflow.ExecuteActivity(ctx, Poll).Get(ctx, nil)
}

// GOOD — reset after N iterations
for i := 0; i < 1000; i++ {
    workflow.Sleep(ctx, time.Minute)
    workflow.ExecuteActivity(ctx, Poll).Get(ctx, nil)
}
return workflow.NewContinueAsNewError(ctx, MyWorkflow, state)

DON’T: Use Workflow ID as a Counter

// BAD — no uniqueness, races possible
we := c.ExecuteWorkflow(ctx, opts, MyWorkflow)
// Two clients starting same workflow ID simultaneously

// GOOD — use idempotency keys
opts.ID = "order-" + orderID  // natural business key
opts.WorkflowIDReusePolicy = enums.WORKFLOW_ID_REUSE_POLICY_REJECT_DUPLICATE

DON’T: Tight-Loop Polling in Workflows

// BAD — burns through history events
for !done {
    workflow.Sleep(ctx, time.Second) // creates TimerStarted + TimerFired events
    done = workflow.ExecuteActivity(ctx, CheckStatus).Get(ctx, nil)
}

// GOOD — use signals or longer intervals
signalCh := workflow.GetSignalChannel(ctx, "status-update")
signalCh.Receive(ctx, &status)

DON’T: Ignore Workflow Task Timeout

// BAD — large history causes replay > 10s, task times out, infinite retry loop
// Fix: increase WorkflowTaskTimeout for complex workflows
opts.WorkflowTaskTimeout = 30 * time.Second

DON’T: Block Workflow Thread

// BAD — blocks the workflow goroutine (Go)
time.Sleep(5 * time.Second)
http.Get("https://api.example.com")

// GOOD — use Temporal APIs
workflow.Sleep(ctx, 5*time.Second)
workflow.ExecuteActivity(ctx, CallAPI).Get(ctx, nil)

27. Quick Reference Cheatsheet

Workflow Lifecycle States

         ┌──────────┐
         │  Running  │
         └────┬─────┘

    ┌─────────┼──────────┬────────────┐
    ▼         ▼          ▼            ▼
Completed   Failed   Cancelled   Terminated
    │         │          │            │
    └─────────┼──────────┴────────────┘

    ContinueAsNew (new Run ID, same Workflow ID)
    TimedOut (execution or run timeout exceeded)

Timeout Decision Tree

"How long should an activity take?"
├── Normal case: 1-30 seconds
│   └── StartToCloseTimeout: 30s, RetryPolicy: 3 attempts
├── External API with variable latency
│   └── StartToCloseTimeout: 2m, ScheduleToCloseTimeout: 10m
├── Long-running processing (minutes-hours)
│   └── StartToCloseTimeout: 2h, HeartbeatTimeout: 60s
└── Human task (external completion)
    └── ScheduleToCloseTimeout: 7d, no heartbeat

Common Workflow Patterns

Sequential:     A → B → C → Done
Parallel:       A + B + C → Wait All → Done
Fan-out/in:     Spawn N children → Collect results
Saga:           A → B → C (fail) → Compensate B → Compensate A
Entity:         Create → Handle signals forever → ContinueAsNew
Pipeline:       Process batch → ContinueAsNew(remaining)
State Machine:  Signal transitions: Created → Active → Suspended → Closed
Polling:        Loop { Activity + Sleep } → ContinueAsNew
Timer:          Sleep(duration) → Execute → Done
Human-in-loop:  Request → Wait for Signal → Process

SDK Quick Reference

OperationGoTypeScriptPythonJava
Sleepworkflow.Sleep(ctx, d)await sleep(d)await asyncio.sleep(d)Workflow.sleep(d)
Execute Activityworkflow.ExecuteActivity(ctx, fn, args).Get(ctx, &res)await fn(args) (proxied)await workflow.execute_activity(fn, args, ...)activities.fn(args) (stub)
Signal Channelworkflow.GetSignalChannel(ctx, name)setHandler(signal, fn)@workflow.signal@SignalMethod
Queryworkflow.SetQueryHandler(ctx, name, fn)setHandler(query, fn)@workflow.query@QueryMethod
Child Workflowworkflow.ExecuteChildWorkflow(ctx, fn, args)executeChild(fn, {args})await workflow.execute_child_workflow(fn, args)Workflow.newChildWorkflowStub(I.class)
Timerworkflow.NewTimer(ctx, d)sleep(d)await asyncio.sleep(d)Workflow.newTimer(d)
ContinueAsNewworkflow.NewContinueAsNewError(ctx, fn, args)continueAsNew(args)workflow.continue_as_new(args)Workflow.continueAsNew(args)
Side Effectworkflow.SideEffect(ctx, fn)N/A (use uuid4())N/A (use activities)Workflow.sideEffect(cls, fn)
Nowworkflow.Now(ctx)Date.now() (intercepted)workflow.now()Workflow.currentTimeMillis()
Randomworkflow.NewRandom(ctx)Math.random() (intercepted)N/A (use side effect)Workflow.newRandom()
UUIDworkflow.SideEffectuuid4()workflow.uuid4()Workflow.randomUUID()

Environment Variables

# Client connection
TEMPORAL_ADDRESS=localhost:7233
TEMPORAL_NAMESPACE=default
TEMPORAL_TLS_CERT=/path/to/client.crt
TEMPORAL_TLS_KEY=/path/to/client.key
TEMPORAL_TLS_CA=/path/to/ca.crt

# Temporal Cloud
TEMPORAL_ADDRESS=your-ns.your-account.tmprl.cloud:7233
TEMPORAL_NAMESPACE=your-ns.your-account

Docker Quick Start

# Fastest way to start (SQLite, ephemeral)
temporal server start-dev

# With persistent storage
temporal server start-dev --db-filename /tmp/temporal.db

# With custom namespace
temporal server start-dev --namespace my-app

# Full stack via Docker
docker compose up -d  # using the compose file from section 20

Health Checks

# Server health
temporal operator cluster health
# Expected: SERVING

# Task queue has workers?
temporal task-queue describe --task-queue my-queue
# Check Pollers list — should not be empty

# Workflow stuck?
temporal workflow describe --workflow-id my-wf
# Check PendingActivities, PendingChildren

# History size growing?
temporal workflow show --workflow-id my-wf | wc -l
# > 10,000 events = consider ContinueAsNew

Dependency Versions (as of 2025)

ComponentRecommended Version
Temporal Server1.24+
Go SDK1.29+
TypeScript SDK1.11+
Python SDK1.7+
Java SDK1.25+
.NET SDK1.2+
PostgreSQL14+
MySQL8.0+
Elasticsearch7.10+ / OpenSearch 2.x

Appendix: Glossary

TermDefinition
WorkflowDeterministic, durable function orchestrating business logic
ActivityNon-deterministic function for side-effects
WorkerYour process that executes workflows and activities
Task QueueNamed queue connecting clients/workflows to workers
NamespaceIsolation boundary for workflows, queues, and config
SignalAsync message to a running workflow (fire-and-forget)
QuerySync, read-only inspection of workflow state
UpdateSync, read-write mutation of workflow state with result
Event HistoryAppend-only log of all workflow execution events
ReplayRe-executing workflow code using cached event history
Sticky ExecutionRouting workflow tasks back to the same cached worker
Continue-As-NewRestarting a workflow with fresh history
Workflow TaskInternal task to advance workflow state (~milliseconds)
Activity TaskTask to execute an activity function
ScheduleServer-managed cron-like recurring workflow execution
CodecEncoder/decoder for payload encryption or compression
Search AttributeIndexed key-value metadata for workflow filtering
Workflow IDUser-defined unique identifier for a workflow
Run IDSystem-generated UUID for a specific workflow run
HeartbeatPeriodic progress report from a long-running activity
ShardPartition of the History service for parallelism
RingpopSWIM-based gossip protocol for server membership

Last updated: 2026-03-27