Temporal is a durable execution platform that guarantees your code runs to completion, surviving crashes, deployments, network failures, and server restarts. It turns any function into a reliable, fault-tolerant workflow.
Table of Contents
- Architecture & Internals
- Core Concepts
- Workflow Execution Model
- Workers — Deep Dive
- Task Queues
- Timeouts — Complete Reference
- Retry Policies
- Signals, Queries & Updates
- Child Workflows
- Continue-As-New
- Saga Pattern & Compensation
- Versioning & Patching
- Schedules & Cron Workflows
- Search Attributes & Visibility
- Data Conversion & Encryption
- Namespaces
- Testing
- Observability — Metrics, Logging, Tracing
- Security
- Production Deployment
- Temporal Cloud vs Self-Hosted
- CLI Cheatsheet
- SDK Code Examples
- Advanced Patterns
- Use Cases
- Anti-Patterns & Pitfalls
- Quick Reference Cheatsheet
1. Architecture & Internals
High-Level Architecture
┌──────────────────────────────────┐
│ Temporal Server │
│ (Cluster of 4 internal services) │
│ │
Client ──gRPC──────► │ ┌──────────┐ ┌──────────────┐ │
(SDK) │ │ Frontend │ │ Matching │ │
│ │ Service │ │ Service │ │
│ └────┬─────┘ └──────┬───────┘ │
│ │ │ │
Worker ──gRPC──────► │ ┌────┴─────┐ ┌──────┴───────┐ │
(your code) │ │ History │ │ Worker │ │
│ │ Service │ │ Service │ │
│ └────┬─────┘ └──────────────┘ │
│ │ │
│ ┌────┴───────────────────────┐ │
│ │ Persistence Layer │ │
│ │ (Postgres/MySQL/Cassandra) │ │
│ └────────────┬───────────────┘ │
│ │ │
│ ┌────────────┴───────────────┐ │
│ │ Visibility Store │ │
│ │ (Elasticsearch/PostgreSQL) │ │
│ └────────────────────────────┘ │
└──────────────────────────────────┘
The 4 Internal Services
| Service | Responsibility |
|---|---|
| Frontend | Rate limiting, routing, authorization, request validation. Entry point for all gRPC calls. Ports: gRPC 7233, Membership 6933 |
| History | Maintains workflow event histories, executes state machine transitions, handles timers. The brain of Temporal. Each workflow is owned by exactly one history shard. Ports: gRPC 7234, Membership 6934. Scale: 1 process per 500 shards. |
| Matching | Manages task queues. Dispatches workflow tasks and activity tasks to polling workers. Handles sync vs async matching. Ports: gRPC 7235, Membership 6935 |
| Worker (internal) | Runs system-level workflows: archival, replication, batch operations. Not your application workers. Port: Membership 6939 |
History Shards
- The History service is sharded — default 512 shards for self-hosted, 8192 for Temporal Cloud.
- Each workflow is assigned to a shard via
hash(namespaceID + workflowID) % numShards. - Shards are distributed across History service instances via Ringpop (consistent hashing).
- More shards = more parallelism = higher throughput. But more DB connections.
- CRITICAL: Shard count is IMMUTABLE after initial setup. Cannot be changed without re-creating the database.
- Each shard maintains 4 internal queues: Transfer (Matching dispatch), Timer (durable timers), Replicator (multi-cluster), Visibility (indexing).
- Scale: Range from 1 to 128K shards. Recommendation: 1 History process per 500 shards.
Shard Sizing Guide:
| Scale | Recommended Shards |
|---|---|
| Dev/test | 1-4 |
| Small prod | 16-64 |
| Medium prod | 128-512 |
| Large prod | 512-4096 |
| Extreme scale | 4096-128K |
History Shard Assignment:
Workflow "order-123" → hash → Shard 47 → History Node 3
Workflow "order-456" → hash → Shard 191 → History Node 1
Persistence Layer
┌─────────────────────────────────────────────────────┐
│ Persistence Layer │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ Executions │ │ History │ │ Visibility │ │
│ │ (workflow │ │ (event │ │ (search │ │
│ │ metadata) │ │ histories) │ │ index) │ │
│ └──────────────┘ └──────────────┘ └────────────┘ │
│ │
│ Supported Databases (production-tested versions): │
│ • PostgreSQL 13-16 (recommended for most) │
│ • MySQL 5.7, 8.0 (8.0.19+ required for some features│
│ • Cassandra 3.11, 4.0 (extreme scale, 100K+ wf/s) │
│ • SQLite 3.x (dev only, via temporal server start-dev│
│ │
│ Four data categories stored: │
│ 1. Tasks (queued for dispatch) │
│ 2. Workflow Execution State (mutable + event history)│
│ 3. Namespace Metadata │
│ 4. Visibility Data (search/filter index) │
└─────────────────────────────────────────────────────-─┘
Ringpop (Membership Protocol)
- Temporal server instances discover each other via Ringpop (SWIM-based gossip protocol).
- Consistent hashing ring distributes shards across nodes.
- When a node joins/leaves, shards rebalance automatically.
- Requires seed nodes for bootstrapping (configured in
temporal-server.yaml).
2. Core Concepts
Workflow
A deterministic function that orchestrates your business logic. Temporal persists its execution state so it survives any failure.
Properties:
- Must be deterministic — same input always produces same execution path
- Can run for seconds to years
- Has a unique Workflow ID (user-defined or auto-generated)
- Has a Run ID (system-generated UUID per execution attempt)
- Maintains full event history in the server
Deterministic constraints — you CANNOT do these inside a workflow:
| Forbidden | Use Instead |
|---|---|
Math.random() / rand() | workflow.random() / workflow.newRandom() |
Date.now() / time.Now() | workflow.now() / workflow.GetInfo().CurrentTime |
setTimeout / time.Sleep | workflow.sleep() / workflow.Sleep() |
| Network I/O, DB calls | Activities |
| File system access | Activities |
| Global mutable state | Workflow-scoped variables |
| Non-deterministic libraries | Wrap in activities |
| Goroutines / threads | workflow.Go() / workflow.executeChild() |
Activity
A normal function that performs side-effects (API calls, DB writes, file I/O). Activities run on workers and their results are recorded in workflow history.
Properties:
- Can be non-deterministic — free to do anything
- Automatically retried on failure (configurable)
- Support heartbeating for long-running work
- Have configurable timeouts
- Results are cached — on workflow replay, cached result is returned
Activity Types:
| Type | Description |
|---|---|
| Regular Activity | Dispatched via task queue, runs on any worker polling that queue |
| Local Activity | Runs on the same worker as the workflow, bypasses task queue. Lower latency but no independent retry/timeout tracking. |
Worker
A process you run that executes workflows and activities. It polls the Temporal server for tasks.
Task Queue
A named queue that connects workflows/activities to workers. Workers poll specific task queues. Multiple workers can poll the same queue for load balancing.
Namespace
A logical isolation boundary. Each namespace has its own workflow IDs, task queues, search attributes, and retention policies. Workflows in different namespaces cannot interact directly.
3. Workflow Execution Model
Event Sourcing & Replay
Temporal uses event sourcing — every step of a workflow is recorded as an event.
Event History for "order-123":
┌─────┬────────────────────────────────┬────────────────────┐
│ # │ Event Type │ Details │
├─────┼────────────────────────────────┼────────────────────┤
│ 1 │ WorkflowExecutionStarted │ input: {orderId} │
│ 2 │ WorkflowTaskScheduled │ │
│ 3 │ WorkflowTaskStarted │ worker: host-1 │
│ 4 │ WorkflowTaskCompleted │ │
│ 5 │ ActivityTaskScheduled │ chargeCard │
│ 6 │ ActivityTaskStarted │ worker: host-2 │
│ 7 │ ActivityTaskCompleted │ result: {txnId} │
│ 8 │ TimerStarted │ 10 minutes │
│ 9 │ TimerFired │ │
│ 10 │ ActivityTaskScheduled │ shipOrder │
│ 11 │ ActivityTaskStarted │ worker: host-1 │
│ 12 │ ActivityTaskCompleted │ result: {trackId} │
│ 13 │ WorkflowExecutionCompleted │ result: "done" │
└─────┴────────────────────────────────┴────────────────────┘
Replay Mechanism
When a worker picks up a workflow task:
1. Load event history from server
2. Execute workflow code from the beginning
3. For each command (activity, timer, child workflow):
a. If matching event exists in history → return cached result (NO re-execution)
b. If no matching event → this is NEW work, execute it
4. Return new commands to server
5. If worker crashes before completing → server re-assigns to another worker
Key insight: Activities are NOT re-executed on replay. Only the workflow function re-runs; activity results come from history.
Sticky Execution
By default, Temporal uses sticky task queues to route workflow tasks back to the same worker that previously ran them. This avoids re-downloading the full event history.
Normal flow (sticky):
Worker A caches workflow state → next task routes to Worker A → resumes from cache
Sticky cache miss:
Worker A dies → task goes to normal queue → Worker B downloads full history → replays
Sticky queue config:
StickyScheduleToStartTimeout: 5s (how long to wait for sticky worker before falling back)
WorkflowCacheSize: 2000 (how many workflows to cache per worker)
Workflow Task vs Activity Task
┌─────────────────────┐ ┌────────────────────────┐
│ Workflow Task │ │ Activity Task │
├─────────────────────┤ ├────────────────────────┤
│ • Runs workflow code │ │ • Runs activity code │
│ • Very fast (~1-5ms) │ │ • Can take seconds/mins │
│ • Deterministic │ │ • Non-deterministic │
│ • Produces commands │ │ • Produces results │
│ • Lightweight │ │ • Can be CPU/IO heavy │
└─────────────────────┘ └────────────────────────┘
4. Workers — Deep Dive
Worker Architecture
┌──────────────────────────────────────────────────────────┐
│ Worker Process │
│ │
│ ┌─────────────── Task Queue: "payments" ──────────────┐ │
│ │ │ │
│ │ Workflow Task Poller(s) Activity Task Poller(s) │ │
│ │ [Thread 1] [Thread 2] [Thread 1] [Thread 2] │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ ┌──────────────┐ ┌──────────────────┐ │ │
│ │ │ WF Executor │ │ Activity Executor │ │ │
│ │ │ Slots: 100 │ │ Slots: 100 │ │ │
│ │ │ │ │ │ │ │
│ │ │ [wf-1][wf-2] │ │ [act-1][act-2] │ │ │
│ │ │ [wf-3][...] │ │ [act-3][...] │ │ │
│ │ └──────────────┘ └──────────────────┘ │ │
│ │ │ │
│ │ Sticky Cache: LRU(2000 workflows) │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ Registered Workflows: [OrderWorkflow, PaymentWorkflow] │
│ Registered Activities: [chargeCard, refund, sendEmail] │
└──────────────────────────────────────────────────────────-─┘
All Worker Configuration Options
| Option | Default | Description |
|---|---|---|
MaxConcurrentWorkflowTaskExecutions | 100 | Max workflow tasks executing simultaneously |
MaxConcurrentActivityTaskExecutions | 100 | Max activity tasks executing simultaneously |
MaxConcurrentWorkflowTaskPollers | 2 | Threads polling for workflow tasks |
MaxConcurrentActivityTaskPollers | 2 | Threads polling for activity tasks |
MaxConcurrentLocalActivityExecutions | 100 | Max local activities executing simultaneously |
WorkerActivitiesPerSecond | unlimited | Rate limit: activities/sec this worker can process |
TaskQueueActivitiesPerSecond | unlimited | Rate limit: activities/sec across ALL workers on this queue |
WorkflowPanicPolicy | BlockWorkflow | What happens on non-determinism: BlockWorkflow or FailWorkflow |
StickyScheduleToStartTimeout | 5s | Time to wait for sticky worker before fallback |
WorkflowCacheSize | 2000 | Number of cached workflow states (sticky cache) |
EnableSessionWorker | false | Enable session-based activity routing |
MaxConcurrentSessionExecutionSize | 1000 | Max concurrent sessions |
DeadlockDetectionTimeout | 1s (Go) | Detects blocked workflow goroutines |
Identity | hostname+pid | Worker identity shown in UI |
EnableLoggingInReplay | false | Whether to emit logs during replay |
Worker Scaling Guidelines
Scaling Decision Matrix:
─────────────────────────────────────────────────────────
Symptom → Action
─────────────────────────────────────────────────────────
schedule_to_start_latency HIGH → Add more worker instances
+ slots available → Increase pollers
+ slots full → Add instances or increase MaxConcurrent*
Activity execution slow → Optimize activity code, or increase instances
Workflow task latency HIGH → Increase WorkflowCacheSize (reduce replays)
+ sticky cache hit rate LOW → ^^ same
Memory high on workers → Reduce WorkflowCacheSize
→ Reduce MaxConcurrent*
CPU high on workers → Add worker instances
→ Reduce MaxConcurrent*
─────────────────────────────────────────────────────────
Multiple Task Queues per Worker
A single worker can poll multiple task queues:
// Go — one worker per task queue
w1 := worker.New(c, "payments", worker.Options{})
w2 := worker.New(c, "notifications", worker.Options{})
// Both run in the same process
// TypeScript — each Worker instance handles one task queue
const w1 = await Worker.create({ taskQueue: 'payments', ... });
const w2 = await Worker.create({ taskQueue: 'notifications', ... });
// Run both
await Promise.all([w1.run(), w2.run()]);
5. Task Queues
How Task Queues Work
Temporal Server
┌──────────────────────────┐
Client starts │ Task Queue: "payments" │
workflow ──────────►│ ┌────┬────┬────┬────┐ │
│ │ T1 │ T2 │ T3 │ T4 │ │◄── Tasks waiting
│ └────┴────┴────┴────┘ │
└──────────┬───────────────┘
│ Long-poll
┌──────────┼──────────────┐
│ │ │
Worker A Worker B Worker C
(picks T1) (picks T2) (picks T3)
Task queues are:
- Virtual — created on first use, no pre-configuration needed
- Per-namespace — same name in different namespaces = different queues
- Server-side — tasks live on the Temporal server, not on workers
- Load-balanced — round-robin (with sync matching optimization)
Sync vs Async Matching
| Mode | How It Works | When |
|---|---|---|
| Sync Match | Task is directly handed to a waiting poller (no DB write) | A worker is already polling when task arrives |
| Async Match | Task is written to DB, worker picks it up later | No worker is polling at that moment |
Sync matching is faster (sub-millisecond). Temporal optimizes for it.
Task Queue Partitions
For high-throughput queues, Temporal splits them into partitions (default: 4 per queue in Temporal Cloud).
Task Queue "high-volume"
├── Partition 0
├── Partition 1
├── Partition 2
└── Partition 3
More partitions = higher dispatch throughput, but workers must poll each partition.
6. Timeouts — Complete Reference
Visual Overview
Workflow Execution Timeout
◄─────────────────────────────────────────────────────────►
Workflow Run Timeout
◄───────────────────────────────────► (per run, resets on ContinueAsNew)
┌─ Activity ScheduleToClose Timeout ──────────────────┐
│ │
│ ScheduleToStart StartToClose │
│ ◄──────────► ◄──────────────────────────► │
│ (queue wait) (execution time) │
│ │
│ ◄─ Heartbeat ─► ◄─ Heartbeat ─► │
│ (periodic check-in) │
└───────────────────────────────────────────────────────┘
All Timeout Types
| Timeout | Applies To | Default | Description |
|---|---|---|---|
| WorkflowExecutionTimeout | Workflow | Unlimited | Max time for the ENTIRE workflow, including all retries and ContinueAsNew runs |
| WorkflowRunTimeout | Workflow | Same as ExecutionTimeout | Max time for a SINGLE workflow run (resets on ContinueAsNew) |
| WorkflowTaskTimeout | Workflow Task | 10s | Max time for a worker to process a single workflow task (code step). If exceeded, task is re-assigned to another worker. Increase for very large histories. |
| ScheduleToCloseTimeout | Activity | Unlimited | Max time from scheduling to completion (includes queue wait + execution + retries) |
| StartToCloseTimeout | Activity | ScheduleToClose | Max time from when a worker picks up an activity to when it must complete. This is the one you almost always set. |
| ScheduleToStartTimeout | Activity | ScheduleToClose | Max time an activity can wait in the queue. Detects when no workers are available. |
| HeartbeatTimeout | Activity | None | Max time between heartbeats. If exceeded, activity is considered failed. Use for long-running activities. |
Timeout Best Practices
# For a typical API call activity:
StartToCloseTimeout: 30s
RetryPolicy: { maximumAttempts: 3 }
# For a long-running data processing activity:
StartToCloseTimeout: 1h
HeartbeatTimeout: 60s ← ensures we detect stuck activities
RetryPolicy: { maximumAttempts: 5 }
# For human-approval workflow:
No activity timeout (it's a signal wait)
WorkflowExecutionTimeout: 30d
# To detect "no workers" situations:
ScheduleToStartTimeout: 30s ← fails fast if no worker picks it up
7. Retry Policies
Default Retry Behavior
| Component | Default Retry | Notes |
|---|---|---|
| Activity | Infinite retries | Retries until ScheduleToCloseTimeout or explicit max |
| Workflow | No retries | Workflows don’t retry by default (set in WorkflowOptions) |
| Workflow Task | Infinite retries | Internal retries on non-determinism errors, panics, etc. |
Retry Policy Configuration
RetryPolicy {
InitialInterval: 1s // First retry delay
BackoffCoefficient: 2.0 // Multiplier for each retry
MaximumInterval: 100s // Cap on retry delay
MaximumAttempts: 0 // 0 = unlimited (until timeout)
NonRetryableErrorTypes: [ // Error types that skip retries
"InvalidInputError",
"BusinessRuleViolation"
]
}
Retry Timeline Example
Attempt 1: fails at T=0
wait 1s (InitialInterval)
Attempt 2: fails at T=1s
wait 2s (1s × 2.0 backoff)
Attempt 3: fails at T=3s
wait 4s
Attempt 4: fails at T=7s
wait 8s
Attempt 5: fails at T=15s
wait 16s
Attempt 6: fails at T=31s
wait 32s
Attempt 7: fails at T=63s
wait 64s
Attempt 8: fails at T=127s
wait 100s (capped by MaximumInterval)
Attempt 9: at T=227s
...continues until MaximumAttempts or ScheduleToCloseTimeout
Non-Retryable Errors
Mark errors as non-retryable to skip retry:
// Go
temporal.NewNonRetryableApplicationError("invalid input", "INVALID", nil)
// TypeScript
throw ApplicationFailure.nonRetryable("invalid input", "INVALID");
# Python
raise ApplicationError("invalid input", non_retryable=True)
// Java
throw ApplicationFailure.newNonRetryableFailure("invalid input", "INVALID");
8. Signals, Queries & Updates
Signals
Asynchronous messages sent to a running workflow. The workflow handles them when it’s ready.
Client ──signal──► Temporal Server ──delivers──► Workflow (handles in handler)
- Fire-and-forget (client doesn’t wait for result)
- Buffered — delivered even if workflow is currently busy
- Persisted in event history
- Can carry data payload
Queries
Synchronous, read-only requests to inspect workflow state. Returns immediately.
Client ──query──► Temporal Server ──routes──► Worker (executes handler) ──result──► Client
- Read-only — cannot modify workflow state
- Not recorded in event history
- Must have a running workflow to query
- Routed to the worker with cached state (sticky) if possible
Updates (Temporal 1.21+)
Synchronous, read-write messages. Client sends data, workflow processes it, and returns a result.
Client ──update──► Temporal Server ──delivers──► Workflow (processes) ──result──► Client
- Can modify workflow state AND return a result
- Client waits for the result (unlike signals)
- Recorded in event history
- Supports validators (reject invalid updates before applying)
Comparison
| Feature | Signal | Query | Update |
|---|---|---|---|
| Direction | Client → Workflow | Client ↔ Workflow | Client ↔ Workflow |
| Modifies State | Yes | No | Yes |
| Returns Result | No | Yes | Yes |
| In Event History | Yes | No | Yes |
| Sync/Async | Async | Sync | Sync |
| Requires Running WF | No (buffered) | Yes | Yes |
9. Child Workflows
A workflow started by another workflow. The parent can wait for the result or fire-and-forget.
When to Use Child Workflows
| Use Case | Why |
|---|---|
| Separate failure domains | Child can fail/retry independently |
| Different timeouts | Child has its own execution timeout |
| Break up large histories | Each child has its own history (avoids 50K event limit) |
| Reuse workflow logic | Same workflow can be started as child or standalone |
| Different task queues | Route child to different workers/services |
| Different retry policies | Each child retries independently |
Parent Close Policy
What happens to children when parent completes/fails:
| Policy | Behavior |
|---|---|
TERMINATE | Child is terminated (default) |
ABANDON | Child continues running independently |
REQUEST_CANCEL | Cancellation request sent to child |
10. Continue-As-New
Resets workflow history by starting a new run with fresh state. Essential for long-running or unbounded workflows.
Why?
Event history has a hard limit (~50,000 events or 50MB). The service warns at 10,240 events and errors at the limit. ContinueAsNew resets the counter.
Event counting guide:
- Minimal workflow (start + complete): ~5 events
- Single activity execution: ~11 events (schedule, start, complete + workflow task events)
- Activity in a loop: 5 + 6 * iterations
- A workflow with 20 sequential activities: ~125 events
- Use
workflow.GetInfo(ctx).GetContinueAsNewSuggested()(Go) orworkflowInfo().historyLength(TS) to check.
Run 1: Events 1-10,000 → ContinueAsNew(state)
Run 2: Events 1-10,000 → ContinueAsNew(state)
Run 3: Events 1-5,000 → Complete
Pattern
// Go
func ProcessorWorkflow(ctx workflow.Context, state State) error {
for i := 0; i < 1000; i++ { // process batch
// ... do work
}
// Reset with accumulated state
return workflow.NewContinueAsNewError(ctx, ProcessorWorkflow, state)
}
// TypeScript
export async function processorWorkflow(state: State): Promise<void> {
for (let i = 0; i < 1000; i++) {
// ... do work
}
await continueAsNew<typeof processorWorkflow>(state);
}
11. Saga Pattern & Compensation
Implementing Distributed Transactions
Forward actions: Compensating actions:
1. Reserve Hotel ───────► 1'. Cancel Hotel
2. Book Flight ───────► 2'. Cancel Flight
3. Charge Payment ───────► 3'. Refund Payment
If step 3 fails → run compensations in reverse: 2', 1'
Go Implementation
func BookTripWorkflow(ctx workflow.Context, trip TripRequest) error {
var compensations []func(ctx workflow.Context) error
// Step 1: Reserve hotel
hotelRes, err := activities.ReserveHotel(ctx, trip.Hotel)
if err != nil {
return err
}
compensations = append(compensations, func(ctx workflow.Context) error {
return activities.CancelHotel(ctx, hotelRes.ReservationID)
})
// Step 2: Book flight
flightRes, err := activities.BookFlight(ctx, trip.Flight)
if err != nil {
return compensate(ctx, compensations) // run compensations
}
compensations = append(compensations, func(ctx workflow.Context) error {
return activities.CancelFlight(ctx, flightRes.BookingID)
})
// Step 3: Charge payment
err = activities.ChargePayment(ctx, trip.Payment)
if err != nil {
return compensate(ctx, compensations)
}
return nil
}
func compensate(ctx workflow.Context, compensations []func(workflow.Context) error) error {
// Run compensations in reverse order
for i := len(compensations) - 1; i >= 0; i-- {
if err := compensations[i](ctx); err != nil {
// Log but continue — compensations should be best-effort
workflow.GetLogger(ctx).Error("Compensation failed", "error", err)
}
}
return fmt.Errorf("saga failed, compensations executed")
}
TypeScript Implementation
export async function bookTripWorkflow(trip: TripRequest): Promise<void> {
const compensations: Array<() => Promise<void>> = [];
try {
const hotelRes = await reserveHotel(trip.hotel);
compensations.push(() => cancelHotel(hotelRes.reservationId));
const flightRes = await bookFlight(trip.flight);
compensations.push(() => cancelFlight(flightRes.bookingId));
await chargePayment(trip.payment);
} catch (err) {
// Execute compensations in reverse
for (const comp of compensations.reverse()) {
try {
await comp();
} catch (compErr) {
log.error('Compensation failed', { error: compErr });
}
}
throw err;
}
}
12. Versioning & Patching
Problem
You need to update workflow code, but existing workflows are still running. Replay must produce the same sequence of events.
Strategy 1: Patching (Recommended)
// Go — workflow.GetVersion
func MyWorkflow(ctx workflow.Context) error {
v := workflow.GetVersion(ctx, "change-id-1", workflow.DefaultVersion, 1)
if v == workflow.DefaultVersion {
// Old code path (for existing workflows)
err = activities.OldActivity(ctx)
} else {
// New code path (v == 1, for new workflows)
err = activities.NewActivity(ctx)
}
return err
}
// TypeScript — patched()
export async function myWorkflow(): Promise<void> {
if (patched('change-id-1')) {
// New code path
await newActivity();
} else {
// Old code path
await oldActivity();
}
}
Strategy 2: Worker Versioning (Build ID-based)
Temporal 1.21+ supports Worker Versioning — route workflows to compatible workers based on build IDs.
# Register build ID
temporal task-queue versioning insert-assignment-rule \
--task-queue payments \
--build-id "v2.0" \
--percentage 100
# Worker declares its build ID
w := worker.New(c, "payments", worker.Options{
BuildID: "v2.0",
UseBuildIDForVersioning: true,
})
Strategy 3: Task Queue per Version
Route new workflows to a new task queue:
v1 workflows → task-queue: "payments-v1" → old workers
v2 workflows → task-queue: "payments-v2" → new workers
Simple but requires managing multiple queues.
13. Schedules & Cron Workflows
Schedules (Recommended — Temporal 1.20+)
# Create a schedule via CLI
temporal schedule create \
--schedule-id "daily-report" \
--cron "0 9 * * *" \
--workflow-type "GenerateReportWorkflow" \
--task-queue "reports" \
--input '{"type": "daily"}'
# List schedules
temporal schedule list
# Pause/unpause
temporal schedule toggle --schedule-id daily-report --pause
temporal schedule toggle --schedule-id daily-report --unpause
# Trigger immediately
temporal schedule trigger --schedule-id daily-report
# Delete
temporal schedule delete --schedule-id daily-report
Schedules via SDK (TypeScript)
import { Client, ScheduleOverlapPolicy } from '@temporalio/client';
const client = new Client();
const schedule = await client.schedule.create({
scheduleId: 'daily-report-schedule',
action: {
type: 'startWorkflow',
workflowType: 'generateDailyReport',
args: [{ reportType: 'sales' }],
taskQueue: 'reports',
},
spec: {
// Option 1: Intervals
intervals: [{ every: '1h' }],
// Option 2: Calendar
calendars: [{
hour: { start: 9 },
minute: { start: 30 },
dayOfWeek: [{ start: 'MONDAY', end: 'FRIDAY' }],
}],
// Option 3: Cron expressions
cronExpressions: ['30 9 * * MON-FRI'],
},
policies: {
catchupWindow: '1 day',
overlap: ScheduleOverlapPolicy.SKIP,
pauseOnFailure: true,
},
});
// Manage the schedule
const handle = client.schedule.getHandle('daily-report-schedule');
await handle.pause('Pausing for maintenance');
await handle.unpause('Maintenance complete');
await handle.trigger(); // Run immediately
await handle.backfill({
start: new Date('2026-03-01T00:00:00Z'),
end: new Date('2026-03-15T00:00:00Z'),
overlap: ScheduleOverlapPolicy.ALLOW_ALL,
});
await handle.delete();
Schedule Policies
| Policy | Options | Description |
|---|---|---|
| Overlap | Skip, BufferOne, BufferAll, CancelOther, TerminateOther, AllowAll | What to do if previous run is still running |
| Catchup | number (default 0) | How many missed runs to catch up on |
| Pause on Failure | true/false | Auto-pause after workflow failure |
Cron Workflows (Legacy)
// Set cron schedule at workflow start
opts := client.StartWorkflowOptions{
ID: "daily-report",
TaskQueue: "reports",
CronSchedule: "0 9 * * *", // 9 AM daily
}
Cron vs Schedules:
| Feature | Cron Workflow | Schedule |
|---|---|---|
| Overlap policy | None (waits) | Configurable |
| Pause/resume | Must terminate & restart | Built-in |
| Backfill | Not supported | Supported |
| Multiple specs | Not supported | Supported |
| CLI management | Limited | Full CRUD |
14. Search Attributes & Visibility
What Are Search Attributes?
Key-value pairs attached to workflows that enable custom filtering in the visibility API and Web UI.
Built-in Search Attributes
| Attribute | Type | Description |
|---|---|---|
WorkflowType | Keyword | Workflow type name |
WorkflowId | Keyword | Workflow ID |
RunId | Keyword | Run ID |
StartTime | Datetime | When workflow started |
CloseTime | Datetime | When workflow closed |
ExecutionStatus | Keyword | Running, Completed, Failed, etc. |
TaskQueue | Keyword | Task queue name |
ExecutionDuration | Int | Duration in nanoseconds |
Custom Search Attributes
# Add custom search attributes (Temporal Server)
temporal operator search-attribute create \
--name CustomerId --type Keyword
temporal operator search-attribute create \
--name OrderAmount --type Double
temporal operator search-attribute create \
--name IsVIP --type Bool
Using in Code
// Go — set at start
opts := client.StartWorkflowOptions{
SearchAttributes: map[string]interface{}{
"CustomerId": "cust-123",
"OrderAmount": 99.99,
"IsVIP": true,
},
}
// Go — update during workflow
workflow.UpsertSearchAttributes(ctx, map[string]interface{}{
"OrderAmount": 149.99,
})
// TypeScript — set at start
await client.workflow.start(orderWorkflow, {
searchAttributes: {
CustomerId: ['cust-123'],
OrderAmount: [99.99],
IsVIP: [true],
},
});
// TypeScript — update during workflow
upsertSearchAttributes({
OrderAmount: [149.99],
});
Querying Workflows
# List workflows with filter
temporal workflow list --query 'WorkflowType="OrderWorkflow" AND CustomerId="cust-123"'
temporal workflow list --query 'ExecutionStatus="Running" AND OrderAmount > 100'
temporal workflow list --query 'StartTime > "2024-01-01T00:00:00Z" AND IsVIP=true'
Search Attribute Types
| Type | SQL Filter Operators |
|---|---|
Keyword | =, !=, IN |
Text | =, !=, LIKE (full-text search) |
Int | =, !=, >, <, >=, <= |
Double | =, !=, >, <, >=, <= |
Bool | =, != |
Datetime | =, !=, >, <, >=, <= |
KeywordList | =, !=, IN |
15. Data Conversion & Encryption
Data Converter Pipeline
Application Data
│
▼
┌──────────────────┐
│ Payload Converter │ Serializes data (JSON, Protobuf, etc.)
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Payload Codec │ Optional: encrypts, compresses
└────────┬─────────┘
│
▼
Temporal Server
(stores encoded payloads)
Default Converters (in order)
- Nil —
null/nilvalues - Bytes — raw
[]byte - Protobuf JSON — protobuf messages as JSON
- Protobuf — protobuf binary
- JSON — everything else (default for most data)
Custom Encryption Codec
// Go — encryption codec
type EncryptionCodec struct {
key []byte
}
func (c *EncryptionCodec) Encode(payloads []*commonpb.Payload) ([]*commonpb.Payload, error) {
result := make([]*commonpb.Payload, len(payloads))
for i, p := range payloads {
encrypted, err := encrypt(c.key, p.SerializeToBytes())
result[i] = &commonpb.Payload{
Metadata: map[string][]byte{"encoding": []byte("encrypted/aes256")},
Data: encrypted,
}
}
return result, nil
}
func (c *EncryptionCodec) Decode(payloads []*commonpb.Payload) ([]*commonpb.Payload, error) {
// Reverse of Encode
}
Codec Server
A standalone HTTP server that decodes payloads for the Web UI. Without it, encrypted payloads show as opaque blobs in the UI.
Web UI ──► Codec Server (decrypts) ──► Readable data in browser
16. Namespaces
What Namespaces Provide
| Feature | Isolation? |
|---|---|
| Workflow IDs | Yes — same ID can exist in different namespaces |
| Task Queues | Yes — same name, different namespaces = separate queues |
| Search Attributes | Yes — defined per namespace |
| Retention Period | Yes — configurable per namespace |
| Archival | Yes — configurable per namespace |
| Security/Auth | Yes — different permissions per namespace |
Managing Namespaces
# Create namespace
temporal operator namespace create \
--namespace staging \
--retention 7d \
--description "Staging environment"
# List namespaces
temporal operator namespace list
# Update retention
temporal operator namespace update \
--namespace staging \
--retention 30d
# Describe
temporal operator namespace describe --namespace staging
Common Namespace Strategy
production → 30d retention, strict access
staging → 7d retention, team access
development → 3d retention, open access
team-payments → 14d retention, payments team only
17. Testing
Testing Approaches
| Approach | Speed | Fidelity | Use For |
|---|---|---|---|
| Unit test (mock activities) | Fast | Low | Logic testing |
| Integration test (test server) | Medium | High | End-to-end flows |
| Time-skipping test server | Fast | High | Workflows with timers/sleeps |
| Replay test | Fast | Medium | Backward compatibility |
Go Testing
func TestOrderWorkflow(t *testing.T) {
testSuite := &testsuite.WorkflowTestSuite{}
env := testSuite.NewTestWorkflowEnvironment()
// Mock activities
env.OnActivity(activities.ChargeCard, mock.Anything, mock.Anything).
Return(&ChargeResult{TxnID: "txn-123"}, nil)
env.OnActivity(activities.ShipOrder, mock.Anything, mock.Anything).
Return(nil)
// Execute workflow
env.ExecuteWorkflow(OrderWorkflow, OrderInput{OrderID: "order-1"})
// Assert
require.True(t, env.IsWorkflowCompleted())
require.NoError(t, env.GetWorkflowError())
var result string
require.NoError(t, env.GetWorkflowResult(&result))
assert.Equal(t, "completed", result)
}
// Test with timer
func TestReminderWorkflow(t *testing.T) {
env := testSuite.NewTestWorkflowEnvironment()
env.RegisterDelayedCallback(func() {
env.SignalWorkflow("approve", ApprovalSignal{Approved: true})
}, time.Hour*24) // Signal after 24h (instant in test)
env.ExecuteWorkflow(ApprovalWorkflow, input)
require.True(t, env.IsWorkflowCompleted())
}
TypeScript Testing
import { TestWorkflowEnvironment } from '@temporalio/testing';
import { Worker } from '@temporalio/worker';
describe('OrderWorkflow', () => {
let testEnv: TestWorkflowEnvironment;
beforeAll(async () => {
testEnv = await TestWorkflowEnvironment.createTimeSkipping();
});
afterAll(async () => {
await testEnv.teardown();
});
it('completes order successfully', async () => {
const { client, nativeConnection } = testEnv;
const worker = await Worker.create({
connection: nativeConnection,
taskQueue: 'test',
workflowsPath: require.resolve('./workflows'),
activities: {
chargeCard: async () => ({ txnId: 'txn-123' }),
shipOrder: async () => {},
},
});
const result = await worker.runUntil(
client.workflow.execute(orderWorkflow, {
workflowId: 'test-order-1',
taskQueue: 'test',
args: [{ orderId: 'order-1' }],
})
);
expect(result).toBe('completed');
});
});
Python Testing
import pytest
from temporalio.testing import WorkflowEnvironment
from temporalio.worker import Worker
@pytest.fixture
async def env():
async with await WorkflowEnvironment.start_time_skipping() as env:
yield env
async def test_order_workflow(env: WorkflowEnvironment):
async with Worker(
env.client,
task_queue="test",
workflows=[OrderWorkflow],
activities=[charge_card_mock, ship_order_mock],
):
result = await env.client.execute_workflow(
OrderWorkflow.run,
OrderInput(order_id="order-1"),
id="test-order-1",
task_queue="test",
)
assert result == "completed"
Replay Testing (Backward Compatibility)
// Go — test that current code can replay old histories
func TestReplayFromJSON(t *testing.T) {
replayer := worker.NewWorkflowReplayer()
replayer.RegisterWorkflow(OrderWorkflow)
err := replayer.ReplayWorkflowHistoryFromJSONFile(nil, "testdata/order_history.json")
require.NoError(t, err) // Fails if code produces different commands
}
# Export workflow history for replay tests
temporal workflow show --workflow-id order-123 --output json > testdata/order_history.json
18. Observability — Metrics, Logging, Tracing
Prometheus Metrics
Server-side metrics (temporal server exposes on :9090):
| Metric | What It Shows |
|---|---|
temporal_persistence_latency | DB operation latency |
temporal_service_requests | Request count by operation |
temporal_service_errors | Error count by type |
temporal_persistence_requests | DB request count |
temporal_history_size | Workflow history sizes |
temporal_timer_active_count | Active timers |
SDK/Worker-side metrics (your workers expose):
| Metric | What It Shows | Alert When |
|---|---|---|
temporal_workflow_task_schedule_to_start_latency | Queue wait time | > 1s |
temporal_activity_schedule_to_start_latency | Activity queue wait | > 5s |
temporal_activity_execution_latency | Activity duration | Exceeds SLA |
temporal_sticky_cache_hit | Cache efficiency | < 80% |
temporal_worker_task_slots_available | Free executor slots | = 0 |
temporal_workflow_completed | Completed workflows | Sudden drop |
temporal_workflow_failed | Failed workflows | Sudden spike |
temporal_workflow_canceled | Canceled workflows | |
temporal_activity_execution_failed | Failed activities |
Enabling Metrics
// Go — Prometheus metrics
clientOptions := client.Options{
MetricsHandler: sdktally.NewMetricsHandler(newPrometheusScope(
prometheus.Configuration{
ListenAddress: "0.0.0.0:9464",
TimerType: "histogram",
},
)),
}
// TypeScript — Prometheus
import { Runtime } from '@temporalio/worker';
Runtime.install({
telemetryOptions: {
metrics: {
prometheus: { bindAddress: '0.0.0.0:9464' },
},
},
});
# Python — Prometheus
from temporalio.runtime import Runtime, TelemetryConfig, PrometheusConfig
runtime = Runtime(telemetry=TelemetryConfig(
metrics=PrometheusConfig(bind_address="0.0.0.0:9464"),
))
client = await Client.connect("localhost:7233", runtime=runtime)
OpenTelemetry Tracing
// Go — OpenTelemetry interceptor
import "go.temporal.io/sdk/contrib/opentelemetry"
clientOptions := client.Options{
Interceptors: []interceptor.ClientInterceptor{
opentelemetry.NewTracingInterceptor(opentelemetry.TracerOptions{
Tracer: otel.Tracer("temporal"),
}),
},
}
Logging
// Go — structured logging
clientOptions := client.Options{
Logger: temporal.NewStructuredLogger(slog.Default()),
}
// Inside workflow — use workflow logger (replay-safe)
workflow.GetLogger(ctx).Info("Processing order", "orderId", orderId)
// Inside activity — use activity logger
activity.GetLogger(ctx).Info("Charging card", "amount", amount)
Important: Never use fmt.Println or direct loggers in workflows — they’ll log on every replay. Use workflow.GetLogger() which auto-suppresses during replay.
19. Security
mTLS (Mutual TLS)
# temporal-server.yaml
tls:
frontend:
server:
certFile: /certs/server.crt
keyFile: /certs/server.key
requireClientAuth: true
clientCaFiles:
- /certs/ca.crt
client:
serverName: temporal.example.com
// Go client with mTLS
cert, _ := tls.LoadX509KeyPair("client.crt", "client.key")
clientOptions := client.Options{
HostPort: "temporal.example.com:7233",
ConnectionOptions: client.ConnectionOptions{
TLS: &tls.Config{
Certificates: []tls.Certificate{cert},
RootCAs: caCertPool,
},
},
}
Authorization
Temporal supports pluggable Authorizer and ClaimMapper:
Request → ClaimMapper (extracts claims from cert/token) → Authorizer (permits/denies)
Built-in authorizer: Checks namespace-level permissions (read, write, admin).
Data Encryption
- Use Payload Codecs for encryption-at-rest (see section 15)
- All data (inputs, outputs, errors) can be encrypted before reaching the server
- Server never sees plaintext — only encrypted payloads
Namespace Isolation
- Each namespace has independent access controls
- Cross-namespace communication not supported (use separate clients or Temporal Nexus)
- Different teams can have different namespaces with different permissions
SSO Configuration (Web UI)
temporal-ui:
environment:
- TEMPORAL_AUTH_ENABLED=true
- TEMPORAL_AUTH_PROVIDER_URL=https://your-idp.example.com
- TEMPORAL_AUTH_CLIENT_ID=your-client-id
- TEMPORAL_AUTH_CLIENT_SECRET=your-client-secret
- TEMPORAL_AUTH_CALLBACK_URL=https://temporal-ui.example.com/auth/sso/callback
- TEMPORAL_AUTH_SCOPES=openid profile email
Failure Types Reference
| Type | Description |
|---|---|
| Application Failure | User-thrown errors with type, message, non_retryable, details, nextRetryDelay |
| Timeout Failure | Activity/Workflow timeout; attaches last heartbeat details |
| Cancelled Failure | Successful cancellation of Workflow/Activity |
| Terminated Failure | Forceful Workflow termination (no cleanup runs) |
| Server Failure | Errors originating in the Temporal Service |
| Activity Failure | Wraps the actual failure cause when an Activity fails |
| Child Workflow Failure | Wraps the actual failure cause when a Child Workflow fails |
Key design principle: An Activity Failure never directly causes a Workflow Failure. Workflows only fail if coded to do so. Workflow Task Failures (e.g., non-determinism) are retried automatically with exponential backoff (max 10min interval).
Activity Heartbeating Details
// Record progress (saved; available on retry)
activity.RecordHeartbeat(ctx, progressData)
// Resume from last progress on retry
if activity.HasHeartbeatDetails(ctx) {
var lastProgress MyProgress
activity.GetHeartbeatDetails(ctx, &lastProgress)
// Resume from lastProgress
}
- Heartbeat throttling:
min(heartbeatTimeout * 0.8, defaultThrottleInterval). Default throttle: 30s, max: 60s. - Activities without heartbeats cannot receive Cancellation signals.
- Final heartbeat on failure is never throttled.
Multi-Cluster Replication
For disaster recovery and multi-region deployments using Global Namespaces:
# Cluster A config
clusterMetadata:
enableGlobalNamespace: true
failoverVersionIncrement: 100
masterClusterName: "cluster-us-east"
currentClusterName: "cluster-us-east"
clusterInformation:
cluster-us-east:
enabled: true
initialFailoverVersion: 1
rpcAddress: "10.0.1.100:7233"
# Cluster B config
clusterMetadata:
enableGlobalNamespace: true
failoverVersionIncrement: 100
masterClusterName: "cluster-us-west"
currentClusterName: "cluster-us-west"
clusterInformation:
cluster-us-west:
enabled: true
initialFailoverVersion: 2
rpcAddress: "10.0.2.100:7233"
# Register clusters with each other
temporal operator cluster upsert --frontend-address="10.0.2.100:7233" # on Cluster A
temporal operator cluster upsert --frontend-address="10.0.1.100:7233" # on Cluster B
# Failover
temporal operator namespace update --namespace my-global-ns --active-cluster cluster-us-west
Requirements: enableGlobalNamespace=true on all clusters, identical failoverVersionIncrement, unique initialFailoverVersion per cluster. Workers must poll on all clusters.
20. Production Deployment
Docker Compose (Quick Start)
version: "3.5"
services:
postgresql:
image: postgres:15
environment:
POSTGRES_USER: temporal
POSTGRES_PASSWORD: temporal
volumes:
- postgres-data:/var/lib/postgresql/data
ports:
- "5432:5432"
temporal:
image: temporalio/auto-setup:latest
depends_on:
- postgresql
environment:
- DB=postgresql
- DB_PORT=5432
- POSTGRES_USER=temporal
- POSTGRES_PWD=temporal
- POSTGRES_SEEDS=postgresql
- DYNAMIC_CONFIG_FILE_PATH=config/dynamicconfig/production.yaml
ports:
- "7233:7233" # gRPC
volumes:
- ./dynamicconfig:/etc/temporal/config/dynamicconfig
temporal-ui:
image: temporalio/ui:latest
depends_on:
- temporal
environment:
- TEMPORAL_ADDRESS=temporal:7233
ports:
- "8080:8080" # Web UI
temporal-admin-tools:
image: temporalio/admin-tools:latest
depends_on:
- temporal
environment:
- TEMPORAL_ADDRESS=temporal:7233
volumes:
postgres-data:
Kubernetes (Helm)
# Add Temporal Helm repo
helm repo add temporal https://temporalio.github.io/helm-charts
helm repo update
# Install with PostgreSQL
helm install temporal temporal/temporal \
--set server.replicaCount=3 \
--set cassandra.enabled=false \
--set mysql.enabled=false \
--set postgresql.enabled=true \
--set prometheus.enabled=true \
--set grafana.enabled=true \
--set elasticsearch.enabled=true \
--namespace temporal \
--create-namespace
# Custom values.yaml
helm install temporal temporal/temporal -f values.yaml
Production values.yaml Example
server:
replicaCount: 3
config:
persistence:
default:
driver: sql
sql:
driver: postgres
host: your-rds-endpoint.amazonaws.com
port: 5432
database: temporal
user: temporal
password: ${DB_PASSWORD}
maxConns: 20
maxIdleConns: 20
visibility:
driver: sql
sql:
driver: postgres
host: your-rds-endpoint.amazonaws.com
port: 5432
database: temporal_visibility
user: temporal
password: ${DB_PASSWORD}
frontend:
replicaCount: 3
resources:
requests: { cpu: "1", memory: "2Gi" }
limits: { cpu: "2", memory: "4Gi" }
history:
replicaCount: 3
resources:
requests: { cpu: "2", memory: "4Gi" }
limits: { cpu: "4", memory: "8Gi" }
matching:
replicaCount: 3
resources:
requests: { cpu: "1", memory: "2Gi" }
limits: { cpu: "2", memory: "4Gi" }
worker:
replicaCount: 2
resources:
requests: { cpu: "0.5", memory: "1Gi" }
limits: { cpu: "1", memory: "2Gi" }
elasticsearch:
enabled: true
replicas: 2
web:
replicaCount: 2
Dynamic Configuration
Temporal server supports runtime config changes without restart:
# dynamicconfig/production.yaml
# History settings
history.maximumBufferedEvents:
- value: 100
constraints: {}
# Retention
system.forceSearchAttributesCacheRefreshOnRead:
- value: true
constraints: {}
# Rate limits
frontend.rps:
- value: 2400
constraints: {}
frontend.namespaceRPS:
- value: 400
constraints: {}
# History size limit
limit.historySize.warn:
- value: 10485760 # 10MB
constraints: {}
limit.historySize.error:
- value: 52428800 # 50MB
constraints: {}
limit.historyCount.warn:
- value: 10240
constraints: {}
limit.historyCount.error:
- value: 51200
constraints: {}
Production Sizing Guide
| Scale | Workflows/sec | Frontend | History | Matching | Worker | DB |
|---|---|---|---|---|---|---|
| Small (<100 wf/s) | 10-100 | 2× 1CPU/2GB | 2× 2CPU/4GB | 2× 1CPU/2GB | 1× 0.5CPU/1GB | 2CPU/4GB PG |
| Medium (100-1K wf/s) | 100-1000 | 3× 2CPU/4GB | 3× 4CPU/8GB | 3× 2CPU/4GB | 2× 1CPU/2GB | 4CPU/16GB PG |
| Large (1K-10K wf/s) | 1000-10000 | 5× 4CPU/8GB | 5× 8CPU/16GB | 5× 4CPU/8GB | 3× 2CPU/4GB | 8CPU/32GB PG |
| XL (>10K wf/s) | 10000+ | Consider Cassandra + dedicated ES cluster |
Database Tuning
-- PostgreSQL recommendations for Temporal
ALTER SYSTEM SET max_connections = 200;
ALTER SYSTEM SET shared_buffers = '4GB'; -- 25% of RAM
ALTER SYSTEM SET effective_cache_size = '12GB'; -- 75% of RAM
ALTER SYSTEM SET work_mem = '64MB';
ALTER SYSTEM SET maintenance_work_mem = '512MB';
ALTER SYSTEM SET random_page_cost = 1.1; -- SSD
ALTER SYSTEM SET effective_io_concurrency = 200; -- SSD
ALTER SYSTEM SET wal_buffers = '64MB';
ALTER SYSTEM SET checkpoint_completion_target = 0.9;
ALTER SYSTEM SET max_wal_size = '2GB';
21. Temporal Cloud vs Self-Hosted
| Feature | Self-Hosted | Temporal Cloud |
|---|---|---|
| Server management | You manage | Managed by Temporal |
| Database | You manage | Managed |
| Upgrades | You do them | Automatic, zero-downtime |
| Multi-region | Complex to set up | Built-in |
| SLA | Depends on you | 99.99% uptime SLA |
| mTLS | You configure | Built-in |
| Encryption | You implement | AES-256-GCM at rest, TLS in transit |
| Compliance | DIY | SOC 2 Type II, HIPAA eligible |
| Pricing | Infrastructure cost | Per-action pricing |
| Support | Community / OSS | Dedicated support |
| Namespaces | Unlimited | Based on plan |
| History shards | 512 (configurable) | 8192 |
| Scale | Based on infra | Effectively unlimited |
Temporal Cloud Pricing (2025)
| Plan | Monthly Minimum | Included Actions | Active Storage | Retained Storage |
|---|---|---|---|---|
| Essentials | $100 / 5% usage | 1M | 1GB | 40GB |
| Business | $500 / 10% usage | 2.5M | 2.5GB | 100GB |
| Enterprise | Annual contract | 10M | 10GB | 400GB |
| Mission Critical | Annual contract | 10M | 10GB | 400GB |
Overage: $50/M (first 5M) down to $25/M (100-200M). Storage: $0.042/GBh active, $0.00105/GBh retained.
What counts as an action:
- Activity/workflow task started
- Signal sent/received
- Timer started
- Query handled
- Child workflow started
What does NOT count:
- Worker polling
- Workflow/activity completion
- History replay events
22. CLI Cheatsheet
Modern CLI (temporal)
# ============== SERVER ==============
temporal server start-dev # Start dev server (SQLite, port 7233, UI on 8233)
temporal server start-dev --db-filename temporal.db # Persistent dev DB
temporal server start-dev --port 7233 --ui-port 8233
temporal server start-dev --namespace my-ns # Dev server with custom namespace
# ============== WORKFLOWS ==============
# Start a workflow
temporal workflow start \
--type OrderWorkflow \
--task-queue payments \
--workflow-id order-123 \
--input '{"orderId": "abc"}'
# Execute (start + wait for result)
temporal workflow execute \
--type OrderWorkflow \
--task-queue payments \
--workflow-id order-123 \
--input '{"orderId": "abc"}'
# List workflows
temporal workflow list
temporal workflow list --query 'WorkflowType="OrderWorkflow" AND ExecutionStatus="Running"'
# Describe a workflow
temporal workflow describe --workflow-id order-123
# Show event history
temporal workflow show --workflow-id order-123
temporal workflow show --workflow-id order-123 --output json > history.json
# Signal a workflow
temporal workflow signal \
--workflow-id order-123 \
--name approve \
--input '{"approved": true}'
# Query a workflow
temporal workflow query \
--workflow-id order-123 \
--name getStatus
# Update a workflow
temporal workflow update \
--workflow-id order-123 \
--name updatePrice \
--input '{"newPrice": 29.99}'
# Cancel a workflow (graceful)
temporal workflow cancel --workflow-id order-123
# Terminate a workflow (immediate, skip cleanup)
temporal workflow terminate --workflow-id order-123 --reason "manual cleanup"
# Reset a workflow to a specific event
temporal workflow reset \
--workflow-id order-123 \
--event-id 10 \
--reason "replay from event 10"
# Count workflows
temporal workflow count --query 'ExecutionStatus="Running"'
# Delete a workflow (removes from visibility)
temporal workflow delete --workflow-id order-123
# View workflow call stack (debugging blocked workflows)
temporal workflow stack --workflow-id order-123
# Trace workflow execution tree (children, activities)
temporal workflow trace --workflow-id order-123 --depth 3
# Follow workflow events in real-time
temporal workflow show --workflow-id order-123 --follow
# Reset by type (reset to last ContinueAsNew point)
temporal workflow reset --workflow-id order-123 --type LastContinuedAsNew
# ============== TASK QUEUES ==============
temporal task-queue describe --task-queue payments
temporal task-queue list-partition --task-queue payments
# ============== SCHEDULES ==============
temporal schedule create --schedule-id daily-cleanup --cron "0 2 * * *" \
--workflow-type CleanupWorkflow --task-queue maintenance
temporal schedule list
temporal schedule describe --schedule-id daily-cleanup
temporal schedule toggle --schedule-id daily-cleanup --pause --reason "maintenance"
temporal schedule toggle --schedule-id daily-cleanup --unpause
temporal schedule trigger --schedule-id daily-cleanup
temporal schedule update --schedule-id daily-cleanup --cron "0 3 * * *"
temporal schedule delete --schedule-id daily-cleanup
# ============== NAMESPACES ==============
temporal operator namespace create --namespace staging --retention 7d
temporal operator namespace list
temporal operator namespace describe --namespace staging
temporal operator namespace update --namespace staging --retention 30d
temporal operator namespace delete --namespace staging
# ============== SEARCH ATTRIBUTES ==============
temporal operator search-attribute create --name CustomerId --type Keyword
temporal operator search-attribute list
# ============== CLUSTER ==============
temporal operator cluster describe
temporal operator cluster health
# ============== BATCH OPERATIONS ==============
# Terminate all running workflows of a type
temporal workflow terminate \
--query 'WorkflowType="BrokenWorkflow" AND ExecutionStatus="Running"' \
--reason "batch cleanup"
# Signal multiple workflows
temporal workflow signal \
--query 'WorkflowType="OrderWorkflow" AND ExecutionStatus="Running"' \
--name shutdown \
--input '{"graceful": true}'
# ============== ENV / CONFIG ==============
temporal env set local.address localhost:7233
temporal env set local.namespace default
temporal env get local
temporal env list
# Use specific environment
temporal --env local workflow list
temporal --env production workflow list
Legacy CLI (tctl) — Deprecated but Still Used
tctl workflow start --workflow_type OrderWorkflow --taskqueue payments --workflow_id order-123 --input '{"orderId":"abc"}'
tctl workflow observe --workflow_id order-123
tctl workflow list --open
tctl workflow signal --workflow_id order-123 --name approve --input '{"approved":true}'
tctl workflow query --workflow_id order-123 --query_type getStatus
tctl workflow cancel --workflow_id order-123
tctl workflow terminate --workflow_id order-123
tctl workflow showid order-123
tctl namespace register --namespace staging --retention 7
tctl namespace describe --namespace staging
tctl admin cluster describe
23. SDK Code Examples
Go — Complete Example
package main
import (
"context"
"fmt"
"time"
"go.temporal.io/sdk/activity"
"go.temporal.io/sdk/client"
"go.temporal.io/sdk/temporal"
"go.temporal.io/sdk/worker"
"go.temporal.io/sdk/workflow"
)
// ─── Workflow ───────────────────────────────────────────
type OrderInput struct {
OrderID string
CustomerID string
Amount float64
}
type OrderResult struct {
Status string
TrackingID string
}
func OrderWorkflow(ctx workflow.Context, input OrderInput) (*OrderResult, error) {
logger := workflow.GetLogger(ctx)
logger.Info("Starting order workflow", "orderId", input.OrderID)
// Activity options
actOpts := workflow.ActivityOptions{
StartToCloseTimeout: 30 * time.Second,
RetryPolicy: &temporal.RetryPolicy{
InitialInterval: time.Second,
BackoffCoefficient: 2.0,
MaximumInterval: 30 * time.Second,
MaximumAttempts: 3,
},
}
ctx = workflow.WithActivityOptions(ctx, actOpts)
// Step 1: Validate order
var validated bool
err := workflow.ExecuteActivity(ctx, ValidateOrder, input).Get(ctx, &validated)
if err != nil {
return nil, fmt.Errorf("validation failed: %w", err)
}
// Step 2: Charge payment
var paymentResult PaymentResult
err = workflow.ExecuteActivity(ctx, ChargePayment, input.OrderID, input.Amount).
Get(ctx, &paymentResult)
if err != nil {
return nil, fmt.Errorf("payment failed: %w", err)
}
// Step 3: Ship order (longer timeout)
shipCtx := workflow.WithActivityOptions(ctx, workflow.ActivityOptions{
StartToCloseTimeout: 5 * time.Minute,
HeartbeatTimeout: 30 * time.Second,
})
var trackingID string
err = workflow.ExecuteActivity(shipCtx, ShipOrder, input.OrderID).Get(ctx, &trackingID)
if err != nil {
// Compensate: refund payment
_ = workflow.ExecuteActivity(ctx, RefundPayment, paymentResult.TxnID).Get(ctx, nil)
return nil, fmt.Errorf("shipping failed: %w", err)
}
// Step 4: Send confirmation
_ = workflow.ExecuteActivity(ctx, SendConfirmation, input.CustomerID, trackingID).
Get(ctx, nil)
return &OrderResult{Status: "completed", TrackingID: trackingID}, nil
}
// ─── Signal Handler ─────────────────────────────────────
func OrderWithApprovalWorkflow(ctx workflow.Context, input OrderInput) (*OrderResult, error) {
var approved bool
approvalCh := workflow.GetSignalChannel(ctx, "approval")
// Wait for approval signal (with timeout)
timerCtx, cancelTimer := workflow.WithCancel(ctx)
timerFuture := workflow.NewTimer(timerCtx, 24*time.Hour)
selector := workflow.NewSelector(ctx)
selector.AddReceive(approvalCh, func(ch workflow.ReceiveChannel, more bool) {
var signal ApprovalSignal
ch.Receive(ctx, &signal)
approved = signal.Approved
cancelTimer()
})
selector.AddFuture(timerFuture, func(f workflow.Future) {
// Timeout — not approved
approved = false
})
selector.Select(ctx)
if !approved {
return &OrderResult{Status: "rejected"}, nil
}
// Continue with order processing...
return &OrderResult{Status: "approved"}, nil
}
// ─── Query Handler ──────────────────────────────────────
func OrderWithQueryWorkflow(ctx workflow.Context, input OrderInput) (*OrderResult, error) {
status := "initialized"
// Register query handler
err := workflow.SetQueryHandler(ctx, "getStatus", func() (string, error) {
return status, nil
})
if err != nil {
return nil, err
}
status = "processing"
// ... do work ...
status = "completed"
return &OrderResult{Status: status}, nil
}
// ─── Activities ─────────────────────────────────────────
type PaymentResult struct {
TxnID string
}
type ApprovalSignal struct {
Approved bool
Reason string
}
func ValidateOrder(ctx context.Context, input OrderInput) (bool, error) {
logger := activity.GetLogger(ctx)
logger.Info("Validating order", "orderId", input.OrderID)
// ... validation logic
return true, nil
}
func ChargePayment(ctx context.Context, orderID string, amount float64) (*PaymentResult, error) {
// ... charge payment gateway
return &PaymentResult{TxnID: "txn-" + orderID}, nil
}
func ShipOrder(ctx context.Context, orderID string) (string, error) {
// Long-running activity with heartbeating
for i := 0; i < 10; i++ {
// Check for cancellation
activity.RecordHeartbeat(ctx, i)
// ... process shipping step
time.Sleep(time.Second) // simulate work
}
return "TRACK-" + orderID, nil
}
func RefundPayment(ctx context.Context, txnID string) error {
// ... refund logic
return nil
}
func SendConfirmation(ctx context.Context, customerID, trackingID string) error {
// ... send email/notification
return nil
}
// ─── Worker ─────────────────────────────────────────────
func runWorker() error {
c, err := client.Dial(client.Options{
HostPort: "localhost:7233",
Namespace: "default",
})
if err != nil {
return err
}
defer c.Close()
w := worker.New(c, "order-processing", worker.Options{
MaxConcurrentWorkflowTaskExecutions: 100,
MaxConcurrentActivityTaskExecutions: 50,
})
// Register workflows
w.RegisterWorkflow(OrderWorkflow)
w.RegisterWorkflow(OrderWithApprovalWorkflow)
w.RegisterWorkflow(OrderWithQueryWorkflow)
// Register activities
w.RegisterActivity(ValidateOrder)
w.RegisterActivity(ChargePayment)
w.RegisterActivity(ShipOrder)
w.RegisterActivity(RefundPayment)
w.RegisterActivity(SendConfirmation)
return w.Run(worker.InterruptCh())
}
// ─── Client Usage ───────────────────────────────────────
func startWorkflow() error {
c, err := client.Dial(client.Options{})
if err != nil {
return err
}
defer c.Close()
// Start workflow
we, err := c.ExecuteWorkflow(context.Background(), client.StartWorkflowOptions{
ID: "order-123",
TaskQueue: "order-processing",
}, OrderWorkflow, OrderInput{
OrderID: "order-123",
CustomerID: "cust-456",
Amount: 99.99,
})
if err != nil {
return err
}
// Wait for result
var result OrderResult
err = we.Get(context.Background(), &result)
fmt.Printf("Result: %+v\n", result)
// Signal a workflow
err = c.SignalWorkflow(context.Background(), "order-123", "", "approval",
ApprovalSignal{Approved: true})
// Query a workflow
resp, err := c.QueryWorkflow(context.Background(), "order-123", "", "getStatus")
var status string
resp.Get(&status)
return nil
}
TypeScript — Complete Example
// ─── workflows.ts ───────────────────────────────────────
import {
proxyActivities,
defineSignal,
defineQuery,
defineUpdate,
setHandler,
condition,
sleep,
continueAsNew,
patched,
ApplicationFailure,
CancellationScope,
log,
} from '@temporalio/workflow';
import type * as activities from './activities';
const { validateOrder, chargePayment, shipOrder, refundPayment, sendConfirmation } =
proxyActivities<typeof activities>({
startToCloseTimeout: '30s',
retry: {
initialInterval: '1s',
backoffCoefficient: 2,
maximumInterval: '30s',
maximumAttempts: 3,
},
});
// Signals, Queries, Updates
export const approvalSignal = defineSignal<[{ approved: boolean; reason?: string }]>('approval');
export const getStatusQuery = defineQuery<string>('getStatus');
export const updatePriceUpdate = defineUpdate<
{ newPrice: number }, // result
[{ price: number }] // args
>('updatePrice');
// ─── Main Workflow ──────────────────────────────────────
export interface OrderInput {
orderId: string;
customerId: string;
amount: number;
}
export async function orderWorkflow(input: OrderInput): Promise<{ status: string; trackingId?: string }> {
log.info('Starting order workflow', { orderId: input.orderId });
let status = 'initialized';
let approved = false;
// Query handler
setHandler(getStatusQuery, () => status);
// Signal handler
setHandler(approvalSignal, (signal) => {
approved = signal.approved;
});
// Update handler (with validator)
setHandler(updatePriceUpdate, (args) => {
input.amount = args.price;
return { newPrice: args.price };
}, {
validator: (args) => {
if (args.price <= 0) throw ApplicationFailure.nonRetryable('Price must be positive');
},
});
// Wait for approval (max 24 hours)
status = 'waiting_approval';
const gotApproval = await condition(() => approved, '24h');
if (!gotApproval) {
return { status: 'timeout' };
}
status = 'processing';
// Validate
await validateOrder(input);
// Charge
const paymentResult = await chargePayment(input.orderId, input.amount);
// Ship (with compensation on failure)
try {
const trackingId = await shipOrder(input.orderId);
status = 'shipped';
await sendConfirmation(input.customerId, trackingId);
status = 'completed';
return { status: 'completed', trackingId };
} catch (err) {
status = 'compensating';
await refundPayment(paymentResult.txnId);
throw err;
}
}
// ─── Long-running workflow with ContinueAsNew ───────────
export async function eventProcessorWorkflow(state: {
processedCount: number;
lastEventId: string;
}): Promise<void> {
const { fetchEvents, processEvent } = proxyActivities<typeof activities>({
startToCloseTimeout: '10s',
});
for (let i = 0; i < 500; i++) {
const events = await fetchEvents(state.lastEventId);
if (events.length === 0) {
await sleep('5s');
continue;
}
for (const event of events) {
await processEvent(event);
state.processedCount++;
state.lastEventId = event.id;
}
}
// Reset history — continue with accumulated state
await continueAsNew<typeof eventProcessorWorkflow>(state);
}
// ─── activities.ts ──────────────────────────────────────
import { Context, heartbeat, log } from '@temporalio/activity';
export async function validateOrder(input: { orderId: string }): Promise<boolean> {
log.info('Validating order', { orderId: input.orderId });
return true;
}
export async function chargePayment(orderId: string, amount: number): Promise<{ txnId: string }> {
// ... call payment gateway
return { txnId: `txn-${orderId}` };
}
export async function shipOrder(orderId: string): Promise<string> {
// Long-running with heartbeat
for (let step = 0; step < 10; step++) {
Context.current().heartbeat(step);
// ... shipping logic per step
await new Promise((r) => setTimeout(r, 1000));
}
return `TRACK-${orderId}`;
}
export async function refundPayment(txnId: string): Promise<void> {
// ... refund logic
}
export async function sendConfirmation(customerId: string, trackingId: string): Promise<void> {
// ... send email
}
// ─── worker.ts ──────────────────────────────────────────
import { Worker, NativeConnection, Runtime } from '@temporalio/worker';
import * as activities from './activities';
async function run() {
// Enable metrics
Runtime.install({
telemetryOptions: {
metrics: { prometheus: { bindAddress: '0.0.0.0:9464' } },
},
});
const connection = await NativeConnection.connect({
address: 'localhost:7233',
});
const worker = await Worker.create({
connection,
namespace: 'default',
taskQueue: 'order-processing',
workflowsPath: require.resolve('./workflows'),
activities,
maxConcurrentWorkflowTaskExecutions: 100,
maxConcurrentActivityTaskExecutions: 50,
});
await worker.run();
}
run().catch(console.error);
// ─── client.ts ──────────────────────────────────────────
import { Client, Connection } from '@temporalio/client';
import { orderWorkflow, approvalSignal, getStatusQuery } from './workflows';
async function main() {
const connection = await Connection.connect({ address: 'localhost:7233' });
const client = new Client({ connection });
// Start workflow
const handle = await client.workflow.start(orderWorkflow, {
workflowId: 'order-123',
taskQueue: 'order-processing',
args: [{ orderId: 'order-123', customerId: 'cust-456', amount: 99.99 }],
searchAttributes: { CustomerId: ['cust-456'] },
});
// Signal
await handle.signal(approvalSignal, { approved: true });
// Query
const status = await handle.query(getStatusQuery);
console.log('Status:', status);
// Wait for result
const result = await handle.result();
console.log('Result:', result);
// Cancel
await handle.cancel();
// Terminate
await handle.terminate('manual cleanup');
}
Python — Complete Example
# ─── workflows.py ────────────────────────────────────────
import asyncio
from datetime import timedelta
from dataclasses import dataclass
from temporalio import workflow
from temporalio.common import RetryPolicy
with workflow.unsafe.imports_passed_through():
from activities import (
validate_order, charge_payment, ship_order,
refund_payment, send_confirmation,
OrderInput, PaymentResult,
)
@dataclass
class ApprovalSignal:
approved: bool
reason: str = ""
@workflow.defn
class OrderWorkflow:
def __init__(self):
self._status = "initialized"
self._approved = False
@workflow.run
async def run(self, input: OrderInput) -> dict:
workflow.logger.info(f"Starting order workflow: {input.order_id}")
# Wait for approval
self._status = "waiting_approval"
try:
await workflow.wait_condition(
lambda: self._approved,
timeout=timedelta(hours=24),
)
except asyncio.TimeoutError:
return {"status": "timeout"}
self._status = "processing"
# Validate
await workflow.execute_activity(
validate_order,
input,
start_to_close_timeout=timedelta(seconds=30),
retry_policy=RetryPolicy(maximum_attempts=3),
)
# Charge
payment_result = await workflow.execute_activity(
charge_payment,
args=[input.order_id, input.amount],
start_to_close_timeout=timedelta(seconds=30),
)
# Ship (with compensation)
try:
tracking_id = await workflow.execute_activity(
ship_order,
input.order_id,
start_to_close_timeout=timedelta(minutes=5),
heartbeat_timeout=timedelta(seconds=30),
)
except Exception:
self._status = "compensating"
await workflow.execute_activity(
refund_payment,
payment_result.txn_id,
start_to_close_timeout=timedelta(seconds=30),
)
raise
self._status = "completed"
await workflow.execute_activity(
send_confirmation,
args=[input.customer_id, tracking_id],
start_to_close_timeout=timedelta(seconds=30),
)
return {"status": "completed", "tracking_id": tracking_id}
@workflow.signal
async def approval(self, signal: ApprovalSignal):
self._approved = signal.approved
@workflow.query
def get_status(self) -> str:
return self._status
@workflow.update
async def update_price(self, new_price: float) -> float:
# Validate and update
if new_price <= 0:
raise ValueError("Price must be positive")
return new_price
# ─── activities.py ───────────────────────────────────────
from dataclasses import dataclass
from temporalio import activity
import asyncio
@dataclass
class OrderInput:
order_id: str
customer_id: str
amount: float
@dataclass
class PaymentResult:
txn_id: str
@activity.defn
async def validate_order(input: OrderInput) -> bool:
activity.logger.info(f"Validating order: {input.order_id}")
return True
@activity.defn
async def charge_payment(order_id: str, amount: float) -> PaymentResult:
return PaymentResult(txn_id=f"txn-{order_id}")
@activity.defn
async def ship_order(order_id: str) -> str:
for step in range(10):
activity.heartbeat(step)
await asyncio.sleep(1)
return f"TRACK-{order_id}"
@activity.defn
async def refund_payment(txn_id: str) -> None:
pass
@activity.defn
async def send_confirmation(customer_id: str, tracking_id: str) -> None:
pass
# ─── worker.py ───────────────────────────────────────────
import asyncio
from temporalio.client import Client
from temporalio.worker import Worker
from workflows import OrderWorkflow
from activities import (
validate_order, charge_payment, ship_order,
refund_payment, send_confirmation,
)
async def main():
client = await Client.connect("localhost:7233")
worker = Worker(
client,
task_queue="order-processing",
workflows=[OrderWorkflow],
activities=[
validate_order, charge_payment, ship_order,
refund_payment, send_confirmation,
],
max_concurrent_workflow_tasks=100,
max_concurrent_activities=50,
)
await worker.run()
if __name__ == "__main__":
asyncio.run(main())
# ─── client_usage.py ─────────────────────────────────────
import asyncio
from temporalio.client import Client
from workflows import OrderWorkflow, ApprovalSignal
from activities import OrderInput
async def main():
client = await Client.connect("localhost:7233")
# Start workflow
handle = await client.start_workflow(
OrderWorkflow.run,
OrderInput(order_id="order-123", customer_id="cust-456", amount=99.99),
id="order-123",
task_queue="order-processing",
)
# Signal
await handle.signal(OrderWorkflow.approval, ApprovalSignal(approved=True))
# Query
status = await handle.query(OrderWorkflow.get_status)
print(f"Status: {status}")
# Wait for result
result = await handle.result()
print(f"Result: {result}")
if __name__ == "__main__":
asyncio.run(main())
Java — Complete Example
// ─── OrderWorkflow.java (Interface) ────────────────────
@WorkflowInterface
public interface OrderWorkflow {
@WorkflowMethod
OrderResult run(OrderInput input);
@SignalMethod
void approval(ApprovalSignal signal);
@QueryMethod
String getStatus();
@UpdateMethod
double updatePrice(double newPrice);
@UpdateValidatorMethod(updateName = "updatePrice")
void validateUpdatePrice(double newPrice);
}
// ─── OrderWorkflowImpl.java ────────────────────────────
public class OrderWorkflowImpl implements OrderWorkflow {
private static final Logger logger = Workflow.getLogger(OrderWorkflowImpl.class);
private String status = "initialized";
private boolean approved = false;
private final OrderActivities activities = Workflow.newActivityStub(
OrderActivities.class,
ActivityOptions.newBuilder()
.setStartToCloseTimeout(Duration.ofSeconds(30))
.setRetryOptions(RetryOptions.newBuilder()
.setInitialInterval(Duration.ofSeconds(1))
.setBackoffCoefficient(2.0)
.setMaximumInterval(Duration.ofSeconds(30))
.setMaximumAttempts(3)
.build())
.build()
);
@Override
public OrderResult run(OrderInput input) {
status = "waiting_approval";
Workflow.await(Duration.ofHours(24), () -> approved);
if (!approved) return new OrderResult("timeout", null);
status = "processing";
activities.validateOrder(input);
PaymentResult payment = activities.chargePayment(input.getOrderId(), input.getAmount());
try {
String trackingId = activities.shipOrder(input.getOrderId());
status = "completed";
activities.sendConfirmation(input.getCustomerId(), trackingId);
return new OrderResult("completed", trackingId);
} catch (ActivityFailure e) {
status = "compensating";
activities.refundPayment(payment.getTxnId());
throw e;
}
}
@Override
public void approval(ApprovalSignal signal) {
this.approved = signal.isApproved();
}
@Override
public String getStatus() {
return status;
}
@Override
public double updatePrice(double newPrice) {
return newPrice;
}
@Override
public void validateUpdatePrice(double newPrice) {
if (newPrice <= 0) throw new IllegalArgumentException("Price must be positive");
}
}
// ─── Worker ────────────────────────────────────────────
WorkflowServiceStubs service = WorkflowServiceStubs.newLocalServiceStubs();
WorkflowClient client = WorkflowClient.newInstance(service);
WorkerFactory factory = WorkerFactory.newInstance(client);
Worker worker = factory.newWorker("order-processing", WorkerOptions.newBuilder()
.setMaxConcurrentWorkflowTaskExecutionSize(100)
.setMaxConcurrentActivityExecutionSize(50)
.build());
worker.registerWorkflowImplementationTypes(OrderWorkflowImpl.class);
worker.registerActivitiesImplementations(new OrderActivitiesImpl());
factory.start();
24. Advanced Patterns
Entity Workflow (Long-Lived Stateful Entity)
Model a persistent entity (user account, shopping cart, device) as a workflow:
export async function shoppingCartWorkflow(userId: string): Promise<void> {
const items: Map<string, CartItem> = new Map();
let checkedOut = false;
// Signal handlers for mutations
setHandler(addItemSignal, (item: CartItem) => {
items.set(item.productId, item);
});
setHandler(removeItemSignal, (productId: string) => {
items.delete(productId);
});
setHandler(checkoutSignal, () => {
checkedOut = true;
});
// Query handler for reads
setHandler(getCartQuery, () => Array.from(items.values()));
// Keep alive until checkout or 30-day timeout
const didCheckout = await condition(() => checkedOut, '30d');
if (didCheckout && items.size > 0) {
await processCheckout(userId, Array.from(items.values()));
}
// Cart expires — optionally continueAsNew for truly infinite entities
}
Async Activity Completion
Activity started in one process, completed from another:
// Start activity — return ErrResultPending
func ExternalApproval(ctx context.Context, request ApprovalRequest) (string, error) {
info := activity.GetInfo(ctx)
taskToken := info.TaskToken
// Send task token to external system (webhook, email link, etc.)
sendApprovalEmail(request.ApproverEmail, taskToken)
return "", activity.ErrResultPending // Activity stays open
}
// Later — complete from external system (webhook handler)
func handleWebhookApproval(c client.Client, taskToken []byte, result string) {
c.CompleteActivity(context.Background(), taskToken, result, nil)
// or: c.CompleteActivityByID(...)
}
Local Activities
Run an activity on the same worker as the workflow — no task queue dispatch:
localOpts := workflow.LocalActivityOptions{
StartToCloseTimeout: 5 * time.Second,
}
localCtx := workflow.WithLocalActivityOptions(ctx, localOpts)
err := workflow.ExecuteLocalActivity(localCtx, LookupCache, key).Get(ctx, &result)
Use when:
- Very short execution time (<1s)
- No need for independent retries
- Want to avoid task queue overhead
- Activity runs on same machine as workflow
Side Effects
Run non-deterministic code inside a workflow (recorded once, replayed from history):
// Go
encodedRandom := workflow.SideEffect(ctx, func(ctx workflow.Context) interface{} {
return rand.Intn(100)
})
var random int
encodedRandom.Get(&random)
// TypeScript — uuid
import { uuid4 } from '@temporalio/workflow';
const id = uuid4(); // deterministic UUID
Workflow Interceptors (Middleware)
// Go — logging interceptor
type LoggingInterceptor struct {
interceptor.WorkflowInboundInterceptorBase
}
func (i *LoggingInterceptor) ExecuteWorkflow(ctx workflow.Context, in *interceptor.ExecuteWorkflowInput) (interface{}, error) {
log.Printf("Workflow started: %s", workflow.GetInfo(ctx).WorkflowType.Name)
result, err := i.Next.ExecuteWorkflow(ctx, in)
log.Printf("Workflow completed: %v, error: %v", result, err)
return result, err
}
Mutex / Semaphore (Cross-Workflow Locking)
Use a “mutex workflow” to serialize access to a shared resource:
func MutexWorkflow(ctx workflow.Context, resourceID string) error {
lockCh := workflow.GetSignalChannel(ctx, "lock")
unlockCh := workflow.GetSignalChannel(ctx, "unlock")
for {
// Wait for lock request
var requester string
lockCh.Receive(ctx, &requester)
// Grant lock (signal back to requester)
workflow.SignalExternalWorkflow(ctx, requester, "", "lock-acquired", nil)
// Wait for unlock
unlockCh.Receive(ctx, nil)
}
}
25. Use Cases
1. Order Processing & E-Commerce
Customer places order → Validate → Reserve inventory → Charge payment
→ Fulfill/Ship → Send confirmation → Handle returns (days later)
Why Temporal: Multi-step, spans multiple services, needs compensation on failure.
2. Payment Processing
Initiate payment → Fraud check → Charge → Settlement → Reconciliation
Why Temporal: Financial accuracy — every step must complete or compensate.
3. Microservice Orchestration (Saga)
Service A → Service B → Service C → Done
↓ fail
Compensate B → Compensate A
Why Temporal: Native saga support, guaranteed compensation.
4. Human-in-the-Loop Approvals
Submit request → Wait for manager approval (days) → Process → Notify
Why Temporal: Workflows can wait indefinitely for signals.
5. Subscription & Billing
Trial (14 days) → First charge → Monthly recurring → Renewal/Cancellation
Why Temporal: Long-running with timers and state transitions.
6. User Onboarding
Sign up → Welcome email → Wait 1 day → Tutorial email → Wait 3 days
→ Feature highlight → Check engagement → Branch based on behavior
Why Temporal: Complex scheduling with conditional branching.
7. Data Pipelines / ETL
Extract (paginated) → Transform (batch) → Load → Validate → Report
Why Temporal: Automatic retry, checkpointing via heartbeat, progress tracking.
8. Infrastructure Provisioning
Create VPC → Create subnet → Launch instances → Configure LB → DNS setup
Why Temporal: Long-running (minutes), needs compensation on partial failure.
9. CI/CD Pipelines
Build → Test → Deploy to staging → Integration tests → Approval gate
→ Deploy to production → Smoke tests → Rollback on failure
Why Temporal: Multi-stage with human gates and automatic rollback.
10. Document Processing
Upload → OCR/Extract → Classify → Validate → Route for review → Archive
Why Temporal: Mix of automated and human steps.
11. IoT Device Management
Device registers → Provision certificates → Configure → Monitor heartbeats
→ OTA update (with rollback) → Decommission
Why Temporal: Entity workflow pattern for device lifecycle.
12. Customer Support Ticket
Created → Auto-classify → Route to team → SLA timer starts → Escalate if no response
→ Resolution → Customer satisfaction survey (3 days later)
Why Temporal: SLA tracking with timers and escalation.
13. Batch Processing with Rate Limiting
Read 1M records → Process in batches of 1000 → Rate limit to 100/sec
→ Retry failures → Generate completion report
Why Temporal: ContinueAsNew for unbounded work, activity rate limiting.
14. Multi-Region / Multi-Cluster Replication
Write to primary → Replicate to region A → Replicate to region B → Confirm
Why Temporal: Durability guarantees across regions.
15. Machine Learning Pipelines
Fetch training data → Preprocess → Train model → Evaluate → A/B test
→ Deploy (if metrics pass) → Monitor in production
Why Temporal: Long-running training with checkpoints and branching.
26. Anti-Patterns & Pitfalls
DON’T: Non-Deterministic Workflow Code
// BAD — different on replay
if time.Now().Hour() > 12 { ... }
if rand.Intn(10) > 5 { ... }
// GOOD — use Temporal APIs
if workflow.Now(ctx).Hour() > 12 { ... }
encoded := workflow.SideEffect(ctx, func(ctx workflow.Context) interface{} {
return rand.Intn(10)
})
DON’T: Huge Payloads in Workflow History
// BAD — 10MB stored in history every time
result := workflow.ExecuteActivity(ctx, FetchLargeDataset).Get(ctx, &data)
// GOOD — store reference, not data
result := workflow.ExecuteActivity(ctx, FetchAndStoreInS3).Get(ctx, &s3Key)
DON’T: Unbounded Loops Without ContinueAsNew
// BAD — history grows forever
for {
workflow.Sleep(ctx, time.Minute)
workflow.ExecuteActivity(ctx, Poll).Get(ctx, nil)
}
// GOOD — reset after N iterations
for i := 0; i < 1000; i++ {
workflow.Sleep(ctx, time.Minute)
workflow.ExecuteActivity(ctx, Poll).Get(ctx, nil)
}
return workflow.NewContinueAsNewError(ctx, MyWorkflow, state)
DON’T: Use Workflow ID as a Counter
// BAD — no uniqueness, races possible
we := c.ExecuteWorkflow(ctx, opts, MyWorkflow)
// Two clients starting same workflow ID simultaneously
// GOOD — use idempotency keys
opts.ID = "order-" + orderID // natural business key
opts.WorkflowIDReusePolicy = enums.WORKFLOW_ID_REUSE_POLICY_REJECT_DUPLICATE
DON’T: Tight-Loop Polling in Workflows
// BAD — burns through history events
for !done {
workflow.Sleep(ctx, time.Second) // creates TimerStarted + TimerFired events
done = workflow.ExecuteActivity(ctx, CheckStatus).Get(ctx, nil)
}
// GOOD — use signals or longer intervals
signalCh := workflow.GetSignalChannel(ctx, "status-update")
signalCh.Receive(ctx, &status)
DON’T: Ignore Workflow Task Timeout
// BAD — large history causes replay > 10s, task times out, infinite retry loop
// Fix: increase WorkflowTaskTimeout for complex workflows
opts.WorkflowTaskTimeout = 30 * time.Second
DON’T: Block Workflow Thread
// BAD — blocks the workflow goroutine (Go)
time.Sleep(5 * time.Second)
http.Get("https://api.example.com")
// GOOD — use Temporal APIs
workflow.Sleep(ctx, 5*time.Second)
workflow.ExecuteActivity(ctx, CallAPI).Get(ctx, nil)
27. Quick Reference Cheatsheet
Workflow Lifecycle States
┌──────────┐
│ Running │
└────┬─────┘
│
┌─────────┼──────────┬────────────┐
▼ ▼ ▼ ▼
Completed Failed Cancelled Terminated
│ │ │ │
└─────────┼──────────┴────────────┘
▼
ContinueAsNew (new Run ID, same Workflow ID)
TimedOut (execution or run timeout exceeded)
Timeout Decision Tree
"How long should an activity take?"
├── Normal case: 1-30 seconds
│ └── StartToCloseTimeout: 30s, RetryPolicy: 3 attempts
├── External API with variable latency
│ └── StartToCloseTimeout: 2m, ScheduleToCloseTimeout: 10m
├── Long-running processing (minutes-hours)
│ └── StartToCloseTimeout: 2h, HeartbeatTimeout: 60s
└── Human task (external completion)
└── ScheduleToCloseTimeout: 7d, no heartbeat
Common Workflow Patterns
Sequential: A → B → C → Done
Parallel: A + B + C → Wait All → Done
Fan-out/in: Spawn N children → Collect results
Saga: A → B → C (fail) → Compensate B → Compensate A
Entity: Create → Handle signals forever → ContinueAsNew
Pipeline: Process batch → ContinueAsNew(remaining)
State Machine: Signal transitions: Created → Active → Suspended → Closed
Polling: Loop { Activity + Sleep } → ContinueAsNew
Timer: Sleep(duration) → Execute → Done
Human-in-loop: Request → Wait for Signal → Process
SDK Quick Reference
| Operation | Go | TypeScript | Python | Java |
|---|---|---|---|---|
| Sleep | workflow.Sleep(ctx, d) | await sleep(d) | await asyncio.sleep(d) | Workflow.sleep(d) |
| Execute Activity | workflow.ExecuteActivity(ctx, fn, args).Get(ctx, &res) | await fn(args) (proxied) | await workflow.execute_activity(fn, args, ...) | activities.fn(args) (stub) |
| Signal Channel | workflow.GetSignalChannel(ctx, name) | setHandler(signal, fn) | @workflow.signal | @SignalMethod |
| Query | workflow.SetQueryHandler(ctx, name, fn) | setHandler(query, fn) | @workflow.query | @QueryMethod |
| Child Workflow | workflow.ExecuteChildWorkflow(ctx, fn, args) | executeChild(fn, {args}) | await workflow.execute_child_workflow(fn, args) | Workflow.newChildWorkflowStub(I.class) |
| Timer | workflow.NewTimer(ctx, d) | sleep(d) | await asyncio.sleep(d) | Workflow.newTimer(d) |
| ContinueAsNew | workflow.NewContinueAsNewError(ctx, fn, args) | continueAsNew(args) | workflow.continue_as_new(args) | Workflow.continueAsNew(args) |
| Side Effect | workflow.SideEffect(ctx, fn) | N/A (use uuid4()) | N/A (use activities) | Workflow.sideEffect(cls, fn) |
| Now | workflow.Now(ctx) | Date.now() (intercepted) | workflow.now() | Workflow.currentTimeMillis() |
| Random | workflow.NewRandom(ctx) | Math.random() (intercepted) | N/A (use side effect) | Workflow.newRandom() |
| UUID | workflow.SideEffect | uuid4() | workflow.uuid4() | Workflow.randomUUID() |
Environment Variables
# Client connection
TEMPORAL_ADDRESS=localhost:7233
TEMPORAL_NAMESPACE=default
TEMPORAL_TLS_CERT=/path/to/client.crt
TEMPORAL_TLS_KEY=/path/to/client.key
TEMPORAL_TLS_CA=/path/to/ca.crt
# Temporal Cloud
TEMPORAL_ADDRESS=your-ns.your-account.tmprl.cloud:7233
TEMPORAL_NAMESPACE=your-ns.your-account
Docker Quick Start
# Fastest way to start (SQLite, ephemeral)
temporal server start-dev
# With persistent storage
temporal server start-dev --db-filename /tmp/temporal.db
# With custom namespace
temporal server start-dev --namespace my-app
# Full stack via Docker
docker compose up -d # using the compose file from section 20
Health Checks
# Server health
temporal operator cluster health
# Expected: SERVING
# Task queue has workers?
temporal task-queue describe --task-queue my-queue
# Check Pollers list — should not be empty
# Workflow stuck?
temporal workflow describe --workflow-id my-wf
# Check PendingActivities, PendingChildren
# History size growing?
temporal workflow show --workflow-id my-wf | wc -l
# > 10,000 events = consider ContinueAsNew
Dependency Versions (as of 2025)
| Component | Recommended Version |
|---|---|
| Temporal Server | 1.24+ |
| Go SDK | 1.29+ |
| TypeScript SDK | 1.11+ |
| Python SDK | 1.7+ |
| Java SDK | 1.25+ |
| .NET SDK | 1.2+ |
| PostgreSQL | 14+ |
| MySQL | 8.0+ |
| Elasticsearch | 7.10+ / OpenSearch 2.x |
Appendix: Glossary
| Term | Definition |
|---|---|
| Workflow | Deterministic, durable function orchestrating business logic |
| Activity | Non-deterministic function for side-effects |
| Worker | Your process that executes workflows and activities |
| Task Queue | Named queue connecting clients/workflows to workers |
| Namespace | Isolation boundary for workflows, queues, and config |
| Signal | Async message to a running workflow (fire-and-forget) |
| Query | Sync, read-only inspection of workflow state |
| Update | Sync, read-write mutation of workflow state with result |
| Event History | Append-only log of all workflow execution events |
| Replay | Re-executing workflow code using cached event history |
| Sticky Execution | Routing workflow tasks back to the same cached worker |
| Continue-As-New | Restarting a workflow with fresh history |
| Workflow Task | Internal task to advance workflow state (~milliseconds) |
| Activity Task | Task to execute an activity function |
| Schedule | Server-managed cron-like recurring workflow execution |
| Codec | Encoder/decoder for payload encryption or compression |
| Search Attribute | Indexed key-value metadata for workflow filtering |
| Workflow ID | User-defined unique identifier for a workflow |
| Run ID | System-generated UUID for a specific workflow run |
| Heartbeat | Periodic progress report from a long-running activity |
| Shard | Partition of the History service for parallelism |
| Ringpop | SWIM-based gossip protocol for server membership |
Last updated: 2026-03-27