Worker Service (Internal)

The Worker Service is an internal component of the Temporal cluster that hosts background processing tasks. Unlike user-hosted workers that execute workflow and activity code, the Worker Service performs cluster maintenance and system operations.

Do not confuse the Worker Service (internal cluster component) with Worker processes (user-hosted processes that execute workflow/activity code).

Overview

The Worker Service provides a framework for running internal Temporal Workflows and Activities that handle:

Cross-cluster replication
Workflow history archival
History and task queue scanning
Batch operations
System maintenance tasks

Core Components

Replicator

Handles cross-cluster replication in multi-region deployments:

Replication Tasks

Consumes replication tasks generated by remote Temporal clusters:

History replication tasks (workflow events)
Sync shard status tasks
Sync activity tasks
Namespace replication tasks

Conflict Resolution

Resolves conflicts when the same workflow is modified in multiple clusters:

Version vectors for causality tracking
Last-write-wins with version comparison
Handles namespace failover scenarios

Replication Queue

Maintains a queue of pending replication tasks:

Persisted to handle worker restarts
Processed in order per workflow execution
Retries on failure with backoff

// Code entrypoint: service/worker/replicator/
// Replication task processing logic

Multi-Cluster Setup

For cross-cluster replication:

Configure remote cluster endpoints
Create global namespaces with multiple clusters
Replicator pulls replication tasks from remote clusters
Tasks are applied to local History Service

# Example: Connecting two clusters
tctl --address 127.0.0.1:7233 adm cluster add-remote \
  --frontend_address "localhost:8233"

tctl --address 127.0.0.1:8233 adm cluster add-remote \
  --frontend_address "localhost:7233"

See Worker Service README for development quickstart.

Scanner

Performs maintenance scans on cluster data:

Executions Scanner

History Scavenger: Cleans up orphaned history branches
Execution Fixer: Repairs corrupted workflow executions
Concrete Execution Scanner: Validates execution data integrity

Task Queue Scanner

Cleans up stale task queues
Removes abandoned partitions
Consolidates task queue metadata

History Archival

Archives completed workflow histories to long-term storage:

Moves history from primary database to archival storage (S3, GCS, etc.)
Configurable retention period before archival
Reduces load on primary database
Enables compliance and auditing

// Code entrypoint: service/worker/scanner/
// Scanner workflow and activity implementations

Configuration

Scanners are configured via dynamic config:

Scan interval and batch size
Enable/disable specific scanners
Concurrent execution limits
Error handling policies

Batcher

Processes batch operations across multiple workflows: Use Cases:

Batch signal workflows matching a query
Batch cancel workflows
Batch terminate workflows
Batch reset workflows

Operation:

Query visibility to find target workflows
Process workflows in batches
Rate limit to avoid cluster overload
Report progress and errors

// Code entrypoint: service/worker/batcher/
// Batch operation workflow and activities

Batch Workflow

The batcher runs as a workflow:

// Simplified batch workflow structure:
func BatchWorkflow(ctx Context, params BatchParams) error {
    // 1. Query visibility for target workflows
    workflows := QueryVisibility(params.Query)
    
    // 2. Process in batches
    for batch := range Paginate(workflows, params.BatchSize) {
        activities.ProcessBatch(batch, params.Operation)
    }
    
    return nil
}

Parent Close Policy Worker

Handles child workflow cleanup when parent workflows close: Parent Close Policies:

ABANDON: Leave child workflows running
TERMINATE: Terminate all child workflows
REQUEST_CANCEL: Send cancel request to children

Implementation:

Triggered when parent workflow completes
Queries for child workflows
Applies policy to each child
Retries on failure

// Code entrypoint: service/worker/parentclosepolicy/
// Parent close policy workflow

Per-Namespace Workers

Dynamically creates and manages workers for specific namespaces: Features:

Worker count per namespace is configurable
Workers start/stop dynamically based on config
Each namespace can have custom worker options
Used for namespace-specific workflows (e.g., migrations)

// Code entrypoint: service/worker/pernamespaceworker.go
// Per-namespace worker management

Configuration:

# Dynamic config example
worker.perNamespaceWorkerCount:
  - namespace: special-namespace
    value: 5  # Run 5 workers for this namespace

worker.perNamespaceWorkerOptions:
  - namespace: special-namespace
    value:
      maxConcurrentActivityExecutionSize: 100
      maxConcurrentWorkflowTaskExecutionSize: 50

Worker Manager

The Worker Service includes a workerManager that:

Initializes and starts all internal workers
Manages worker lifecycle (start, stop, health check)
Coordinates shutdown during service termination
Handles worker registration with the cluster

// Code entrypoint: service/worker/worker.go
// WorkerManager implementation

SDK Client Factory

The Worker Service uses the Temporal SDK to run internal workflows:

Creates SDK clients connected to local cluster
Configures workers with appropriate task queues
Handles authentication and TLS
Manages connection pooling

// Internal workers use the same SDK as user workers
// This ensures consistency and leverages SDK features

System Namespaces

Internal workers operate on special system namespaces:

temporal-system: Main system namespace for internal workflows
Scanner workflows: Run in temporal-system
Batcher workflows: Run in temporal-system
Isolated from user namespaces

Task Queues

Internal workers poll specific task queues:

Replicator: temporal-replicator task queue
Scanner: temporal-scanner-taskqueue-{type} task queues
Batcher: temporal-sys-batcher-taskqueue
Parent Close Policy: temporal-sys-pcp-taskqueue

Service Configuration

// Code entrypoint: service/worker/service.go
type Config struct {
    // Scanner configuration
    ScannerCfg *scanner.Config
    
    // Parent close policy configuration
    ParentCloseCfg *parentclosepolicy.Config
    
    // Rate limiting
    PersistenceMaxQPS int
    
    // Batcher configuration
    EnableBatcher        bool
    BatcherRPS           int  // Per-namespace
    BatcherConcurrency   int  // Per-namespace
    
    // Per-namespace workers
    PerNamespaceWorkerCount   int  // Per-namespace
    PerNamespaceWorkerOptions sdkworker.Options
}

Failure Handling

Worker Restarts

Internal workers are resilient to restarts:

Workflows are durable and resume after restart
In-progress activities are retried
Replication queue persists pending tasks
Scanner state is checkpointed

Error Handling

Internal workflows implement retry policies:

Transient errors: Exponential backoff retry
Permanent errors: Fail workflow and alert
Rate limiting: Back off and retry

Monitoring

The Worker Service emits metrics:

Replication lag per remote cluster
Scanner progress and error rates
Batcher operation success/failure
Per-namespace worker health

Deployment Considerations

Dedicated Worker Service Instances

In production:

Run Worker Service on dedicated instances
Separate from Frontend/History/Matching for resource isolation
Scale independently based on workload
Consider resource-intensive tasks (scanner, replicator)

Resource Requirements

Worker Service needs:

CPU: For workflow execution and activity processing
Memory: For SDK worker caches and in-flight workflows
Network: For replication streams and archival uploads
Disk: For local activity retries and temporary storage

High Availability

Worker Service instances:

Multiple instances can run concurrently
Work is distributed via task queues (Matching Service)
No single point of failure
Automatic failover on instance failure

Development and Testing

Local Development

To run Worker Service locally:

# Start development server with all services
make start

# Worker Service starts automatically
# Logs show internal worker activity

Adding New Internal Workflows

To add a new internal workflow:

Create workflow and activity implementations
Register with worker manager
Configure task queue and worker options
Add metrics and logging
Write tests

// Example: Registering a new internal worker
func (s *Service) Start() {
    s.workerManager.RegisterWorker(
        "my-internal-workflow",
        "my-task-queue",
        myWorkflow,
        myActivities,
    )
}

History Service

How workflow executions are managed

Matching Service

Task distribution for internal workers

Architecture Overview

High-level system architecture

Workflow Lifecycle

How workflows execute

​Worker Service (Internal)

​Overview

​Core Components

​Replicator

​Multi-Cluster Setup

​Scanner

​Executions Scanner

​Task Queue Scanner

​History Archival

​Configuration

​Batcher

​Batch Workflow

​Parent Close Policy Worker

​Per-Namespace Workers

​Worker Manager

​SDK Client Factory

​System Namespaces

​Task Queues

​Service Configuration

​Failure Handling

​Worker Restarts

​Error Handling

​Monitoring

​Deployment Considerations

​Dedicated Worker Service Instances

​Resource Requirements

​High Availability

​Development and Testing

​Local Development

​Adding New Internal Workflows

​Further Reading

History Service

Matching Service

Architecture Overview

Workflow Lifecycle

Worker Service (Internal)

Overview

Core Components

Replicator

Multi-Cluster Setup

Scanner

Executions Scanner

Task Queue Scanner

History Archival

Configuration

Batcher

Batch Workflow

Parent Close Policy Worker

Per-Namespace Workers

Worker Manager

SDK Client Factory

System Namespaces

Task Queues

Service Configuration

Failure Handling

Worker Restarts

Error Handling

Monitoring

Deployment Considerations

Dedicated Worker Service Instances

Resource Requirements

High Availability

Development and Testing

Local Development

Adding New Internal Workflows

Further Reading