Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/temporalio/temporal/llms.txt

Use this file to discover all available pages before exploring further.

Worker Service (Internal)

The Worker Service is an internal component of the Temporal cluster that hosts background processing tasks. Unlike user-hosted workers that execute workflow and activity code, the Worker Service performs cluster maintenance and system operations.
Do not confuse the Worker Service (internal cluster component) with Worker processes (user-hosted processes that execute workflow/activity code).

Overview

The Worker Service provides a framework for running internal Temporal Workflows and Activities that handle:
  • Cross-cluster replication
  • Workflow history archival
  • History and task queue scanning
  • Batch operations
  • System maintenance tasks

Core Components

Replicator

Handles cross-cluster replication in multi-region deployments:
Consumes replication tasks generated by remote Temporal clusters:
  • History replication tasks (workflow events)
  • Sync shard status tasks
  • Sync activity tasks
  • Namespace replication tasks
Resolves conflicts when the same workflow is modified in multiple clusters:
  • Version vectors for causality tracking
  • Last-write-wins with version comparison
  • Handles namespace failover scenarios
Maintains a queue of pending replication tasks:
  • Persisted to handle worker restarts
  • Processed in order per workflow execution
  • Retries on failure with backoff
// Code entrypoint: service/worker/replicator/
// Replication task processing logic

Multi-Cluster Setup

For cross-cluster replication:
  1. Configure remote cluster endpoints
  2. Create global namespaces with multiple clusters
  3. Replicator pulls replication tasks from remote clusters
  4. Tasks are applied to local History Service
# Example: Connecting two clusters
tctl --address 127.0.0.1:7233 adm cluster add-remote \
  --frontend_address "localhost:8233"

tctl --address 127.0.0.1:8233 adm cluster add-remote \
  --frontend_address "localhost:7233"
See Worker Service README for development quickstart.

Scanner

Performs maintenance scans on cluster data:

Executions Scanner

  • History Scavenger: Cleans up orphaned history branches
  • Execution Fixer: Repairs corrupted workflow executions
  • Concrete Execution Scanner: Validates execution data integrity

Task Queue Scanner

  • Cleans up stale task queues
  • Removes abandoned partitions
  • Consolidates task queue metadata

History Archival

Archives completed workflow histories to long-term storage:
  • Moves history from primary database to archival storage (S3, GCS, etc.)
  • Configurable retention period before archival
  • Reduces load on primary database
  • Enables compliance and auditing
// Code entrypoint: service/worker/scanner/
// Scanner workflow and activity implementations

Configuration

Scanners are configured via dynamic config:
  • Scan interval and batch size
  • Enable/disable specific scanners
  • Concurrent execution limits
  • Error handling policies

Batcher

Processes batch operations across multiple workflows: Use Cases:
  • Batch signal workflows matching a query
  • Batch cancel workflows
  • Batch terminate workflows
  • Batch reset workflows
Operation:
  1. Query visibility to find target workflows
  2. Process workflows in batches
  3. Rate limit to avoid cluster overload
  4. Report progress and errors
// Code entrypoint: service/worker/batcher/
// Batch operation workflow and activities

Batch Workflow

The batcher runs as a workflow:
// Simplified batch workflow structure:
func BatchWorkflow(ctx Context, params BatchParams) error {
    // 1. Query visibility for target workflows
    workflows := QueryVisibility(params.Query)
    
    // 2. Process in batches
    for batch := range Paginate(workflows, params.BatchSize) {
        activities.ProcessBatch(batch, params.Operation)
    }
    
    return nil
}

Parent Close Policy Worker

Handles child workflow cleanup when parent workflows close: Parent Close Policies:
  • ABANDON: Leave child workflows running
  • TERMINATE: Terminate all child workflows
  • REQUEST_CANCEL: Send cancel request to children
Implementation:
  • Triggered when parent workflow completes
  • Queries for child workflows
  • Applies policy to each child
  • Retries on failure
// Code entrypoint: service/worker/parentclosepolicy/
// Parent close policy workflow

Per-Namespace Workers

Dynamically creates and manages workers for specific namespaces: Features:
  • Worker count per namespace is configurable
  • Workers start/stop dynamically based on config
  • Each namespace can have custom worker options
  • Used for namespace-specific workflows (e.g., migrations)
// Code entrypoint: service/worker/pernamespaceworker.go
// Per-namespace worker management
Configuration:
# Dynamic config example
worker.perNamespaceWorkerCount:
  - namespace: special-namespace
    value: 5  # Run 5 workers for this namespace

worker.perNamespaceWorkerOptions:
  - namespace: special-namespace
    value:
      maxConcurrentActivityExecutionSize: 100
      maxConcurrentWorkflowTaskExecutionSize: 50

Worker Manager

The Worker Service includes a workerManager that:
  • Initializes and starts all internal workers
  • Manages worker lifecycle (start, stop, health check)
  • Coordinates shutdown during service termination
  • Handles worker registration with the cluster
// Code entrypoint: service/worker/worker.go
// WorkerManager implementation

SDK Client Factory

The Worker Service uses the Temporal SDK to run internal workflows:
  • Creates SDK clients connected to local cluster
  • Configures workers with appropriate task queues
  • Handles authentication and TLS
  • Manages connection pooling
// Internal workers use the same SDK as user workers
// This ensures consistency and leverages SDK features

System Namespaces

Internal workers operate on special system namespaces:
  • temporal-system: Main system namespace for internal workflows
  • Scanner workflows: Run in temporal-system
  • Batcher workflows: Run in temporal-system
  • Isolated from user namespaces

Task Queues

Internal workers poll specific task queues:
  • Replicator: temporal-replicator task queue
  • Scanner: temporal-scanner-taskqueue-{type} task queues
  • Batcher: temporal-sys-batcher-taskqueue
  • Parent Close Policy: temporal-sys-pcp-taskqueue

Service Configuration

// Code entrypoint: service/worker/service.go
type Config struct {
    // Scanner configuration
    ScannerCfg *scanner.Config
    
    // Parent close policy configuration
    ParentCloseCfg *parentclosepolicy.Config
    
    // Rate limiting
    PersistenceMaxQPS int
    
    // Batcher configuration
    EnableBatcher        bool
    BatcherRPS           int  // Per-namespace
    BatcherConcurrency   int  // Per-namespace
    
    // Per-namespace workers
    PerNamespaceWorkerCount   int  // Per-namespace
    PerNamespaceWorkerOptions sdkworker.Options
}

Failure Handling

Worker Restarts

Internal workers are resilient to restarts:
  • Workflows are durable and resume after restart
  • In-progress activities are retried
  • Replication queue persists pending tasks
  • Scanner state is checkpointed

Error Handling

Internal workflows implement retry policies:
  • Transient errors: Exponential backoff retry
  • Permanent errors: Fail workflow and alert
  • Rate limiting: Back off and retry

Monitoring

The Worker Service emits metrics:
  • Replication lag per remote cluster
  • Scanner progress and error rates
  • Batcher operation success/failure
  • Per-namespace worker health

Deployment Considerations

Dedicated Worker Service Instances

In production:
  • Run Worker Service on dedicated instances
  • Separate from Frontend/History/Matching for resource isolation
  • Scale independently based on workload
  • Consider resource-intensive tasks (scanner, replicator)

Resource Requirements

Worker Service needs:
  • CPU: For workflow execution and activity processing
  • Memory: For SDK worker caches and in-flight workflows
  • Network: For replication streams and archival uploads
  • Disk: For local activity retries and temporary storage

High Availability

Worker Service instances:
  • Multiple instances can run concurrently
  • Work is distributed via task queues (Matching Service)
  • No single point of failure
  • Automatic failover on instance failure

Development and Testing

Local Development

To run Worker Service locally:
# Start development server with all services
make start

# Worker Service starts automatically
# Logs show internal worker activity

Adding New Internal Workflows

To add a new internal workflow:
  1. Create workflow and activity implementations
  2. Register with worker manager
  3. Configure task queue and worker options
  4. Add metrics and logging
  5. Write tests
// Example: Registering a new internal worker
func (s *Service) Start() {
    s.workerManager.RegisterWorker(
        "my-internal-workflow",
        "my-task-queue",
        myWorkflow,
        myActivities,
    )
}

Further Reading

History Service

How workflow executions are managed

Matching Service

Task distribution for internal workers

Architecture Overview

High-level system architecture

Workflow Lifecycle

How workflows execute