Documentation Index
Fetch the complete documentation index at: https://mintlify.com/temporalio/temporal/llms.txt
Use this file to discover all available pages before exploring further.
Temporal Server persists workflow state, history, and task queues in a database. This guide covers persistence layer operations and optimization.
Persistence Architecture
Temporal uses two types of data stores:
- Default Store - Core workflow data, task queues, and system state
- Visibility Store - Workflow search and list operations
Data Store Types
Supported databases:
- Cassandra - Horizontally scalable, high throughput
- PostgreSQL - ACID compliant, strong consistency
- MySQL - ACID compliant, widespread support
- SQLite - Development and testing only
Configuration
Basic Setup
persistence:
defaultStore: "cassandra-default"
visibilityStore: "elasticsearch-visibility"
numHistoryShards: 4096
datastores:
cassandra-default:
cassandra:
hosts: "127.0.0.1"
port: 9042
keyspace: "temporal"
user: "temporal_user"
password: "${DB_PASSWORD}"
elasticsearch-visibility:
elasticsearch:
url: "http://elasticsearch:9200"
indices:
visibility: "temporal_visibility_v1"
History Shards
Workflow executions are sharded across multiple partitions:
persistence:
numHistoryShards: 4096 # Must be power of 2: 1, 2, 4, 8, ..., 16384
Shard Selection:
- Based on workflow ID hash
- Immutable after cluster creation
- Higher count = better parallelism
Recommended Shard Counts:
- Development: 1-4
- Small production: 128-512
- Medium production: 1024-2048
- Large production: 4096-16384
Cassandra Configuration
Connection Settings
persistence:
datastores:
default:
cassandra:
hosts: "cassandra-0,cassandra-1,cassandra-2"
port: 9042
user: "temporal_user"
password: "${CASSANDRA_PASSWORD}"
keyspace: "temporal"
datacenter: "datacenter1"
maxConns: 20
connectTimeout: "600ms"
timeout: "10s"
writeTimeout: "10s"
Connection Pool:
maxConns - Maximum connections per host (default: 2)
- Recommended: 10-20 for high throughput
- Total connections = maxConns × number of hosts × number of history nodes
Consistency Configuration
persistence:
datastores:
default:
cassandra:
consistency:
default:
consistency: "LOCAL_QUORUM" # Read/write consistency
serialConsistency: "LOCAL_SERIAL" # Serial consistency
Consistency Levels:
LOCAL_QUORUM - Majority of replicas in local datacenter (recommended)
QUORUM - Majority across all datacenters
ONE - Single replica (not recommended)
TLS Configuration
persistence:
datastores:
default:
cassandra:
tls:
enabled: true
certFile: "/path/to/client-cert.pem"
keyFile: "/path/to/client-key.pem"
caFile: "/path/to/ca-cert.pem"
enableHostVerification: true
serverName: "cassandra.example.com"
Address Translation
For environments where Cassandra returns non-routable IPs:
persistence:
datastores:
default:
cassandra:
addressTranslator:
translator: "aws" # or "gcp", "azure"
options:
region: "us-east-1"
Cassandra Best Practices
- Replication Factor: 3 minimum for production
- Compaction Strategy: LeveledCompactionStrategy for temporal tables
- Read Repair: Disabled for better performance
- Monitoring: Track read/write latency, compaction lag
- Separate Clusters: Use different clusters for default and visibility
SQL Configuration (PostgreSQL/MySQL)
PostgreSQL
persistence:
datastores:
default:
sql:
pluginName: "postgres12" # or "postgres12_pgx"
databaseName: "temporal"
connectAddr: "postgres.example.com:5432"
connectProtocol: "tcp"
user: "temporal_user"
password: "${DB_PASSWORD}"
maxConns: 100
maxIdleConns: 20
maxConnLifetime: "1h"
connectAttributes:
sslmode: "require"
MySQL
persistence:
datastores:
default:
sql:
pluginName: "mysql8"
databaseName: "temporal"
connectAddr: "mysql.example.com:3306"
connectProtocol: "tcp"
user: "temporal_user"
password: "${DB_PASSWORD}"
maxConns: 100
maxIdleConns: 20
maxConnLifetime: "1h"
connectAttributes:
tx_isolation: "READ-COMMITTED"
parseTime: "true"
Connection Pool Tuning
persistence:
datastores:
default:
sql:
maxConns: 100 # Maximum open connections
maxIdleConns: 20 # Maximum idle connections
maxConnLifetime: "1h" # Connection lifetime
Pool Sizing:
maxConns = (numHistoryShards / numHistoryNodes) × 2-3
For 4096 shards across 16 history nodes:
maxConns = (4096 / 16) × 2.5 = 640
SQL TLS Configuration
persistence:
datastores:
default:
sql:
connectAttributes:
sslmode: "verify-full"
sslcert: "/path/to/client-cert.pem"
sslkey: "/path/to/client-key.pem"
sslrootcert: "/path/to/ca-cert.pem"
persistence:
datastores:
default:
sql:
tls:
enabled: true
certFile: "/path/to/client-cert.pem"
keyFile: "/path/to/client-key.pem"
caFile: "/path/to/ca-cert.pem"
enableHostVerification: true
serverName: "mysql.example.com"
Vitess (MySQL Sharding)
For large-scale MySQL deployments:
persistence:
datastores:
default:
sql:
pluginName: "mysql8"
databaseName: "temporal"
connectAddr: "vtgate.example.com:15306"
taskScanPartitions: 4 # Number of Vitess shards
Visibility Store Configuration
Elasticsearch
persistence:
visibilityStore: "elasticsearch-visibility"
datastores:
elasticsearch-visibility:
elasticsearch:
version: "v7" # or "v6", "v8"
url: "http://elasticsearch:9200"
username: "elastic"
password: "${ES_PASSWORD}"
indices:
visibility: "temporal_visibility_v1"
numShards: 5
numReplicas: 1
logLevel: "error"
Index Sharding:
- Start with 5 primary shards
- Increase to 10-20 for > 100M workflows
- Use 1-2 replicas for production
Dual Visibility
Run two visibility stores simultaneously:
persistence:
visibilityStore: "elasticsearch-primary"
secondaryVisibilityStore: "sql-secondary"
datastores:
elasticsearch-primary:
elasticsearch:
url: "http://elasticsearch:9200"
indices:
visibility: "temporal_visibility_v1"
sql-secondary:
sql:
pluginName: "postgres12"
databaseName: "temporal_visibility"
Useful for:
- Migration from one visibility store to another
- Comparing query results
- Fallback during maintenance
Schema Management
Initial Setup
Temporal provides schema files in /schema directory:
Cassandra
PostgreSQL
MySQL
# Setup keyspace and schema
cassandra-tool \
--ep 127.0.0.1 \
--keyspace temporal \
--replication-factor 3 \
setup-schema \
--version 1.0
# Setup visibility schema
cassandra-tool \
--ep 127.0.0.1 \
--keyspace temporal_visibility \
--replication-factor 3 \
setup-schema \
--version 1.0
# Setup database
temporal-sql-tool \
--plugin postgres12 \
--ep postgres.example.com \
--db temporal \
setup-schema \
--version 1.0
# Setup visibility
temporal-sql-tool \
--plugin postgres12 \
--ep postgres.example.com \
--db temporal_visibility \
setup-schema \
--version 1.0
# Setup database
temporal-sql-tool \
--plugin mysql8 \
--ep mysql.example.com \
--db temporal \
setup-schema \
--version 1.0
# Setup visibility
temporal-sql-tool \
--plugin mysql8 \
--ep mysql.example.com \
--db temporal_visibility \
setup-schema \
--version 1.0
Schema Updates
Upgrade to newer Temporal versions:
# Check current version
temporal-sql-tool \
--plugin postgres12 \
--ep postgres.example.com \
--db temporal \
update-schema \
--version 1.1
Schema Versioning
Temporal tracks schema version in the database:
/schema/cassandra/temporal/versioned/ - Cassandra schemas
/schema/postgresql/v12/temporal/versioned/ - PostgreSQL schemas
/schema/mysql/v8/temporal/versioned/ - MySQL schemas
Persistence Metrics
Operation Metrics
All persistence operations emit metrics:
# Format: Persistence{Operation}Scope
PersistenceGetWorkflowExecution
PersistenceUpdateWorkflowExecution
PersistenceCreateWorkflowExecution
PersistenceDeleteWorkflowExecution
Each emits:
- Request count
- Error count
- Latency histogram
- Tagged with
db_kind
Monitoring Query
# P99 latency for workflow updates
histogram_quantile(0.99,
rate(PersistenceUpdateWorkflowExecution_latency_bucket[5m])
)
# Error rate
rate(PersistenceUpdateWorkflowExecution_errors[5m]) /
rate(PersistenceUpdateWorkflowExecution_requests[5m])
Critical Metrics
-
Shard Operations
GetOrCreateShard - Should be fast (< 10ms)
UpdateShard - Latency impacts failover
-
Workflow Operations
UpdateWorkflowExecution - Most frequent, optimize heavily
CreateWorkflowExecution - Directly affects start rate
-
Task Operations
GetTransferTasks - Affects task dispatch latency
GetTimerTasks - Affects timer firing accuracy
Data Retention
Workflow Retention
Set retention per namespace:
tctl namespace register \
--namespace production \
--retention 30 # Days
Or update existing:
tctl namespace update \
--namespace production \
--retention 7
Retention Behavior:
- Applies to closed workflows only
- History deleted after retention period
- Visibility records removed
- Does not affect running workflows
Database Cleanup
Cassandra:
- Uses TTL on history tables
- Automatic compaction removes expired data
- No manual cleanup needed
SQL Databases:
- History scavenger deletes old records
- Runs as system workflow
- Configure via dynamic config:
worker.executionsScannerEnabled:
- value: true
constraints: {}
worker.executionsScannerConcurrency:
- value: 5
constraints: {}
Backup and Recovery
Cassandra Backup
# Take snapshot
nodetool snapshot temporal
nodetool snapshot temporal_visibility
# List snapshots
nodetool listsnapshots
# Clear old snapshots
nodetool clearsnapshot -t snapshot_name
PostgreSQL Backup
# Logical backup
pg_dump temporal > temporal_backup.sql
pg_dump temporal_visibility > temporal_visibility_backup.sql
# Point-in-time recovery
pg_basebackup -D /var/lib/postgresql/backup
MySQL Backup
# Using mysqldump
mysqldump temporal > temporal_backup.sql
mysqldump temporal_visibility > temporal_visibility_backup.sql
# Using Percona XtraBackup
xtrabackup --backup --target-dir=/backup/
Recovery Considerations
- Consistency: Backup all datastores simultaneously
- Downtime: Stop Temporal services during restore
- Testing: Regularly test restore procedures
- Point-in-Time: Use transaction logs for precise recovery
Troubleshooting
High Latency
Symptoms:
- Persistence metrics show high p99 latency
- Workflow operations slow
Solutions:
- Check database server metrics (CPU, I/O)
- Review query execution plans
- Verify connection pool not exhausted
- Check network latency to database
- Add read replicas (not recommended for writes)
Connection Pool Exhaustion
Symptoms:
connection refused errors
too many connections errors
Solutions:
- Increase
maxConns in config
- Add more history nodes to distribute load
- Increase database connection limits
- Check for connection leaks
Data Inconsistency
Symptoms:
- Workflow state doesn’t match expected
- Missing history events
Solutions:
- Verify consistency settings (Cassandra)
- Check for split-brain scenarios
- Review replication lag
- Verify no partial failures during writes
Schema Version Mismatch
Symptoms:
schema version mismatch errors
- Server fails to start
Solutions:
- Check schema version:
SELECT * FROM schema_version;
- Run schema update tool
- Ensure all nodes use same version
- Review schema update logs
Cassandra
- Compaction: Use LeveledCompactionStrategy
- Caching: Enable row cache for small workflows
- GC: Tune JVM for low pause times
- Replication: Use LOCAL_QUORUM for better performance
PostgreSQL
- Indexes: Ensure all indexes are healthy
- VACUUM: Run auto-vacuum regularly
- Shared Buffers: Set to 25% of RAM
- Work Memory: Increase for large queries
- Connection Pooling: Use pgBouncer
MySQL
- InnoDB Buffer Pool: Set to 70-80% of RAM
- Binary Logging: Use ROW format
- Query Cache: Disable (deprecated in 8.0)
- Connection Pooling: Use ProxySQL
See Also