* v2.1.0.2 * restore tweaks to simlibp2p * fix: nil ref on size calc * fix: panic should induce shutdown from event_distributor * fix: friendlier initialization that requires less manual kickstarting for test/devnets * fix: fewer available shards than provers should choose shard length * fix: update stored worker registry, improve logging for debug mode * fix: shut the fuck up, peer log * qol: log value should be snake cased * fix:non-archive snap sync issues * fix: separate X448/Decaf448 signed keys, add onion key to registry * fix: overflow arithmetic on frame number comparison * fix: worker registration should be idempotent if inputs are same, otherwise permit updated records * fix: remove global prover state from size calculation * fix: divide by zero case * fix: eager prover * fix: broadcast listener default * qol: diagnostic data for peer authenticator * fix: master/worker connectivity issue in sparse networks tight coupling of peer and workers can sometimes interfere if mesh is sparse, so give workers a pseudoidentity but publish messages with the proper peer key * fix: reorder steps of join creation * fix: join verify frame source + ensure domain is properly padded (unnecessary but good for consistency) * fix: add delegate to protobuf <-> reified join conversion * fix: preempt prover from planning with no workers * fix: use the unallocated workers to generate a proof * qol: underflow causes join fail in first ten frames on test/devnets * qol: small logging tweaks for easier log correlation in debug mode * qol: use fisher-yates shuffle to ensure prover allocations are evenly distributed when scores are equal * qol: separate decisional logic on post-enrollment confirmation into consensus engine, proposer, and worker manager where relevant, refactor out scoring * reuse shard descriptors for both join planning and confirm/reject decisions * fix: add missing interface method and amend test blossomsub to use new peer id basis * fix: only check allocations if they exist * fix: pomw mint proof data needs to be hierarchically under global intrinsic domain * staging temporary state under diagnostics * fix: first phase of distributed lock refactoring * fix: compute intrinsic locking * fix: hypergraph intrinsic locking * fix: token intrinsic locking * fix: update execution engines to support new locking model * fix: adjust tests with new execution shape * fix: weave in lock/unlock semantics to liveness provider * fix lock fallthrough, add missing allocation update * qol: additional logging for diagnostics, also testnet/devnet handling for confirmations * fix: establish grace period on halt scenario to permit recovery * fix: support test/devnet defaults for coverage scenarios * fix: nil ref on consensus halts for non-archive nodes * fix: remove unnecessary prefix from prover ref * add test coverage for fork choice behaviors and replay – once passing, blocker (2) is resolved * fix: no fork replay on repeat for non-archive nodes, snap now behaves correctly * rollup of pre-liveness check lock interactions * ahead of tests, get the protobuf/metrics-related changes out so teams can prepare * add test coverage for distributed lock behaviors – once passing, blocker (3) is resolved * fix: blocker (3) * Dev docs improvements (#445) * Make install deps script more robust * Improve testing instructions * Worker node should stop upon OS SIGINT/SIGTERM signal (#447) * move pebble close to Stop() * move deferred Stop() to Start() * add core id to worker stop log message * create done os signal channel and stop worker upon message to it --------- Co-authored-by: Cassandra Heart <7929478+CassOnMars@users.noreply.github.com> --------- Co-authored-by: Daz <daz_the_corgi@proton.me> Co-authored-by: Black Swan <3999712+blacks1ne@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| example | ||
| go.mod | ||
| go.sum | ||
| README.md | ||
| state_machine_test.go | ||
| state_machine_viz.go | ||
| state_machine.go | ||
Consensus State Machine
A generic, extensible state machine implementation for building Byzantine Fault Tolerant (BFT) consensus protocols. This library provides a framework for implementing round-based consensus algorithms with cryptographic proofs.
Overview
The state machine manages consensus engine state transitions through a well-defined set of states and events. It supports generic type parameters to allow different implementations of state data, votes, peer identities, and collected mutations.
Features
- Generic Implementation: Supports custom types for state data, votes, peer IDs, and collected data
- Byzantine Fault Tolerance: Provides BFT consensus with < 1/3 byzantine nodes, flexible to other probabilistic BFT implementations
- Round-based Consensus: Implements a round-based state transition pattern
- Pluggable Providers: Extensible through provider interfaces for different consensus behaviors
- Event-driven Architecture: State transitions triggered by events with optional guard conditions
- Concurrent Safe: Thread-safe implementation with proper mutex usage
- Timeout Support: Configurable timeouts for each state with automatic transitions
- Transition Listeners: Observable state transitions for monitoring and debugging
Core Concepts
States
The state machine progresses through the following states:
- StateStopped: Initial state, engine is not running
- StateStarting: Engine is initializing
- StateLoading: Loading data and syncing with network
- StateCollecting: Collecting data/mutations for consensus round
- StateLivenessCheck: Checking peer liveness before proving
- StateProving: Generating cryptographic proof (leader only)
- StatePublishing: Publishing proposed state
- StateVoting: Voting on proposals
- StateFinalizing: Finalizing consensus round
- StateVerifying: Verifying and publishing results
- StateStopping: Engine is shutting down
Events
Events trigger state transitions:
EventStart,EventStop: Lifecycle eventsEventSyncComplete: Synchronization finishedEventCollectionDone: Mutation collection completeEventLivenessCheckReceived: Peer liveness confirmedEventProverSignal: Leader selection completeEventProofComplete: Proof generation finishedEventProposalReceived: New proposal receivedEventVoteReceived: Vote receivedEventQuorumReached: Voting quorum achievedEventConfirmationReceived: State confirmation received- And more...
Type Constraints
All generic type parameters must implement the Unique interface:
type Unique interface {
Identity() Identity // Returns a unique string identifier
}
Provider Interfaces
SyncProvider
Handles initial state synchronization:
type SyncProvider[StateT Unique] interface {
Synchronize(
existing *StateT,
ctx context.Context,
) (<-chan *StateT, <-chan error)
}
VotingProvider
Manages the voting process:
type VotingProvider[StateT Unique, VoteT Unique, PeerIDT Unique] interface {
SendProposal(proposal *StateT, ctx context.Context) error
DecideAndSendVote(
proposals map[Identity]*StateT,
ctx context.Context,
) (PeerIDT, *VoteT, error)
IsQuorum(votes map[Identity]*VoteT, ctx context.Context) (bool, error)
FinalizeVotes(
proposals map[Identity]*StateT,
votes map[Identity]*VoteT,
ctx context.Context,
) (*StateT, PeerIDT, error)
SendConfirmation(finalized *StateT, ctx context.Context) error
}
LeaderProvider
Handles leader selection and proof generation:
type LeaderProvider[
StateT Unique,
PeerIDT Unique,
CollectedT Unique,
] interface {
GetNextLeaders(prior *StateT, ctx context.Context) ([]PeerIDT, error)
ProveNextState(
prior *StateT,
collected CollectedT,
ctx context.Context,
) (*StateT, error)
}
LivenessProvider
Manages peer liveness checks:
type LivenessProvider[
StateT Unique,
PeerIDT Unique,
CollectedT Unique,
] interface {
Collect(ctx context.Context) (CollectedT, error)
SendLiveness(prior *StateT, collected CollectedT, ctx context.Context) error
}
Usage
Basic Setup
// Define your types implementing Unique
type MyState struct {
Round uint64
Hash string
}
func (s MyState) Identity() string { return s.Hash }
type MyVote struct {
Voter string
Value bool
}
func (v MyVote) Identity() string { return v.Voter }
type MyPeerID struct {
ID string
}
func (p MyPeerID) Identity() string { return p.ID }
type MyCollected struct {
Data []byte
}
func (c MyCollected) Identity() string { return string(c.Data) }
// Implement providers
syncProvider := &MySyncProvider{}
votingProvider := &MyVotingProvider{}
leaderProvider := &MyLeaderProvider{}
livenessProvider := &MyLivenessProvider{}
// Create state machine
sm := consensus.NewStateMachine[MyState, MyVote, MyPeerID, MyCollected](
MyPeerID{ID: "node1"}, // This node's ID
&MyState{Round: 0, Hash: "genesis"}, // Initial state
true, // shouldEmitReceiveEventsOnSends
3, // minimumProvers
syncProvider,
votingProvider,
leaderProvider,
livenessProvider,
nil, // Optional trace logger
)
// Add transition listener
sm.AddListener(&MyTransitionListener{})
// Start the state machine
if err := sm.Start(); err != nil {
log.Fatal(err)
}
// Receive external events
sm.ReceiveProposal(peer, proposal)
sm.ReceiveVote(voter, vote)
sm.ReceiveLivenessCheck(peer, collected)
sm.ReceiveConfirmation(peer, confirmation)
// Stop the state machine
if err := sm.Stop(); err != nil {
log.Fatal(err)
}
Implementing Providers
See the example/generic_consensus_example.go for a complete working example
with mock provider implementations.
State Flow
The typical consensus flow:
- Start → Starting → Loading
- Loading: Synchronize with network
- Collecting: Gather mutations/changes
- LivenessCheck: Verify peer availability
- Proving: Leader generates proof
- Publishing: Leader publishes proposal
- Voting: All nodes vote on proposals
- Finalizing: Aggregate votes and determine outcome
- Verifying: Confirm and apply state changes
- Loop back to Collecting for next round
Configuration
Constructor Parameters
id: This node's peer IDinitialState: Starting state (can be nil)shouldEmitReceiveEventsOnSends: Whether to emit receive events for own messagesminimumProvers: Minimum number of active provers requiredtraceLogger: Optional logger for debugging state transitions
State Timeouts
Each state can have a configured timeout that triggers an automatic transition:
- Starting: 1 second →
EventInitComplete - Loading: 10 minutes →
EventSyncComplete - Collecting: 1 second →
EventCollectionDone - LivenessCheck: 1 second →
EventLivenessTimeout - Proving: 120 seconds →
EventPublishTimeout - Publishing: 1 second →
EventPublishTimeout - Voting: 10 seconds →
EventVotingTimeout - Finalizing: 1 second →
EventAggregationDone - Verifying: 1 second →
EventVerificationDone - Stopping: 30 seconds →
EventCleanupComplete
Thread Safety
The state machine is thread-safe. All public methods properly handle concurrent access through mutex locks. State behaviors run in separate goroutines with proper cancellation support.
Error Handling
- Provider errors are logged but don't crash the state machine
- The state machine continues operating and may retry operations
- Critical errors during state transitions are returned to callers
- Use the
TraceLoggerinterface for debugging
Best Practices
- Message Isolation: When implementing providers, always deep-copy data before sending to prevent shared state between state machine and other handlers
- Nil Handling: Provider implementations should handle nil prior states gracefully
- Context Usage: Respect context cancellation in long-running operations
- Quorum Size: Set appropriate quorum size based on your network (typically 2f+1 for f failures)
- Timeout Configuration: Adjust timeouts based on network conditions and proof generation time
Example
See example/generic_consensus_example.go for a complete working example
demonstrating:
- Mock provider implementations
- Multi-node consensus network
- Byzantine node behavior
- Message passing between nodes
- State transition monitoring
Testing
The package includes comprehensive tests in state_machine_test.go covering:
- State transitions
- Event handling
- Concurrent operations
- Byzantine scenarios
- Timeout behavior