mirror of
https://github.com/QuilibriumNetwork/ceremonyclient.git
synced 2026-02-22 19:07:26 +08:00
301 lines
8.7 KiB
Markdown
301 lines
8.7 KiB
Markdown
# Consensus State Machine
|
|
|
|
A generic, extensible state machine implementation for building Byzantine Fault
|
|
Tolerant (BFT) consensus protocols. This library provides a framework for
|
|
implementing round-based consensus algorithms with cryptographic proofs.
|
|
|
|
## Overview
|
|
|
|
The state machine manages consensus engine state transitions through a
|
|
well-defined set of states and events. It supports generic type parameters to
|
|
allow different implementations of state data, votes, peer identities, and
|
|
collected mutations.
|
|
|
|
## Features
|
|
|
|
- **Generic Implementation**: Supports custom types for state data, votes, peer
|
|
IDs, and collected data
|
|
- **Byzantine Fault Tolerance**: Provides BFT consensus with < 1/3 byzantine
|
|
nodes, flexible to other probabilistic BFT implementations
|
|
- **Round-based Consensus**: Implements a round-based state transition pattern
|
|
- **Pluggable Providers**: Extensible through provider interfaces for different
|
|
consensus behaviors
|
|
- **Event-driven Architecture**: State transitions triggered by events with
|
|
optional guard conditions
|
|
- **Concurrent Safe**: Thread-safe implementation with proper mutex usage
|
|
- **Timeout Support**: Configurable timeouts for each state with automatic
|
|
transitions
|
|
- **Transition Listeners**: Observable state transitions for monitoring and
|
|
debugging
|
|
|
|
## Core Concepts
|
|
|
|
### States
|
|
|
|
The state machine progresses through the following states:
|
|
|
|
1. **StateStopped**: Initial state, engine is not running
|
|
2. **StateStarting**: Engine is initializing
|
|
3. **StateLoading**: Loading data and syncing with network
|
|
4. **StateCollecting**: Collecting data/mutations for consensus round
|
|
5. **StateLivenessCheck**: Checking peer liveness before proving
|
|
6. **StateProving**: Generating cryptographic proof (leader only)
|
|
7. **StatePublishing**: Publishing proposed state
|
|
8. **StateVoting**: Voting on proposals
|
|
9. **StateFinalizing**: Finalizing consensus round
|
|
10. **StateVerifying**: Verifying and publishing results
|
|
11. **StateStopping**: Engine is shutting down
|
|
|
|
### Events
|
|
|
|
Events trigger state transitions:
|
|
- `EventStart`, `EventStop`: Lifecycle events
|
|
- `EventSyncComplete`: Synchronization finished
|
|
- `EventCollectionDone`: Mutation collection complete
|
|
- `EventLivenessCheckReceived`: Peer liveness confirmed
|
|
- `EventProverSignal`: Leader selection complete
|
|
- `EventProofComplete`: Proof generation finished
|
|
- `EventProposalReceived`: New proposal received
|
|
- `EventVoteReceived`: Vote received
|
|
- `EventQuorumReached`: Voting quorum achieved
|
|
- `EventConfirmationReceived`: State confirmation received
|
|
- And more...
|
|
|
|
### Type Constraints
|
|
|
|
All generic type parameters must implement the `Unique` interface:
|
|
|
|
```go
|
|
type Unique interface {
|
|
Identity() Identity // Returns a unique string identifier
|
|
}
|
|
```
|
|
|
|
## Provider Interfaces
|
|
|
|
### SyncProvider
|
|
|
|
Handles initial state synchronization:
|
|
|
|
```go
|
|
type SyncProvider[StateT Unique] interface {
|
|
Synchronize(
|
|
existing *StateT,
|
|
ctx context.Context,
|
|
) (<-chan *StateT, <-chan error)
|
|
}
|
|
```
|
|
|
|
### VotingProvider
|
|
|
|
Manages the voting process:
|
|
|
|
```go
|
|
type VotingProvider[StateT Unique, VoteT Unique, PeerIDT Unique] interface {
|
|
SendProposal(proposal *StateT, ctx context.Context) error
|
|
DecideAndSendVote(
|
|
proposals map[Identity]*StateT,
|
|
ctx context.Context,
|
|
) (PeerIDT, *VoteT, error)
|
|
IsQuorum(votes map[Identity]*VoteT, ctx context.Context) (bool, error)
|
|
FinalizeVotes(
|
|
proposals map[Identity]*StateT,
|
|
votes map[Identity]*VoteT,
|
|
ctx context.Context,
|
|
) (*StateT, PeerIDT, error)
|
|
SendConfirmation(finalized *StateT, ctx context.Context) error
|
|
}
|
|
```
|
|
|
|
### LeaderProvider
|
|
|
|
Handles leader selection and proof generation:
|
|
|
|
```go
|
|
type LeaderProvider[
|
|
StateT Unique,
|
|
PeerIDT Unique,
|
|
CollectedT Unique,
|
|
] interface {
|
|
GetNextLeaders(prior *StateT, ctx context.Context) ([]PeerIDT, error)
|
|
ProveNextState(
|
|
prior *StateT,
|
|
collected CollectedT,
|
|
ctx context.Context,
|
|
) (*StateT, error)
|
|
}
|
|
```
|
|
|
|
### LivenessProvider
|
|
|
|
Manages peer liveness checks:
|
|
|
|
```go
|
|
type LivenessProvider[
|
|
StateT Unique,
|
|
PeerIDT Unique,
|
|
CollectedT Unique,
|
|
] interface {
|
|
Collect(ctx context.Context) (CollectedT, error)
|
|
SendLiveness(prior *StateT, collected CollectedT, ctx context.Context) error
|
|
}
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Basic Setup
|
|
|
|
```go
|
|
// Define your types implementing Unique
|
|
type MyState struct {
|
|
Round uint64
|
|
Hash string
|
|
}
|
|
func (s MyState) Identity() string { return s.Hash }
|
|
|
|
type MyVote struct {
|
|
Voter string
|
|
Value bool
|
|
}
|
|
func (v MyVote) Identity() string { return v.Voter }
|
|
|
|
type MyPeerID struct {
|
|
ID string
|
|
}
|
|
func (p MyPeerID) Identity() string { return p.ID }
|
|
|
|
type MyCollected struct {
|
|
Data []byte
|
|
}
|
|
func (c MyCollected) Identity() string { return string(c.Data) }
|
|
|
|
// Implement providers
|
|
syncProvider := &MySyncProvider{}
|
|
votingProvider := &MyVotingProvider{}
|
|
leaderProvider := &MyLeaderProvider{}
|
|
livenessProvider := &MyLivenessProvider{}
|
|
|
|
// Create state machine
|
|
sm := consensus.NewStateMachine[MyState, MyVote, MyPeerID, MyCollected](
|
|
MyPeerID{ID: "node1"}, // This node's ID
|
|
&MyState{Round: 0, Hash: "genesis"}, // Initial state
|
|
true, // shouldEmitReceiveEventsOnSends
|
|
3, // minimumProvers
|
|
syncProvider,
|
|
votingProvider,
|
|
leaderProvider,
|
|
livenessProvider,
|
|
nil, // Optional trace logger
|
|
)
|
|
|
|
// Add transition listener
|
|
sm.AddListener(&MyTransitionListener{})
|
|
|
|
// Start the state machine
|
|
if err := sm.Start(); err != nil {
|
|
log.Fatal(err)
|
|
}
|
|
|
|
// Receive external events
|
|
sm.ReceiveProposal(peer, proposal)
|
|
sm.ReceiveVote(voter, vote)
|
|
sm.ReceiveLivenessCheck(peer, collected)
|
|
sm.ReceiveConfirmation(peer, confirmation)
|
|
|
|
// Stop the state machine
|
|
if err := sm.Stop(); err != nil {
|
|
log.Fatal(err)
|
|
}
|
|
```
|
|
|
|
### Implementing Providers
|
|
|
|
See the `example/generic_consensus_example.go` for a complete working example
|
|
with mock provider implementations.
|
|
|
|
## State Flow
|
|
|
|
The typical consensus flow:
|
|
|
|
1. **Start** → **Starting** → **Loading**
|
|
2. **Loading**: Synchronize with network
|
|
3. **Collecting**: Gather mutations/changes
|
|
4. **LivenessCheck**: Verify peer availability
|
|
5. **Proving**: Leader generates proof
|
|
6. **Publishing**: Leader publishes proposal
|
|
7. **Voting**: All nodes vote on proposals
|
|
8. **Finalizing**: Aggregate votes and determine outcome
|
|
9. **Verifying**: Confirm and apply state changes
|
|
10. Loop back to **Collecting** for next round
|
|
|
|
## Configuration
|
|
|
|
### Constructor Parameters
|
|
|
|
- `id`: This node's peer ID
|
|
- `initialState`: Starting state (can be nil)
|
|
- `shouldEmitReceiveEventsOnSends`: Whether to emit receive events for own
|
|
messages
|
|
- `minimumProvers`: Minimum number of active provers required
|
|
- `traceLogger`: Optional logger for debugging state transitions
|
|
|
|
### State Timeouts
|
|
|
|
Each state can have a configured timeout that triggers an automatic transition:
|
|
|
|
- **Starting**: 1 second → `EventInitComplete`
|
|
- **Loading**: 10 minutes → `EventSyncComplete`
|
|
- **Collecting**: 1 second → `EventCollectionDone`
|
|
- **LivenessCheck**: 1 second → `EventLivenessTimeout`
|
|
- **Proving**: 120 seconds → `EventPublishTimeout`
|
|
- **Publishing**: 1 second → `EventPublishTimeout`
|
|
- **Voting**: 10 seconds → `EventVotingTimeout`
|
|
- **Finalizing**: 1 second → `EventAggregationDone`
|
|
- **Verifying**: 1 second → `EventVerificationDone`
|
|
- **Stopping**: 30 seconds → `EventCleanupComplete`
|
|
|
|
## Thread Safety
|
|
|
|
The state machine is thread-safe. All public methods properly handle concurrent
|
|
access through mutex locks. State behaviors run in separate goroutines with
|
|
proper cancellation support.
|
|
|
|
## Error Handling
|
|
|
|
- Provider errors are logged but don't crash the state machine
|
|
- The state machine continues operating and may retry operations
|
|
- Critical errors during state transitions are returned to callers
|
|
- Use the `TraceLogger` interface for debugging
|
|
|
|
## Best Practices
|
|
|
|
1. **Message Isolation**: When implementing providers, always deep-copy data
|
|
before sending to prevent shared state between state machine and other
|
|
handlers
|
|
2. **Nil Handling**: Provider implementations should handle nil prior states
|
|
gracefully
|
|
3. **Context Usage**: Respect context cancellation in long-running operations
|
|
4. **Quorum Size**: Set appropriate quorum size based on your network (typically
|
|
2f+1 for f failures)
|
|
5. **Timeout Configuration**: Adjust timeouts based on network conditions and
|
|
proof generation time
|
|
|
|
## Example
|
|
|
|
See `example/generic_consensus_example.go` for a complete working example
|
|
demonstrating:
|
|
- Mock provider implementations
|
|
- Multi-node consensus network
|
|
- Byzantine node behavior
|
|
- Message passing between nodes
|
|
- State transition monitoring
|
|
|
|
## Testing
|
|
|
|
The package includes comprehensive tests in `state_machine_test.go` covering:
|
|
- State transitions
|
|
- Event handling
|
|
- Concurrent operations
|
|
- Byzantine scenarios
|
|
- Timeout behavior
|