# Overview
Faults are a guarantee to occur within a system. Especially a distributed system that relies on an unreliable network. Fault tolerance is a system's ability to continue to operate when facing these faults. This is done by improving the resiliency of the system through good design by using [[timeouts]] and retries / backoffs during [[Exception Handling]].
# Key Considerations
## Levels of Fault Tolerance
- **[[Byzantine fault-tolerant]]** - system continues to operate correctly even if some of the nodes are malfunctioning and not obeying the protocol, or if malicious attackers are interfering with the network.
# Implementation Details
## Fault Tolerance in [[Stream Processing]]
- [[Microbatching]]
- [[Checkpointing]]
- [[Database Transactions]]
- [[idempotency]]
# Useful Links
# Related Topics
## Reference
#### Working Notes
#### Sources