Data Stream - Ryan Lynch's Hub

# Overview A data stream is a real-time, continuous flow of data that can be captured, processed, and analyzed as it is generated. It is often used in situations where data enters a system as events via an [[Event Sourcing]] technique. ## General Use Cases for Data Streams #flashcard - Process large amounts of data in real-time - Need to support complex processing scenarios (e.g., event sourcing) - [[Pub-sub]] use cases, when multiple consumers read from the same stream  # Key Considerations A data stream is often separate from the components that are able to process off of a stream. These [[Stream Processing]] systems read off of a stream to process the events. ## Consumer Groups Streams allow for multiple consumer groups, so different consumers can independently read from the same stream. ## Windowing An approach for grouping data in a stream together based on time or count. ## Scalability ### Partitioning Similar to databases, a stream can be partitioned across multiple servers to handle more events. This will require a [[Partitioning]] strategy that ensures related events are stored on the same partition. # Implementation Details # Useful Links ## Products - [[Kafka]] - [[AWS Kinesis]] - [[Confluent]] # Related Topics ## Reference #### Working Notes #### Sources