# Overview
HyperLogLog is a... #flashcard
probabilistic data structure designed for counting the number of unique elements in a dataset (i.e., estimating [[cardinality]]).
This is done through the following process:
1. Each incoming data element is hashed to get a uniform binary string
2. Use the first `b` bits of the hash has a 'bucket index' and the remaining bits for the counting process
1. Using 2^b buckets helps improve the estimated counts by reducing variance
3. For each bucket, maintain a register that stores the maxiumum number of leading zeros (plus one) seen in the bucket
4. When a new element arrives, we only update if we see more leading zeros than what's currently stored in the register
5. Calculate the harmonic mean of the registers and apply some corrections for large and small ranges to determine an estimate
<!--ID: 1751507777615-->
![[2025-04-28_HyperLogLog (HLL)-1.png]]
# Key Considerations
# Pros
# Cons
- Does not provide exact counts
# Use Cases
A HLL works well when you need: #flashcard
- A count of unique items in a very large dataset
- Have limited memory available
- Can tolerate small estimation errors
<!--ID: 1751507777618-->
# Related Topics