Batch Processing - Ryan Lynch's Hub

# Overview A batch processing system takes a large amount of input data, runs a job to process it, and produces some output data. Jobs often take a while (from a few minutes to several days), so there normally isn’t a user waiting for the job to finish. Instead, batch jobs are often scheduled to run periodically (for example, once a day). The primary performance measure of a batch job is usually [[Throughput]] (the time it takes to crunch through an input dataset of a certain size). # Key Considerations In batch processing, the locality of data is very important. In order to achieve good throughput, you must avoid network calls. This may require storing additional data on the system doing the batch processings. # Implementation Details # Useful Links # Related Topics ## Reference #### Working Notes #### Sources #### Related Topics -