What is the difference between batch and stream processing?
The typical answer when someone describes the difference between batch processing and stream processing is that batch data is collected, stored for a period of time, and then processed and used at regular intervals (e.g. payroll, bank statements ) while streaming data is processed and be used closer to the time it is generated (think alerts from sensor data).
While accurate, this answer fails to capture why the difference matters and why companies are moving aggressively towards stream processing architectures.
We experience the world as a constant flow of events. We make decisions by comparing this flow of information to our experiences and memories. We perceive and react to threats or recognize and seize opportunities. And often reacting in a timely manner is rewarding – we avoid the snakebite or get the best seat at the movies. Stream processing more accurately reflects this very human mode of experience.
Businesses ingest as many streams of information as they can handle, looking for patterns in the data that represent threats or opportunities as it flows, and when those patterns emerge, they act. The cost of inaction could be a data breach or a lost revenue opportunity.
Batch processing still works well when you need to process huge amounts of data and results can be delivered at regular intervals. But if recent trends continue, more of those jobs will shift to streaming as companies can no longer accept the hidden cost of the lot and stay competitive.
A good example is insider trading. The cost of detecting someone about to execute an insider trade is now much lower than the cost of trying to unwind that trade later when the batch process picks it up. Even though the batch runs every five minutes, that just means you’ll find them sooner, not stop them. Ultimately, the flow against the lot will show up in the balance sheet and stock price.
The only potential argument against streaming is that it might not handle the amount of data as cost-effectively as batch processing. However, with the advent of systems like Kafka, Flink and their cloud analogues, such cases are becoming rare.