What's the problem with batch oriented systems?

It's the DELAY!

Let's say, for example, that you are polling every hour, waiting 5 minutes after the hour for late events.

Even if you optimise your ETL to run in just 1 minute, you will be re-acting with at least 6 minutes of delay from when an event landed in your source system.

Worse case scenario? 66 minutes of delay...

Would you wait 66 minutes at a restaurant before the chef receive the order and start preparing your meal?

Follow

@ftisiot this isn't a fault of batch processing. What you describe is using the wrong approach for a specific use case.

Latency, throughput and cost all tradeoff. So it depends which is more important for a specific situation.

There are times when you want lower latency analytics, so solve it differently.

There are times when your data is naturally batched.

There are times when crunching data overnight is perfectly fine.

@intrbiz @ftisiot Imagine the restaurant in a fast-food environment: even one minute is too long.

Building this system not near-realtime but as batch system is the wrong approach.

Sign in to participate in the conversation
Mastodon

Time for a cuppa... Earl Grey please!