This post is meant to serve as a starting point for people using Java or Scala to process large amounts of data, and need a quick introduction to how to do it - either in Spark or in Storm.
This is meant to be a starting point for people new to the whole concept of distributed processing of data, and need a headstart. It’s 2015, and my blog post is probably 5 years too late, but it’s never too late to get started!
What I plan to talk about:
- Distributed Thinking
- When to use it?
- Where to start?
- How to look at data?
- Processing Data Streams
- Processing Large Data Chunks
Click here to go to the first part - Part 1: When and Where
Click on any of the links below to go directly to any part:
Part 0: Introduction
Part 1: When and Where
Part 2: How to look at data
Part 3: How to look at data (continued)
Part 4: Processing Data Streams
Part 5: Processing Large Data Chunks