Apache Spark LogoApache Storm Logo

0. Introduction

This post is meant to serve as a starting point for people using Java or Scala to process large amounts of data, and need a quick introduction to how to do it - either in Spark or in Storm.

It is not meant to be a Spark vs Storm debate, there are plenty of those out there. A quick Google search yields several StackOverflow questions and technical blogs talking endlessly about it.

This is meant to be a starting point for people new to the whole concept of distributed processing of data, and need a headstart. It’s 2015, and my blog post is probably 5 years too late, but it’s never too late to get started!

What I plan to talk about:

  1. Distributed Thinking
    1. When to use it?
    2. Where to start?
    3. How to look at data?
  2. Processing Data Streams
  3. Processing Large Data Chunks

Click here to go to the first part - Part 1: When and Where

Click on any of the links below to go directly to any part:
Part 0: Introduction
Part 1: When and Where
Part 2: How to look at data
Part 3: How to look at data (continued)
Part 4: Processing Data Streams
Part 5: Processing Large Data Chunks