Apache Spark is a big data framework, somewhat like Hadoop. You can use it to run queries and algorithms, over humongous amounts of data. And, you don’t need to think about how you make it distributed over a large farm of servers.

Yesterday, Dan Serban presented a part of his workshop at the bi-weekly Softbinator meetup. You should check up his entire work. The setup is easy, you just have to raise up a Docker container. Afterwards, take each Jupyter notebook and go over each line carefully.

After the first 3 notebooks, I’m enthusiastic about what can I do with Spark now. I enjoy the more functional way over Hadoop’s cumbersome map-reduce approach.

Have fun!