Introduction to distributed computing with Spark

This is a companion post I wrote for the talk “Introduction to distributed computing with Spark and Dask” I delivered together with Adrian Pino Alcalde at Kernel Analytics in June 2016. Adrian took care of the Dask part, while I concentrated on Spark. In this post we will provide an introduction to Apache Spark. We […]