Data preparation using Python & Spark in an Azure Databricks environment

Extracting and manipulating huge amount of data can quickly become gruesome. It is especially true if you want to extract data from websites. In this video, we will show you how Azure Databricks can help you in automating these tasks and manipulate huge amount of data leveraging the power of Spark.

First, we will extract openly available data from the NYC MTA website for turnstiles during year 2018. Next, we will load and manipulate this data in Azure Databricks, showing you how you can navigate through it using different programming languages.