Writing Continuous Applications with Structured Streaming in PySpark


Follow to receive video recommendations   a   A

AnacondaCon 2018. Jules S. Damji. We are in the midst of a Big Data Zeitgeist in which data comes at us fast, in myriad forms and formats at intermittent intervals or in a continuous stream, and we need to respond to streaming data immediately. This need has created a notion of writing a streaming application that reacts and interacts with data in real-time. We call this a continuous application. In this talk we will explore the concepts and motivations behind continuous applications and how Structured Streaming Python APIs in Apache Spark 2.x enables writing them. We also will examine the programming model behind Structured Streaming and the APIs that support them. Through a short demo and code examples, Jules will demonstrate how to write an end-to-end Structured Streaming application that reacts and interacts with both real-time and historic data to perform advanced analytics using Spark SQL, DataFrames, and Datasets APIs.

Editors Note:

I am looking for editors/curators to help with branches of the tree. Please send me an email  if you are interested.