Apache Spark has been a great driver of not only Scala adoption, but introducing a new generation of developers to functional programming concepts. As Spark places more emphasis on its newer DataFrame & Dataset APIs, it’s important to ask ourselves how we can benefit from this while still keeping our fun functional roots. We will explore the cases where the Dataset APIs empower us to do cool things we couldn’t before, what the different approaches to serialization mean, and how to figure out when the shiny new API is actually just trying to steal your lunch money (aka CPU cycles).
Holden is a transgender Canadian open source developer advocate @ Google with a focus on Apache Spark, BEAM, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that's a bit more out of date. She is a commiter on and PMC on Apache Spark and committer on SystemML & Mahout projects. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal.