An open and introductory book for the Python API of Apache Spark. The book "Introduction to pyspark" provides a quick introduction for the pyspark
Python package, which is the Python API of Apache Spark.
Read the book at: https://pedropark99.github.io/Introd-pyspark/.
You can buy a copy of the book through Amazon: https://www.amazon.com/dp/B0CRYMVWDN.
Publication page: https://pedro-faria.netlify.app/publications/book/introd-pyspark/en/.
With pyspark
you are able to use the Python language to write Spark applications and run them on a Spark cluster in a scalable and elegant way. This book focus on teaching the fundamentals of pyspark
, and how to use it for big data analysis.
Some of the main subjects discussed in the book are:
- How an Apache Spark application works?
- What are Spark DataFrames?
- How to transform and model your Spark DataFrame.
- How to import data into, and export data out of Apache Spark.
- How to work with SQL inside pyspark.
- Tools for manipulating specific data types (e.g. string, dates and datetimes).
- How to use window functions.
Pedro Duarte Faria have a bachelor degree in Economics from Federal University of Ouro Preto - Brazil. Currently, he is a Data Platform Engineer at Blip, and an Associate Developer for Apache Spark 3.0 certified by Databricks.
The author have more than 5 years of experience in the data analysis market. He developed data pipelines, reports and analysis for research institutions and some of the largest companies in the brazilian financial sector, such as the BMG Bank, Sodexo and Pan Bank, besides dealing with databases that go beyond the billion rows.
Furthermore, Pedro is specialized on the R programming language, and have given several lectures and courses about it, inside graduate centers (such as PPEA-UFOP), in addition to federal and state organizations (such as FJP-MG). As researcher, he have experience in the field of Science, Technology and Innovation Economics.
Personal Website: https://pedro-faria.netlify.app/
Twitter: @PedroPark9
Mastodon: @[email protected]
Copyright © 2024 Pedro Duarte Faria. This book is licensed by the CC-BY 4.0 Creative Commons Attribution 4.0 International Public License.