Skip to content

Avaiga/demo-pyspark-penguin-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Demo PySpark Penguin App

Usage

What is Demo PySpark Penguin App

Taipy is a Python library for creating Business Applications. More information on our website.

Demo PySpark Penguin App focuses on the seamless integration of PySpark (Empowering big data processing with Python simplicity and Spark speed) with Taipy, a Python library used for pipeline orchestration and scenario management.

Demo Type

  • Level: Intermediate
  • Topic: GUI/Core

How to run

This demo works with a Python version superior to 3.8. Install the dependencies of the requirements.txt and run the main.py.

Introduction

We'll design a workflow which performs two main tasks: 1- Spark task (spark_process):

  • Load the data;
  • Group the data by "species", "island" and "sex";
  • Find the mean of the other columns ("bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g");
  • Save the data.

2- Python task (filter):

  • Load the output data saved previously by the Spark task;
  • Given a "species", "island" and "sex", return the aggregated values.

Directory Structure

  • src/: Contains the demo source code.
    • data/ ├─ penguin.csv: the data as downloaded from the palmerpenguins git repo.
    • penguin_spark_app.py: Spark application.
    • config.py: Taipy configuration which models our data workflow.
    • main.py: the main script (including our application gui).
  • CODE_OF_CONDUCT.md: Code of conduct for members and contributors of demo-dask-customer-analysis.
  • CONTRIBUTING.md: Instructions to contribute to demo-dask-customer-analysis.
  • INSTALLATION.md: Instructions to install demo-dask-customer-analysis.
  • LICENSE: The Apache 2.0 License.
  • README.md: Current file.

License

Copyright 2022 Avaiga Private Limited

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Installation

Want to install Demo Dask Customer Analysis? Check out our INSTALLATION.md file.

Contributing

Want to help build Demo Dask Customer Analysis? Check out our CONTRIBUTING.md file.

Code of conduct

Want to be part of the Demo Dask Customer Analysis community? Check out our CODE_OF_CONDUCT.md file.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages