This project will illustrate using Pycaret 3 to develop a predictive model pipeline and then recreate with scikit-learn. The primary modules of focus will be classification, regression, and clustering. For this particular project, the scope will be on a classification pipeline build using the AutoML features of Pycaret 3. The classification pipeline will be a simple predictive model using the Palmer Penguins dataset.
The intent of the project is to demonstrate a methodology for incorporating an AutoML technology, in this case Pycaret 3, into a training vehicle for "Citizen Data Scientists". The intent of this project is to teach the statistical concepts required in creating a predictive model using Pycaret, then taking the knowledge gained in creating the pipeline and recreating in a Jupyter notebook (.ipynb) using Python, Pandas, Numpy, and Scikit-Learn.
Data used within the examples will include:
- the Palmer Penguins dataset as sourced from Allison Horst's GitHub repository.
The notebook can be viewed and downloaded in Jupyter Notebook Viewer.
Detail my steps..