This tutorial is about sktime - a unified framework for machine learning with time series. sktime features various time series algorithms and modular tools for sktime is a widely used scikit-learn compatible library for learning with time series. sktime is easily extensible by anyone, and interoperable with the pydata/numfocus stack.
This advanced tutorial explains how to write your own sktime estimator, e.g., forecaster, classifier, transformer, by using sktime’s extension templates and testing framework. A custom estimator can live in any local code base, and will be compatible with sktime pipelines, or scikit-learn.
If you are unfamiliar with sktime
, it is recommended to work through the general sktime introduction tutorial first:
🎥 general sktime intro tutorial from PyData Global 2021
📺 youtube video of sktime intro at PyData Global 2021
🎥 Check out our previous tutorial on hierarchical & probabilistic forecasting from PyData Berlin 2022!
Writing sktime compatible estimators is meant to be easy.
This tutorial will explain:
- sktime base class and estimator architecture
- basic software design patterns used in extension
- how to use the extension templates
- how to validate your custom estimator
- testing in third party extensions and packages
Users can write sktime compatible estimators without a full developer setup, or any need to contribute the estimator to the sktime codebase. The custom estimator can be used with any tuning, pipeline, composition, or reduction functionality in sktime, and will be compatible with scikit-learn, too. This philosophy enables interoperability with third projects, proprietary code bases, or custom extension packages to sktime.
How this works technically: sktime ensures that all estimators of a certain type, e.g., forecasters, adhere to the same interface contracts, by using the base class and strategy patterns.
Separate to this user sided contract is the extension contract, which "extenders", users implementing their own estimators, have to satisfy. This is based on the template pattern which keeps boilerplate from the extension contract, and clearly defined "fill in your code" instructions in sktime´s extension templates.
The extension templates are python files with gaps that the extender is meant to fill in with the logic of a new estimator, with clear instructions in comments, and without any boilerplate. Finally, the sktime test suite provides few-line-validation for any custom estimator.
A full developer setup is typically not required to implement a custom estimator compatible with sktime.
🎥 Check out our previous tutorial (probabilistic & hierarchical forecasting) from PyData Berlin 2022!
🎥 Check out our previous tutorial (general intro) from PyData Global 2021!
🎥 Check out our previous tutorial (general intro, legacy version) from PyData Amsterdam 2020!
You have different options how to run the tutorial notebooks:
- Run the notebooks in the cloud on Binder - for this you don't have to install anything!
- Run the notebooks on your machine. Clone this repository, get conda, install the required packages (
sktime
,pytest
,seaborn
,jupyter
) in an environment, and open the notebooks with that environment. For detail instructions, see below. For troubleshooting, see sktime's more detailed installation instructions. - or, use python venv, and/or an editable install of this repo as a package. Instructions below.
If you're interested in contributing to sktime, you can find out more how to get involved here.
Any contributions are welcome, not just code!
To clone the repository locally:
git clone https://github.com/sktime/sktime-workshop-pydata-london-2022.git
- Create a python virtual environment:
conda create -y -n pydata_sktime python=3.9
- Install required packages:
conda install -y -n pydata_sktime pip sktime pytest seaborn jupyter pmdarima
- Activate your environment:
conda activate pydata_sktime
- If using jupyter: make the environment available in jupyter:
python -m ipykernel install --user --name=pydata_sktime
- Create a python virtual environment:
conda create -y -n pydata_sktime python=3.9
- Make sure the environment has pip:
conda install -y -n pydata_sktime pip
- Activate your environment:
conda activate pydata_sktime
- Install the package in development mode:
pip install -e .
- If using jupyter: make the environment available in jupyter:
python -m ipykernel install --user --name=pydata_sktime
- Create a python virtual environment:
python -m venv .venv
- Activate your environment:
source .venv/bin/activate
- Install the requirements:
pip install sktime pytest seaborn jupyter pmdarima
- If using jupyter: make the environment available in jupyter:
python -m ipykernel install --user --name=pydata_sktime
- Create a python virtual environment:
python -m venv .venv
- Activate your environment:
source .venv/bin/activate
- Install the package in development mode:
pip install -e .
- If using jupyter: make the environment available in jupyter:
python -m ipykernel install --user --name=pydata_sktime