- Compare the accuracy of various time series forecasting algorithms such as Prophet, DeepAR, VAR, DeepVAR, and LightGBM
- (Optional) Use
tsfresh
for automated feature engineering of time series data.
- The dataset can be downloaded from this Kaggle competition.
- In addition to the Anaconda libraries, you need to install
altair
,vega_datasets
,category_encoders
,mxnet
,gluonts
,kats
,lightgbm
,hyperopt
andpandarallel
.kats
requires Python 3.7 or higher.
- The M5 Competition aims to forecast daily sales for the next 28 days based on sales over the last 1,941 days for IDs of 30,490 items per Walmart store.
- Data includes (i) time series of daily sales quantity by ID, (ii) sales prices, and (iii) holiday and event information.
- Evaluation is done through Weighted Root Mean Squared Scaled Error. A detailed explanation is given in the M5 Participants Guide and the implementation is at this link.
- For hyperparameter tuning, 0.1% of IDs were randomly selected and used, and 1% were used to measure test set performance.
- Prophet can incorporate forward-looking related time series into the model, so additional features were created with holiday and event information.
- Since a Prophet model has to fit for each ID, I had to use the
apply
function of thepandas dataframe
and instead usedpandarallel
to maximize the parallelization performance. - Prophet hyperparameters were tuned through 3-fold CV using the Bayesian Optimization module built into the
Kats
library. In this case, Tweedie was applied as the loss function. Below is the hyperparameter tuning result.
seasonality_prior_scale | changepoint_prior_scale | changepoint_range | n_changepoints | holidays_prior_scale | seasonality_mode |
---|---|---|---|---|---|
0.01 | 0.046 | 0.93 | 5 | 100.00 | multiplicative |
- In the figures below, the actual sales (black dots), the point predictions and confidence intervals (blue lines and bands), and the red dotted lines representing the test period are shown.
- Since VAR is a multivariate time series model, the more IDs it fits simultaneously, the better the performance, and the memory requirement increases exponentially.
- DeepAR can incorporate metadata and forward-looking related time series into the model, so additional features were created with sales prices, holiday and event information. Dynamic categorical variables were quantified through Feature Hashing.
- As a hyperparameter, it is very important to set the probability distribution of the output, and here it is set as the Negative Binomial distribution.
- In the case of DeepVAR, a multivariate model, what can be set as the probability distribution of the output is limited (i.e. Multivariate Gaussian distribution), which leads to a decrease in performance.
- I used
tsfresh
to convert time series into structured data features, which consumes a lot of computational resources even with minimal settings. - A LightGBM Tweedie regression model was fitted. Hyperparameters were tuned via 3-fold CV using the Bayesian Optimization function of the
hyperopt
library. The following is the hyperparameter tuning result.
boosting | learning_rate | num_iterations | num_leaves | min_data_in_leaf | min_sum_hessian_in_leaf | bagging_fraction | bagging_freq | feature_fraction | extra_trees | lambda_l1 | lambda_l2 | path_smooth | max_bin |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gbdt | 0.01773 | 522 | 11 | 33 | 0.0008 | 0.5297 | 4 | 0.5407 | False | 2.9114 | 0.2127 | 217.3879 | 1023 |
- The sales forecast for day D+1 was used recursively to predict the sales volume for day D+2 through feature engineering, and through this iterative process, 28-day test set performance was measured.
Algorithm | WRMSSE | sMAPE | MAE | MASE | RMSE |
---|---|---|---|---|---|
DeepAR | 0.7513 | 1.4200 | 0.8795 | 0.9269 | 1.1614 |
LightGBM | 1.0701 | 1.4429 | 0.8922 | 0.9394 | 1.1978 |
Prophet | 1.0820 | 1.4174 | 1.1014 | 1.0269 | 1.4410 |
VAR | 1.2876 | 2.3818 | 1.5545 | 1.6871 | 1.9502 |
Naive Method | 1.3430 | 1.5074 | 1.3730 | 1.1077 | 1.7440 |
Mean Method | 1.5984 | 1.4616 | 1.1997 | 1.0708 | 1.5352 |
DeepVAR | 4.6933 | 4.6847 | 1.9201 | 1.3683 | 2.3195 |
As a result, DeepAR was finally selected and submitted its predictions to Kaggle, achieving a WRMSSE value of 0.8112 based on the private leaderboard.
- Taylor SJ, Letham B. 2017. Forecasting at scale. PeerJ Preprints 5:e3190v2
- Prophet: Forecasting at Scale
- Stock, James, H., Mark W. Watson. 2001. Vector Autoregressions. Journal of Economic Perspectives, 15 (4): 101-115.
- David Salinas, Valentin Flunkert, Jan Gasthaus, Tim Januschowski. 2020. DeepAR: Probabilistic forecasting with autoregressive recurrent networks, International Journal of Forecasting, 36 (3): 1181-1191.
- David Salinas, Michael Bohlke-Schneider, Laurent Callot, Roberto Medico, Jan Gasthaus. 2019. High-dimensional multivariate forecasting with low-rank Gaussian Copula Processes. In Advances in Neural Information Processing Systems. 6827–6837.
- Kats - One Stop Shop for Time Series Analysis in Python
- GluonTS - Probabilistic Time Series Modeling