-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nested forecasting - common recipe feature engineering causing issue in model calibration #243
Comments
Hi again, I was digging into to the code. I think the problem arises from The problem is that Hence if any of the series does not share the same time index, processing steps that remove some features (like CORR, ZV) will create discrepancy between data used to train model for such series vs. data used to predict on. |
Ok sorry haven't had time to dig into it. But yeah the logic there was that the recipe used on the first model can be used on others. Might need to rethink that |
Hi @mdancho84,
First and foremost thanks for this amazing suite of
modeltime
packages. I am trying to model many individual time series using nested forecasting as mentiond here: https://business-science.github.io/modeltime/articles/nested-forecasting.htmlI came across a peculiar problem with using a commonly defined recipe with date-based features and having timeseries of differing lengths and not fully overlapping periods.
With recipe like this
The training works well and models are fitted well on all time series. I see from the recipes nested in the output of
modeltime_nested_fit
that not all series where fitted with same features (I guess it is the ZV and CORR removal which decided to remove different features for different series) which is ok and wanted.Unfortunately, models for some series are lacking
.calibration_data
, so I was trying to figure out why. What I have found out is that it works well for all series which end up with same features as the original recipe definition, while it fails producing.calibration_data
for all other series.Simple example. I have 8 series. I build the recipe as stated above with
extract_nested_train_split(nested_data_tbl)
which by default uses.row_id = 1
, i.e. first series. Let say series nr. 7 and 8 were trained with different feature sets (because their training period was slightly different to series 1-6). Then the calculation of.calibration_data
would fail.I can manualy produce
new_data
usingbake
andprep
using the recipe specifically extracted for series 7/8 and the predict(model, new_data = ...) and predictions work fine. e.g.Finally, when I create the initial recipe with
extract_nested_train_split(nested_data_tbl, .row_id = 7)
, then calibration fails for first 6 series and works for series 7.I don't know the implementation details well, but I think the problem is that when prediction data for calibration is being constructed, it bakes the recipe trained on the data supplied when recipe is being instantiated and not on the actual (individual time series) training data. Hence it tries to predict a model trained on a given feature set using new data with different feature set.
Is my understanding correct? Thanks for any feedback. :)
The text was updated successfully, but these errors were encountered: