Create and evaluate synthetic time series datasets effortlessly
Get Started • Tutorials • Augmentations • Generators • Metrics • Datasets • Contributing • Citing
TSGM is an open-source framework for synthetic time series dataset generation and evaluation.
The framework can be used for creating synthetic datasets (see 🔨 Generators ), augmenting time series data (see 🎨 Augmentations ), evaluating synthetic data with respect to consistency, privacy, downstream performance, and more (see 📈 Metrics ), using common time series datasets (TSGM provides easy access to more than 140 datasets, see 💾 Datasets ).
We provide:
- Documentation with a complete overview of the implemented methods,
- Tutorials that describe practical use-cases of the framework.
pip install tsgm
To install tsgm
on Apple M1 and M2 chips:
# Install tensorflow
conda install -c conda-forge tensorflow=2.9.1
# Install tsgm without dependencies
pip install tsgm --no-deps
# Install rest of the dependencies (separately here for clarity)
conda install tensorflow-probability scipy antropy statsmodels dtaidistance networkx optuna prettytable seaborn scikit-learn yfinance tqdm
import tsgm
# ... Define hyperparameters ...
# dataset is a tensor of shape n_samples x seq_len x feature_dim
# Zoo contains several prebuilt architectures: we choose a conditional GAN architecture
architecture = tsgm.models.architectures.zoo["cgan_base_c4_l1"](
seq_len=seq_len, feat_dim=feature_dim,
latent_dim=latent_dim, output_dim=0)
discriminator, generator = architecture.discriminator, architecture.generator
# Initialize GAN object with selected discriminator and generator
gan = tsgm.models.cgan.GAN(
discriminator=discriminator, generator=generator, latent_dim=latent_dim
)
gan.compile(
d_optimizer=keras.optimizers.Adam(learning_rate=0.0003),
g_optimizer=keras.optimizers.Adam(learning_rate=0.0003),
loss_fn=keras.losses.BinaryCrossentropy(from_logits=True),
)
gan.fit(dataset, epochs=N_EPOCHS)
# Generate 100 synthetic samples
result = gan.generate(100)
- Introductory Tutorial Getting started with TSGM
- Tutorial Datasets in TSGM
- Tutorial Time Series Augmentations
- Tutorial Time Series Generation with VAEs
- Tutorial Conditional Time Series Generation with GANs
- Tutorial Evaluation of Synthetic Time Series Data
- Tutorial Model Selection
- Tutorial Multiple GPUs or TPU with TSGM
For more examples, see our tutorials.
TSGM provides a number of time series augmentations.
Augmentation | Class in TSGM | Reference |
---|---|---|
Gaussian Noise / Jittering | tsgm.augmentations.GaussianNoise |
- |
Slice-And-Shuffle | tsgm.augmentations.SliceAndShuffle |
- |
Shuffle Features | tsgm.augmentations.Shuffle |
- |
Magnitude Warping | tsgm.augmentations.MagnitudeWarping |
Data Augmentation of Wearable Sensor Data for Parkinson’s Disease Monitoring using Convolutional Neural Networks |
Window Warping | tsgm.augmentations.WindowWarping |
Data Augmentation for Time Series Classification using Convolutional Neural Networks |
DTW Barycentric Averaging | tsgm.augmentations.DTWBarycentricAveraging |
A global averaging method for dynamic time warping, with applications to clustering. |
TSGM implements several generative models for synthetic time series data.
Method | Link to docs | Type | Notes |
---|---|---|---|
Structural Time Series | sts.STS | Data-driven | Great for modeling time series when prior knowledge is available (e.g., trend or seasonality). |
GAN | GAN | Data-driven | A generic implementation of GAN for time series generation. It can be customized with architectures for generators and discriminators. |
WaveGAN | GAN | Data-driven | WaveGAN is the model for audio synthesis proposed in Adversarial Audio Synthesis. To use WaveGAN, set use_wgan=True when initializing the GAN class and use the zoo["wavegan"] architecture from the model zoo. |
ConditionalGAN | ConditionalGAN | Data-driven | A generic implementation of conditional GAN. It supports scalar conditioning as well as temporal one. |
BetaVAE | BetaVAE | Data-driven | A generic implementation of Beta VAE for TS. The loss function is customized to work well with multi-dimensional time series. |
cBetaVAE | cBetaVAE | Data-driven | Conditional version of BetaVAE. It supports temporal a scalar condiotioning. |
TimeGAN | TimeGAN | Data-driven | TSGM implementation of TimeGAN from paper |
SineConstSimulator | SineConstSimulator | Simulator-based | Simulator-based synthetic signal that switches between constant and periodics functions. |
Lotka Volterra | LotkaVolterraSimulator | Simulator-based | Simulator-based synthetic signal that switches between constant and periodics functions. |
PdM Simulator | PdMSimulator | Simulator-based | Simulator of predictive maintenance with multiple pieces of equipment from paper |
TSGM implements many metrics for synthetic time series evaluation. Check Section 3 from our paper for more detail on the evaluation of synthetic time series.
Metric | Link to docs | Type | Notes |
---|---|---|---|
Distance in the space of summary statistics | tsgm.metrics.DistanceMetric | Distance | Calculates a set of summary statistics in the original and synthetic data, and measures the distance between those. |
Maximum Mean Discrepancy (MMD) | tsgm.metrics.MMDMetric | Distance | This metric calculated MMD between real and synthetic samples |
Discriminative Score | tsgm.metrics.DiscriminativeMetric | Distance | The DiscriminativeMetric measures the discriminative performance of a model in distinguishing between synthetic and real datasets. |
Demographic Parity Score | tsgm.metrics.DemographicParityMetric | Fairness | This metric assesses the difference in the distributions of a target variable among different groups in two datasets. Refer to this paper to learn more. |
Predictive Parity Score | tsgm.metrics.PredictiveParityMetric | Fairness | This metric assesses the discrepancy in the predictive performance of a model among different groups in two datasets. Refer to this paper to learn more. |
Privacy Membership Inference Attack Score | tsgm.metrics.PrivacyMembershipInferenceMetric | Privacy | The metric measures the possibility of membership inference attacks. |
Spectral Entropy | tsgm.metrics.EntropyMetric | Diversity | Calculates the spectral entropy of a dataset or tensor as a sum of individual entropies. |
Shannon Entropy | tsgm.metrics.ShannonEntropyMetric | Diversity | Shannon Entropy calculated over the labels of a dataset. |
Pairwise Distance | tsgm.metrics.PairwiseDistanceMetric | Diversity | Measures pairwise distances in a set of time series. |
Downstream Effectiveness | tsgm.metrics.DownstreamPerformanceMetric | Downstream Effectiveness | The downstream performance metric evaluates the performance of a model on a downstream task. It returns performance gains achieved with the addition of synthetic data. |
Qualitative Evaluation | tsgm.utils.visualization | Qualitative | Various tools for visual assessment of a generated dataset. |
Dataset | API | Description |
---|---|---|
UCR Dataset | tsgm.utils.UCRDataManager |
https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/ |
Mauna Loa | tsgm.utils.get_mauna_loa() |
https://gml.noaa.gov/ccgg/trends/data.html |
EEG & Eye state | tsgm.utils.get_eeg() |
https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State |
Power consumption dataset | tsgm.utils.get_power_consumption() |
https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption |
Stock data | tsgm.utils.get_stock_data(ticker_name) |
Gets historical stock data from YFinance |
COVID-19 over the US | tsgm.utils.get_covid_19() |
Covid-19 distribution over the US |
Energy Data (UCI) | tsgm.utils.get_energy_data() |
https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction |
MNIST as time series | tsgm.utils.get_mnist_data() |
https://en.wikipedia.org/wiki/MNIST_database |
Samples from GPs | tsgm.utils.get_gp_samples_data() |
https://en.wikipedia.org/wiki/Gaussian_process |
Physionet 2012 | tsgm.utils.get_physionet2012() |
https://archive.physionet.org/pn3/challenge/2012/ |
Synchronized Brainwave Dataset | tsgm.utils.get_synchronized_brainwave_dataset() |
https://www.kaggle.com/datasets/berkeley-biosense/synchronized-brainwave-dataset |
TSGM provides API for convenient use of many time-series datasets (currently more than 140 datasets). The comprehensive list of the datasets in the documentation
We appreciate all contributions. To learn more, please check CONTRIBUTING.md.
git clone github.com/AlexanderVNikitin/tsgm
cd tsgm
pip install -e .
Run tests:
python -m pytest
To check static typing:
mypy
We provide two CLIs for convenient synthetic data generation:
tsgm-gd
generates data by a stored sample,tsgm-eval
evaluates the generated time series.
Use tsgm-gd --help
or tsgm-eval --help
for documentation.
If you find this repo useful, please consider citing our paper:
@article{
nikitin2023tsgm,
title={TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series},
author={Nikitin, Alexander and Iannucci, Letizia and Kaski, Samuel},
journal={arXiv preprint arXiv:2305.11567},
year={2023}
}