Skip to content

Latest commit

 

History

History
144 lines (115 loc) · 5.87 KB

README.md

File metadata and controls

144 lines (115 loc) · 5.87 KB

unquad

A Python library for uncertainty-quantified anomaly detection.

unquad is a wrapper applicable for most PyOD detectors (see Supported Estimators) for uncertainty-quantified anomaly detection based on one-class classification and the principles of conformal inference.

License start with why

What is Conformal Anomaly Detection?

Conformal anomaly detection (CAD) is based on the model-agnostic and non-parametric framework of conformal prediction (CP). While CP aims to produce statistically valid prediction regions (prediction intervals or prediction sets) for any given point predictor or classifier, CAD aims to control the false discovery rate (FDR) for any given anomaly detector, suitable for one-class classification, without compromising its statistical power.

CAD translates anomaly scores into statistical p-values by comparing anomaly scores observed on test data to a retained set of calibration scores as previously on normal data during model training (see One-Class Classification). The larger the discrepancy between normal scores and observed test scores, the lower the obtained (statistically valid) p-value. The p-values, instead of the usual anomaly estimates, allow for FDR control by statistical procedures like Benjamini-Hochberg.

Getting started

pip install unquad

Usage: CV+

from pyod.models.iforest import IForest

from unquad.estimator.conformal_estimator import ConformalEstimator
from unquad.estimator.split_configuration import SplitConfiguration
from unquad.datasets.loader import DataLoader
from unquad.enums.adjustment import Adjustment
from unquad.enums.dataset import Dataset
from unquad.enums.method import Method
from unquad.evaluation.metrics import false_discovery_rate, statistical_power

dl = DataLoader(dataset=Dataset.THYROID)
x_train, x_test, y_test = dl.get_example_setup()

ce = ConformalEstimator(
    detector=IForest(behaviour="new"),
    method=Method.CV_PLUS,
    split=SplitConfiguration(n_split=10),
    adjustment=Adjustment.BENJAMINI_HOCHBERG,
    alpha=0.2,  # nominal FDR level
    seed=1
)

ce.fit(x_train)  # model fit and calibration
estimates = ce.predict(x_test, raw=False)

print(false_discovery_rate(y=y_test, y_hat=estimates))
print(statistical_power(y=y_test, y_hat=estimates))

Output:

0.174  # empirical FDR
0.826  # empirical Power

Usage: Jackknife+-after-Bootstrap

from pyod.models.iforest import IForest

from unquad.estimator.conformal_estimator import ConformalEstimator
from unquad.estimator.split_configuration import SplitConfiguration
from unquad.datasets.loader import DataLoader
from unquad.enums.adjustment import Adjustment
from unquad.enums.dataset import Dataset
from unquad.enums.method import Method
from unquad.evaluation.metrics import false_discovery_rate, statistical_power

dl = DataLoader(dataset=Dataset.THYROID)
x_train, x_test, y_test = dl.get_example_setup()

ce = ConformalEstimator(
    detector=IForest(behaviour="new"),
    method=Method.JACKKNIFE_PLUS_AFTER_BOOTSTRAP,
    split=SplitConfiguration(n_split=0.95, n_bootstraps=40),
    adjustment=Adjustment.BENJAMINI_HOCHBERG,
    alpha=0.1,  # nominal FDR level
    seed=1,
)

ce.fit(x_train)  # model fit and calibration
estimates = ce.predict(x_test, raw=False)

print(false_discovery_rate(y=y_test, y_hat=estimates))
print(statistical_power(y=y_test, y_hat=estimates))

Output:

0.041 # empirical FDR
0.959 # empirical Power

Supported Estimators

The package currently supports anomaly estimators that are suitable for unsupervised one-class classification. As respective detectors are therefore exclusively fitted on normal (or non-anomalous) data, parameters like threshold are therefore internally set to the smallest possible values.

Models that are currently supported include:

  • Angle-Based Outlier Detection (ABOD)
  • Autoencoder (AE)
  • Cook's Distance (CD)
  • Copula-based Outlier Detector (COPOD)
  • Deep Isolation Forest (DIF)
  • Empirical-Cumulative-distribution-based Outlier Detection (ECOD)
  • Gaussian Mixture Model (GMM)
  • Histogram-based Outlier Detection (HBOS)
  • Isolation-based Anomaly Detection using Nearest-Neighbor Ensembles (INNE)
  • Isolation Forest (IForest)
  • Kernel Density Estimation (KDE)
  • k-Nearest Neighbor (kNN)
  • Kernel Principal Component Analysis (KPCA)
  • Linear Model Deviation-base Outlier Detection (LMDD)
  • Local Outlier Factor (LOF)
  • Local Correlation Integral (LOCI)
  • Lightweight Online Detector of Anomalies (LODA)
  • Locally Selective Combination of Parallel Outlier Ensembles (LSCP)
  • GNN-based Anomaly Detection Method (LUNAR)
  • Median Absolute Deviation (MAD)
  • Minimum Covariance Determinant (MCD)
  • One-Class SVM (OCSVM)
  • Principal Component Analysis (PCA)
  • Quasi-Monte Carlo Discrepancy Outlier Detection (QMCD)
  • Rotation-based Outlier Detection (ROD)
  • Subspace Outlier Detection (SOD)
  • Scalable Unsupervised Outlier Detection (SUOD)

Contact

Bug reporting: https://github.com/OliverHennhoefer/unquad/issues