Skip to content

harphub/harpPoint

Repository files navigation

harpPoint

harpPoint provides functionality for the verification of meteorological data at geographic points. Typically this would be the verification of forecasts interpolated to the locations of weather stations. Functions are provided for computing verification scores for both deterministic and ensemble forecasts. In addition, confidence intervals for scores, or the differences between scores for different forecast models can be computed using bootstrapping.

Installation

You can install harpPoint from GitHub with:

# install.packages("remotes")
remotes::install_github("harphub/harpPoint")

Verification

harpPoint functions for verification are designed to work with data read in using functions from harpIO. This means harp_df data frames and harp_lists. These data must include a column for observations against which the forecasts should be verified.

There are two main functions for verification:

det_verify() - for deterministic forecasts;

ens_verify() - for ensemble forecasts.

Both of these functions will output a harp_verif list. This is a list of data frames with scores separated out into summary scores and threshold scores. Threshold scores are computed when thresholds are provided to the functions and are computed for probabilities of threshold exceedance.

Deterministic scores

det_verify() computes scores with the following column names:

Summary scores

  • bias - The mean difference between forecasts and observations
  • rmse - The root mean squared error
  • mae - The mean of the absolute error
  • stde - The standard deviation of the error
  • hexbin - A heat map of paired hexagonal bins of forecasts and observations

Threshold scores

  • cont_tab - A contingency table of forecast hits, misses, false alarms and correct rejections
  • threat_score - The ratio of hits to the sum of hits, misses and false alarms
  • hit_rate - The ratio of hits to the sum of hits and misses
  • miss_rate - The ratio of misses to the sum of hits and misses
  • false_alarm_rate - The ratio of false alarms to the sum of false alarms and correct rejections
  • false_alarm_ratio - The ratio of false alarms to the sum of false alarms and hits
  • heidke_skill_score - The fraction of correct forecasts after eliminating those forecasts that would be correct purely due to random chance
  • pierce_skill_score - 1 - miss rate - false alarm rate
  • kuiper_skill_score - How well forecasts separates hits from false alarms
  • percent_correct - The ratio of the sum of hits and correct rejections to the total number of cases
  • frequency_bias - The ratio of the sum of hits and false alarms to the sum of hits and misses
  • equitable_threat_score - How well the forecast measures hits accounting for hits due to pure chance
  • odds_ratio - The ratio of the product of hits and correct rejections to the product of misses and false alarms
  • log_odds_ratio - The sum of the logs of hits and correct rejections minus the sum of the logs of misses and false alarms.
  • odds_ratio_skill_score - The ratio of the product of hits and correct rejections minus the product of misses and false alarms to the product of hits and correct rejections plus the product of misses and false alarms
  • extreme_dependency_score - The ratio of the difference between the logs of observations climatology and hit rate to the sum of the logs of observations climatology and hit rate
  • symmetric_eds - The symmetric extreme dependency score, which is the ratio of the difference between the logs of forecast climatology and hit rate to the sum of the logs of forecast climatology and hit rate
  • extreme_dependency_index - The ratio of the difference between the logs of false alarm rate and hit rate to the sum of the logs of false alarm rate and hit rate
  • symmetric_edi - The symmetric extreme dependency index, which is the ratio of the sum of the difference between the logs of false alarm rate and hit rate and the difference between the logs of the inverse hit rate and false alarm rate to the sum of the logs of hit rate, false alarm rate, inverse hit rate and inverse false alarm rate. Here the inverse is 1 - the value.

Ensemble scores

ens_verify() computes scores with the following column names:

Summary scores

  • mean_bias - The mean difference between the ensemble mean of forecasts and observations
  • rmse - The root mean squared error
  • stde - The standard deviation of the error
  • spread - The square root of the mean variance of the ensemble forecasts
  • hexbin - A heat map of paired hexagonal bins of forecasts and observations
  • rank_histogram - Observation counts ranked by ensemble member bins
  • crps- The cumulative rank probability score - the difference between the cumulative distribution of the ensemble forecasts and the step function of the observations
  • crps_potential - The crps that could be achieved with a perfectly reliable ensemble
  • crps_reliability - Measures the ability of the ensemble to produce a cumulative distribution with desired statisical properties.
  • fair_crps - The crps that would be achieved for either an ensemble with an infinite number of members, or for a number of members provided to the function

Threshold scores

  • brier_score - The mean of the squared error of the ensemble in probability space
  • fair_brier_score - The Brier score that would be achieved for either an ensemble with an infinite number of members, or for a number of members provided to the function
  • brier_skill_score - The Brier score compared to that of a reference probabilistic forecast (usually the observed climatology)
  • brier_score_reliability - A measure of the ensemble’s ability to produce reliable (forecast probability = observed frequency) forecasts
  • brier_score_resolution - A measure of the ensemble’s ability to discriminate between “on the day” uncertainty and climatological uncertainty
  • brier_score_uncertainty - The inherent uncertainty of the events
  • reliability - The frequency of observations for bins of forecast probability
  • roc - The relative operating characteristic of the ensemble - the hit rates and false alarm rates for forecast probability bins
  • roc_area - The area under a roc curve - summarises the ability of the ensemble to discriminate between events and non events
  • economic_value - The relative improvement in economic value of the forecast compared to climatology for a range of cost / loss ratios

Getting gridded data to points

For interpolation of gridded data to points see the Interpolate section of the Transforming model data article on the harpIO website, or the documentation for geo_points.