-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
191 lines (139 loc) · 8.67 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
---
output: github_document
---
<!-- README.md is generated from README.Rmd -->
```{r, echo = FALSE}
# https://github.com/tidyverse/ggplot2/blob/main/README.Rmd
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```
# feb2 <img src="man/figures/hex.png" align="right" alt="" width="120" />
## Overview
Every year on February 2, [Groundhog Day](https://en.wikipedia.org/wiki/Groundhog_Day), the famous groundhog [Punxsutawney Phil](https://en.wikipedia.org/wiki/Punxsutawney_Phil)—and a growing cast of creatures (including stuffed animals, sock puppets, and mascots)—emerge from their burrows to predict the weather. Punxsutawney Phil, the prognosticator of prognosticators, has been predicting the weather since 1887. If he sees his shadow, his prediction is for six more weeks of winter. If not, he predicts an early spring.
This package brings together prediction data from [Countdown to Groundhog Day](https://countdowntogroundhogday.com/) and weather data from the [U.S. National Ocean and Atmospheric Administration (NOAA)](https://www.noaa.gov/) to help you evaluate which prognosticators you can trust.
## Installation
```{r, eval = FALSE}
# Install from GitHub
devtools::install_github("ericpgreen/feb2")
```
## Usage
Currently this is a function-less data package designed to facilitate your analyses of Groundhog Day prediction data.
```{r library}
library(feb2)
```
## Datasets
There are three main datasets and several supporting datasets.
```{r img, echo=FALSE, fig.cap="Datasets"}
knitr::include_graphics(path="man/figures/feb2 data.png")
```
### `prognosticators`
Michael Venos, the creator of [Countdown to Groundhog Day](https://countdowntogroundhogday.com/), maintains the internet's most comprehensive database about Groundhog Day predictions. He generously agreed to allow me to incorporate his data into this package. Currently he has data on a diverse collection of `r nrow(prognosticators)` prognosticators.
```{r prognosticators, message=FALSE}
library(tidyverse)
prognosticators %>%
group_by(prognosticator_status) %>%
count()
```
Rodents are by far the most common prognosticators. One lobster is holding it down for the arthropods.
```{r prognosticators-type}
prognosticators %>%
group_by(prognosticator_phylum, prognosticator_class, prognosticator_order) %>%
count()
```
### `predictions`
Michael has collected `r nrow(predictions)` predictions going back to Punxsutawney Phil's first prediction in 1887. Some years the prediction is uncertain or not recorded, accounting for the `NA`s.
```{r predictions}
predictions %>%
group_by(prediction) %>%
count()
```
### `weather_stations_ghcnd`
Each prognosticator is linked to a city. I used the {`tidygeocoder`} package to get coordinates for each city and then searched for weather stations with TMAX (daily max air temperature) data within 100km of each city's coordinates using the `meteo_nearby_stations()` function in the {`rnoaa`} package. This dataset lists every eligible weather station within 100km, but the package only uses weather data from (up to) the 10 closest stations.
```{r map, echo=FALSE, message=FALSE, eval=FALSE}
library(leaflet)
states <- geojsonio::geojson_read("https://rstudio.github.io/leaflet/json/us-states.geojson",
what = "sp")
leaflet(data = states) %>%
setView(-96, 37.8, 4) %>%
addProviderTiles(providers$Stamen.Toner) %>%
addCircleMarkers(data = weather_stations_ghcnd,
lng = ~longitude, lat = ~latitude, radius=0.5)
```
```{r map2, fig.width=8, message=FALSE, echo=FALSE}
library(rnaturalearth)
library(sp)
state_prov <- rnaturalearth::ne_states(c("united states of america", "canada"),
returnclass = "sf") %>%
filter(name != "Hawaii") %>%
filter(name != "Alaska")
ggplot(data = state_prov) +
geom_sf() +
geom_point(data = weather_stations_ghcnd, aes(x = longitude, y = latitude),
size=0.25, alpha = 0.4) +
theme_minimal() +
labs(title = "NOAA weather stations within 100km of prognosticators",
x=NULL,
y=NULL) +
theme(plot.title = element_text(face="bold"),
plot.title.position = "plot")
```
### `class_def1`
The biggest challenge for evaluating Groundhog Day predictions is defining what we mean by "early spring". So far in this package I follow the general approach of earlier analyses by [NOAA](https://www.ncei.noaa.gov/news/groundhog-day-forecasts-and-climate-history) and [538](https://fivethirtyeight.com/features/groundhogs-do-not-make-good-meteorologists/). I define early spring for a prognosticator's location as one month (February OR March) with an average high temperature above the historical average for that month.[^15days] Unlike the previous analyses, however, I use local data for each prognosticator. NOAA used U.S. national temperatures, and 538 looked across nine U.S. regions.[^years] I think it's just silly to expect a real or stuffed groundhog to be able to predict national or regional weather based on localized sunshine. I say let's evaluate their powers of prognostication using local data.[^def1]
[^15days]: 538 uses the 15-year rolling mean, and so do I.
[^years]: NOAA evaluated Phil's predictions from 2012-2021. 538 expanded the scope of the inquiry from 1 to 9 prognosticators, and included more years, 1994-2021.
[^def1]: I refer to this classification definition as `def1`.
Steps to construct the classification:
1. Identify the 10 weather stations with TMAX data closest to each prognosticator's city
2. Use `rnoaa::ghcnd_search()` to obtain daily historical high temperatures for each station
3. Calculate the mean daily high temperature for each prognosticator's city by averaging over the 10 closest weather stations
4. Calculate the mean monthly high temperature for each prognosticator's city by averaging over the daily data
5. Calculate the 15-year rolling mean high monthly temperature for each prognosticator's city
6. Use the `def1` definition to classify each year as "early spring" or "long winter"
```{r Gobbler, message=FALSE, echo=FALSE}
gk <- tribble(
~prognosticator_city, ~id, ~name, ~latitude, ~longitude, ~distance,
"Punxsutawney, PA", "Gobbler's Knob", "Gobbler's Knob", 40.93185285540005, -78.95731896931149,
0)
phil <- weather_stations_ghcnd %>%
filter(prognosticator_city=="Punxsutawney, PA") %>%
arrange(distance) %>%
mutate(num = 1:n()) %>%
bind_rows(gk) %>%
mutate(keep = case_when(
name == "Gobbler's Knob" ~ "Gobbler's Knob",
num <= 10 ~ "Top 10",
TRUE ~ "Others"
))
state_prov %>%
filter(name=="Pennsylvania") %>%
ggplot() +
geom_sf() +
geom_point(data = phil,
aes(x = longitude, y = latitude, color = keep, alpha= keep),
size=3) +
scale_color_manual(values=c("red","#000000", "blue")) +
scale_alpha_manual(values=c(1, 0.5, 0.5)) +
theme_minimal() +
labs(title = "NOAA weather stations within 100km of Punxsutawney, PA",
subtitle = "Weather data for prediction classifications comes from 10 closest stations (blue)",
x=NULL,
y=NULL) +
theme(plot.title = element_text(face="bold"),
plot.title.position = "plot",
legend.position = "none")
```
The `class_def1` dataset does not contain the underlying weather data, but you can find it in `class_def1_data`. You can also use the {`rnoaa`} package and the weather station information in `weather_stations_ghcnd` and `weather_stations_isd` to query the weather data directly if you are interested in creating a new classification definition.
## Updates
I plan to update the prognosticators, predictions, and classifications data after the month of March (classification definition `def1` depends on March weather data).
## Data Use
NOAA weather data are in the public domain (as far as I know). Data on prognosticators and their predictions come with permission from [Countdown to Groundhog Day](https://countdowntogroundhogday.com/). You are welcome to use the data via this package for any purpose, but please do not post the raw data on any other public sites. Instead, give credit to Michael's tremendous effort by pointing back to [Countdown to Groundhog Day](https://countdowntogroundhogday.com/).
## Hex Sticker
The groundhog pixel art is a [DALL-E 2](https://openai.com/dall-e-2/) creation.
## Issues
Please [submit an issue](https://github.com/ericpgreen/feb2/issues) if you encounter any bugs or errors. This package comes with no warranty of any kind. Don't rely on me or these rodents to get it right. Though my family did live in Punxsutawney when I was 4, and I have been to Gobbler's Knob.
```{r me, echo=FALSE, fig.cap="Me visting Gobbler's Knob as a child."}
knitr::include_graphics(path="man/figures/gk.png")
```