-
Notifications
You must be signed in to change notification settings - Fork 0
/
00_welcome.qmd
358 lines (216 loc) · 8.66 KB
/
00_welcome.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
---
title: "Introduction & Welcome!"
author: "Dr Anna Krystallli"
subtitle: "Reproducible Research Data and Project Management in R"
institute: R-RSE
materials_url: https://acce-rrresearch.netlify.app/
format:
revealjs:
logo: assets/logo/r-rse-logo2.png
theme: [default, assets/css/styles.scss, assets/css/reveal.scss]
footer: "[{{< fa home >}}](index.qmd)"
from: markdown+emoji
template-partials:
- assets/layouts/title-slide.html
editor: visual
preload-iframes: true
lightbox: true
---
## :wave: Hello
### me: **Dr Anna Krystalli**
::: nonincremental
- **Research Software Engineer**, [*`R-RSE`*](https://www.r-rse.eu/)
- mastodon annakrystalli\@fosstodon.org
- github \@annakrystalli
- email **r.rse.eu\[at\]gmail.com**
- Background in **Marine Macroecology**
- **Core Team:** [ReproHack](https://github.com/reprohack/reprohack-hq)
:::
## Ice Breaker
### Split into break out rooms
### Introduce yourselves
### Q: Why did you decide to join this course?
# Why are we here?
## The paper is the advertisement
> “an article about computational result is advertising, not scholarship. The actual scholarship is the **full software environment, code and data, that produced the result.**”
*John Claerbout paraphrased in [Buckheit and Donoho (1995)](https://statweb.stanford.edu/~wavelab/Wavelab_850/wavelab.pdf)*
. . .
### [The Scientific Paper Is Obsolete](https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676/)
Here's what's next
<small>APR 5, 2018, The Atlantic</small>
<img src="assets/SciencePaperFlames-New.gif" height="100px" width="350px"/>
## Lessons from the Reproducibility/Replicability crisis
- Many issues statistical and a results of broken Academic incentive systems.
- Much can be tackled by transparency and better computational literacy.
<img src="assets/woes.png" width="450px"/>
## [Reproducible Research in Computational Science](http://science.sciencemag.org/content/334/6060/1226)
ROGER D. PENG, SCIENCE 02 DEC 2011 : 1226-1227
> Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible.
![](assets/repro-spectrum.jpg){width="550"}
## Reinventing discovery by open sourcing science
*Nielsen, Michael. Reinventing Discovery: The New Era of Networked Science. Princeton University Press, 2012. JSTOR, www.jstor.org/stable/j.ctt7s4vx.*
::: columns
::: {.column width="50%"}
- Sharing resources
- Collective intelligence
- Mass collaboration
:::
::: {.column width="50%"}
<img src="assets/reinventing-innovation.png" height="300px"/>
:::
:::
## The internet was built for open science
### Key to next generation networked science
```{r, echo=FALSE, out.width="70%"}
knitr:::include_graphics("assets/www.jpg")
```
## **The grand vision**
### Hans Rosling on open data (and data science) back in 2006
```{r}
knitr::include_url("https://goo.gl/ry6AiG")
```
::: fragment
> So how far have we come?
:::
## [gapminder.org](https://www.gapminder.org/): today
#### Fighting global misconceptions with data
[![](assets/gapminder-org.png)](https://www.gapminder.org/)
## gapmider at our fingertips
```{r, fig.show = "animate", message=FALSE, warning=FALSE}
#| echo: true
library(ggplot2)
p <- ggplot(gapminder::gapminder,
aes(gdpPercap, lifeExp, size = pop,
color = continent, frame = year)) +
geom_point() + scale_x_log10() + theme_bw()
```
```{r, message=FALSE, warning=FALSE, fig.height=4}
#| echo: true
plotly::ggplotly(p)
```
# How do we get there?
## **Research meta-responsibilities**
We need better digital curation of the workhorses of modern science: **code** & **data**
> **aim to create secure materials that are [FAIR](https://www.nature.com/articles/sdata201618)** *findable, accessible, interoperable, reusable*
![](assets/FAIRPrinciples.jpg){fig-align="center" width="700"}
## **Research meta-responsibilities**
::: columns
::: {.column width="50%"}
- Think about traceability and provenance.
- Follow community conventions.
- Prepare it to share it.
:::
::: {.column width="50%"}
### We all need to do our bit!
```{r, echo=FALSE}
knitr::include_graphics("assets/CultureShift.jpg")
```
:::
:::
## **Drivers of better digital management**
- **Funders**: value for money, impact, reputation
- **Publishers**: many now require code and data.
- Specialist journals have emerged for:
- **software**: [Journal of Open Source Software](http://joss.theoj.org/), [MEE](https://besjournals.onlinelibrary.wiley.com/journal/2041210x)
- **data**: [Scientific Data](https://www.nature.com/sdata/))
- **PIs, Supervisors and immediate research group**
- Your **wider scientific community**
- The **public**
## **Yourselves!**
**Be your own best friend:**
![](https://media.giphy.com/media/9Q249Qsl5cfLi/giphy.gif)
## **Ultimately it's about getting a handle on our research materials**
> "Agree on a community convention...then follow it""
![](assets/img/beer_messy_tidy.png)
## The concept of a Research Compendium
::: columns
::: {.column width="50%"}
> “ ...We introduce the **concept of a compendium** as both a **container** for the different **elements that make up the document and its computations** (i.e. text, code, data, ...), and as a **means for distributing, managing and updating the collection**."
[*Gentleman and Temple Lang, 2004*](https://biostats.bepress.com/bioconductor/paper2/)
:::
::: {.column width="50%"}
```{r, echo=FALSE}
knitr::include_graphics("assets/ResearchCompendium.jpg")
```
:::
:::
## Why Research Compendia?
[![Kartik Ram: rstudio::conf 2019 talk](assets/reproducible-data-analysis-02.png)](https://github.com/karthik/rstudio2019)
## Research Compendium Principles
[![Kartik Ram: rstudio::conf 2019 talk](assets/reproducible-data-analysis-04.png)](https://github.com/karthik/rstudio2019)
## R + Rstudio
### Next generation data science powerhouse
::: fragment
#### Backed by a diverse and active community of learners, users and developers
![](assets/logo-big.png){width="150"} ![](https://software-carpentry.org/files/2017/12/satrday-logo.png){width="150"} ![](assets/rladies-logo.png){width="150"} ![](https://forwards.github.io/images/forwards.svg){width="150"} ![](assets/ro-logo.png){width="150"}
:::
## Back to "Why are we here?"
- To show you howto use R + Rstudio to perform reproducible data analyses.
. . .
- To help you make the most of the real workhorses of your work, **YOUR CODE & DATA**!
. . .
- To help you be empowered by modern tools & technologies rather than be overwhelmed by them.
. . .
- To help you lead the culture change rather than be burdened by increased requirements.
. . .
- Ultimately, to **change how science works for better for everyone**!
- We'll do this by introducing you to **useful data and software tools and best practices**.
## Course Outline
::: columns
::: {.column width="30%"}
### Day 1
-
#### Basics
-
#### Project Management
-
#### Data Munging
:::
::: {.column width="30%"}
### Day 2
-
#### Metadata
-
#### Analysing & presenting data
:::
::: {.column width="30%"}
### Day 3
-
#### Version Control
-
#### Packaging Code
-
#### Research Compendia
:::
:::
**We'll take regular breaks and aim to break for lunch between 13:00 - 14:00 for an hour**
# Before we dive in
- We'll exploring best practice in data and workflow management. I've tried to focus on concepts and tools that I wish I knew when I started.
. . .
- We'll explore individual tools and concepts and show how they work nicely together.
. . .
- We'll be coding together and working in Posit Cloud.
. . .
- Feedback: After each day, let me know on the notepad:
- :green_book: : somethind you liked
- :red_circle: : somethind that could be improved
. . .
- Please feel free to ask questions if I use jargon you don't understand or need some clarification. Questions are helpful for everyone! :sparkles:
## Working on Zoom
- Have your mic on mute by default.
. . .
- Please try to have your camera on as much as possible.
. . .
- You can use the chat for questions but better to interrupt me.
. . .
- Please try to help each other!
. . .
- If you get lost, the materials and appendices are your friend!
. . .
- Use status reaction emojis to communicate how it's going.
. . .
## A note about Posit Cloud Projects
- Please also **do not share the link to Course Shared space on Posit Cloud**.
- Posit Cloud projects **will be deleted after a week** of the course ending. Please \[**download any work you want to keep.**\]
# Let's go!
### Get back [{{< fa home >}}](index.qmd)