forked from SCAR/EGABIcourse19
-
Notifications
You must be signed in to change notification settings - Fork 0
/
25_Occurrences.Rmd
346 lines (210 loc) · 21.3 KB
/
25_Occurrences.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
## Metadata for occurrence data
In general, the more information you provide the more knowledge can be gained from your occurrence data.
The first user of this data is future **yourself**. Taking the time to be as detailed in providing and describing your data is time you will gain in the future when you want to reanalyse your data or integrate it in a broader study.
The best way is to save the data in some tabular format, something like a csv-file or tab-delimted txt. It is not recomended to use Excel, because it has the habit of giving you unwanted help, resulting in messed up data...
Here we start with a description of the minimal metadata that you should provide in order to analyse your data, including developing a species distribution modelling.
Each row in your dataset should corresponding to an observation. The names of the columns in the dataset should correspond to DwC terms.
You can find a full description of the Darwin Core terms in the [Darwin Core quick reference guide](https://dwc.tdwg.org/terms/). But below we provide an overview of the most important terms. If you have information that doesn't ft the terms below just check the [Darwin Core quick reference guide](https://dwc.tdwg.org/terms/) or create an issue on the EG-ABI course github or the [TDWG Github](https://github.com/tdwg/dwc-qa).
Your metadata should allow you or anybody else to determine What, Where, When, Who and How.
The Darwin Core Standards has 8 absolutely required terms. **basisOfRecord, occurrenceID, scientificName, scientificNameID, occurrenceStatus, decimalLongitude, decimalLatitude, eventDate**
Besides those we recommend a number of other ones as well. In general it is better to provide as many as you can.
**basisOfRecord**
- *type*
- *associatedReferences*
**occurrenceID**
- *institutionCode*
- *collectionCode*
- *catalogNumber*
**scientificName, scientificNameID**
- *scientificNameAuthorship*
- *IdentificationQualifier*
- *taxonrank*
**occurrenceStatus**
- *organismQuantity*
- *organismQuantityType*
- *sex*
- *lifeStage*
- *behavior*
- *occurrenceRemarks*
- *identifiedBy*
- *dateIdentified*
**decimalLongitude, decimalLatitude**
- *geodeticDatum*
- *coordinateUncertaintyInMeters*
**eventDate**
- *year*
- *month*
- *Day*
- *Time*
- *Timezone*
- *samplingProtocol*
### What
#### basisOfRecord
Additional term
*[type](https://dwc.tdwg.org/terms/#dwc:type)*,
*[associatedReferences](https://dwc.tdwg.org/terms/#dwc:associatedReferences)*
[basisOfRecord](https://dwc.tdwg.org/terms/#dwc:basisOfRecord) describes the nature of record. It has a to follow a specific vocabulary. Some of these might overlap sometimes. That is why it is useful to use *[type](https://dwc.tdwg.org/terms/#dwc:type)* as well.
- **PreservedSpecimen** A specimen that has been preserved. This can be a museum specimen, a plant on an herbarium sheet, fish in a bag in a freezer.
- **FossilSpecimen** A resource describing a fossilized specimen.
- **LivingSpecimen** A resource describing a living specimen. For instance an animal in a zoo.
- **MaterialSample** A resource describing a material sample. For instance a material sample from an organism in the wild and sample put into a collection (blood sample, biopsy).
- **HumanObservation** A resource describing an observation made by one or more people. For instance an organism observed in the wild and written in field notes. Samples that were collected during an expedition but thrown back into the sea.
- **MachineObservation** An output of a machine observation process. Any kind of automated sensor like camera traps.
- **Occurrence** A resource describing an instance of the Occurrence class. NOTE: this value is ambiguous and hence should only be used when the resource type is unknown.
*[type](https://dwc.tdwg.org/terms/#dwc:type)*
The nature or genre of the resource. Must be populated with a value from the [DCMI type vocabulary](http://dublincore.org/documents/2010/10/11/dcmi-type-vocabulary/).
Options include: Image, StillImage, MovingImage, Interactive Resource, PhysicalObject, Sound, Text, PhysicalObject, Collection, Dataset, Event, Service, Software.
*[associatedReferences](https://dwc.tdwg.org/terms/#dwc:associatedReferences)*
A list (concatenated and separated) of identifiers (publication, bibliographic reference, global unique identifier, URI) of literature associated with the occurrence.
Example: If you want to add a record from literature you should use *basisOfRecord:*HumanObservation with *type:*Text in combination with *associatedReferences:*Bibliographic reference
#### [occurrenceID](https://dwc.tdwg.org/terms/#dwc:occurrenceID)
Additional Terms
*[institutionCode](https://dwc.tdwg.org/terms/#dwc:institutionCode)*, *[collectionCode](https://dwc.tdwg.org/terms/#dwc:collectionCode)*, *[catalogNumber](https://dwc.tdwg.org/terms/#dwc:catalogNumber)*
**[occurrenceID](https://dwc.tdwg.org/terms/#dwc:occurrenceID)** is an identifier for the occurrence record and should be globally unique and persistent.
In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the occurrenceID globally unique. Something like the combination of institutionCode, collectionCode, and catalogNumber is a generally used format.
There are a number of other terms that you could use as well like datasetName and datasetID.
**[institutionCode](https://dwc.tdwg.org/terms/#dwc:institutionCode)**
The name (or acronym) in use by the institution having custody of the object(s) or information referred to in the record.
**[institutionID](https://dwc.tdwg.org/terms/#dwc:institutionID)**
An identifier for the institution having custody of the object(s) or information referred to in the record.
For physical specimens, the recommended best practice is to use a code and identifier from a collections registry such as the Global Registry of Biodiversity Repositories [(http://grbio.org/)](http://grbio.org/).
```{r echo = FALSE}
showtbl(tribble(~"Institution code", ~"Institution name",
"AAS", "British Antarctic Survey",
"BRM", "Alfred-Wegener-Institut für Polar- und Meeresforschung",
"MNA", "The Museo Nazionale dell'Antartide (Italian National Antarctic Museum in Genoa)",
"RBINS", "Royal Belgian Institute of Natural Sciences"))
```
**[collectionCode](https://dwc.tdwg.org/terms/#dwc:collectionCode)**
The name, acronym, coden, or initialism identifying the collection or data set from which the record was derived.
**[datasetName](https://dwc.tdwg.org/terms/#dwc:datasetName)**
The name identifying the data set from which the record was derived.
**[datasetID](https://dwc.tdwg.org/terms/#dwc:datasetID)**
An identifier for the set of data. May be a globally unique identifier or an identifier specific to a collection or institution.
**[catalogNumber](https://dwc.tdwg.org/terms/#dwc:catalogNumber)**
An identifier (preferably unique) for the record within the data set or collection.
**[recordNumber](https://dwc.tdwg.org/terms/#dwc:recordNumber)**
An identifier given to the occurrence at the time it was recorded. Often serves as a link between field notes and an occurrence record, such as a specimen collector's number.
```{r echo = FALSE}
showtbl(tribble(~occurrenceID, ~institutionCode, ~collectionCode, ~catalogNumber,
"http://arctos.database.museum/guid/MSB:Mamm:233627", "MSB", "Mamm", "233627",
"PS89_FF_000023", "PS89", "FF", "000023")) %>% scroll_box(width = "100%")
```
It is a good practice to keep your catalogue/record numbers of equal length throughout your dataset.
To keep it persistent just don't change it. If you reference another source, keep the number.
For example if you have samples that were collected during a marine expedition, usually all sampling events get ther own ID. Don't invent a new ID for the event. This will help you to find any information related to that event at a later stage.
#### [scientificName](https://dwc.tdwg.org/terms/#taxon) and [scientificNameID](https://dwc.tdwg.org/terms/#taxon)
Additional Terms
*[Kingdom](https://dwc.tdwg.org/terms/#dwc:Kingdom)*, *[taxonrank](https://dwc.tdwg.org/terms/#dwc:taxonrank)*, *[scientificNameAuthorship](https://dwc.tdwg.org/terms/#dwc:scientificNameAuthorship)*, *[IdentificationQualifier](https://dwc.tdwg.org/terms/#dwc:IdentificationQualifier)*
Additional Additional Terms (See Who)
*[identifiedBy](https://dwc.tdwg.org/terms/#dwc:identifiedBy)*, *[dateIdentified](https://dwc.tdwg.org/terms/#dwc:dateIdentified)*, *[typeStatus](https://dwc.tdwg.org/terms/#dwc:typeStatus)*
[scientificName](https://dwc.tdwg.org/terms/#taxon) In the spirit of keeping things persistent this should always contain the originally recorded scientific name, even if the name is currently a synonym (if the sample has been reidentified this is of course a different case). The scientific name should always be to the lowest possible taxonomic rank of which you are certain. In case of uncertain identifications, and the scientific name contains qualifiers such as cf., ? or aff., then this name should go in *[identificationQualifier](https://dwc.tdwg.org/terms/#identificationQualifier)*.
*[scientificNameAuthorship](https://dwc.tdwg.org/terms/#dwc:scientificNameAuthorship)* OBIS recommends to not include authorship in scientificName, and only use scientificNameAuthorship for that purpose.
[scientificNameID](https://dwc.tdwg.org/terms/#scientificNameID)
Following the OBIS guidelines a WoRMS LSID should be added in scientificNameID. LSIDs are persistent, location-independent, resource identifiers for uniquely naming biologically significant resources. More information on LSIDs can be found at http://www.lsid.info.
*[taxonrank](https://dwc.tdwg.org/terms/#dwc:taxonrank)*, Determines the level of identification. This can be for [kingdom](https://dwc.tdwg.org/terms/#dwc:kingdom), [phylum](https://dwc.tdwg.org/terms/#dwc:phylum), [class](https://dwc.tdwg.org/terms/#dwc:class), [order](https://dwc.tdwg.org/terms/#dwc:order), [family](https://dwc.tdwg.org/terms/#dwc:family), [genus](https://dwc.tdwg.org/terms/#dwc:genus), [subgenus](https://dwc.tdwg.org/terms/#dwc:subgenus), [specificEpithet](https://dwc.tdwg.org/terms/#dwc:specificEpithet), [infraspecificEpithet](https://dwc.tdwg.org/terms/#dwc:infraspecificEpithet). It is good practice to always provide at least the kingdom. In case you can provide a WoRMS LSID the others are not really required. Otherwise it would be good to add them if possible.
[identificationQualifier](https://dwc.tdwg.org/terms/#identificationQualifier)
A brief phrase or a standard term ("cf.", "aff.") to express the determiner's doubts about the identification.
The use of **confer** meaning compare and abbeviated to *cf.* in a scientific name means that the person using the name is saying the animal should be compared to a given species, as it might not be exactly the same species. It's a way of applying a provisional name to a species and is most frequently used when new fish are discovered that look slightly different to the form normally encountered. It indicates that the fish might be a variant of the same species, but could also turn out to be something completely different.
**Species affinis** (commonly abbreviated to: sp. aff., aff., or affin.) indicates that available material or evidence suggests that the proposed species is related to, has an affinity to, but is not identical to, the species with the binomial name that follows.
**Species** abbreviated as sp. commonly occurs when authors are confident that some individuals belong to a particular genus but are not sure to which exact species they belong.
Authors may also use "spp." (**Species pluralis**) as a short way of saying that something applies to many species within a genus, but not to all.
```{r echo = FALSE}
showtbl(tribble(~"", ~scientificName, ~scientificNameAuthorship, ~IdentificationQualifier, ~taxonrank,
1, "[Electrona](http://www.marinespecies.org/aphia.php?p=taxdetails&id=125821)", "Goode & Bean, 1896", "aff. risso", "genus",
2, "[Electrona antarctica](http://www.marinespecies.org/aphia.php?p=taxdetails&id=217697)", "Günther, 1878", "", "species",
3, "[Electrona](http://www.marinespecies.org/aphia.php?p=taxdetails&id=125821)", "Goode & Bean, 1896", "sp.", "genus",
4, "[Electrona](http://www.marinespecies.org/aphia.php?p=taxdetails&id=125821)", "Goode & Bean, 1896", "spp.", "genus"))
```
What do these examples mean?
- Case 1: "I collected something that looks like *Electrona risso* but I'm not sure."
- Case 2: "I'm 100% sure this is *Electrona antarctica*)
- Case 3: "I'm sure about the genus, but it could be any one of multiple species"
- Case 4: "I have this bag with fish; I'm sure about the genus but it could be a variety of species within the genus"
#### [occurrenceStatus](https://dwc.tdwg.org/terms/#dwc:occurrenceStatus)
Additional Terms
*[organismQuantity](https://dwc.tdwg.org/terms/#dwc:organismQuantity)*, *[organismQuantityType](https://dwc.tdwg.org/terms/#dwc:organismQuantityType)*, *[sex](https://dwc.tdwg.org/terms/#dwc:sex)*, *[lifeStage](https://dwc.tdwg.org/terms/#dwc:lifeStage)*, *[behavior](https://dwc.tdwg.org/terms/#dwc:behavior)*, *[occurrenceRemarks](https://dwc.tdwg.org/terms/#dwc:occurrenceRemarks)*
*occurrenceStatus* is a statement about the presence or absence of a taxon at a location. It is an important term, because it allows us to distinguish between presence and absence records. It is an OBIS required term and should be filled in with either "present" or "absent".
*[organismQuantity](https://dwc.tdwg.org/terms/#dwc:organismQuantity)*, *[organismQuantityType](https://dwc.tdwg.org/terms/#dwc:organismQuantityType)*
These two always go together. *organismQuantity* is a number for the quantity of animals and *organismQuantityType* defines the type of quantification system used for the quantity of organisms.
```{r echo = FALSE}
showtbl(tribble(~organismQuantity, ~organismQuantityType,
"27", "individuals",
"12.5", "%biomass",
"r", "BraunBlanquetScale"))
```
There are a few other terms as well that we want to mention. The old term that was used was *individualCount* but that wasn't really versatile. Still in some cases it can be useful to mention it.
Please take note that OBIS recommends all quantitative measurements and sampling facts to be treated in the ExtendedMeasurementOrFact extension and not in the Darwin Core files. OBIS recomends using ExtendedMeasurementOrFact in combination with Darwin Core Event Core, which is a little more advanced.
*sex[decimalLatitude](https://dwc.tdwg.org/terms/#dwc:decimalLatitude)*
The sex of the biological individual(s) represented in the occurrence. For the OBIS-recommended vocabulary for sex see [BODC vocab : S10](http://vocab.nerc.ac.uk/collection/S10/current/)
*lifeStage[decimalLatitude](https://dwc.tdwg.org/terms/#dwc:decimalLatitude)*
The age class or life stage of the biological individual(s) at the time the occurrence was recorded. The OBIS recommended vocabulary for lifestage is [BODC vocab: S11](http://vocab.nerc.ac.uk/collection/S10/current/)
*behavior[decimalLatitude](https://dwc.tdwg.org/terms/#dwc:decimalLatitude)*
The behavior shown by the subject at the time the occurrence was recorded (no OBIS recomended vocab available).
*occurrenceRemarks[decimalLatitude](https://dwc.tdwg.org/terms/#dwc:decimalLatitude)* can hold any comments or notes about the occurrence.
### Where
#### [decimalLatitude](https://dwc.tdwg.org/terms/#dwc:decimalLatitude) and[decimalLongitude](https://dwc.tdwg.org/terms/#dwc:decimalLongitude)
Additional Terms
*[geodeticDatum](https://dwc.tdwg.org/terms/#dwc:geodeticDatum)*, *[coordinateUncertaintyInMeters](https://dwc.tdwg.org/terms/#dwc:coordinateUncertaintyInMeters)*, *[locality](https://dwc.tdwg.org/terms/#dwc:locality)*, *[locationID](https://dwc.tdwg.org/terms/#dwc:locationID)*, *[footprintWKT](https://dwc.tdwg.org/terms/#dwc:footprintWKT)*
To determine the location of a point on a globe you need at least the *decimalLatitude* and *decimalLongitude* and and a definition of the spatial reference system that was used. The spatial reference system defines the model of the earths shape that was used.
The system currently used by GPS is WGS84 (also known as WGS 1984, EPSG:4326). Quite often the geodeticDatum is not provided but it is actually essential.
*decimalLatitude* defines the position (in degrees) relative to the equator and ranges from 90 (North Pole) to -90 (South Pole).
*decimalLongitude* defines the position (in degrees) relative to the Greenwich Meridian and ranges from 180 to -180.
Mak sure not to switch those around. The standard notation has them as latitude followed by longitude. Some people naturally tend to write them as longitude followed by latitude, because this is conceptually similar to a cartesian (x-y) notation. However, especially for samples collected in the area of the Antarctic Peninsula, these switches can be hard to spot.
*[coordinateUncertaintyInMeters](https://dwc.tdwg.org/terms/#dwc:coordinateUncertaintyInMeters)*
The horizontal distance (in meters) from the given *decimalLatitude* and *decimalLongitude* describing the smallest circle containing the whole of the location. Leave the value empty if the uncertainty is unknown, cannot be estimated, or is not applicable (because there are no coordinates). Zero is not a valid value for this term.
This one is actually closely linked to the number of decimals in your latitude and longitude values.
In cases where decimal latitude are calculated they are often expressed with too many digits. If you have more than 6 in biology it doesn't real make sense anymore or it doesn't represent how precise the measurement was in the field.
To explain it, see this XKCD commic or the table below.
<img src="https://imgs.xkcd.com/comics/coordinate_precision.png" alt="coordinate precisionXKCD" title="coordinate precision XKCD" width="400" align="center"/>
```{r echo = FALSE}
showtbl(tribble(~"decimal places", ~"decimal degrees", ~DMS, ~"Object that can be unambiguously recognized at this scale", ~"N/S or E/W at equator", ~"E/W at 67N/S",
0, "1.0", "1° 00' 0\"", "country or large region", "111.32 km", "43.496 km",
1, "0.1", "0° 06' 0\"", "large city or district", "11.132 km", "4.3496 km",
2, "0.01", "0° 00' 36\"", "town or village", "1.1132 km", "434.96 m",
3, "0.001", "0° 00' 3.6\"", "neighborhood, street", "111.32 m", "43.496 m",
4, "0.0001", "0° 00' 0.36\"", "individual street, land parcel", "11.132 m", "4.3496 m",
5, "0.00001", "0° 00' 0.036\"", "individual trees, door entrance", "1.1132 m", "434.96 mm",
6, "0.000001", "0° 00' 0.0036\"", "individual humans", "111.32 mm", "43.496 mm",
7, "0.0000001", "0° 00' 0.00036\"", "practical limit of commercial surveying", "11.132 mm", "4.3496 mm",
8, "0.00000001", "0° 00' 0.000036\"", "specialized surveying (e.g. tectonic plate mapping)", "1.1132 mm", "434.96 µm"))
```
### When
#### [eventDate](https://dwc.tdwg.org/terms/#dwc:eventDate)
Additional Terms
*[year](https://dwc.tdwg.org/terms/#dwc:year)*,
*[month](https://dwc.tdwg.org/terms/#dwc:month)*,
*[Day](https://dwc.tdwg.org/terms/#dwc:Day)*,
*[Time](https://dwc.tdwg.org/terms/#dwc:Time)*,
*[Timezone](https://dwc.tdwg.org/terms/#dwc:Timezone)*
**[eventDate](https://dwc.tdwg.org/terms/#dwc:eventDate)** The date and time when the occurrence/event was recorded. Can be the date-time or interval during which an event occurred. This term uses the ISO 8601 standard. OBIS recommends using the extended ISO 8601 format with hyphens.
<img src="https://imgs.xkcd.com/comics/iso_8601.png" alt="iso 8601 time XKCD" title="iso 8601 time XKCD" width="400" align="center"/>
For time intervals ISO 8601 uses / as a separator.
Date and time are separated by T.
Times can have a time zone indicator at the end, if this is not the case then the time is assumed to be local time.
When a time is UTC, a Z is added. Some examples of ISO 8601 dates are:
1973-02-28T15:25:00
2005-08-31T12:11+12
1993-01-26T04:39+12/1993-01-26T05:48+12
2008-04-25T09:53
1948-09-13
1993-01/02
1993-01
1993
It is an option to split out the *eventDate* into its seperate components. This can be a good safeguard against software like Excel reformatting your dates and times.
*year*,
*month*,
*Day*,
*Time*,
*Timezone*
*verbatimEventDate* You can put the original date format here, not really needed but might be useful, in case where you are digitising old literature records.
### Who
*[identifiedBy](https://dwc.tdwg.org/terms/#dwc:identifiedBy)*, *[dateIdentified](https://dwc.tdwg.org/terms/#dwc:dateIdentified)*
*[identifiedBy](https://dwc.tdwg.org/terms/#dwc:identifiedBy)*
A list (concatenated and separated) of names of people, groups, or organizations who assigned the Taxon to the subject. Recommended best practice is to separate the values in a list with space vertical bar space ( | )
*[dateIdentified](https://dwc.tdwg.org/terms/#dwc:dateIdentified)*
This one is very seldom filled out but it is actually an important one and it should be specific. It should be the person that did the actual identification not the suprvisor.
In case of a reindentification it is often useful to be able to contact the person who did the initial identification.
### How
*[samplingProtocol](https://dwc.tdwg.org/terms/#dwc:samplingProtocol)*
This is a tricky one it is quite relevent but this field often doesn't provide enough space to describe what you did. For describing a specific gear you can use [BODC vocab : L22](https://www.bodc.ac.uk/resources/vocabularies/vocabulary_search/L22/)