Sarah comments #6

wright13 · 2022-10-17T20:53:44Z

Consider adding an Rmd template for QC checks since we're not doing that in the DRR. Could load some useful QC packages, provide some basic examples, and provide an outline to give folks a good starting point
If we expect a lot of user contributions to this package (I think that would be great!) then we should come up with some contribution guidelines
DCColCheck
- Are we planning to use these column names as a standard?
dp_fuzzLocation
- Consider vectorizing this (or at least validate that coords are length 1)
qc_getParkPolygonIRMA
- Consider vectorizing
qc_ValidateCoordinates
- Consider vectorizing
TECheck
- Remind users that this is federal list, not state
- Remind users that taxonomy changes may cause some species to be missed
- Consider reworking this function so that it behaves like a tidyverse function (TECheck(data, species_col, ParkCode)) and returns a vector of true/false to indicate whether species is T/E.
UTMtoLL
- I think I have a slightly better fxn for this, just needs a tweak to use sf instead of sp

RobLBaker · 2022-11-07T22:49:00Z

Good idea. First I'd want to get a good idea of what sort of checks would be most broadly useable and how people structure their data to be able to implement this well. Is there a standard (or at least common) way data are structured and checks are run?
Great idea. We should chat about those.
Yes, we are hoping people will adopt darwinCore naming conventions, although it's by no means required
4-x: vectorizing - yes, I'll put that on the list!
te_check - put that in the documentation for the function.
Cool! do share.

wright13 · 2022-11-14T15:42:03Z

Lots of QC checks are going to be pretty dataset-specific, but we could start people off by loading the skimr and/or dlookr packages and running some of the basic summaries included in those packages. And maybe include some code snippets for reading data from SQL, Access, and/or AGOL. I think it could also be helpful to come up with several common categories of QC checks (e.g. missing data, outliers, nonsensical values, spatial data) and put those into a sample outline with options to organize by SOP. I don't think this is a template that we can expect to work right out of the box, but hopefully suggesting some tools and structure will lower the barrier to reproducible QC.
Sweet, let's find a time. If we come up with a rough draft of contribution guidelines, we could post it in the data sci CoP for feedback. It's also a good place to get feedback on a QC template.
I will try to get my utm -> lat/long code updated and shared this week. Shouldn't take long, in theory...

Provide feedback