Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sarah comments #6

Open
wright13 opened this issue Oct 17, 2022 · 2 comments
Open

Sarah comments #6

wright13 opened this issue Oct 17, 2022 · 2 comments

Comments

@wright13
Copy link
Collaborator

  • Consider adding an Rmd template for QC checks since we're not doing that in the DRR. Could load some useful QC packages, provide some basic examples, and provide an outline to give folks a good starting point
  • If we expect a lot of user contributions to this package (I think that would be great!) then we should come up with some contribution guidelines
  • DCColCheck
    • Are we planning to use these column names as a standard?
  • dp_fuzzLocation
    • Consider vectorizing this (or at least validate that coords are length 1)
  • qc_getParkPolygonIRMA
    • Consider vectorizing
  • qc_ValidateCoordinates
    • Consider vectorizing
  • TECheck
    • Remind users that this is federal list, not state
    • Remind users that taxonomy changes may cause some species to be missed
    • Consider reworking this function so that it behaves like a tidyverse function (TECheck(data, species_col, ParkCode)) and returns a vector of true/false to indicate whether species is T/E.
  • UTMtoLL
    • I think I have a slightly better fxn for this, just needs a tweak to use sf instead of sp
@RobLBaker
Copy link
Member

  1. Good idea. First I'd want to get a good idea of what sort of checks would be most broadly useable and how people structure their data to be able to implement this well. Is there a standard (or at least common) way data are structured and checks are run?
  2. Great idea. We should chat about those.
  3. Yes, we are hoping people will adopt darwinCore naming conventions, although it's by no means required
    4-x: vectorizing - yes, I'll put that on the list!
  4. te_check - put that in the documentation for the function.
  5. Cool! do share.

@wright13
Copy link
Collaborator Author

  1. Lots of QC checks are going to be pretty dataset-specific, but we could start people off by loading the skimr and/or dlookr packages and running some of the basic summaries included in those packages. And maybe include some code snippets for reading data from SQL, Access, and/or AGOL. I think it could also be helpful to come up with several common categories of QC checks (e.g. missing data, outliers, nonsensical values, spatial data) and put those into a sample outline with options to organize by SOP. I don't think this is a template that we can expect to work right out of the box, but hopefully suggesting some tools and structure will lower the barrier to reproducible QC.
  2. Sweet, let's find a time. If we come up with a rough draft of contribution guidelines, we could post it in the data sci CoP for feedback. It's also a good place to get feedback on a QC template.
  3. I will try to get my utm -> lat/long code updated and shared this week. Shouldn't take long, in theory...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants