-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add Sample Concept #683
Merged
Merged
✨ Add Sample Concept #683
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
1106a14
✨ Modify biospecimen_group to ingest into sample in kf
chris-s-friedman 0ad1e1f
✨ Allow connecting biospecimen to sample
chris-s-friedman d6126f6
✨ Actually ingest sample
chris-s-friedman d5bbddf
♻️ Make sample its own concept and allow sample or biospecimen
chris-s-friedman a956f02
📝 Document samples and specimens
chris-s-friedman 3206940
📝 Document why biospecimen group is still around
chris-s-friedman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
.. _about_concepts: | ||
|
||
============== | ||
Concept Schema | ||
============== | ||
|
||
A key part of the ingest library is the intermediate model that raw data is | ||
mapped to. The intermediate model is called the "concept schema". The concept | ||
schema is essentially just a list of column names that follow a standard that | ||
denotes the type/concept and the attribute (e.g. `PARTICIPANT.GENDER`). Data | ||
that has been mapped to the concept schema is later transformed into the final | ||
schema that is used to create the tables in the target API. | ||
|
||
|
||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
samples_and_specimens.rst |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
.. _samples_and_specimens: | ||
|
||
======================== | ||
Samples and Biospecimens | ||
======================== | ||
|
||
Although similarly named, samples and biospecimens refer to different concepts: | ||
|
||
* A **sample** represents a physical piece of tissue, blood, or other | ||
biologically distinct material taken from a patient. | ||
* A **biospecimen** is represents a portion or a part of that sample, e.g. | ||
an aliquot of a sample. | ||
|
||
While samples and biospecimens are distinct concepts, they share much in | ||
common. In fact, when the ingest library was first written, its primary target | ||
API, the Kids First Data Service, only had a table for biospecimens. As a | ||
result, the ingest library's architecture provides for a biospecimen to share | ||
*all* the qualities of a sample. In fact, biospecimen is a child class of | ||
sample! | ||
|
||
This architecture allows the ingest library to be used against target APIs | ||
that, like the older versions of the Kids First Data Service, only have a table | ||
for biospecimens. | ||
|
||
A sample has qualities: | ||
|
||
* A sample may have information about itself, such as the type of tissue it | ||
is, the type of tumor it comes from, when the sample was collected from | ||
the participant, its volume, etc. | ||
* A sample may have information about shipping, such as the date it was | ||
shipped and shipment origin | ||
|
||
|
||
As discussed above, a biospecimen is a child class of sample, so biospecimens | ||
may have all of the same qualities of a sample*. In addition: | ||
|
||
* a biospecimen may have information about its concentration | ||
* a biospecimen may have information about its analyte type (e.g. DNA vs | ||
RNA) | ||
* a biospecimen may have information about the consent under which it was | ||
collected. | ||
|
||
Biospecimen is designed as a child class of sample to provide for | ||
backwards-compatibility with older ingest packages that existed before the | ||
sample concept. | ||
|
||
Moving forward, it is advised to use the sample class when | ||
extracting information that is most related to the sample and use biospecimen | ||
only when extracting information that is specific to the biospecimen | ||
(such as concentration, analyte, and consent information). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -40,6 +40,7 @@ described. | |
:maxdepth: 1 | ||
|
||
value_principles | ||
concepts/index.rst | ||
extract_mapping | ||
transform | ||
load | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should add a class docstring that explains
BIOSPECIMEN_GROUP
was a previously existing concept that has been used in ingest packages and that's why some fields (external_id) in the Biospecimen and/or Sample class first extract fromSAMPLE
and then secondarily extract fromBIOSPECIMEN_GROUP