Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCRUM-3953: load expression experiments afer annotations #1723

Open
wants to merge 8 commits into
base: alpha
Choose a base branch
from

Conversation

abecerra
Copy link
Contributor

@abecerra abecerra commented Nov 12, 2024

No description provided.

Copy link
Member

@markquintontulloch markquintontulloch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're having problems running integration test locally then pushing the code here will result in them being run by Github actions.

@@ -39,21 +46,47 @@ public void execLoad(BulkLoadFileHistory bulkLoadFileHistory) {

bulkLoadFileHistory.setCount(geneExpressionIngestFmsDTO.getData().size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to set up separate counts for annotations and experiments here. See the GFF Transcript executor for how to do this. Otherwise your total record count will be less than the completed count as you are counting both annotations and experiments as "Records".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added counts: annotations, experiments

@Indexed
@Data
@EqualsAndHashCode(onlyExplicitlyIncluded = true, callSuper = false)
@AGRCurationSchemaVersion(min = "1.7.3", max = LinkMLSchemaConstants.LATEST_RELEASE, dependencies = { SubmittedObject.class }, partial = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure the LinkML schema for this class has been updated since v1.7.3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@JsonSubTypes({ @JsonSubTypes.Type(value = GeneExpressionExperiment.class, name = "GeneExpressionExperiment") })
@Indexed
@Data
@EqualsAndHashCode(onlyExplicitlyIncluded = true, callSuper = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to define appropriate database indexes here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the mapped super class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the indexes defined on the child class either

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@@ -0,0 +1,39 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No index or foreign key definitions in this migration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added fkeys and indexes

ObjectResponse<GeneExpressionAnnotation> response = new ObjectResponse<>();
GeneExpressionAnnotation geneExpressionAnnotation;
String uniqueId;
String referenceCurie;

ObjectResponse<Reference> singleReferenceResponse = validateEvidence(geneExpressionFmsDTO);
if (singleReferenceResponse.hasErrors()) {
response.addErrorMessage("singleReference", singleReferenceResponse.errorMessagesString());
throw new ObjectValidationException(geneExpressionFmsDTO, response.errorMessagesString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be throwing the exception here as want to find any other validation errors before throwing exception

Copy link
Contributor Author

@abecerra abecerra Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check if an annotation its already in the DB the uniqueId (containing AGR reference ID) must be used. At this point, if the publication is not found, the only option is to throw the exception.

if (success) {
runCleanup(geneExpressionAnnotationService, bulkLoadFileHistory, dataProvider.name(), annotationIdsBefore, annotationIdsLoaded, "gene expression annotation");
loadExperiments(bulkLoadFileHistory);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably there needs to be some sort of mechanism to clean up experiments too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know a simple way to define a rule to obsolete an experiment (defined by a triple paperId, assayId, geneId)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as any other load I would have thought. Check experiment IDs before load, record IDs of experiments created/updated, compare lists at end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe if an experiment has (eventually) no annotations linked. But maybe they could be kept for historical reasons?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have thought that you would want to deprecate (make obsolete) an experiment if all the annotations relating to that experiment had been deprecated.

@@ -25,7 +25,7 @@
@Indexed
@Data
@EqualsAndHashCode(onlyExplicitlyIncluded = true, callSuper = false)
@AGRCurationSchemaVersion(min = "1.7.3", max = LinkMLSchemaConstants.LATEST_RELEASE, dependencies = { SubmittedObject.class }, partial = true)
@AGRCurationSchemaVersion(min = "2.8.1", max = LinkMLSchemaConstants.LATEST_RELEASE, dependencies = { SubmittedObject.class }, partial = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be min=2.8.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

GeneExpressionAnnotationService geneExpressionAnnotationService;
@Inject GeneExpressionAnnotationService geneExpressionAnnotationService;
@Inject GeneExpressionExperimentService geneExpressionExperimentService;
static final String ANNOTATIONS = "gene expression annotations";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be inclined to shorten these to "Annotations" and "Experiments" to avoid overcrowding in the display of the data loads table

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I see you're also using these for the load descriptions, where the longer name is required. So, I'd suggest the shorter version for the counts and the longer for the load description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

try {
GeneExpressionExperiment experiment = geneExpressionExperimentService.upsert(experimentId, experiments.get(experimentId));
if (experiment != null) {
history.incrementCompleted();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to tell the method which count you're incrementing otherwise it will add to "Records". Same applies to all incrementXXX() methods used here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added experiments

@Entity
@Data
@EqualsAndHashCode(onlyExplicitlyIncluded = true, callSuper = false)
@AGRCurationSchemaVersion(min = "2.8.1", max = LinkMLSchemaConstants.LATEST_RELEASE, dependencies = { ExpressionExperiment.class }, partial = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be min = "2.3.0"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@EqualsAndHashCode(onlyExplicitlyIncluded = true, callSuper = false)
@AGRCurationSchemaVersion(min = "2.8.1", max = LinkMLSchemaConstants.LATEST_RELEASE, dependencies = { ExpressionExperiment.class }, partial = true)
@Table(indexes = {
@Index(name = "on_index", columnList = "id")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id is the primary key so doesn't need an index defining

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

CREATE INDEX geneexpressionexperiment_createdby_index ON geneexpressionexperiment USING btree (createdby_id);
CREATE INDEX geneexpressionexperiment_updatedby_index ON geneexpressionexperiment USING btree (updatedby_id);

CREATE SEQUENCE geneexpressionexperiment_geneexpressionannotation_seq START WITH 1 INCREMENT BY 50 NO MINVALUE NO MAXVALUE CACHE 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should just be called geneexpressionannotation_seq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That name would collide with the sequence for table geneexpressionannotation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, meant geneexpressionexperiment_seq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That name would collide with line 1

geneexpressionexperiment_id bigint NOT NULL,
expressionannotations_id bigint NOT NULL,

CONSTRAINT gen_exp_exp_experiment_fkey FOREIGN KEY (geneexpressionexperiment_id) REFERENCES geneexpressionexperiment (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION,
Copy link
Member

@markquintontulloch markquintontulloch Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name not very descriptive - maybe geexperiment_geannotation_geexperiment_id_fk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typical naming stragey is <table_name>_<field_name>_fk. When that's too long we should try to keep the structure of the name but with suitable abbreviations for each part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

expressionannotations_id bigint NOT NULL,

CONSTRAINT gen_exp_exp_experiment_fkey FOREIGN KEY (geneexpressionexperiment_id) REFERENCES geneexpressionexperiment (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT gen_exp_exp_annotation_fkey FOREIGN KEY (expressionannotations_id) REFERENCES geneexpressionannotation (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geexperiment_geannotation_expressionannotations_id_fk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants