SCRUM-3953: load expression experiments afer annotations #1723

abecerra · 2024-11-12T15:30:21Z

No description provided.

markquintontulloch

If you're having problems running integration test locally then pushing the code here will result in them being run by Github actions.

markquintontulloch · 2024-11-12T23:31:46Z

src/main/java/org/alliancegenome/curation_api/jobs/executors/GeneExpressionExecutor.java

@@ -39,21 +46,47 @@ public void execLoad(BulkLoadFileHistory bulkLoadFileHistory) {

 			bulkLoadFileHistory.setCount(geneExpressionIngestFmsDTO.getData().size());


I think you need to set up separate counts for annotations and experiments here. See the GFF Transcript executor for how to do this. Otherwise your total record count will be less than the completed count as you are counting both annotations and experiments as "Records".

Added counts: annotations, experiments

markquintontulloch · 2024-11-12T23:34:13Z

src/main/java/org/alliancegenome/curation_api/model/entities/ExpressionExperiment.java

+@Indexed
+@Data
+@EqualsAndHashCode(onlyExplicitlyIncluded = true, callSuper = false)
+@AGRCurationSchemaVersion(min = "1.7.3", max = LinkMLSchemaConstants.LATEST_RELEASE, dependencies = { SubmittedObject.class }, partial = true)


Pretty sure the LinkML schema for this class has been updated since v1.7.3

markquintontulloch · 2024-11-12T23:35:36Z

src/main/java/org/alliancegenome/curation_api/model/entities/ExpressionExperiment.java

+@JsonSubTypes({ @JsonSubTypes.Type(value = GeneExpressionExperiment.class, name = "GeneExpressionExperiment") })
+@Indexed
+@Data
+@EqualsAndHashCode(onlyExplicitlyIncluded = true, callSuper = false)


Need to define appropriate database indexes here

This is the mapped super class

I don't see the indexes defined on the child class either

markquintontulloch · 2024-11-12T23:36:09Z

src/main/resources/db/migration/v0.38.0.7__gene_expression_experiment.sql

@@ -0,0 +1,39 @@
+


No index or foreign key definitions in this migration

Added fkeys and indexes

...org/alliancegenome/curation_api/controllers/crud/GeneExpressionAnnotationCrudController.java

src/main/java/org/alliancegenome/curation_api/jobs/executors/GeneExpressionExecutor.java

markquintontulloch · 2024-11-12T23:48:37Z

...genome/curation_api/services/validation/dto/fms/GeneExpressionAnnotationFmsDTOValidator.java

 		ObjectResponse<GeneExpressionAnnotation> response = new ObjectResponse<>();
 		GeneExpressionAnnotation geneExpressionAnnotation;
+		String uniqueId;
+		String referenceCurie;

 		ObjectResponse<Reference> singleReferenceResponse = validateEvidence(geneExpressionFmsDTO);
 		if (singleReferenceResponse.hasErrors()) {
 			response.addErrorMessage("singleReference", singleReferenceResponse.errorMessagesString());
 			throw new ObjectValidationException(geneExpressionFmsDTO, response.errorMessagesString());


Shouldn't be throwing the exception here as want to find any other validation errors before throwing exception

To check if an annotation its already in the DB the uniqueId (containing AGR reference ID) must be used. At this point, if the publication is not found, the only option is to throw the exception.

src/main/java/org/alliancegenome/curation_api/services/GeneExpressionAnnotationService.java

src/main/java/org/alliancegenome/curation_api/services/GeneExpressionExperimentService.java

markquintontulloch · 2024-11-12T23:58:39Z

src/main/java/org/alliancegenome/curation_api/jobs/executors/GeneExpressionExecutor.java

 			if (success) {
 				runCleanup(geneExpressionAnnotationService, bulkLoadFileHistory, dataProvider.name(), annotationIdsBefore, annotationIdsLoaded, "gene expression annotation");
+				loadExperiments(bulkLoadFileHistory);


Presumably there needs to be some sort of mechanism to clean up experiments too?

I don't know a simple way to define a rule to obsolete an experiment (defined by a triple paperId, assayId, geneId)

Same as any other load I would have thought. Check experiment IDs before load, record IDs of experiments created/updated, compare lists at end.

Or maybe if an experiment has (eventually) no annotations linked. But maybe they could be kept for historical reasons?

I would have thought that you would want to deprecate (make obsolete) an experiment if all the annotations relating to that experiment had been deprecated.

markquintontulloch · 2024-11-14T14:31:57Z

src/main/java/org/alliancegenome/curation_api/model/entities/ExpressionExperiment.java

@@ -25,7 +25,7 @@
 @Indexed
 @Data
 @EqualsAndHashCode(onlyExplicitlyIncluded = true, callSuper = false)
-@AGRCurationSchemaVersion(min = "1.7.3", max = LinkMLSchemaConstants.LATEST_RELEASE, dependencies = { SubmittedObject.class }, partial = true)
+@AGRCurationSchemaVersion(min = "2.8.1", max = LinkMLSchemaConstants.LATEST_RELEASE, dependencies = { SubmittedObject.class }, partial = true)


Should be min=2.8.0

markquintontulloch · 2024-11-14T14:36:13Z

src/main/java/org/alliancegenome/curation_api/jobs/executors/GeneExpressionExecutor.java

-	GeneExpressionAnnotationService geneExpressionAnnotationService;
+	@Inject GeneExpressionAnnotationService geneExpressionAnnotationService;
+	@Inject GeneExpressionExperimentService geneExpressionExperimentService;
+	static final String ANNOTATIONS = "gene expression annotations";


I'd be inclined to shorten these to "Annotations" and "Experiments" to avoid overcrowding in the display of the data loads table

Although I see you're also using these for the load descriptions, where the longer name is required. So, I'd suggest the shorter version for the counts and the longer for the load description.

markquintontulloch · 2024-11-14T14:39:05Z

src/main/java/org/alliancegenome/curation_api/jobs/executors/GeneExpressionExecutor.java

+			try {
+				GeneExpressionExperiment experiment = geneExpressionExperimentService.upsert(experimentId, experiments.get(experimentId));
+				if (experiment != null) {
+					history.incrementCompleted();


You have to tell the method which count you're incrementing otherwise it will add to "Records". Same applies to all incrementXXX() methods used here.

Added experiments

markquintontulloch · 2024-11-14T14:45:12Z

src/main/java/org/alliancegenome/curation_api/model/entities/GeneExpressionExperiment.java

+@Entity
+@Data
+@EqualsAndHashCode(onlyExplicitlyIncluded = true, callSuper = false)
+@AGRCurationSchemaVersion(min = "2.8.1", max = LinkMLSchemaConstants.LATEST_RELEASE, dependencies = { ExpressionExperiment.class }, partial = true)


Should be min = "2.3.0"

markquintontulloch · 2024-11-14T14:48:44Z

src/main/java/org/alliancegenome/curation_api/model/entities/GeneExpressionExperiment.java

+@EqualsAndHashCode(onlyExplicitlyIncluded = true, callSuper = false)
+@AGRCurationSchemaVersion(min = "2.8.1", max = LinkMLSchemaConstants.LATEST_RELEASE, dependencies = { ExpressionExperiment.class }, partial = true)
+@Table(indexes = {
+	@Index(name = "on_index", columnList = "id")


id is the primary key so doesn't need an index defining

markquintontulloch · 2024-11-14T17:38:49Z

src/main/resources/db/migration/v0.38.0.8__gene_expression_experiment.sql

+CREATE INDEX geneexpressionexperiment_createdby_index ON geneexpressionexperiment USING btree (createdby_id);
+CREATE INDEX geneexpressionexperiment_updatedby_index ON geneexpressionexperiment USING btree (updatedby_id);
+
+CREATE SEQUENCE geneexpressionexperiment_geneexpressionannotation_seq START WITH 1 INCREMENT BY 50 NO MINVALUE NO MAXVALUE CACHE 1;


Should just be called geneexpressionannotation_seq

That name would collide with the sequence for table geneexpressionannotation

Sorry, meant geneexpressionexperiment_seq

That name would collide with line 1

markquintontulloch · 2024-11-14T17:39:13Z

src/main/resources/db/migration/v0.38.0.8__gene_expression_experiment.sql

+    geneexpressionexperiment_id bigint NOT NULL,
+    expressionannotations_id bigint NOT NULL,
+
+	CONSTRAINT gen_exp_exp_experiment_fkey FOREIGN KEY (geneexpressionexperiment_id) REFERENCES geneexpressionexperiment (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION,


Name not very descriptive - maybe geexperiment_geannotation_geexperiment_id_fk

Typical naming stragey is <table_name>_<field_name>_fk. When that's too long we should try to keep the structure of the name but with suitable abbreviations for each part.

markquintontulloch · 2024-11-14T17:42:30Z

src/main/resources/db/migration/v0.38.0.8__gene_expression_experiment.sql

+    expressionannotations_id bigint NOT NULL,
+
+	CONSTRAINT gen_exp_exp_experiment_fkey FOREIGN KEY (geneexpressionexperiment_id) REFERENCES geneexpressionexperiment (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION,
+    CONSTRAINT gen_exp_exp_annotation_fkey FOREIGN KEY (expressionannotations_id)  REFERENCES geneexpressionannotation (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION


geexperiment_geannotation_expressionannotations_id_fk

markquintontulloch requested changes Nov 13, 2024

View reviewed changes

abecerra added 3 commits November 14, 2024 09:17

SCRUM-3953: load expression experiments afer annotations

cff10f7

SCRUM-3953: migration number

1bcf5ac

SCRUM-3953: fix migration and remove logging

812385b

abecerra force-pushed the SCRUM-3953 branch from 6ff9944 to 812385b Compare November 14, 2024 14:06

SCRUM-3953: linkml version

1b78723

markquintontulloch reviewed Nov 14, 2024

View reviewed changes

SCRUM-3953: change list to set

e0c8981

markquintontulloch reviewed Nov 14, 2024

View reviewed changes

abecerra added 3 commits November 15, 2024 11:53

Merge branch 'alpha' into SCRUM-3953

d7f1856

SCRUM-3953: add @Index annotations

d749d99

SCRUM-3953: rename constraints

12184f2

		@@ -39,21 +46,47 @@ public void execLoad(BulkLoadFileHistory bulkLoadFileHistory) {

		bulkLoadFileHistory.setCount(geneExpressionIngestFmsDTO.getData().size());

SCRUM-3953: load expression experiments afer annotations #1723

Are you sure you want to change the base?

SCRUM-3953: load expression experiments afer annotations #1723

Conversation

abecerra commented Nov 12, 2024 • edited Loading

markquintontulloch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abecerra Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markquintontulloch Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abecerra commented Nov 12, 2024 •

edited

Loading

abecerra Nov 15, 2024 •

edited

Loading

markquintontulloch Nov 14, 2024 •

edited

Loading