Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/mvp #576

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
95483cf
added new param target_site to datacheck to filter metakeys
vinay-ebi Jan 11, 2024
bc32e3a
Update lib/Bio/EnsEMBL/DataCheck/Checks/ControlledMetaKeys.pm
vinay-ebi Jan 12, 2024
4aac929
fix PR changes
vinay-ebi Jan 12, 2024
e92e37b
Update DbDataChecks_conf.pm
marcoooo Feb 6, 2024
956a22d
added new param target_site to datacheck to filter metakeys
vinay-ebi Jan 11, 2024
902c42a
Update lib/Bio/EnsEMBL/DataCheck/Checks/ControlledMetaKeys.pm
vinay-ebi Jan 12, 2024
3cf83e7
fix PR changes
vinay-ebi Jan 12, 2024
2954a72
Update DbDataChecks_conf.pm
marcoooo Feb 6, 2024
540c34e
Merge branch 'feature/mvp' into feature/mvp-rebase-main
marcoooo Feb 6, 2024
603c24b
add target_site param to DbDataChecks_conf
vinay-ebi Feb 7, 2024
087f326
add target_site params to dbsubmission module
vinay-ebi Feb 7, 2024
6310788
add target_site params to dbsubmission module
vinay-ebi Feb 7, 2024
0799ab2
add support for GCF in beta/rapid
JAlvarezJarreta Feb 8, 2024
a4ee179
Merge pull request #572 from JAlvarezJarreta/jalvarez/support_gcf
marcoooo Feb 8, 2024
7f85dbd
Merge branch 'feature/mvp' into feature/mvp-rebase-main
marcoooo Feb 8, 2024
7710e50
accept GCF for accession and add alt_accession meta key
JAlvarezJarreta Feb 8, 2024
2c79241
allow GCFs as well
JAlvarezJarreta Feb 8, 2024
96fc44d
Merge pull request #571 from Ensembl/feature/mvp-rebase-main
marcoooo Feb 8, 2024
cc0f63c
Merge pull request #574 from JAlvarezJarreta/jalvarez/support_gcf
marcoooo Feb 8, 2024
d53d027
Skip checking the meta table for blanks
leannehaggerty Feb 23, 2024
f879608
Merge pull request #575 from Ensembl/update/allow_blanks_core_meta
marcoooo Feb 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions lib/Bio/EnsEMBL/DataCheck/Checks/BlankNulls.pm
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,14 @@ sub tests {

foreach my $nullable (@$nullables) {
my ($table, $column) = @$nullable;

my $desc = "Nullable column $table.$column has no '' or 'NULL' string values";
my $sql = qq/
SELECT COUNT(*) FROM $table
WHERE $column = '' OR $column = 'NULL'
/;
if ($table ne "meta"){
my $desc = "Nullable column $table.$column has no '' or 'NULL' string values";
my $sql = qq/
SELECT COUNT(*) FROM $table
WHERE $column = '' OR $column = 'NULL'
/;
is_rows_zero($self->dba, $sql, $desc);
}
}
}

Expand Down
11 changes: 8 additions & 3 deletions lib/Bio/EnsEMBL/DataCheck/Checks/ControlledMetaKeys.pm
Original file line number Diff line number Diff line change
Expand Up @@ -36,22 +36,27 @@ use constant {

sub tests {
my ($self) = @_;

my $species_id = $self->dba->species_id;
my $group = $self->dba->group;

my $sql = qq/
SELECT meta_key, COUNT(*) FROM meta
WHERE species_id = $species_id OR species_id IS NULL
GROUP BY meta_key
/;

my $helper = $self->dba->dbc->sql_helper;
my %meta_keys = %{ $helper->execute_into_hash(-SQL => $sql) };

#check target site is main / new and select mandatory metakeys
my $filter_metakeys = '';
if (defined $self->target_site){
$filter_metakeys = " AND target_site like '\%".$self->target_site."\%' ";
}

my $prod_sql = qq/
SELECT name, is_optional
FROM meta_key
WHERE FIND_IN_SET('$group', db_type) AND is_current = 1
WHERE FIND_IN_SET('$group', db_type) AND is_current = 1 $filter_metakeys
/;
my $prod_dba = $self->get_dba('multi', 'production');
my $prod_helper = $prod_dba->dbc->sql_helper;
Expand Down
2 changes: 1 addition & 1 deletion lib/Bio/EnsEMBL/DataCheck/Checks/DisplayNameFormat.pm
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ sub tests {
my $mca = $self->dba->get_adaptor("MetaContainer");

# Check that the format of the display name conforms to expectations.
my $format = '[A-Za-z0-9\ ]+ \([A-Za-z0-9\(\)\/\-\_,\#\. ]+\) \- GCA_\d+\.\d+(?:\s\[[\w ]+\])?';
my $format = '[A-Za-z0-9\ ]+ \([A-Za-z0-9\(\)\/\-\_,\#\. ]+\) \- GC[AF]_\d+\.\d+(?:\s\[[\w ]+\])?';

my $desc = "Display name has correct format";
my $display_name = $mca->single_value_by_key('species.display_name');
Expand Down
3 changes: 2 additions & 1 deletion lib/Bio/EnsEMBL/DataCheck/Checks/MetaKeyFormat.pm
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,8 @@ sub tests {
my %formats = (
'annotation.provider_url' => '(https?:\/\/.+|www.*\.ensembl\.org)',
'assembly.provider_url' => '(https?:\/\/.+|www.*\.ensembl\.org)',
'assembly.accession' => 'GCA_\d+\.\d+',
'assembly.accession' => 'GC[AF]_\d+\.\d+',
'assembly.alt_accession' => 'GCA_\d+\.\d+',
'assembly.date' => '\d{4}-\d{2}',
'assembly.default' => '[\w\.\-]+',
'genebuild.id' => '\d+',
Expand Down
2 changes: 1 addition & 1 deletion lib/Bio/EnsEMBL/DataCheck/Checks/SpeciesTaxonomy.pm
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ sub tests {
# scientific name, to disambiguate in the case of multiple strains
# or assemblies of the same species. Since the taxonomy database does
# not always have that information, remove it before comparing.
$sci_name =~ s/ \(GCA_\d+\)//;
$sci_name =~ s/ \(GC[AF]_\d+\)//;
$sci_name =~ s/ (str\.|strain) .*//;

my $desc_1 = 'Species-related meta data exists';
Expand Down
9 changes: 9 additions & 0 deletions lib/Bio/EnsEMBL/DataCheck/DbCheck.pm
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,15 @@ subtype 'Registry', as 'Str', where {

=head1 METHODS

=head2 target_site
Description: Fetch mandatory meta keys based on target site
current values are main/new
=cut
has 'target_site' => (
is => 'ro',
isa => 'Str | Undef',
);

=head2 db_types
Description: Database types for which this datacheck is appropriate.
=cut
Expand Down
1 change: 1 addition & 0 deletions lib/Bio/EnsEMBL/DataCheck/Pipeline/DataCheckSubmission.pm
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ sub write_output {
json_by_species => $self->param('json_by_species'),

submission_job_id => $self->input_job->dbID,
target_site => $self->param('target_site'),
};
$self->dataflow_output_id($params, 1);

Expand Down
Loading