diff --git a/config/gdi-starter-kit/config/load_data.sh b/config/gdi-starter-kit/config/load_data.sh new file mode 100644 index 0000000..23b63b5 --- /dev/null +++ b/config/gdi-starter-kit/config/load_data.sh @@ -0,0 +1,32 @@ +set -e +apk -q --no-cache add curl jq + +DIR="/data/EGAD00001008392/region_vcfs/" +OUTPATH="region_vcfs" +# Keep datasetid and uploader unchanged to match dummy visas from demo oidc +DATASETID="DATASET0001" +UPLOADER="dummy_gdi.eu" +export SDA_CLI=/sda-cli +export SDA_KEY=/shared/c4gh.pub.pem +export MQ_URL=http://rabbitmq:15672 +export SDA_CONFIG=/shared/s3cfg + +mkdir -p $OUTPATH +# make sure no unwanted data is uploaded +rm -rf $OUTPATH/* +cp -r $DIR/* $OUTPATH + +echo "uploading files..." +./sda-admin --user $UPLOADER upload $OUTPATH + +echo "ingesting files..." +./sda-admin --user $UPLOADER ingest $OUTPATH/ +sleep 5 +echo "accession..." +./sda-admin --user $UPLOADER accession 'FILE_%02d' 1 $OUTPATH/ +sleep 2 +echo "linking to dataset..." +./sda-admin --user $UPLOADER dataset $DATASETID $OUTPATH/ +sleep 2 + +rm -rf $OUTPATH diff --git a/config/gdi-starter-kit/config/sda-cli b/config/gdi-starter-kit/config/sda-cli new file mode 100755 index 0000000..d0d5201 Binary files /dev/null and b/config/gdi-starter-kit/config/sda-cli differ diff --git a/config/gdi-starter-kit/docker-compose.override.yml b/config/gdi-starter-kit/docker-compose.override.yml index 79123e9..08a211b 100644 --- a/config/gdi-starter-kit/docker-compose.override.yml +++ b/config/gdi-starter-kit/docker-compose.override.yml @@ -68,6 +68,13 @@ services: - ingest - verify restart: no + volumes: !override + - ./config/load_data.sh:/load_data.sh + - ../../data/:/data/ + - ./starter-kit-storage-and-interfaces/scripts/sda-admin:/sda-admin + - ./config/sda-cli:/sda-cli + - shared:/shared + - cacert:/cacert finalize: restart: no diff --git a/docs/htsget.md b/docs/htsget.md index c4d0de5..46737c4 100644 --- a/docs/htsget.md +++ b/docs/htsget.md @@ -30,55 +30,57 @@ pubkey=$(base64 -w0 keys/c4gh.pub.pem) ``` For the decrypted case, the header `Client-Public-Key` below can be left out. -Now you should be able make the requests to the htsget server. To request the (byte range of) chromosome 11 of the encrypted file `htsnexus_test_NA12878.bam` run: +Now you should be able make the requests to the htsget server. To request the (byte range of) chromosome 19 of the file `Case7_IC.reg.vcf` run: + ```sh - curl -v -H "Client-Public-Key: $pubkey" -H "Authorization: Bearer $token" -H -k "http://localhost:8088/reads/DATASET0001/htsnexus_test_NA12878?referenceName=11" + curl -v -H "Client-Public-Key: $pubkey" -H "Authorization: Bearer $token" -H -k "http://localhost:8088/variants/DATASET0001/region_vcfs/Case7_IC.reg?referenceName=19&start=39955351" ``` + The request will return a ticket of how to download the requested partial file: ```sh -{ + { "htsget": { - "format": "BAM", + "format": "VCF", "urls": [ { "url": "data:;base64,Y3J5cHQ0Z2gBAAAAAgAAAA==" }, { - "url": "http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam.c4gh", + "url": "http://localhost:8443/s3-encrypted/DATASET0001/region_vcfs/Case7_IC.reg.vcf.gz.c4gh", "headers": { "Range": "bytes=16-123", ... } }, { - "url": "data:;base64,ZAAAAAAAAACxHxjMhagEVY+4bVEZYuqYGK5Ph3jrffrMhXpc3wYWenp2ofohEUwSBOuZF3kH6TEiQsjSPGaE1bvdMQ2uUuuHLWicplUneE77G079sTW8rJIJJ1VgZecPi9cTfQ==" + "url": "data:;base64,bAAAAAAAAAAOeAPEBUbfTcA0rho6fMu3D47GsC31V7Vd88JS4Wr2cvHhRpFHyQ20CE1+iIuMog/y8CtkrMLdEGIvzjUtuBj7K+/ZUcZS9FkSLYeMGQLUnqCmNL9DYXGUW7SGvbVSd/YU0V16" }, { - "url": "http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam.c4gh", + "url": "http://localhost:8443/s3-encrypted/DATASET0001/region_vcfs/Case7_IC.reg.vcf.gz.c4gh", "headers": { - "Range": "bytes=124-1049147", + "Range": "bytes=124-65687", ... } }, { - "url": "http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam.c4gh", + "url": "http://localhost:8443/s3-encrypted/DATASET0001/region_vcfs/Case7_IC.reg.vcf.gz.c4gh", "headers": { - "Range": "bytes=2557120-2598042", - "accept": "*/*", + "Range": "bytes=131252-151945", ... - } + } + } ] } } ``` -This repsonse contains byte ranges (eg. `"Range": "bytes=124-1049147"`) as parts of url requests. -This should point you to doing requests to `http://localhost:8443/s3-encrypted` (`sda-download`, from `storage-and-interfaces`) that gets you data for chromosome 11 of the file: +This repsonse contains byte ranges (eg. `"Range": "bytes=124-65687"`) as parts of url requests. +This should point you to doing requests to `http://localhost:8443/s3-encrypted` (`sda-download`, from `storage-and-interfaces`) that gets you data for chromosome 19 of the file: ```sh -curl 'http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam' -H "Authorization: Bearer $token" -H "Client-Public-Key: $pubkey" -H "Range: bytes=16-123" -o p11-00.bam.c4gh -curl 'http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam' -H "Authorization: Bearer $token" -H "Client-Public-Key: $pubkey" -H "Range: bytes=124-1049147" -o p11-01.bam.c4gh -curl 'http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam' -H "Authorization: Bearer $token" -H "Client-Public-Key: $pubkey" -H "Range: bytes=2557120-2598042" -o p11-02.bam.c4gh +curl 'http://localhost:8443/s3-encrypted/DATASET0001/region_vcfs/Case7_IC.reg.vcf.gz.c4gh' -H "Authorization: Bearer $token" -H "Client-Public-Key: $pubkey" -H "Range: bytes=16-123" -o p19-00.vcf.gz.c4gh +curl 'http://localhost:8443/s3-encrypted/DATASET0001/region_vcfs/Case7_IC.reg.vcf.gz.c4gh' -H "Authorization: Bearer $token" -H "Client-Public-Key: $pubkey" -H "Range: bytes=124-bytes=124-65687" -o p19-01.vcf.gz.c4gh +curl 'http://localhost:8443/s3-encrypted/DATASET0001/region_vcfs/Case7_IC.reg.vcf.gz.c4gh' -H "Authorization: Bearer $token" -H "Client-Public-Key: $pubkey" -H "Range: bytes=131252-151945" -o p19-02.vcf.gz.c4gh ``` The response from hstget also lists two data sections: @@ -87,22 +89,14 @@ The response from hstget also lists two data sections: ``` and ```sh -"url": "data:;base64,ZAAAAAAAAACxHxjMhagEVY+4bVEZYuqYGK5Ph3jrffrMhXpc3wYWenp2ofohEUwSBOuZF3kH6TEiQsjSPGaE1bvdMQ2uUuuHLWicplUneE77G079sTW8rJIJJ1VgZecPi9cTfQ== +"url": "data:;base64,bAAAAAAAAAAOeAPEBUbfTcA0rho6fMu3D47GsC31V7Vd88JS4Wr2cvHhRpFHyQ20CE1+iIuMog/y8CtkrMLdEGIvzjUtuBj7K+/ZUcZS9FkSLYeMGQLUnqCmNL9DYXGUW7SGvbVSd/YU0V16" ``` These segments are part of the requested data. Save the data (eg. `Y3J5cHQ0Z2gBAAAAAgAAAA==`) to files, `start.b64` and `mid.b64`, respectively. Then concatenate all segments: ```sh -{ htsnexus_11.bam.c4gh +{ case7.vcf.gz.c4gh ``` Make sure that the file can be decrypted with your private key: ```sh -crypt4gh decrypt -s keys/c4gh.sec.pem -f htsnexus_11.bam.c4gh -``` - -Finally, check that samtools can open the new file: -```sh -samtools view htsnexus_11.bam -``` -or, if you don't have samtools installed -```sh -docker run -it --rm -v $(pwd):/tmp staphb/samtools /bin/bash +crypt4gh decrypt -s keys/c4gh.sec.pem -f case7.vcf.gz.c4gh ``` +TODO: interesting/good way to verify that the vcf file is ok? diff --git a/docs/storage-and-interfaces.md b/docs/storage-and-interfaces.md index 3970a4f..33a599d 100644 --- a/docs/storage-and-interfaces.md +++ b/docs/storage-and-interfaces.md @@ -61,7 +61,7 @@ curl -s -H "Authorization: Bearer $token" "http://localhost:8443/metadata/datase ```shell fileID=$(curl -s -H "Authorization: Bearer $token" "http://localhost:8443/metadata/datasets/$datasetID/files" | jq -r '.[0].fileId') -filename=$(curl -s -H "Authorization: Bearer $token" "http://localhost:8443/metadata/datasets/$datasetID/files" | jq -r '.[0].displayFileName' | cut -d '.' -f 1,2 ) +filename=$(curl -s -H "Authorization: Bearer $token" "http://localhost:8443/metadata/datasets/$datasetID/files" | jq -r '.[0].displayFileName' | cut -d '.' -f 1,2,3 ) curl -s -H "Authorization: Bearer $token" http://localhost:8443/files/$fileID -o "$filename" ``` -Check that the file `$filename` (`htsnexus_test_NA12878.bam`) has been created, and that it contains (binary) data. +Check that the file `$filename` (eg `Case1_IC.reg.vcf`) has been created, and that it contains data.