Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load test data in sda #19

Merged
merged 7 commits into from
Jun 18, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions config/gdi-starter-kit/config/load_data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
set -e
apk -q --no-cache add curl jq

DIR="/data/EGAD00001008392/region_vcfs/"
OUTPATH="region_vcfs"
# Keep datasetid and uploader unchanged to match dummy visas from demo oidc
DATASETID="DATASET0001"
UPLOADER="dummy_gdi.eu"
export SDA_CLI=/sda-cli
export SDA_KEY=/shared/c4gh.pub.pem
export MQ_URL=http://rabbitmq:15672
export SDA_CONFIG=/shared/s3cfg

mkdir -p $OUTPATH
# make sure no unwanted data is uploaded
rm -rf $OUTPATH/*
cp -r $DIR/* $OUTPATH

echo "uploading files..."
./sda-admin --user $UPLOADER upload $OUTPATH

echo "ingesting files..."
./sda-admin --user $UPLOADER ingest $OUTPATH/
sleep 5
echo "accession..."
./sda-admin --user $UPLOADER accession 'FILE_%02d' 1 $OUTPATH/
sleep 2
echo "linking to dataset..."
./sda-admin --user $UPLOADER dataset $DATASETID $OUTPATH/
sleep 2

rm -rf $OUTPATH
Binary file added config/gdi-starter-kit/config/sda-cli
Binary file not shown.
9 changes: 8 additions & 1 deletion config/gdi-starter-kit/docker-compose.override.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ services:
healthcheck:
test: curl -fq http://localhost:9000/minio/health/live
restart: no
ports: !override []
# ports: !override []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep this or remap the ports to expose them here instead, ideally with an environment variable to be able to configure the external port. E.g.,

    ports: !override
      - ${PORT:-1234}:1234

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sorry, this was just intended for testing, but happened to be commited. Reset in 01bd15d


s3inbox:
depends_on:
Expand Down Expand Up @@ -68,6 +68,13 @@ services:
- ingest
- verify
restart: no
volumes: !override
- ./config/load_data.sh:/load_data.sh
- ../../data/:/data/
- ./starter-kit-storage-and-interfaces/scripts/sda-admin:/sda-admin
- ./config/sda-cli:/sda-cli
- shared:/shared
- cacert:/cacert

finalize:
restart: no
Expand Down
55 changes: 22 additions & 33 deletions docs/htsget.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
> [!NOTE]
> to be updated when
> - branches are merged, images have final names

## Setting up from GDI Starter kit source
1. Make sure you have the services in [storage-and-interfaces running](/docs/storage-and-interfaces.md). You might have
restart all services.
Expand All @@ -28,55 +24,56 @@ pubkey=$(base64 -w0 keys/c4gh.pub.pem)
# macOS: pubkey=$(base64 -i keys/c4gh.pub.pem)
```

Now you should be able make the requests to the htsget server. To request the (byte range of) chromosome 11 of the file `htsnexus_test_NA12878.bam` run:
Now you should be able make the requests to the htsget server. To request the (byte range of) chromosome 19 of the file `Case7_IC.reg.vcf` run:
```sh
curl -v -H "Client-Public-Key: $pubkey" -H "Authorization: Bearer $token" -H -k "http://localhost:8088/reads/DATASET0001/htsnexus_test_NA12878?referenceName=11"
curl -v -H "Client-Public-Key: $pubkey" -H "Authorization: Bearer $token" -H -k "http://localhost:8088/variants/DATASET0001/region_vcfs/Case7_IC.reg?referenceName=19&start=39955351"
```


The request will return a ticket of how to download the requested partial file:
```sh
{
{
"htsget": {
"format": "BAM",
"format": "VCF",
"urls": [
{
"url": "data:;base64,Y3J5cHQ0Z2gBAAAAAgAAAA=="
},
{
"url": "http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam.c4gh",
"url": "http://localhost:8443/s3-encrypted/DATASET0001/region_vcfs/Case7_IC.reg.vcf.gz.c4gh",
"headers": {
"Range": "bytes=16-123",
...
}
},
{
"url": "data:;base64,ZAAAAAAAAACxHxjMhagEVY+4bVEZYuqYGK5Ph3jrffrMhXpc3wYWenp2ofohEUwSBOuZF3kH6TEiQsjSPGaE1bvdMQ2uUuuHLWicplUneE77G079sTW8rJIJJ1VgZecPi9cTfQ=="
"url": "data:;base64,bAAAAAAAAAAOeAPEBUbfTcA0rho6fMu3D47GsC31V7Vd88JS4Wr2cvHhRpFHyQ20CE1+iIuMog/y8CtkrMLdEGIvzjUtuBj7K+/ZUcZS9FkSLYeMGQLUnqCmNL9DYXGUW7SGvbVSd/YU0V16"
},
{
"url": "http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam.c4gh",
"url": "http://localhost:8443/s3-encrypted/DATASET0001/region_vcfs/Case7_IC.reg.vcf.gz.c4gh",
"headers": {
"Range": "bytes=124-1049147",
"Range": "bytes=124-65687",
...
}
},
{
"url": "http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam.c4gh",
"url": "http://localhost:8443/s3-encrypted/DATASET0001/region_vcfs/Case7_IC.reg.vcf.gz.c4gh",
"headers": {
"Range": "bytes=2557120-2598042",
"accept": "*/*",
"Range": "bytes=131252-151945",
...
}
}
}
]
}
}
```

This repsonse contains byte ranges (eg. `"Range": "bytes=124-1049147"`) as parts of url requests.
This should point you to doing requests to `http://localhost:8443/s3-encrypted` (`sda-download`, from `storage-and-interfaces`) that gets you data for chromosome 11 of the file:
This repsonse contains byte ranges (eg. `"Range": "bytes=124-65687"`) as parts of url requests.
This should point you to doing requests to `http://localhost:8443/s3-encrypted` (`sda-download`, from `storage-and-interfaces`) that gets you data for chromosome 19 of the file:
```sh
curl 'http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam' -H "Authorization: Bearer $token" -H "Client-Public-Key: $pubkey" -H "Range: bytes=16-123" -o p11-00.bam.c4gh
curl 'http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam' -H "Authorization: Bearer $token" -H "Client-Public-Key: $pubkey" -H "Range: bytes=124-1049147" -o p11-01.bam.c4gh
curl 'http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam' -H "Authorization: Bearer $token" -H "Client-Public-Key: $pubkey" -H "Range: bytes=2557120-2598042" -o p11-02.bam.c4gh
curl 'http://localhost:8443/s3-encrypted/DATASET0001/region_vcfs/Case7_IC.reg.vcf.gz.c4gh' -H "Authorization: Bearer $token" -H "Client-Public-Key: $pubkey" -H "Range: bytes=16-123" -o p19-00.vcf.gz.c4gh
curl 'http://localhost:8443/s3-encrypted/DATASET0001/region_vcfs/Case7_IC.reg.vcf.gz.c4gh' -H "Authorization: Bearer $token" -H "Client-Public-Key: $pubkey" -H "Range: bytes=124-bytes=124-65687" -o p19-01.vcf.gz.c4gh
curl 'http://localhost:8443/s3-encrypted/DATASET0001/region_vcfs/Case7_IC.reg.vcf.gz.c4gh' -H "Authorization: Bearer $token" -H "Client-Public-Key: $pubkey" -H "Range: bytes=131252-151945" -o p19-02.vcf.gz.c4gh
```

The response from hstget also lists two data sections:
Expand All @@ -85,22 +82,14 @@ The response from hstget also lists two data sections:
```
and
```sh
"url": "data:;base64,ZAAAAAAAAACxHxjMhagEVY+4bVEZYuqYGK5Ph3jrffrMhXpc3wYWenp2ofohEUwSBOuZF3kH6TEiQsjSPGaE1bvdMQ2uUuuHLWicplUneE77G079sTW8rJIJJ1VgZecPi9cTfQ==
"url": "data:;base64,bAAAAAAAAAAOeAPEBUbfTcA0rho6fMu3D47GsC31V7Vd88JS4Wr2cvHhRpFHyQ20CE1+iIuMog/y8CtkrMLdEGIvzjUtuBj7K+/ZUcZS9FkSLYeMGQLUnqCmNL9DYXGUW7SGvbVSd/YU0V16"
```
These segments are part of the requested data. Save the data (eg. `Y3J5cHQ0Z2gBAAAAAgAAAA==`) to files, `start.b64` and `mid.b64`, respectively. Then concatenate all segments:
```sh
{ <start.b64 base64 --decode && cat p11-00.bam.c4gh && <mid.b64 base64 --decode && cat p11-01.bam.c4gh && cat p11-02.bam.c4gh ;} > htsnexus_11.bam.c4gh
{ <start.b64 base64 --decode && cat p19-00.vcf.gz.c4gh && <mid.b64 base64 --decode && cat p19-01.vcf.gz.c4gh && cat p19-02.vcf.gz.c4gh ;} > case7.vcf.gz.c4gh
```
Make sure that the file can be decrypted with your private key:
```sh
crypt4gh decrypt -s keys/c4gh.sec.pem -f htsnexus_11.bam.c4gh
```

Finally, check that samtools can open the new file:
```sh
samtools view htsnexus_11.bam
```
or, if you don't have samtools installed
```sh
docker run -it --rm -v $(pwd):/tmp staphb/samtools /bin/bash
crypt4gh decrypt -s keys/c4gh.sec.pem -f case7.vcf.gz.c4gh
```
TODO: interesting/good way to verify that the vcf file is ok?
4 changes: 2 additions & 2 deletions docs/storage-and-interfaces.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ curl -s -H "Authorization: Bearer $token" "http://localhost:8443/metadata/datase

```shell
fileID=$(curl -s -H "Authorization: Bearer $token" "http://localhost:8443/metadata/datasets/$datasetID/files" | jq -r '.[0].fileId')
filename=$(curl -s -H "Authorization: Bearer $token" "http://localhost:8443/metadata/datasets/$datasetID/files" | jq -r '.[0].displayFileName' | cut -d '.' -f 1,2 )
filename=$(curl -s -H "Authorization: Bearer $token" "http://localhost:8443/metadata/datasets/$datasetID/files" | jq -r '.[0].displayFileName' | cut -d '.' -f 1,2,3 )
curl -s -H "Authorization: Bearer $token" http://localhost:8443/files/$fileID -o "$filename"
```
Check that the file `$filename` (`htsnexus_test_NA12878.bam`) has been created, and that it contains (binary) data.
Check that the file `$filename` (eg `Case1_IC.reg.vcf`) has been created, and that it contains data.