Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address issues of missing bucket_name in s3fs paths #673

Merged
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,12 +68,13 @@ jobs:
S3_SECRET_KEY: ${{ secrets.s3_secret_key }}
S3_ENDPOINT: https://s3.sbg.cloud.ovh.net/
S3_REGION: sbg
S3_BUCKET_NAME: quetz
QUETZ_S3_BUCKET_NAME: quetz
run: |
# install dev dependencies
pip install -e .[all,dev]
pip install redis rq
pip install pytest-github-actions-annotate-failures

- name: Testing server
shell: bash -l -eo pipefail {0}
env:
Expand All @@ -85,6 +86,8 @@ jobs:
S3_SECRET_KEY: ${{ secrets.s3_secret_key }}
S3_ENDPOINT: https://s3.sbg.cloud.ovh.net/
S3_REGION: sbg
S3_BUCKET_NAME: quetz
QUETZ_S3_BUCKET_NAME: quetz
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you brielfy elaborate what's the purpose of this env var?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I found out with fsspec/s3fs#824, we need to specify the bucket_name in the paths for s3fs, we need to know how the bucket is called that we write to. Previously (at least for aws s3) this information could be encoded in S3_ENDPOINT, however, f3fs doesn't work with that (bucket cannot be encoded in URL). Therefore we need another variable explicitly stating the bucket_name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why do we need S3_BUCKET_NAME and QUETZ_S3_BUCKET_NAME?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already removed QUETZ_S3_BUCKET_NAME again, I just tested the QUETZ_ prefix to make sure that it wouldn't be mapped to S3_BUCKET_NAME in the tests (as the env variable was missing)

run: |
if [ "$TEST_DB_BACKEND" == "postgres" ]; then
export QUETZ_TEST_DATABASE="postgresql://postgres:mysecretpassword@${POSTGRES_HOST}:${POSTGRES_PORT}/postgres"
Expand All @@ -96,6 +99,10 @@ jobs:
fi

export QUETZ_IS_TEST=1

echo "S3_BUCKET_NAME: $S3_BUCKET_NAME"
echo "QUETZ_S3_BUCKET_NAME: $QUETZ_S3_BUCKET_NAME"
RobinHolzingerQC marked this conversation as resolved.
Show resolved Hide resolved

pytest -v ./quetz/tests/ --cov-config=pyproject.toml --cov=. --cov-report=xml

- name: Test the plugins
Expand All @@ -109,6 +116,7 @@ jobs:
S3_SECRET_KEY: ${{ secrets.s3_secret_key }}
S3_ENDPOINT: https://s3.sbg.cloud.ovh.net/
S3_REGION: sbg
S3_BUCKET_NAME: quetz
run: |
if [ "$TEST_DB_BACKEND" == "postgres" ]; then
export QUETZ_TEST_DATABASE="postgresql://postgres:mysecretpassword@${POSTGRES_HOST}:${POSTGRES_PORT}/postgres"
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,6 @@ test_quetz
.env
.envrc
.venv

# OS
.DS_Store
7 changes: 5 additions & 2 deletions docs/source/deploying/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -156,16 +156,18 @@ Quetz can store package in object cloud storage compatible with S3 interface. To
[s3]
access_key = "AKIAIOSFODNN7EXAMPLE"
secret_key = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
url = "https://..."
url = "https://s3.amazonaws.com"
region = ""
bucket_name = ""
bucket_prefix="..."
bucket_suffix="..."


:access key:
:secret key: credentials to S3 account, if you use IAM roles, don't set them or set them to ``""``
:url: set to the S3 endpoint of your provider (for AWS, you can skip it)
:url: set to the S3 endpoint (excluding bucket name) of your provider (for AWS, you can skip it)
:region: region of the S3 instance
:bucket_name: name of the bucket to store the data in
:bucket_prefix:
:bucket_suffix: channel directories on S3 are created with the following semantics: ``{bucket_prefix}{channel_name}{bucket_suffix}``

Expand Down Expand Up @@ -276,5 +278,6 @@ Variable description values
``S3_SECRET_KEY`` secret key to s3 (used in tests) string
``S3_ENDPOINT`` s3 endpoint url string
``S3_REGION`` s3 region string
``S3_BUCKET_NAME`` s3 bucket name string
======================= ====================================== =========================== ===================

2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ dependencies:
- python>=3.7
- pip
- fastapi
- typer
- typer >=0.9,<1.0
- authlib=0.15.5
- psycopg2
- httpx>=0.22.0
Expand Down
4 changes: 3 additions & 1 deletion quetz/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,8 +128,9 @@ class Config:
[
ConfigEntry("access_key", str, default=""),
ConfigEntry("secret_key", str, default=""),
ConfigEntry("url", str, default=""),
ConfigEntry("url", str, default="https://s3.amazonaws.com"),
ConfigEntry("region", str, default=""),
ConfigEntry("bucket_name", str, default=""),
ConfigEntry("bucket_prefix", str, default=""),
ConfigEntry("bucket_suffix", str, default=""),
],
Expand Down Expand Up @@ -450,6 +451,7 @@ def get_package_store(self) -> pkgstores.PackageStore:
'secret': self.s3_secret_key,
'url': self.s3_url,
'region': self.s3_region,
'bucket_name': self.s3_bucket_name,
'bucket_prefix': self.s3_bucket_prefix,
'bucket_suffix': self.s3_bucket_suffix,
}
Expand Down
5 changes: 4 additions & 1 deletion quetz/pkgstores.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,8 +292,11 @@ def __init__(self, config):
client_kwargs = {}
url = config.get('url')
region = config.get("region")
self.bucket_name = config.get("bucket_name")
if url:
client_kwargs['endpoint_url'] = url
if not self.bucket_name:
raise ValueError("bucket_name is required in s3 configuration")
if region:
client_kwargs["region_name"] = region

Expand Down Expand Up @@ -326,7 +329,7 @@ def _get_fs(self):
raise ConfigError(f"{e} - check configured S3 credentials")

def _bucket_map(self, name):
return f"{self.bucket_prefix}{name}{self.bucket_suffix}"
return f"{self.bucket_name}/{self.bucket_prefix}{name}{self.bucket_suffix}"

def create_channel(self, name):
"""Create the bucket if one doesn't already exist
Expand Down
5 changes: 4 additions & 1 deletion quetz/tests/test_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@
from unittest import mock
from unittest.mock import MagicMock

import pytest
import sqlalchemy as sa
from alembic.script import ScriptDirectory
from pytest_mock.plugin import MockerFixture
from typer.testing import CliRunner

import pytest
from quetz import cli
from quetz.config import Config
from quetz.db_models import Base, Identity, User
Expand Down Expand Up @@ -554,9 +554,12 @@ def database_url_environment_variable(database_url) -> None:
@pytest.fixture()
def s3_environment_variable() -> None:
os.environ["QUETZ_S3_ACCESS_KEY"] = "fake_key"
os.environ["QUETZ_S3_BUCKET_NAME"] = "fake_bucket"
yield
if "QUETZ_S3_ACCESS_KEY" in os.environ:
del os.environ["QUETZ_S3_ACCESS_KEY"]
if "QUETZ_S3_BUCKET_NAME" in os.environ:
del os.environ["QUETZ_S3_BUCKET_NAME"]


@pytest.fixture()
Expand Down
5 changes: 4 additions & 1 deletion quetz/tests/test_pkg_stores.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
from pathlib import Path

import pytest

from quetz.pkgstores import (
AzureBlobStore,
GoogleCloudStorageStore,
Expand All @@ -23,6 +22,7 @@
'secret': os.environ.get("S3_SECRET_KEY"),
'url': os.environ.get("S3_ENDPOINT"),
'region': os.environ.get("S3_REGION"),
'bucket_name': os.environ.get("S3_BUCKET_NAME"),
'bucket_prefix': "test",
'bucket_suffix': "",
}
Expand Down Expand Up @@ -156,6 +156,9 @@ def test_remove_dirs(any_store, channel_name):

@pytest.fixture
def s3_store():
print(">>> s3_config", s3_config)
print(">>> s3_config.bucket_name", s3_config.get("bucket_name"))
print(">>> s3_config | os.environ", os.environ)
RobinHolzingerQC marked this conversation as resolved.
Show resolved Hide resolved
pkg_store = S3Store(s3_config)
return pkg_store

Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ install_requires =
sqlalchemy-utils
tenacity
toml
typer
typer >=0.9,<1.0
typing_extensions
ujson
uvicorn
Expand Down
Loading