Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commitment of documentation for clickhouse official image #2397

Merged
merged 4 commits into from
Nov 8, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions clickhouse/README-short.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ClickHouse is the fastest and most resource efficient OSS database for real-time apps and analytics.
160 changes: 160 additions & 0 deletions clickhouse/content.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# ClickHouse Server Docker Image

## What is ClickHouse?

%%LOGO%%

ClickHouse is an open-source column-oriented DBMS (columnar database management system) for online analytical processing (OLAP) that allows users to generate analytical reports using SQL queries in real-time.

ClickHouse works 100-1000x faster than traditional database management systems, and processes hundreds of millions to over a billion rows and tens of gigabytes of data per server per second. With a widespread user base around the globe, the technology has received praise for its reliability, ease of use, and fault tolerance.

For more information and documentation see https://clickhouse.com/.

### Compatibility

- The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3.
- The arm64 image requires support for the [ARMv8.2-A architecture](https://en.wikipedia.org/wiki/AArch64#ARMv8.2-A) and additionally the Load-Acquire RCpc register. The register is optional in version ARMv8.2-A and mandatory in [ARMv8.3-A](https://en.wikipedia.org/wiki/AArch64#ARMv8.3-A). Supported in Graviton >=2, Azure and GCP instances. Examples for unsupported devices are Raspberry Pi 4 (ARMv8.0-A) and Jetson AGX Xavier/Orin (ARMv8.2-A).
Comment on lines +15 to +16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the author of opencontainers/image-spec@2d95dde, I love this -- I wish we had a way to specify that the amd64 image variant should be v2 (corresponding to the lowest level with SSE3) and the arm64 variant could be v8.2, but we don't currently have a way to specify either in our system and the latter would break folks (until containerd/platforms#8 and similar trickle down to popular runtimes), so for now I'm just happy to see it documented explicitly (hopefully ClickHouse also reports a clear error when these are missing). 😄 ❤️

- Since the Clickhouse 24.11 Ubuntu images started using `ubuntu:22.04` as its base image. It requires docker version >= `20.10.10` containing [patch](https://github.com/moby/moby/commit/977283509f75303bc6612665a04abf76ff1d2468). As a workaround you could use `docker run [--privileged | --security-opt seccomp=unconfined]` instead, however that has security implications.
Felixoid marked this conversation as resolved.
Show resolved Hide resolved

## How to use this image

### start server instance

```bash
docker run -d --name some-clickhouse-server --ulimit nofile=262144:262144 %%IMAGE%%
```

By default, ClickHouse will be accessible only via the Docker network. See the **networking** section below.

By default, starting above server instance will be run as the `default` user without password.

### connect to it from a native client

```bash
docker run -it --rm --link some-clickhouse-server:clickhouse-server --entrypoint clickhouse-client %%IMAGE%% --host clickhouse-server
# OR
docker exec -it some-clickhouse-server clickhouse-client
```

More information about the [ClickHouse client](https://clickhouse.com/docs/en/interfaces/cli/).

### connect to it using curl

```bash
echo "SELECT 'Hello, ClickHouse!'" | docker run -i --rm --link some-clickhouse-server:clickhouse-server buildpack-deps:curl 'http://clickhouse-server:8123/?query=' -s --data-binary @-
Felixoid marked this conversation as resolved.
Show resolved Hide resolved
```

More information about the [ClickHouse HTTP Interface](https://clickhouse.com/docs/en/interfaces/http/).

### stopping / removing the container

```bash
docker stop some-clickhouse-server
docker rm some-clickhouse-server
```

### networking

You can expose your ClickHouse running in docker by [mapping a particular port](https://docs.docker.com/config/containers/container-networking/) from inside the container using host ports:

```bash
docker run -d -p 18123:8123 -p19000:9000 --name some-clickhouse-server --ulimit nofile=262144:262144 %%IMAGE%%
echo 'SELECT version()' | curl 'http://localhost:18123/' --data-binary @-
```

`22.6.3.35`

Or by allowing the container to use [host ports directly](https://docs.docker.com/network/host/) using `--network=host` (also allows achieving better network performance):

```bash
docker run -d --network=host --name some-clickhouse-server --ulimit nofile=262144:262144 %%IMAGE%%
echo 'SELECT version()' | curl 'http://localhost:8123/' --data-binary @-
```

`22.6.3.35`

### Volumes

Typically you may want to mount the following folders inside your container to achieve persistency:

- `/var/lib/clickhouse/` - main folder where ClickHouse stores the data
- `/var/log/clickhouse-server/` - logs

```bash
docker run -d \
-v "$PWD/ch_data:/var/lib/clickhouse/" \
-v "$PWD/ch_logs:/var/log/clickhouse-server/" \
--name some-clickhouse-server --ulimit nofile=262144:262144 %%IMAGE%%
```

You may also want to mount:

- `/etc/clickhouse-server/config.d/*.xml` - files with server configuration adjustments
- `/etc/clickhouse-server/users.d/*.xml` - files with user settings adjustments
- `/docker-entrypoint-initdb.d/` - folder with database initialization scripts (see below).

### Linux capabilities

ClickHouse has some advanced functionality, which requires enabling several [Linux capabilities](https://man7.org/linux/man-pages/man7/capabilities.7.html).

They are optional and can be enabled using the following [docker command-line arguments](https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities):

```bash
docker run -d \
--cap-add=SYS_NICE --cap-add=NET_ADMIN --cap-add=IPC_LOCK \
Comment on lines +100 to +104
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that they're optional, you don't happen to have any documentation about what each of these are used for and when/why users might want to set them (specific to ClickHouse behavior), do you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is this URL worth adding here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line is added below:

Read more in [knowledge base](https://clickhouse.com/docs/knowledgebase/configure_cap_ipc_lock_and_cap_sys_nice_in_docker).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for sure! I wonder if it could be improved to explain why, but that's not a blocker at all, just something I think would be useful. 😄 ❤️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a part of the ClickHouse logs related to the capabilities on the KB page:

2023.04.19 08:04:10.022720 [ 1 ] {} <Information> Application: It looks like the process has no CAP_IPC_LOCK capability, binary mlock will be disabled. It could happen due to incorrect ClickHouse package installation. You could resolve the problem manually with 'sudo setcap cap_ipc_lock=+ep /usr/bin/clickhouse'. Note that it will not work on 'nosuid' mounted filesystems.

2023.04.19 08:04:10.065860 [ 1 ] {} <Information> Application: It looks like the process has no CAP_SYS_NICE capability, the setting 'os_thread_priority' will have no effect. It could happen due to incorrect ClickHouse package installation. You could resolve the problem manually with 'sudo setcap cap_sys_nice=+ep /usr/bin/clickhouse'. Note that it will not work on 'nosuid' mounted filesystems.

Should I add it as well explicitly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naw, I think what you've got now is fine. 👍

--name some-clickhouse-server --ulimit nofile=262144:262144 %%IMAGE%%
```

Read more in [knowledge base](https://clickhouse.com/docs/knowledgebase/configure_cap_ipc_lock_and_cap_sys_nice_in_docker).

## Configuration

The container exposes port 8123 for the [HTTP interface](https://clickhouse.com/docs/en/interfaces/http_interface/) and port 9000 for the [native client](https://clickhouse.com/docs/en/interfaces/tcp/).

ClickHouse configuration is represented with a file "config.xml" ([documentation](https://clickhouse.com/docs/en/operations/configuration_files/))

### Start server instance with custom configuration

```bash
docker run -d --name some-clickhouse-server --ulimit nofile=262144:262144 -v /path/to/your/config.xml:/etc/clickhouse-server/config.xml %%IMAGE%%
```

### Start server as custom user

```bash
# $PWD/data/clickhouse should exist and be owned by current user
docker run --rm --user "${UID}:${GID}" --name some-clickhouse-server --ulimit nofile=262144:262144 -v "$PWD/logs/clickhouse:/var/log/clickhouse-server" -v "$PWD/data/clickhouse:/var/lib/clickhouse" %%IMAGE%%
```

When you use the image with local directories mounted, you probably want to specify the user to maintain the proper file ownership. Use the `--user` argument and mount `/var/lib/clickhouse` and `/var/log/clickhouse-server` inside the container. Otherwise, the image will complain and not start.

### Start server from root (useful in case of enabled user namespace)

```bash
docker run --rm -e CLICKHOUSE_RUN_AS_ROOT=1 --name clickhouse-server-userns -v "$PWD/logs/clickhouse:/var/log/clickhouse-server" -v "$PWD/data/clickhouse:/var/lib/clickhouse" %%IMAGE%%
```

### How to create default database and user on starting

Sometimes you may want to create a user (user named `default` is used by default) and database on a container start. You can do it using environment variables `CLICKHOUSE_DB`, `CLICKHOUSE_USER`, `CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT` and `CLICKHOUSE_PASSWORD`:

```bash
docker run --rm -e CLICKHOUSE_DB=my_database -e CLICKHOUSE_USER=username -e CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1 -e CLICKHOUSE_PASSWORD=password -p 9000:9000/tcp %%IMAGE%%
```

## How to extend this image

To perform additional initialization in an image derived from this one, add one or more `*.sql`, `*.sql.gz`, or `*.sh` scripts under `/docker-entrypoint-initdb.d`. After the entrypoint calls `initdb`, it will run any `*.sql` files, run any executable `*.sh` scripts, and source any non-executable `*.sh` scripts found in that directory to do further initialization before starting the service.
Also, you can provide environment variables `CLICKHOUSE_USER` & `CLICKHOUSE_PASSWORD` that will be used for clickhouse-client during initialization.

For example, to add an additional user and database, add the following to `/docker-entrypoint-initdb.d/init-db.sh`:

```bash
#!/bin/bash
set -e

clickhouse client -n <<-EOSQL
CREATE DATABASE docker;
CREATE TABLE docker.docker (x Int32) ENGINE = Log;
EOSQL
```
1 change: 1 addition & 0 deletions clickhouse/github-repo
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
https://github.com/ClickHouse/docker-library
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly an FYI, this will get used for things like links in the "Quick Reference" section to GitHub issues -- is this where you want users to file issues related to the image, or would you rather they go to ClickHouse/ClickHouse? (do you plan to keep issues enabled on the docker-library repository?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misinterpret the readme, it should be the ClickHouse/ClickHouse for sure

1 change: 1 addition & 0 deletions clickhouse/license.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
View [license information](https://github.com/ClickHouse/ClickHouse/blob/master/LICENSE) for the software contained in this image.
43 changes: 43 additions & 0 deletions clickhouse/logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions clickhouse/maintainer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[ClickHouse Inc.](%%GITHUB-REPO%%)
7 changes: 7 additions & 0 deletions clickhouse/metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"hub": {
"categories": [
"databases-and-storage"
]
}
}