deployment issues in AKS #775

iferencik · 2024-02-06T18:08:42Z

iferencik
Feb 6, 2024

Hello titilers,

In am writing in a possible issue related to a custom titiler deployment to AKS.
I have been using an older version of titiler for over a year successfully in AKS and never had any issues. Some 2 weeks ago I have upgraded to 0.17 (some small code changes were also done to the server).

I am deploying these custom server to a 8cpu/32GB RAM machine in AKS and it constantly gets stuck after running and arbitrary amount of time (30 min). The cluster is using default AKS nginx ingress load balancer. Here is all the def in yaml

apiVersion: v1
kind: Namespace
metadata:
  name: titiler
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: titiler
  namespace: titiler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: titiler
  template:
    metadata:
      labels:
        app: titiler
    spec:
      nodeSelector:
        type: "manual"
      containers:
        - name: titiler
          #image: ghcr.io/undp-data/cogserver:v0.0.3
          image: undpgeohub.azurecr.io/cogserver-debug
          imagePullPolicy: Always
          resources:
            limits:
              memory: "9G"
              cpu: "3000m"
          env:
            # - name: WEB_CONCURRENCY
            #   value: "1"
            
            # - name: MAX_WORKERS
            #   value: "1"
            # - name: WEB_CONCURRENCY
            #   value: "1"
            # - name: RIO_TILER_MAX_THREADS
            #   value: "1"
            # - name: API_CORS_ORIGIN
            #   value: "*"
---
apiVersion: v1
kind: Service
metadata:
  name: titiler
  namespace: titiler
  labels:
    app: titiler
spec:
  ports:
    - name: web
      port: 80
      targetPort: 80
  selector:
    app: titiler
  type: ClusterIP # LoadBalancer # NodePort #
  ## load balancer will make the service accessible on the internet using an external ip but no https
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: titiler-ssl-tls-ingress
  namespace: titiler
  annotations:
    kubernetes.io/ingress.class: addon-http-application-routing
    cert-manager.io/cluster-issuer: zerossl

spec:
  tls:
    - hosts:
        - titiler.undpgeohub.org # update IP address here
      secretName: titiler-cert
  rules:
    - host: titiler.undpgeohub.org # update IP address here
      http:
        paths:
          - path: "/"
            pathType: Prefix
            backend:
              service:
                name: titiler
                port:
                  number: 80

and this is the docker file

FROM ghcr.io/osgeo/gdal:ubuntu-small-latest as base

RUN apt-get update \
  && apt-get install -y --no-install-recommends \
  libffi-dev python3-pip
RUN python3 -m pip install pipenv
WORKDIR /opt/server
RUN export PYTHON_VERSION="$(python3 --version | cut -d ' ' -f 2)" && pipenv --python ${PYTHON_VERSION}
RUN pipenv run pip install -U pip
#RUN pipenv run pip install uvicorn titiler asyncpg postgis --no-cache-dir  --upgrade
COPY requirements.txt requirements.txt
RUN pipenv run pip install -r requirements.txt
COPY src/cogserver cogserver
ENV HOST=0.0.0.0
ENV PORT=80
ENV WEB_CONCURRENCY=1
ENV CPL_TMPDIR=/tmp
ENV GDAL_CACHEMAX=75%
ENV GDAL_INGESTED_BYTES_AT_OPEN=32768
ENV GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR
ENV GDAL_HTTP_MERGE_CONSECUTIVE_RANGES=YES
ENV GDAL_HTTP_MULTIPLEX=YES
ENV GDAL_HTTP_VERSION=2
ENV PYTHONWARNINGS=ignore
ENV VSI_CACHE=FALSE
#ENV RIO_TILER_MAX_THREADS=2



#CMD pipenv run uvicorn cogserver:app --host ${HOST} --port ${PORT} --log-config cogserver/logconf.yaml
CMD pipenv run uvicorn cogserver:app --host ${HOST} --port ${PORT} --log-level trace

I set the WEB_CONSURRENCY to 1 to force using 1 worker. The fact is the load balancer reports 504 gateway timeout and I can never find anything in the pod logs. It looks like the service just gets stuck, like waiting on a thread or something and becomes unresponsive

I tried to force RIO_TILER_MAX_THREADS to 1 and various other options but I just can not get it up and running properly.

I need to mention that I run a dev server in identical config deployed in a different namespace (titiler-dev) on the same node.

I did not create an issue because I believe this is related to my deployment.

The SSL/TLS is managed by cert manager (letsencrypt and zerossl) . I also did not find any issue in cert-manager's pod logs

I will be grateful for any hints/ideas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deployment issues in AKS #775

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

deployment issues in AKS #775

iferencik Feb 6, 2024

Replies: 0 comments

iferencik
Feb 6, 2024