Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[newrelic-pixie] newrelic-pixie init container not running on arm64 #1152

Closed
maxlemieux opened this issue Oct 4, 2023 · 9 comments · May be fixed by #1198
Closed

[newrelic-pixie] newrelic-pixie init container not running on arm64 #1152

maxlemieux opened this issue Oct 4, 2023 · 9 comments · May be fixed by #1198
Labels
bug Categorizes issue or PR as related to a bug. team/pixie triage/pending Issue or PR is pending for triage and prioritization.

Comments

@maxlemieux
Copy link
Contributor

Bug description

newrelic-pixie chart fails to install to arm64 nodes.

Version of Helm and Kubernetes

Any versions, where the nodes are arm64 type. Tested on AKS, Kubernetes v1.26.6 with node pool template Standard_D2pds_v5 (arm64)

Which chart?

helm search repo newrelic-pixie
NAME                   	CHART VERSION	APP VERSION	DESCRIPTION                                      
newrelic/newrelic-pixie	2.1.2        	2.1.4      	A Helm chart for the New Relic Pixie integration.

What happened?

The newrelic-pixie job fails 5 times in quick succession after scheduling to an arm64 node.

Logs for the cluster-registration-wait container include this message:

exec /bin/sh: exec format error                                                                                                                                                     │

What you expected to happen?

Expecting the init container to work with arm64.

How to reproduce it?

Add an arm64 node pool to your cluster. Taint the other node groups. Process per this guide.

Install the New Relic bundle with Pixie enabled.

Anything else we need to know?

This is the container image for the container that's not running on arm64:

Image:         gcr.io/pixie-oss/pixie-dev-public/curl:1.0                                                                                                                       │
Image ID:      gcr.io/pixie-oss/pixie-dev-public/curl@sha256:b57f1d617b3eded350e2f78a5eece0c0839c59f59f1dece39f413f599dc382b1                                                   │
@maxlemieux maxlemieux added bug Categorizes issue or PR as related to a bug. triage/pending Issue or PR is pending for triage and prioritization. labels Oct 4, 2023
@workato-integration
Copy link

@workato-integration
Copy link

@ddelnano
Copy link

The pixie repo seems to use this "multiarch" tagged image.

$ git grep 'pixie-dev-public\/curl' | grep '^k8s'
k8s/cloud/base/ory_auth/kratos/kratos_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/devinfra/buildbuddy-executor/values.yaml:  image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/base/kelvin_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/base/patch_sentry.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/base/query_broker_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/bootstrap/cloud_connector_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/etcd_metadata/base/metadata_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/etcd_metadata/base/metadata_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/pem/base/pem_daemonset.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/persistent_metadata/base/metadata_statefulset.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/sanitizer/kelvin_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd

My suspicion is that the helm chart may not have the latest changes to pull in the correct image.

@ddelnano
Copy link

I missed that this wasn't the pixie-operator helm chart, but the newrelic-pixie chart. I believe we need to replace this image with the one I mentioned above.

@ddelnano
Copy link

ddelnano commented Oct 19, 2023

After investigating this more, the curl image isn't the only one to address. The newrelic/newrelic-pixie-integration repo isn't publishing container images for ARM. I've validated with @maxlemieux's help that if those two things are addressed, that the chart successfully installs.

@ddelnano
Copy link

The newrelic/newrelic-pixie-integration repo's v2.2.0 release supports ARM builds now. We can now update the helm-chart to use this version and fix the curl issue mentioned above.

@maxlemieux
Copy link
Contributor Author

The curl container issue seems to be fixed with this update, but the main container (not the init container) now shows the same issue with exec format.

@ddelnano
Copy link

This will be addressed once #1198 is merged and a new nri-bundle release is made. Thanks for all your help through this @maxlemieux!

@workato-integration
Copy link

All attempts at reproducing this issue failed, or not enough information was available to reproduce the issue. Reading the code produces no clues as to why this behavior would occur. If more information appears later, please reopen the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Categorizes issue or PR as related to a bug. team/pixie triage/pending Issue or PR is pending for triage and prioritization.
Projects
None yet
3 participants