Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG?] ERROR: work type did not expect a signature when running health check in AWX with work-kubernetes in Receptor #14849

Open
5 of 11 tasks
x86-39 opened this issue Feb 7, 2024 · 4 comments

Comments

@x86-39
Copy link

x86-39 commented Feb 7, 2024

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that AWX is open source software provided for free and that I might not receive a timely response.
  • I am NOT reporting a (potential) security vulnerability. (These should be emailed to [email protected] instead.)

Bug Summary

When using the work-kubernetes type as described in the documentation, we get the following error when checking the health of the node from AWX.

ERROR 2024/02/07 00:40:42 : work type did not expect a signature

image

Does AWX not support the work-kubernetes type yet and the health check is not reporting a readable error for this? The error is quite vague and I'm not sure what the issue is.

Our goal here is to run Receptor in a Kubernetes cluster so we can host execution and/or hop nodes in Kubernetes. I'm not certain whether this is an issue in AWX or in Receptor.

AWX version

23.7.0

Select the relevant components

  • UI
  • UI (tech preview)
  • API
  • Docs
  • Collection
  • CLI
  • Other

Installation method

kubernetes

Modifications

no

Ansible version

No response

Operating system

Ubuntu 22.04

Web browser

No response

Steps to reproduce

The following receptor config is used:

receptor.conf
---
- node:
    id: 192.168.21.54
 
- work-verification:
    publickey: /etc/receptor/work_public_key.pem
 
- log-level: debug
 
- control-service:
    service: control
    filename: /tmp/receptor.sock
    permissions: 0660
    tls: tls_server
- tls-server:
    name: tls_server
    cert: /etc/receptor/tls/receptor.crt
    key: /etc/receptor/tls/receptor.key
    clientcas: /etc/receptor/tls/ca/mesh-CA.crt
    requireclientcert: true
    mintls13: False
 
- tls-client:
    name: tls_client
    cert: /etc/receptor/tls/receptor.crt
    key: /etc/receptor/tls/receptor.key
    rootcas: /etc/receptor/tls/ca/mesh-CA.crt
    insecureskipverify: false
    mintls13: False
- tcp-listener:
    port: 27199
    tls: tls_server
 
- work-kubernetes:
    worktype: kubeit
    authmethod: kubeconfig
    allowruntimeauth: true
    allowruntimepod: true
    allowruntimeparams: true
    verifysignature: true

After starting Receptor and checking the health of the instance, I get the error.

Expected results

AWX should succeed the health check and use Receptor to run workloads on the Kubernetes cluster with the kubeit worktype.

If this is not a supported usecase yet, I would expect a clearer error message. This error message seems quite arbitrary to me and confused us for days.

Actual results

We get an error

ERROR 2024/02/07 00:40:42 : work type did not expect a signature

This does not seem relevant to what we are trying to achieve. I looked through the code to see what causes this and it seems to be related to the health check not using the correct work type (more information later).

Additional information

It seems this error occurs due to the workType being given as ansible-runner instead of kubeit. I'm not too familiar with the code at work here, but I added some debug statements in Receptor.

func (c *workceptorCommand) processSignature(workType, signature string, connIsUnix, signWork bool) error {
	shouldVerifySignature := c.w.ShouldVerifySignature(workType, signWork)
	fmt.Print("shouldVerifySignature: ", shouldVerifySignature)
	fmt.Print("workType: ", workType)
	fmt.Print("connIsUnix: ", connIsUnix)

	if !shouldVerifySignature && signature != "" {
		return fmt.Errorf("work type did not expect a signature")
	}
	if shouldVerifySignature && !connIsUnix {
		err := c.w.VerifySignature(signature)
		if err != nil {
			return err
		}
	}

	return nil
shouldVerifySignature: false
workType: ansible-runner
connIsUnix: false

And in ShouldVerifySignature

func (w *Workceptor) ShouldVerifySignature(workType string, signWork bool) bool {
	// if work unit is remote, just get the signWork boolean from the
	// remote extra data field
	if workType == "remote" {
		return signWork
	}
	w.workTypesLock.RLock()
	fmt.Print("w: ", w, " workTypes: ", w.workTypes, "\n")

	wt, ok := w.workTypes[workType]
	w.workTypesLock.RUnlock()
	fmt.Print("w: ", w, " wt: ", wt, " ok: ", ok, "\n")

	if ok && wt.verifySignature {
		return true
	}

	return false
}
workTypes: map[kubeit:0xc000379e80 remote:0xc000379620]
w: &{0xc0002adb30 0x495fe0 0xc0001ef880 /tmp/receptor/192.168.21.54 0xc0000a74d0 map[kubeit:0xc000379e80 remote:0xc000379620] 0xc0000a74e8 map[]  5m0s /etc/receptor/work_public_key.pem}
wt: <nil>
ok: false

Am I correct here in that it seems like it thinks the only valid workTypes are kubeit and remote here but AWX is sending ansible-runner for the health check?

@kurokobo
Copy link
Contributor

kurokobo commented Feb 7, 2024

@diademiemi
Hi,

Our goal here is to run Receptor in a Kubernetes cluster so we can host execution and/or hop nodes in Kubernetes.

The current AWX implementation assumes that the execution nodes are running as the hosts where Ansible Runner is running locally and Podman is installed.
So in the first place it's hard to run execution nodes in Kubernetes cluster since if we select execution nodes for some job templates AWX sends request to ansible runner on the execution nodes to run execition environment by creating container on the Podman, instead of Kubernetes.

Alternatively, I recommend you this to achieve similar goals; we can define Container Group with credentials for the remote Kubernetes cluster. This allows us to run EE on remote Kubernetes cluster: https://ansible.readthedocs.io/projects/awx/en/latest/administration/containers_instance_groups.html#create-a-container-group

Running hop node on Kubernetes cluster is not so hard, since hop node never be used to invoke any commands. No podman nor ansible runner are required. In addition, the feature "in-cluster hop node" called AWXMeshIngress will be implemented in the next release: #14640

Here are my answer for your questions for your technical interest:

  • worktype is just a name. The documentation uses kubeit not because it is required for Kubernetes work, but simply as one example, given a simple name.
  • AWX sends ansible-runner worktype to run health check. This will invoke ansible-runner worker --worker-info on the execution nodes.
  • Running jobs on Instance Groups means that AWX requests to remote Ansible Runner to run playbooks with process isolation by podman. Ansible Runner has an ability to run ansible-playbook in isolated environment (means running it in Podman container), so in this case EE container is created by Ansible Runner.
  • Running jobs on Container Groups* means that AWX requests to remote Ansible Runner to run playbook locally. Receptor has an ability to create Pod with custom specification on Kubernetes cluster, so in this case EE container that run Ansible Runner is created by Receptor (kubernetes-runtime-auth worktype or kubernetes-incluster-auth worktype).

If you have further insterest, my blog article may helps you (sorry it is in Japanese, so please use some translator): https://blog.kurokobo.com/archives/4847
Or ask further questions on the forum: https://forum.ansible.com/

@kurokobo
Copy link
Contributor

kurokobo commented Feb 7, 2024

It would be appropriate to improve the error message, perhaps in an enhancement request on the Receptor side.

@fosterseth
Copy link
Member

fosterseth commented Feb 7, 2024

as @kurokobo mentioned, container groups are designed to achieve running jobs on remote k8s clusters

AWX expects execution node to have a work-command called ansible-runner for health checks

but when running jobs, AWX also uses this same work command. So even if you have a proper kubeit work-kubernetes setup in the config, AWX is not going to utilize it sadly. That would require a bit of changes in AWX to get that working.

Is there a use case for this that container groups doesn't cover?

@x86-39
Copy link
Author

x86-39 commented Feb 8, 2024

Thank you for the detailed response! I understand a lot better now what this is doing under the hood

I'll be checking out the AWXMeshIngress and Container Groups feature today and tomorrow and I'll get back to you for if this covers our usecase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants