Skip to content
This repository has been archived by the owner on Mar 25, 2024. It is now read-only.

jenkins master can't find jenkins agent which is buildpod #64

Open
junhee-yoo opened this issue Nov 8, 2019 · 7 comments
Open

jenkins master can't find jenkins agent which is buildpod #64

junhee-yoo opened this issue Nov 8, 2019 · 7 comments

Comments

@junhee-yoo
Copy link

junhee-yoo commented Nov 8, 2019

Hi, I'm using rancher v2.3.2 and trying to use pipeline function.

I setup rancher and kubernetes on-prem infrastructure and tried to make docker images from internal git repo.
During build steps, fetching from git repo is working but right after stage - Build and Publish Image type step always fails.

I exec into those jenkins master and agent and figured out that jenkins master can not resolve agent address with it's hostname like 'buildpod.pipeline-p-jsk9d-3.1-gw3ck-65xvn'.
I've tried to do 'nslookup' for other domains like 'google.com' and it worked fine.

please check my config and nslookup results and let me know if I miss configured something.

Thank you!

network configuration

    kube-api:
      always_pull_images: false
      pod_security_policy: false
      service_cluster_ip_range: 172.16.0.0/16
      service_node_port_range: 30000-32767
    kube-controller:
      cluster_cidr: 172.17.0.0/16
      service_cluster_ip_range: 172.16.0.0/16
    kubelet:
      cluster_dns_server: 172.16.0.10
      cluster_domain: cluster.local

from jenkins master

root@jenkins-574b5cc88f-fxsd8:/# cat /etc/resolv.conf 
nameserver 172.16.0.10
search p-8ppxw-pipeline.svc.cluster.local svc.cluster.local cluster.local nhnjp.ism line.ism lineinfra.com
options ndots:5
root@jenkins-574b5cc88f-fxsd8:/# nslookup buildpod.pipeline-p-jsk9d-1.1-6x35b-dlms6
Server:		172.16.0.10
Address:	172.16.0.10#53

** server can't find buildpod.pipeline-p-jsk9d-1.1-6x35b-dlms6: NXDOMAIN
root@jenkins-574b5cc88f-fxsd8:/# nslookup jenkins.p-8ppxw-pipeline
Server:		172.16.0.10
Address:	172.16.0.10#53

Name:	jenkins.p-8ppxw-pipeline.svc.cluster.local
Address: 172.16.151.169
root@jenkins-574b5cc88f-fxsd8:/# nslookup buildpod.pipeline-p-jsk9d-3.1-gw3ck-65xvn.p-8ppxw-pipeline
Server:		172.16.0.10
Address:	172.16.0.10#53

** server can't find buildpod.pipeline-p-jsk9d-3.1-gw3ck-65xvn.p-8ppxw-pipeline: NXDOMAIN

from jenkins agent

# cat /etc/resolv.conf 
nameserver 172.16.0.10
search p-8ppxw-pipeline.svc.cluster.local svc.cluster.local cluster.local nhnjp.ism line.ism lineinfra.com
options ndots:5
bash-4.4# nslookup jenkins.p-8ppxw-pipeline.svc.cluster.local
nslookup: can't resolve '(null)': Name does not resolve

Name:      jenkins.p-8ppxw-pipeline.svc.cluster.local
Address 1: 172.16.151.169 jenkins.p-8ppxw-pipeline.svc.cluster.local
bash-4.4# nslookup buildpod.pipeline-p-jsk9d-3.1-gw3ck-65xvn
nslookup: can't resolve '(null)': Name does not resolve

Name:      buildpod.pipeline-p-jsk9d-3.1-gw3ck-65xvn
Address 1: 172.17.6.17 buildpod.pipeline-p-jsk9d-3.1-gw3ck-65xvn
bash-4.4# nslookup buildpod.pipeline-p-jsk9d-3.1-gw3ck-65xvn.p-8ppxw-pipeline
nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'buildpod.pipeline-p-jsk9d-3.1-gw3ck-65xvn.p-8ppxw-pipeline': Name does not resolve
@gitlawr
Copy link
Contributor

gitlawr commented Nov 8, 2019

Can you show the error message in your builds?
A pod name(buildpod.pipeline-p-jsk9d-3) is not resolvable, while a service name is(there is a jenkins service resource in the namespace). I'm not sure if the DNS resolution is related to your original problem.

@junhee-yoo
Copy link
Author

Sorry for my late reply.

Here's the jenkins log from rancher pipeline build.

From jenkins agent pod name: 'buildpod.pipeline-p-jsk9d-4.1-7wfvr-l152f'

+ /usr/local/bin/dockerd-entrypoint.sh /bin/drone-docker
+ /usr/local/bin/dockerd --data-root /var/lib/docker
Cannot contact buildpod.pipeline-p-jsk9d-4.1-7wfvr-l152f: java.lang.InterruptedException

Build status always fails into 'failed' after 1~1.5 hour from above message.

@gitlawr
Copy link
Contributor

gitlawr commented Nov 11, 2019

There's a 60min timeout by default, so it looks like it is terminated due to the timeout.

@junhee-yoo
Copy link
Author

junhee-yoo commented Nov 11, 2019

Yes, and this is the message when timeout reached:

+ /usr/local/bin/dockerd-entrypoint.sh /bin/drone-docker
+ /usr/local/bin/dockerd --data-root /var/lib/docker
Cannot contact buildpod.pipeline-p-jsk9d-4.1-7wfvr-l152f: java.lang.InterruptedException
Could not connect to buildpod.pipeline-p-jsk9d-4.1-7wfvr-l152f to send interrupt signal to process

Even timeout signal can not reach to 'buildpod.pipeline-p-jsk9d-4.1-7wfvr-l152f' here.
And total build process doesn't take an hour when I tried it on my laptop.

So I thought that master can not reach to 'buildpod.pipeline-p-jsk9d-4.1-7wfvr-l152f' pod and I guess it related with DNS because Jenkins master can not resolve the pod name.

Sorry for inefficient description :(

@LivesMountain
Copy link

I have the same problem

@kindomcat
Copy link

I have the same problem,too

@kindomcat
Copy link

You can try to upgrade or lower the jenkins plugin (
Kubernetes plugin). My rancher 2.3.6 upgraded this plugin from 1.18.2 to 1.18.3 just fine.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants