You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Spoiler:
I fixed this by setting the AWS value for HttpPutResponseHopLimit to 2. This allows the AWS SDK to use IMDSv2 inside a docker container on an EC2 machine.
Fixes:
It would be great if there was clearer error logging for this issue in libdns/route53, I took a very long time to figure out what was going wrong when reading the messages below.
Perhaps some documentation could be added
I don't know enough about this issue to say that my change is the only way to fix this
Details:
I was running into failures when using route53 via caddy-dns/route53 inside a docker container on an EC2 node with an IAM role attached to the EC2 machine. Note, I was using staging Let's Encrypt, but I don't think that's relevant.
IMDSv1 and IMDSv2 were both enabled for the EC2 machine.
I'm still not sure why I ran into this issue just now, I had a machine on a different AWS account run the same code without issues.
But I had run into issues in the past where using tools which had updated to newer versions of the AWS SDK for Go were failing until either IMDSv2 was disabled or an additional hop was allowed, so I had an idea of how to try to fix something here.
The failing logs:
The two relevant sections I see are could not determine zone for domain and unexpected response code 'SERVFAIL'
{"level":"error","ts":1684773178.4666998,"logger":"http.acme_client","msg":"cleaning up solver","identifier":"example.com","challenge_type":"dns-01","error":"no memory of presenting a DNS record for \"_acme-challenge.example.com\" (usually OK if presenting also failed)"}
{"level":"error","ts":1684773178.5130796,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"example.com","issuer":"acme-staging-v02.api.letsencrypt.org-directory","error":"[example.com] solving challenges: presenting for challenge: could not determine zone for domain \"_acme-challenge.example.com\": unexpected response code 'SERVFAIL' for _acme-challenge.example.com. (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/REDACTED/REDACTED) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)"}
{"level":"error","ts":1684773178.5131493,"logger":"tls.obtain","msg":"will retry","error":"[example.com] Obtain: [example.com] solving challenges: presenting for challenge: could not determine zone for domain \"_acme-challenge.example.com\": unexpected response code 'SERVFAIL' for _acme-challenge.example.com. (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/REDACTED/REDACTED) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)","attempt":26,"retrying_in":21600,"elapsed":324020.948514926,"max_duration":2592000}
Full fix with Terraform:
My full fix since I was using Terraform was to set the following block in my aws_instance:
Stumbled on this issue while trying dns validations from Caddy running in a docker container with the route53 module used for DNS and was seeing them to be failing. In my case the error was as below.
failed to sign request: failed to retrieve credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded
Spoiler:
I fixed this by setting the AWS value for
HttpPutResponseHopLimit
to2
. This allows the AWS SDK to use IMDSv2 inside a docker container on an EC2 machine.Fixes:
Details:
I was running into failures when using route53 via caddy-dns/route53 inside a docker container on an EC2 node with an IAM role attached to the EC2 machine. Note, I was using staging Let's Encrypt, but I don't think that's relevant.
IMDSv1 and IMDSv2 were both enabled for the EC2 machine.
I'm still not sure why I ran into this issue just now, I had a machine on a different AWS account run the same code without issues.
But I had run into issues in the past where using tools which had updated to newer versions of the AWS SDK for Go were failing until either IMDSv2 was disabled or an additional hop was allowed, so I had an idea of how to try to fix something here.
Versions:
Caddy: 2.6.4
caddy-dns/route53: v1.3.2
/libdns/route53: v1.3.2
The failing logs:
The two relevant sections I see are
could not determine zone for domain
andunexpected response code 'SERVFAIL'
Full fix with Terraform:
My full fix since I was using Terraform was to set the following block in my
aws_instance
:The text was updated successfully, but these errors were encountered: