HTTP Middleware Error Logging: Vision on StatusBadGateway/ServiceUnavailable #192

joe-elliott · 2020-07-13T15:37:34Z

The http logging middleware splits out different request results and logs them as either debug or warn. Generally errors are logged as warn and successes are logged as debug.

common/middleware/logging.go

Lines 56 to 64 in 4b18475

 if 100 <= statusCode && statusCode < 500 || statusCode == http.StatusBadGateway || statusCode == http.StatusServiceUnavailable { 

 l.logWithRequest(r).Debugf("%s %s (%d) %s", r.Method, uri, statusCode, time.Since(begin)) 

 if l.LogRequestHeaders && headers != nil { 

 l.logWithRequest(r).Debugf("ws: %v; %s", IsWSHandshakeRequest(r), string(headers)) 

 } 

 } else { 

 l.logWithRequest(r).Warnf("%s %s (%d) %s Response: %q ws: %v; %s", 

 r.Method, uri, statusCode, time.Since(begin), buf.Bytes(), IsWSHandshakeRequest(r), headers) 

 }

We need to log the below error conditions that are currently being logged as debug. Unfortunately, due to volume, we can't turn on debug logging.

statusCode == http.StatusBadGateway || statusCode == http.StatusServiceUnavailable

My guess is that these two statuses are logged at a debug level due to volume if the backend is unavailable. We would like to log these failures at a higher level than debug, but also recognize that the volume would be too great to log if a backend is down.

The change we'd like to make:

Move http.StatusBadGateway and http.StatusServiceUnavailable to be logged at a Warn level with the other errors
Use a configurable rate limited logger to log errors instead of logging 100% of all errors at Warn

Thoughts?

If this (or something similar) is acceptable I'd be glad to PR it.

@bboreham

The text was updated successfully, but these errors were encountered:

bboreham · 2020-08-11T13:49:09Z

Background is here: cortexproject/cortex#810, http://github.com/weaveworks/common/pull/84

I would be ok with sampling the messages (we have -event.sample-rate in cortex already); I guess a rate limit is also fine

You could also sample after the line hits the logfile?

joe-elliott · 2020-08-11T20:41:44Z

Thanks for the background. As suspected those two error codes were just overwhelming the logs and so they got removed. It sounds like you're ok with the general idea so I will submit a PR and we can discuss details there.

You could also sample after the line hits the logfile?
Unsure what you mean by this. Like log everything but only push a certain subset of the logfile to the backend?

joe-elliott mentioned this issue Aug 14, 2020

Rate limited errors #195

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP Middleware Error Logging: Vision on StatusBadGateway/ServiceUnavailable #192

HTTP Middleware Error Logging: Vision on StatusBadGateway/ServiceUnavailable #192

joe-elliott commented Jul 13, 2020 •

edited

Loading

bboreham commented Aug 11, 2020

joe-elliott commented Aug 11, 2020

HTTP Middleware Error Logging: Vision on StatusBadGateway/ServiceUnavailable #192

HTTP Middleware Error Logging: Vision on StatusBadGateway/ServiceUnavailable #192

Comments

joe-elliott commented Jul 13, 2020 • edited Loading

bboreham commented Aug 11, 2020

joe-elliott commented Aug 11, 2020

joe-elliott commented Jul 13, 2020 •

edited

Loading