-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zap.Any add benchmarks #1311
zap.Any add benchmarks #1311
Conversation
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310. We're writing various approaches to reduce the stock size and it's painful to keep copy-pasting the tests between PRs. This was suggested in @prashantv in #1307. The tests are mostly based on tests in #1303, but made "more generic", as #1307 we might want to test across more than just a single type. It does make the tests a bit harder to setup. Some of the setup is inconvenient (we duplicate the value in both `typed` and `any` version of the tests) but hopefully okay to understand. A fully non-duplicated alternative would likely require something like #1310 itself. For #1307 in particular a test against interface type would likely be needed, so adding it here too. The tests compare two code paths, with the same arguments, one using a strongly typed method and second using `zap.Any`. We have: - a simple "create field" case for a baseline - a "create and log" case for a realistic case (we typically log the fields) - a "create and log in a goroutine" case for the pathological case we're trying to solve for. - a "create and long in goroutine in a pre-warmed system" that does the above, but before tries to affect the starting goroutine stack size to provide an realistic example. Without this, for any tests with 2+ goroutines, the cost of `zap.Any` is not visible, as we always end up expanding the stack even in the strongly typed methods. The test results are: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/string-typ-no-logger 166879518 6.988 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-no-logger-12 167398297 6.973 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger 87669631 13.97 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger-12 86760837 14.11 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-logger 3059485 395.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-12 3141176 379.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger 2995699 401.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger-12 3071046 391.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-go 784323 1351 ns/op 146 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-12 2000835 613.9 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go 477486 2479 ns/op 117 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-12 1830955 680.0 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack 841566 1328 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack-12 2625226 479.6 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack 486084 2493 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack-12 2658640 667.9 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-no-logger 147314238 8.034 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-no-logger-12 157857937 7.436 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger 58872349 20.19 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger-12 60532305 20.27 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-logger 3094204 411.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-12 3163489 383.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger 2981940 427.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger-12 2777792 394.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-go 911761 1335 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-12 2006440 605.2 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go 467934 2518 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-12 1786076 683.1 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack 855794 1316 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack-12 2598783 434.5 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack 473282 2474 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack-12 2020183 651.9 ns/op 112 B/op 2 allocs/op PASS ok go.uber.org/zap 53.516s ```
Codecov Report
@@ Coverage Diff @@
## master #1311 +/- ##
==========================================
- Coverage 98.08% 97.93% -0.16%
==========================================
Files 50 50
Lines 3242 3242
==========================================
- Hits 3180 3175 -5
- Misses 53 57 +4
- Partials 9 10 +1 see 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
logger_bench_test.go
Outdated
wg := sync.WaitGroup{} | ||
wg.Add(b.N) | ||
b.ResetTimer() | ||
for i := 0; i < b.N; i++ { | ||
go func() { | ||
logger.Info("", tt.typed()) | ||
wg.Done() | ||
}() | ||
} | ||
wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, is this right? It'll spawn b.N
goroutines and count that as part of a run.
I think maybe you want a fixed number of goroutines spawned inside each b.N loop, and wait for them to finish also in the loop?
const Count = 1000
for i := 0; i < b.N; i++ {
wg.Add(Concurrency)
for j := 0; j < Count; j++ {
go func() { ... }()
}
wg.Wait()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the actual issue, as (badly) reported by a user was: "if I start a new goroutine and zap.Any
in it, it's slow" - this is where this came from, with an exact translation.
The goal was also to have sequential progress:
- how long does it take to create a
Field
- how long does it take to log a
Field
- how long does it take to log a
Field
in a new goroutine - including the setup and teardown cost.
Spawning o 1000 goroutines would at least break the sequentially above, I think?
It seemed that as long as we're comparing the same cost for typed version and Any
, the benchmarks are representative.
logger_bench_test.go
Outdated
for i := 0; i < b.N; i++ { | ||
logger.Info("", tt.typed()) | ||
} | ||
}) | ||
b.Run(tt.name+"-any-logger", func(b *testing.B) { | ||
for i := 0; i < b.N; i++ { | ||
logger.Info("", Any(key, tt.anyArg)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these two cases, we might want to use withBenchedLogger.
That's used in existing logging benchmarks:
Although it'll use b.RunParallel
-- I don't know if that's okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't know if it matters.
Looking at https://pkg.go.dev/testing#B.RunParallel, it means we'll start a GOMAXPROCS goroutines, and then distribute the actual tests (which will start their own goroutines) there.
Does it matter? Does it not matter? I can test/try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, just meant these two cases, rather than all loggers.
BenchmarkAny/string-typ-logger 2959306 406.9 ns/op 64 B/op 1 allocs/op
BenchmarkAny/string-typ-logger-12 15433677 80.36 ns/op 64 B/op 1 allocs/op
BenchmarkAny/string-any-logger 2845083 413.0 ns/op 64 B/op 1 allocs/op
BenchmarkAny/string-any-logger-12 15772636 79.69 ns/op 64 B/op 1 allocs/op
the difference for 12- gomacprocs is huge.
I guess it makes sense to include it like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to all tests.
logger_bench_test.go
Outdated
// The stack growing below simulates production setup where some other | ||
// goroutines exist and affect the starting goroutine stack size up. | ||
// Otherwise, for tests with 2+ goroutines, the cost of starting the goroutine | ||
// dominates and the cost of `any` stack overallocation is not visible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How reliable is this -- does it change the stack size immediately, does it have knock on effects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm hesitating about the value of this.
In my earlier experiments (#1303) the value of this was clear - the cost of "typed" version was half of the any
if the average stack was higher. Especially on high GOMAXPROCS tests.
As the benchmarking setup grew, this became less pronounced:
BenchmarkAny/string/log-go/typed-12 2151031 550.7 ns/op 99 B/op 2 allocs/op
BenchmarkAny/string/log-go/any-12 1931274 650.3 ns/op 115 B/op 2 allocs/op
BenchmarkAny/string/log-go-stack/typed-12 2042646 520.8 ns/op 96 B/op 2 allocs/op
BenchmarkAny/string/log-go-stack/any-12 2068048 622.4 ns/op 112 B/op 2 allocs/op
to an extend that I'm wondering about skipping this altogether.
I'll tinker with this a bit more, but maybe we should remove it. and rely on GOMACPROCS=2 as indication of our changes working instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it seems to work.
BenchmarkAny/string/log-go-stack/typed 1826714 659.1 ns/op 112 B/op 3 allocs/op
BenchmarkAny/string/log-go-stack/typed-12 8111128 143.7 ns/op 112 B/op 3 allocs/op
BenchmarkAny/string/log-go-stack/any 1606474 687.5 ns/op 128 B/op 3 allocs/op
BenchmarkAny/string/log-go-stack/any-12 3320598 373.5 ns/op 128 B/op 3 allocs/op
any-12
is twice slower than typed-12
, while any
is relatively as slow typed
(single maxprocs test)
logger_bench_test.go
Outdated
withBenchedLogger(b, func(log *Logger) { | ||
go func() { | ||
log.Info("", Any(key, tt.anyArg)) | ||
wg.Done() | ||
}() | ||
}) | ||
wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this works? (Related to the prior comment about b.N
goroutines.)
Each iteration (whether with RunParallel or with a plain for loop), will last only as long as spawning the goroutine takes. I'm not sure this will do what we want?
I know we're trying to measure the cost inside a new goroutine, but this doesn't seem right?
Maybe we need to spawn and wait for the goroutine in each iteration of the loop?
var wg sync.WaitGroup
wg.Add(1)
go func() {
...
wg.Done()
}()
wg.Wait()
This will measure the cost of the whole operation (including spawning the goroutine) per iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.
It was testing something reasonable, because the changes we were making to the stack were clearly having an effect. I agree that this is better though, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for all the fixes.
Please get @prashantv's stamp on the stack growing end of things.
logger_bench_test.go
Outdated
func dummy(wg *sync.WaitGroup, s string, i int) string { | ||
if i == 0 { | ||
wg.Wait() | ||
return "1" + s | ||
} | ||
return dummy(wg, s, i-1) | ||
} | ||
|
||
// increaseAvgStack starts a background goroutine with a variable | ||
// stack size. The goal is to move the average stack size higher, | ||
// since https://go-review.googlesource.com/c/go/+/345889 this affects | ||
// goroutine starting stack size. | ||
func increaseAvgStack(n int) *sync.WaitGroup { | ||
wg := sync.WaitGroup{} | ||
wg.Add(1) | ||
|
||
go dummy(&wg, "hi", n) | ||
return &wg | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use a simple var data[8000]byte
to grow the stack size.
func growAvgStack() {
var wg sync.WaitGroup
block := make(chan struct{})
for i := 0; i < 1000; i++ {
wg.Add(1)
go func() {
defer wg.Done()
// Effectively the minimum size for the starting goroutine size.
var data [8000]byte
<-block
runtime.KeepAlive(data)
}()
}
close(block)
wg.Wait()
}
Verified that it works by reading + printing the starting stack size of metrics.Sample{Name: "/gc/stack/starting-size:bytes"}
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310. We're writing various approaches to reduce the stack size and it's painful to keep copy-pasting the tests between PRs. This was suggested in @prashantv in #1307. The tests are mostly based on tests in #1303, but made "more generic", as #1307 we might want to test across more than just a single type. It does make the tests a bit harder to setup. Some of the setup is inconvenient (we duplicate the value in both `typed` and `any` version of the tests) but hopefully okay to understand. A fully non-duplicated alternative would likely require something like #1310 itself. For #1307 in particular a test against interface type would likely be needed, so adding it here too. The tests compare two code paths, with the same arguments, one using a strongly typed method and second using `zap.Any`. We have: - a simple "create field" case for a baseline - a "create and log" case for a realistic case (we typically log the fields) - a "create and log in a goroutine" case for the pathological case we're trying to solve for. - a "create and long in goroutine in a pre-warmed system" that does the above, but before tries to affect the starting goroutine stack size to provide an realistic example. Without this, for any tests with 2+ goroutines, the cost of `zap.Any` is not visible, as we always end up expanding the stack even in the strongly typed methods. The test results are: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/string-typ-no-logger 166879518 6.988 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-no-logger-12 167398297 6.973 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger 87669631 13.97 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger-12 86760837 14.11 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-logger 3059485 395.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-12 3141176 379.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger 2995699 401.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger-12 3071046 391.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-go 784323 1351 ns/op 146 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-12 2000835 613.9 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go 477486 2479 ns/op 117 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-12 1830955 680.0 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack 841566 1328 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack-12 2625226 479.6 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack 486084 2493 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack-12 2658640 667.9 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-no-logger 147314238 8.034 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-no-logger-12 157857937 7.436 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger 58872349 20.19 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger-12 60532305 20.27 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-logger 3094204 411.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-12 3163489 383.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger 2981940 427.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger-12 2777792 394.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-go 911761 1335 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-12 2006440 605.2 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go 467934 2518 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-12 1786076 683.1 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack 855794 1316 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack-12 2598783 434.5 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack 473282 2474 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack-12 2020183 651.9 ns/op 112 B/op 2 allocs/op PASS ok go.uber.org/zap 53.516s ```
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310.
We're writing various approaches to reduce the stock size and it's painful to keep copy-pasting the tests between PRs. This was suggested in @prashantv in #1307.
The tests are mostly based on tests in #1303, but made "more generic", as #1307 we might want to test across more than just a single type. It does make the tests a bit harder to setup. Some of the setup is inconvenient (we duplicate the value in both
typed
andany
version of the tests) but hopefully okay to understand. A fully non-duplicated alternative would likely require something like #1310 itself.For #1307 in particular a test against interface type would likely be needed, so adding it here too.
The tests compare two code paths, with the same arguments, one using a strongly typed method and second using
zap.Any
. We have:The test results are:
On gotip:
on amd64 (similar, 2x worse stack growth impact)