-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize zap.Any to use less memory on stack (intermediate props verion) #1308
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov Report
@@ Coverage Diff @@
## master #1308 +/- ##
==========================================
+ Coverage 98.08% 98.11% +0.02%
==========================================
Files 50 50
Lines 3242 3281 +39
==========================================
+ Hits 3180 3219 +39
Misses 53 53
Partials 9 9
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
rabbbit
added a commit
that referenced
this pull request
Jul 30, 2023
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310. We're writing various approaches to reduce the stock size and it's painful to keep copy-pasting the tests between PRs. This was suggested in @prashantv in #1307. The tests are mostly based on tests in #1303, but made "more generic", as #1307 we might want to test across more than just a single type. It does make the tests a bit harder to setup. Some of the setup is inconvenient (we duplicate the value in both `typed` and `any` version of the tests) but hopefully okay to understand. A fully non-duplicated alternative would likely require something like #1310 itself. For #1307 in particular a test against interface type would likely be needed, so adding it here too. The tests compare two code paths, with the same arguments, one using a strongly typed method and second using `zap.Any`. We have: - a simple "create field" case for a baseline - a "create and log" case for a realistic case (we typically log the fields) - a "create and log in a goroutine" case for the pathological case we're trying to solve for. - a "create and long in goroutine in a pre-warmed system" that does the above, but before tries to affect the starting goroutine stack size to provide an realistic example. Without this, for any tests with 2+ goroutines, the cost of `zap.Any` is not visible, as we always end up expanding the stack even in the strongly typed methods. The test results are: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/string-typ-no-logger 166879518 6.988 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-no-logger-12 167398297 6.973 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger 87669631 13.97 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger-12 86760837 14.11 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-logger 3059485 395.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-12 3141176 379.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger 2995699 401.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger-12 3071046 391.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-go 784323 1351 ns/op 146 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-12 2000835 613.9 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go 477486 2479 ns/op 117 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-12 1830955 680.0 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack 841566 1328 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack-12 2625226 479.6 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack 486084 2493 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack-12 2658640 667.9 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-no-logger 147314238 8.034 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-no-logger-12 157857937 7.436 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger 58872349 20.19 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger-12 60532305 20.27 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-logger 3094204 411.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-12 3163489 383.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger 2981940 427.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger-12 2777792 394.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-go 911761 1335 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-12 2006440 605.2 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go 467934 2518 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-12 1786076 683.1 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack 855794 1316 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack-12 2598783 434.5 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack 473282 2474 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack-12 2020183 651.9 ns/op 112 B/op 2 allocs/op PASS ok go.uber.org/zap 53.516s ```
Merged
rabbbit
added a commit
that referenced
this pull request
Aug 1, 2023
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310. We're writing various approaches to reduce the stock size and it's painful to keep copy-pasting the tests between PRs. This was suggested in @prashantv in #1307. The tests are mostly based on tests in #1303, but made "more generic", as #1307 we might want to test across more than just a single type. It does make the tests a bit harder to setup. Some of the setup is inconvenient (we duplicate the value in both `typed` and `any` version of the tests) but hopefully okay to understand. A fully non-duplicated alternative would likely require something like #1310 itself. For #1307 in particular a test against interface type would likely be needed, so adding it here too. The tests compare two code paths, with the same arguments, one using a strongly typed method and second using `zap.Any`. We have: - a simple "create field" case for a baseline - a "create and log" case for a realistic case (we typically log the fields) - a "create and log in a goroutine" case for the pathological case we're trying to solve for. - -a "create and long in goroutine in a pre-warmed system" that does the above- we decided it's not worth the complication. The test results are: ``` ❯ go test -bench BenchmarkAny -benchmem -cpu 1 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/string/field-only/typed 161981473 7.374 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/field-only/any 82343354 14.67 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/log/typed 2965648 416.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log/any 2920292 418.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log-go/typed 1000000 1158 ns/op 112 B/op 3 allocs/op BenchmarkAny/string/log-go/any 553144 2152 ns/op 128 B/op 3 allocs/op BenchmarkAny/stringer/field-only/typed 160509367 7.548 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/field-only/any 51330402 23.45 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/log/typed 3221404 377.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log/any 2726443 393.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log-go/typed 1000000 1129 ns/op 112 B/op 3 allocs/op BenchmarkAny/stringer/log-go/any 558602 2147 ns/op 128 B/op 3 allocs/op PASS ok go.uber.org/zap 19.426s ``` On gotip: ``` ❯ gotip test -bench BenchmarkAny -benchmem -cpu 1 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/string/field-only/typed 155084869 7.603 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/field-only/any 82740788 14.55 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/log/typed 2800495 411.6 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log/any 2896258 411.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log-go/typed 1000000 1155 ns/op 112 B/op 3 allocs/op BenchmarkAny/string/log-go/any 551599 2168 ns/op 128 B/op 3 allocs/op BenchmarkAny/stringer/field-only/typed 159505488 7.578 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/field-only/any 51406354 23.78 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/log/typed 3011210 388.6 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log/any 3010370 395.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log-go/typed 1000000 1161 ns/op 112 B/op 3 allocs/op BenchmarkAny/stringer/log-go/any 553860 2161 ns/op 128 B/op 3 allocs/op PASS ok go.uber.org/zap 19.391s ``` on amd64 (similar, 2x worse stack growth impact) ``` % go test -bench BenchmarkAny -benchmem -cpu 1 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/string/field-only/typed 47534053 25.23 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/field-only/any 36913526 32.57 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/log/typed 1693508 725.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log/any 1576172 765.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log-go/typed 516832 2343 ns/op 112 B/op 3 allocs/op BenchmarkAny/string/log-go/any 243692 4404 ns/op 128 B/op 3 allocs/op BenchmarkAny/stringer/field-only/typed 48735537 24.73 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/field-only/any 26115684 47.24 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/log/typed 1761630 677.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log/any 1646913 705.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log-go/typed 534187 2275 ns/op 112 B/op 3 allocs/op BenchmarkAny/stringer/log-go/any 273787 4348 ns/op 128 B/op 3 allocs/op PASS ok go.uber.org/zap 18.890s ```
…ion) This is an alternative to: - #1301 and #1302 and #1304 - a series of PRs that are faster than this one. However, they rely on unsafe. - #1303 - my own PR that uses closures, to reduce the stack size by 60%. - #1305 - my own PR that inline bunch of loops - https://github.com/uber-go/zap/compare/pawel/any-int5 that does the same as above, but is slightly easier to parse - #1307 - a reflect.TypeOf lookup version THIS PR IS INCOMPLETE - it shows a possible approach, but I wanted to get reviewers thoughts before typing everything in. I originally thought we can use a `type props strucy` intermediary struct to store the data, but that hits the same problem: every `props` would get it's own slot on the stack. This avoids this by returning the raw data. Pros: - the implementation is shared between `Any` and strongly typed Fields - no reflect or unsafe - reduced the stack significantly - we should be able to get to the same ~180 bytes as ~1305. - no peft penalty for strongly typed versions, at least on ARM64 it's compiled away. Cons: - the code gets a bit harder to maintain. It's significantly better than #1305 I would say though.
Closed in favor of #1310. |
abhinav
added a commit
that referenced
this pull request
Aug 1, 2023
Yet another attempt at reducing the stack size of zap.Any, borrowing from #1301, #1303, #1304, #1305, #1307, and #1308. This approach defines a generic data type for field constructors of a specific type. This is similar to the lookup map in #1307, minus the map lookup, the interface match, or reflection. type anyFieldC[T any] func(string, T) Field The generic data type provides a non-generic method matching the interface: interface{ Any(string, any) Field } **Stack size**: The stack size of zap.Any following this change is 0xc0 (192 bytes). % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0 This is just 8 bytes more than #1305, which is the smallest stack size of all other attempts. **Allocations**: Everything appears to get inlined with no heap escapes: % go build -gcflags -m 2>&1 | grep field.go | perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' | grep 'escapes' [no output] (Line 413 declares anyFieldC) Besides that, the output of `-m` for the relevant section of code consists of almost entirely: ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any [...] ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any Followed by: ./field.go:428:10: leaking param: key ./field.go:428:22: leaking param: value **Maintainability**: Unlike some of the other approaches, this variant is more maintainable. The `zap.Any` function looks roughly the same. Adding new branches there is obvious, and requires no duplication. **Performance**: This is a net improvement against master on BenchmarkAny's log-go checks that log inside a new goroutine. ``` name old time/op new time/op delta Any/string/field-only/typed 25.2ns ± 1% 25.6ns ± 2% ~ (p=0.460 n=5+5) Any/string/field-only/any 56.9ns ± 3% 79.4ns ± 0% +39.55% (p=0.008 n=5+5) Any/string/log/typed 1.47µs ± 0% 1.49µs ± 4% +1.58% (p=0.016 n=4+5) Any/string/log/any 1.53µs ± 2% 1.55µs ± 1% +1.37% (p=0.016 n=5+5) Any/string/log-go/typed 5.97µs ± 6% 5.99µs ± 1% ~ (p=0.151 n=5+5) Any/string/log-go/any 10.9µs ± 0% 6.2µs ± 0% -43.32% (p=0.008 n=5+5) Any/stringer/field-only/typed 25.3ns ± 1% 25.5ns ± 1% +1.09% (p=0.008 n=5+5) Any/stringer/field-only/any 85.5ns ± 1% 124.5ns ± 0% +45.66% (p=0.008 n=5+5) Any/stringer/log/typed 1.43µs ± 1% 1.42µs ± 2% ~ (p=0.175 n=4+5) Any/stringer/log/any 1.50µs ± 1% 1.56µs ± 6% +4.20% (p=0.008 n=5+5) Any/stringer/log-go/typed 5.94µs ± 0% 5.92µs ± 0% -0.40% (p=0.032 n=5+5) Any/stringer/log-go/any 11.1µs ± 2% 6.3µs ± 0% -42.93% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Any/string/field-only/typed 0.00B 0.00B ~ (all equal) Any/string/field-only/any 0.00B 0.00B ~ (all equal) Any/string/log/typed 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/string/log/any 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/string/log-go/typed 112B ± 0% 112B ± 0% ~ (all equal) Any/string/log-go/any 128B ± 0% 128B ± 0% ~ (all equal) Any/stringer/field-only/typed 0.00B 0.00B ~ (all equal) Any/stringer/field-only/any 0.00B 0.00B ~ (all equal) Any/stringer/log/typed 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/stringer/log/any 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/stringer/log-go/typed 112B ± 0% 112B ± 0% ~ (all equal) Any/stringer/log-go/any 128B ± 0% 128B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Any/string/field-only/typed 0.00 0.00 ~ (all equal) Any/string/field-only/any 0.00 0.00 ~ (all equal) Any/string/log/typed 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/string/log/any 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/string/log-go/typed 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/string/log-go/any 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/stringer/field-only/typed 0.00 0.00 ~ (all equal) Any/stringer/field-only/any 0.00 0.00 ~ (all equal) Any/stringer/log/typed 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/stringer/log/any 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/stringer/log-go/typed 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/stringer/log-go/any 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` It causes a regression in "field-only" which calls the field constructor and discards the result without using it in a logger. I believe this is acceptable because that's not a real use case; we expect the result to be used with a logger.
rabbbit
added a commit
that referenced
this pull request
Aug 2, 2023
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310. We're writing various approaches to reduce the stack size and it's painful to keep copy-pasting the tests between PRs. This was suggested in @prashantv in #1307. The tests are mostly based on tests in #1303, but made "more generic", as #1307 we might want to test across more than just a single type. It does make the tests a bit harder to setup. Some of the setup is inconvenient (we duplicate the value in both `typed` and `any` version of the tests) but hopefully okay to understand. A fully non-duplicated alternative would likely require something like #1310 itself. For #1307 in particular a test against interface type would likely be needed, so adding it here too. The tests compare two code paths, with the same arguments, one using a strongly typed method and second using `zap.Any`. We have: - a simple "create field" case for a baseline - a "create and log" case for a realistic case (we typically log the fields) - a "create and log in a goroutine" case for the pathological case we're trying to solve for. - a "create and long in goroutine in a pre-warmed system" that does the above, but before tries to affect the starting goroutine stack size to provide an realistic example. Without this, for any tests with 2+ goroutines, the cost of `zap.Any` is not visible, as we always end up expanding the stack even in the strongly typed methods. The test results are: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/string-typ-no-logger 166879518 6.988 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-no-logger-12 167398297 6.973 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger 87669631 13.97 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger-12 86760837 14.11 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-logger 3059485 395.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-12 3141176 379.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger 2995699 401.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger-12 3071046 391.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-go 784323 1351 ns/op 146 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-12 2000835 613.9 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go 477486 2479 ns/op 117 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-12 1830955 680.0 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack 841566 1328 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack-12 2625226 479.6 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack 486084 2493 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack-12 2658640 667.9 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-no-logger 147314238 8.034 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-no-logger-12 157857937 7.436 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger 58872349 20.19 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger-12 60532305 20.27 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-logger 3094204 411.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-12 3163489 383.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger 2981940 427.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger-12 2777792 394.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-go 911761 1335 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-12 2006440 605.2 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go 467934 2518 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-12 1786076 683.1 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack 855794 1316 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack-12 2598783 434.5 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack 473282 2474 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack-12 2020183 651.9 ns/op 112 B/op 2 allocs/op PASS ok go.uber.org/zap 53.516s ```
rabbbit
pushed a commit
that referenced
this pull request
Aug 2, 2023
Yet another attempt at reducing the stack size of zap.Any, borrowing from #1301, #1303, #1304, #1305, #1307, and #1308. This approach defines a generic data type for field constructors of a specific type. This is similar to the lookup map in #1307, minus the map lookup, the interface match, or reflection. type anyFieldC[T any] func(string, T) Field The generic data type provides a non-generic method matching the interface: interface{ Any(string, any) Field } **Stack size**: The stack size of zap.Any following this change is 0xc0 (192 bytes). % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0 This is just 8 bytes more than #1305, which is the smallest stack size of all other attempts. **Allocations**: Everything appears to get inlined with no heap escapes: % go build -gcflags -m 2>&1 | grep field.go | perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' | grep 'escapes' [no output] (Line 413 declares anyFieldC) Besides that, the output of `-m` for the relevant section of code consists of almost entirely: ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any [...] ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any Followed by: ./field.go:428:10: leaking param: key ./field.go:428:22: leaking param: value **Maintainability**: Unlike some of the other approaches, this variant is more maintainable. The `zap.Any` function looks roughly the same. Adding new branches there is obvious, and requires no duplication. **Performance**: This is a net improvement against master on BenchmarkAny's log-go checks that log inside a new goroutine. ``` name old time/op new time/op delta Any/string/field-only/typed 25.2ns ± 1% 25.6ns ± 2% ~ (p=0.460 n=5+5) Any/string/field-only/any 56.9ns ± 3% 79.4ns ± 0% +39.55% (p=0.008 n=5+5) Any/string/log/typed 1.47µs ± 0% 1.49µs ± 4% +1.58% (p=0.016 n=4+5) Any/string/log/any 1.53µs ± 2% 1.55µs ± 1% +1.37% (p=0.016 n=5+5) Any/string/log-go/typed 5.97µs ± 6% 5.99µs ± 1% ~ (p=0.151 n=5+5) Any/string/log-go/any 10.9µs ± 0% 6.2µs ± 0% -43.32% (p=0.008 n=5+5) Any/stringer/field-only/typed 25.3ns ± 1% 25.5ns ± 1% +1.09% (p=0.008 n=5+5) Any/stringer/field-only/any 85.5ns ± 1% 124.5ns ± 0% +45.66% (p=0.008 n=5+5) Any/stringer/log/typed 1.43µs ± 1% 1.42µs ± 2% ~ (p=0.175 n=4+5) Any/stringer/log/any 1.50µs ± 1% 1.56µs ± 6% +4.20% (p=0.008 n=5+5) Any/stringer/log-go/typed 5.94µs ± 0% 5.92µs ± 0% -0.40% (p=0.032 n=5+5) Any/stringer/log-go/any 11.1µs ± 2% 6.3µs ± 0% -42.93% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Any/string/field-only/typed 0.00B 0.00B ~ (all equal) Any/string/field-only/any 0.00B 0.00B ~ (all equal) Any/string/log/typed 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/string/log/any 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/string/log-go/typed 112B ± 0% 112B ± 0% ~ (all equal) Any/string/log-go/any 128B ± 0% 128B ± 0% ~ (all equal) Any/stringer/field-only/typed 0.00B 0.00B ~ (all equal) Any/stringer/field-only/any 0.00B 0.00B ~ (all equal) Any/stringer/log/typed 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/stringer/log/any 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/stringer/log-go/typed 112B ± 0% 112B ± 0% ~ (all equal) Any/stringer/log-go/any 128B ± 0% 128B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Any/string/field-only/typed 0.00 0.00 ~ (all equal) Any/string/field-only/any 0.00 0.00 ~ (all equal) Any/string/log/typed 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/string/log/any 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/string/log-go/typed 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/string/log-go/any 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/stringer/field-only/typed 0.00 0.00 ~ (all equal) Any/stringer/field-only/any 0.00 0.00 ~ (all equal) Any/stringer/log/typed 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/stringer/log/any 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/stringer/log-go/typed 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/stringer/log-go/any 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` It causes a regression in "field-only" which calls the field constructor and discards the result without using it in a logger. I believe this is acceptable because that's not a real use case; we expect the result to be used with a logger.
sywhang
pushed a commit
that referenced
this pull request
Aug 2, 2023
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310. We're writing various approaches to reduce the stock size and it's painful to keep copy-pasting the tests between PRs. This was suggested in @prashantv in #1307. The tests are mostly based on tests in #1303, but made "more generic", as #1307 we might want to test across more than just a single type. It does make the tests a bit harder to setup. Some of the setup is inconvenient (we duplicate the value in both `typed` and `any` version of the tests) but hopefully okay to understand. A fully non-duplicated alternative would likely require something like #1310 itself. For #1307 in particular a test against interface type would likely be needed, so adding it here too. The tests compare two code paths, with the same arguments, one using a strongly typed method and second using `zap.Any`. We have: - a simple "create field" case for a baseline - a "create and log" case for a realistic case (we typically log the fields) - a "create and log in a goroutine" case for the pathological case we're trying to solve for. - a "create and long in goroutine in a pre-warmed system" that does the above, but before tries to affect the starting goroutine stack size to provide an realistic example. Without this, for any tests with 2+ goroutines, the cost of `zap.Any` is not visible, as we always end up expanding the stack even in the strongly typed methods. The test results are: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/string-typ-no-logger 166879518 6.988 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-no-logger-12 167398297 6.973 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger 87669631 13.97 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger-12 86760837 14.11 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-logger 3059485 395.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-12 3141176 379.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger 2995699 401.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger-12 3071046 391.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-go 784323 1351 ns/op 146 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-12 2000835 613.9 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go 477486 2479 ns/op 117 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-12 1830955 680.0 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack 841566 1328 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack-12 2625226 479.6 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack 486084 2493 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack-12 2658640 667.9 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-no-logger 147314238 8.034 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-no-logger-12 157857937 7.436 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger 58872349 20.19 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger-12 60532305 20.27 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-logger 3094204 411.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-12 3163489 383.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger 2981940 427.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger-12 2777792 394.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-go 911761 1335 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-12 2006440 605.2 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go 467934 2518 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-12 1786076 683.1 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack 855794 1316 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack-12 2598783 434.5 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack 473282 2474 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack-12 2020183 651.9 ns/op 112 B/op 2 allocs/op PASS ok go.uber.org/zap 53.516s ``` reviews rename review farewell, my friend dummy
sywhang
pushed a commit
that referenced
this pull request
Aug 2, 2023
Yet another attempt at reducing the stack size of zap.Any, borrowing from #1301, #1303, #1304, #1305, #1307, and #1308. This approach defines a generic data type for field constructors of a specific type. This is similar to the lookup map in #1307, minus the map lookup, the interface match, or reflection. type anyFieldC[T any] func(string, T) Field The generic data type provides a non-generic method matching the interface: interface{ Any(string, any) Field } **Stack size**: The stack size of zap.Any following this change is 0xc0 (192 bytes). % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0 This is just 8 bytes more than #1305, which is the smallest stack size of all other attempts. **Allocations**: Everything appears to get inlined with no heap escapes: % go build -gcflags -m 2>&1 | grep field.go | perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' | grep 'escapes' [no output] (Line 413 declares anyFieldC) Besides that, the output of `-m` for the relevant section of code consists of almost entirely: ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any [...] ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any Followed by: ./field.go:428:10: leaking param: key ./field.go:428:22: leaking param: value **Maintainability**: Unlike some of the other approaches, this variant is more maintainable. The `zap.Any` function looks roughly the same. Adding new branches there is obvious, and requires no duplication. **Performance**: This is a net improvement against master on BenchmarkAny's log-go checks that log inside a new goroutine. ``` name old time/op new time/op delta Any/string/field-only/typed 25.2ns ± 1% 25.6ns ± 2% ~ (p=0.460 n=5+5) Any/string/field-only/any 56.9ns ± 3% 79.4ns ± 0% +39.55% (p=0.008 n=5+5) Any/string/log/typed 1.47µs ± 0% 1.49µs ± 4% +1.58% (p=0.016 n=4+5) Any/string/log/any 1.53µs ± 2% 1.55µs ± 1% +1.37% (p=0.016 n=5+5) Any/string/log-go/typed 5.97µs ± 6% 5.99µs ± 1% ~ (p=0.151 n=5+5) Any/string/log-go/any 10.9µs ± 0% 6.2µs ± 0% -43.32% (p=0.008 n=5+5) Any/stringer/field-only/typed 25.3ns ± 1% 25.5ns ± 1% +1.09% (p=0.008 n=5+5) Any/stringer/field-only/any 85.5ns ± 1% 124.5ns ± 0% +45.66% (p=0.008 n=5+5) Any/stringer/log/typed 1.43µs ± 1% 1.42µs ± 2% ~ (p=0.175 n=4+5) Any/stringer/log/any 1.50µs ± 1% 1.56µs ± 6% +4.20% (p=0.008 n=5+5) Any/stringer/log-go/typed 5.94µs ± 0% 5.92µs ± 0% -0.40% (p=0.032 n=5+5) Any/stringer/log-go/any 11.1µs ± 2% 6.3µs ± 0% -42.93% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Any/string/field-only/typed 0.00B 0.00B ~ (all equal) Any/string/field-only/any 0.00B 0.00B ~ (all equal) Any/string/log/typed 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/string/log/any 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/string/log-go/typed 112B ± 0% 112B ± 0% ~ (all equal) Any/string/log-go/any 128B ± 0% 128B ± 0% ~ (all equal) Any/stringer/field-only/typed 0.00B 0.00B ~ (all equal) Any/stringer/field-only/any 0.00B 0.00B ~ (all equal) Any/stringer/log/typed 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/stringer/log/any 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/stringer/log-go/typed 112B ± 0% 112B ± 0% ~ (all equal) Any/stringer/log-go/any 128B ± 0% 128B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Any/string/field-only/typed 0.00 0.00 ~ (all equal) Any/string/field-only/any 0.00 0.00 ~ (all equal) Any/string/log/typed 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/string/log/any 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/string/log-go/typed 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/string/log-go/any 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/stringer/field-only/typed 0.00 0.00 ~ (all equal) Any/stringer/field-only/any 0.00 0.00 ~ (all equal) Any/stringer/log/typed 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/stringer/log/any 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/stringer/log-go/typed 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/stringer/log-go/any 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` It causes a regression in "field-only" which calls the field constructor and discards the result without using it in a logger. I believe this is acceptable because that's not a real use case; we expect the result to be used with a logger.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an alternative to:
THIS PR IS INCOMPLETE - it shows a possible approach, but I wanted to get reviewers thoughts before typing everything in.
I originally thought we can use a
type props strucy
intermediary struct to store the data, but that hits the same problem: everyprops
would get it's own slot on the stack. This avoids this by returning the raw data.Pros:
Any
and strongly typed FieldsCons: