Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: check if reserve is configured #4837

Merged
merged 1 commit into from
Oct 21, 2024
Merged

Conversation

acha-bill
Copy link
Contributor

Checklist

  • I have read the coding guide.
  • My change requires a documentation update, and I have done it.
  • I have added tests to cover my changes.
  • I have filled out the description and linked the related issues.

Description

Check if reserve is configured before attempting to put chunk

Related Issue (Optional)

#4829

pkg/storer/reserve.go Outdated Show resolved Hide resolved
@istae
Copy link
Member

istae commented Oct 8, 2024

this is not a real solution, i think. we need to first understand why the reserve is nil. Chances are this is a light node and it was unable to find a peer to push the chunk to so it tried storing it. In that case, we need to plug that hole, where a light node should not try to store a chunk in the above scenario. but this is one theory. we need to ask the issue creator the config options he was running with before hastily jumping onto a solution.

@acha-bill
Copy link
Contributor Author

I also got this from this failed beekeeper test

[pod/bootnode-0-0/bee] "time"="2024-10-21 06:07:39.750144" "level"="debug" "logger"="node/pusher" "v"=1 "msg"="chunk stays here, i'm the closest node" "chunk_address"="d43711cd5b3436c09ef09cb4e554eba026719af857220febc41f1db9fa311602"
[pod/bootnode-0-0/bee] panic: runtime error: invalid memory address or nil pointer dereference
[pod/bootnode-0-0/bee] [signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0xf33756]
[pod/bootnode-0-0/bee] 
[pod/bootnode-0-0/bee] goroutine 2528 [running]:
[pod/bootnode-0-0/bee] github.com/ethersphere/bee/v2/pkg/storer/internal/reserve.(*Reserve).Put(0x0, {0x1a55ca0, 0xc0043c9470}, {0x1a64ba0, 0xc004684870})
[pod/bootnode-0-0/bee] 	github.com/ethersphere/bee/v2/pkg/storer/internal/reserve/reserve.go:104 +0x76
[pod/bootnode-0-0/bee] github.com/ethersphere/bee/v2/pkg/storer.(*DB).ReservePutter.func1({0x1a55ca0?, 0xc0043c9470?}, {0x1a64ba0, 0xc004684870})
[pod/bootnode-0-0/bee] 	github.com/ethersphere/bee/v2/pkg/storer/reserve.go:300 +0x53
[pod/bootnode-0-0/bee] github.com/ethersphere/bee/v2/pkg/storage.PutterFunc.Put(0x4ed33c?, {0x1a55ca0?, 0xc0043c9470?}, {0x1a64ba0?, 0xc004684870?})
[pod/bootnode-0-0/bee] 	github.com/ethersphere/bee/v2/pkg/storage/chunkstore.go:56 +0x37
[pod/bootnode-0-0/bee] github.com/ethersphere/bee/v2/pkg/storer.putterWithMetrics.Put({{0x1a486c0, 0xc003b07470}, {{0xc0008c74d0}, {0xc0008c7560}, {0x1a641a8, 0xc0008ec340}, {0x1a641a8, 0xc0008ec3c0}, {0x1a5bf18, 0xc0008f8240}, ...}, ...}, ...)
[pod/bootnode-0-0/bee] 	github.com/ethersphere/bee/v2/pkg/storer/metrics.go:181 +0xb4
[pod/bootnode-0-0/bee] github.com/ethersphere/bee/v2/pkg/pusher.(*Service).pushDirect(0xc0008e84e0, {0x1a55ca0, 0xc0043c9470}, {0x1a64008, 0xc00137c730}, 0xc0043c9440)
[pod/bootnode-0-0/bee] 	github.com/ethersphere/bee/v2/pkg/pusher/pusher.go:321 +0x372
[pod/bootnode-0-0/bee] github.com/ethersphere/bee/v2/pkg/pusher.(*Service).chunksWorker.func2(0xc0043c9440)
[pod/bootnode-0-0/bee] 	github.com/ethersphere/bee/v2/pkg/pusher/pusher.go:161 +0x1f4
[pod/bootnode-0-0/bee] created by github.com/ethersphere/bee/v2/pkg/pusher.(*Service).chunksWorker in goroutine 300
[pod/bootnode-0-0/bee] 	github.com/ethersphere/bee/v2/pkg/pusher/pusher.go:230 +0x619

Config

  • full node mode: true
  • bootnode-mode: true

The reverse is not configured because

if o.FullNodeMode && !o.BootnodeMode {

But the pushsync only uses o.fullNodeMode

pushSyncProtocol := pushsync.New(swarmAddress, networkID, nonce, p2ps, localStore, waitNetworkRFunc, kad, o.FullNodeMode, pssService.TryUnwrap, validStamp, logger, acc, pricer, signer, tracer, warmupTime)

if ps.fullNode {
if cac.Valid(ch) {
go ps.unwrap(ch)
}
return nil, topology.ErrWantSelf
}

So initializing pushsync consistently with o.FullNodeMode && !o.BootnodeMode should fix it

@istae
Copy link
Member

istae commented Oct 21, 2024

does that mean that an upload was attempted from a bootnode?

Copy link
Contributor

@martinconic martinconic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good finding. But I think that when I investigated the issue, actually the reserve seamed not to be nil, else it would have crashed even earlier. Please can you confirm that the reserve is not used in earlier places?

@acha-bill
Copy link
Contributor Author

does that mean that an upload was attempted from a bootnode?

yes. See https://github.com/ethersphere/bee/actions/runs/11434065514/job/31807011143

msg="soc: submitting soc chunk xx to node bootnode-0"

Please can you confirm that the reserve is not used in earlier places?

I don't see any other place. The stack trace indicates that the receiver is nil.

(*Reserve).Put(0x0,....)

@acha-bill acha-bill merged commit 253d382 into master Oct 21, 2024
14 checks passed
@acha-bill acha-bill deleted the fix-reserve-put-panic branch October 21, 2024 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants