Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Listener.handleNewSignedBlock doesn't correctly record any errors to trace span #3814

Open
odeke-em opened this issue Oct 5, 2024 · 1 comment · May be fixed by #3817
Open

core: Listener.handleNewSignedBlock doesn't correctly record any errors to trace span #3814

odeke-em opened this issue Oct 5, 2024 · 1 comment · May be fixed by #3817
Labels
external Issues created by non node team members needs:triage

Comments

@odeke-em
Copy link
Contributor

odeke-em commented Oct 5, 2024

If we look at the code in this function

func (cl *Listener) handleNewSignedBlock(ctx context.Context, b types.EventDataSignedBlock) error {
ctx, span := tracer.Start(ctx, "handle-new-signed-block")
defer span.End()
span.SetAttributes(
attribute.Int64("height", b.Header.Height),
)
eds, err := extendBlock(b.Data, b.Header.Version.App)
if err != nil {
return fmt.Errorf("extending block data: %w", err)
}
// generate extended header
eh, err := cl.construct(&b.Header, &b.Commit, &b.ValidatorSet, eds)
if err != nil {
panic(fmt.Errorf("making extended header: %w", err))
}
err = storeEDS(ctx, eh, eds, cl.store, cl.availabilityWindow)
if err != nil {
return fmt.Errorf("storing EDS: %w", err)
}
syncing, err := cl.fetcher.IsSyncing(ctx)
if err != nil {
return fmt.Errorf("getting sync state: %w", err)
}
// notify network of new EDS hash only if core is already synced
if !syncing {
err = cl.hashBroadcaster(ctx, shrexsub.Notification{
DataHash: eh.DataHash.Bytes(),
Height: eh.Height(),
})
if err != nil && !errors.Is(err, context.Canceled) {
log.Errorw("listener: broadcasting data hash",
"height", b.Header.Height,
"hash", b.Header.Hash(), "err", err) // TODO: hash or datahash?
}
}
// broadcast new ExtendedHeader, but if core is still syncing, notify only local subscribers
err = cl.headerBroadcaster.Broadcast(ctx, eh, pubsub.WithLocalPublication(syncing))
if err != nil && !errors.Is(err, context.Canceled) {
log.Errorw("listener: broadcasting next header",
"height", b.Header.Height,
"err", err)
}
return nil

in no place is the returned error intercepted and recorded on the span.

Remedy

Go supports named returns as a first class feature and that can be used to fix this problem simply by

diff --git a/core/listener.go b/core/listener.go
index 67e532d9..2f44a467 100644
--- a/core/listener.go
+++ b/core/listener.go
@@ -220,12 +220,36 @@ func (cl *Listener) listen(ctx context.Context, sub <-chan types.EventDataSigned
 	}
 }
 
-func (cl *Listener) handleNewSignedBlock(ctx context.Context, b types.EventDataSignedBlock) error {
+func (cl *Listener) handleNewSignedBlock(ctx context.Context, b types.EventDataSignedBlock) (err error) {
 	ctx, span := tracer.Start(ctx, "handle-new-signed-block")
 	defer span.End()
 	span.SetAttributes(
 		attribute.Int64("height", b.Header.Height),
 	)
+	defer func() {
+		var rerr error = err
+		r := recover()
+		if r != nil {
+			rerr, ok := r.(error)
+			if !ok {
+				rerr = fmt.Errorf("%v", r)
+			}
+		}
+
+		if rerr == nil {
+			return
+		}
+
+		// Otherwise now record the span error.
+		span.SetStatus(codes.Error, rerr.Error())
+
+		if r != nil { // Re-panic
+			panic(r)
+		}
+	}()
 
 	eds, err := extendBlock(b.Data, b.Header.Version.App)
 	if err != nil {

/cc @cristaloleg @Wondertan @liamsi

@github-actions github-actions bot added needs:triage external Issues created by non node team members labels Oct 5, 2024
odeke-em added a commit to orijtech/celestia-node that referenced this issue Oct 6, 2024
…us from returned error or panics

This change ensures that if a panic occurs, that span.RecordError
correctly records it and then if an error is returned that span.Status
correctly captures and records it.

Fixes celestiaorg#3814
@odeke-em
Copy link
Contributor Author

odeke-em commented Oct 6, 2024

I have mailed out PR #3817

odeke-em added a commit to orijtech/celestia-node that referenced this issue Oct 7, 2024
…us from returned error or panics

This change ensures that if a panic occurs, that span.RecordError
correctly records it and then if an error is returned that span.Status
correctly captures and records it.

Fixes celestiaorg#3814
odeke-em added a commit to orijtech/celestia-node that referenced this issue Oct 7, 2024
…us from returned error or panics

This change ensures that if a panic occurs, that span.RecordError
correctly records it and then if an error is returned that span.Status
correctly captures and records it.

Fixes celestiaorg#3814
odeke-em added a commit to orijtech/celestia-node that referenced this issue Oct 7, 2024
…us from returned error or panics

This change ensures that if a panic occurs, that span.RecordError
correctly records it and then if an error is returned that span.Status
correctly captures and records it.

Fixes celestiaorg#3814
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external Issues created by non node team members needs:triage
Projects
None yet
1 participant