Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument gRPC server span status codes #1127

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ OpenTelemetry Go Automatic Instrumentation adheres to [Semantic Versioning](http
- Support `google.golang.org/grpc` `1.67.1`. ([#1143](https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/1143))
- Support Go `1.22.8`. ([#1143](https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/1143))
- Support Go `1.23.2`. ([#1143](https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/1143))
- Add gRPC status code attribute for server spans (`rpc.grpc.status_code`). ([#1127](https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/1127))

### Changed

Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ require (
go.opentelemetry.io/otel/trace v1.31.0
golang.org/x/arch v0.11.0
golang.org/x/sys v0.26.0
google.golang.org/grpc v1.67.1
gopkg.in/yaml.v3 v3.0.1
)

Expand Down Expand Up @@ -75,6 +76,5 @@ require (
golang.org/x/text v0.19.0 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20241007155032-5fefd90f89a9 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20241007155032-5fefd90f89a9 // indirect
google.golang.org/grpc v1.67.1 // indirect
google.golang.org/protobuf v1.35.1 // indirect
)
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ struct grpc_request_t
{
BASE_SPAN_PROPERTIES
char method[MAX_SIZE];
u32 status_code;
};

struct
Expand Down Expand Up @@ -59,6 +60,8 @@ volatile const u64 frame_stream_id_pod;
volatile const u64 stream_id_pos;
volatile const u64 stream_ctx_pos;
volatile const bool is_new_frame_pos;
volatile const u64 status_s_pos;
volatile const u64 status_code_pos;

static __always_inline long dummy_extract_span_context_from_headers(void *stream_id, struct span_context *parent_span_context) {
return 0;
Expand Down Expand Up @@ -113,14 +116,14 @@ int uprobe_server_handleStream(struct pt_regs *ctx)
if (!get_go_string_from_user_ptr((void *)(stream_ptr + stream_method_ptr_pos), grpcReq->method, sizeof(grpcReq->method)))
{
bpf_printk("Failed to read gRPC method from stream");
goto done;
bpf_map_delete_elem(&streamid_to_grpc_events, &stream_id);
return 0;
}

// Write event
bpf_map_update_elem(&grpc_events, &key, grpcReq, 0);
start_tracking_span(go_context.data, &grpcReq->sc);
done:
bpf_map_delete_elem(&streamid_to_grpc_events, &stream_id);

return 0;
}

Expand Down Expand Up @@ -167,3 +170,65 @@ int uprobe_http2Server_operateHeader(struct pt_regs *ctx)

return 0;
}

// func (ht *http2Server) WriteStatus(s *Stream, st *status.Status)
// https://github.com/grpc/grpc-go/blob/bcf9171a20e44ed81a6eb152e3ca9e35b2c02c5d/internal/transport/http2_server.go#L1049
SEC("uprobe/http2Server_WriteStatus")
int uprobe_http2Server_WriteStatus(struct pt_regs *ctx) {
struct go_iface go_context = {0};
get_Go_context(ctx, 2, stream_ctx_pos, true, &go_context);
void *key = get_consistent_key(ctx, go_context.data);

// Get parent context if exists
void *stream_ptr = get_argument(ctx, 2);
u32 stream_id = 0;
bpf_probe_read(&stream_id, sizeof(stream_id), (void *)(stream_ptr + stream_id_pos));
struct grpc_request_t *grpcReq = bpf_map_lookup_elem(&streamid_to_grpc_events, &stream_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to look here at the grpc_events map.
If I understand this correctly, this probe should be called between the entry probe of handleStream (where an entry is created for the stream in the grpc_events map) and the return probe of handleStream.
If we can't find an entry it means the entry probe didn't run properly or didn't run at all, hence in that case we won't have a valid start time. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. If handleStream isn't called (say the server hits an error before that) then we might still want to record the status from WriteStatus. Maybe in that case, this probe should check if an event already exists in grpc_events and if not start a new span with the status?

It seems like WriteStatus might be used in places besides handleStream (SendMsg and RecvMsg are 2 instances I found). Do you think these are relevant? If not then I agree with your suggestion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that depends on the RPC life cycle the application is using.
The creation of a new stream can be called on the initialization phase of the instrumented process, that is also a case where we'll miss the entry probe of handleStream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The creation of a new stream can be called on the initialization phase of the instrumented process, that is also a case where we'll miss the entry probe of handleStream

That sounds like a case where we would want to create a new trace in the WriteStatus probe, is that what you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. We won't have the RPC method in that case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WriteStatus does have the same Stream argument that is passed to handleStream, so we could try to get the RPC method from there. I just pushed an update that does that, ptal. I think this would be neat to cover the cases where we miss the original initialization like you mentioned. If not I'm all for just simplifying this down

if (grpcReq == NULL) {
// No parent span context, generate new span context
u32 zero = 0;
grpcReq = bpf_map_lookup_elem(&grpc_storage_map, &zero);
if (grpcReq == NULL) {
bpf_printk("failed to get grpcReq from storage map");
return -1;
}
}

void *grpcReq_event_ptr = bpf_map_lookup_elem(&grpc_events, &key);
// if grpcReq_event is null, then handleStream probe didn't run. Try starting a new span here
if (grpcReq_event_ptr == NULL)
{
grpcReq->start_time = bpf_ktime_get_ns();

start_span_params_t start_span_params = {
.ctx = ctx,
.sc = &grpcReq->sc,
.psc = &grpcReq->psc,
.go_context = &go_context,
// The parent span context is set by operateHeader probe
.get_parent_span_context_fn = dummy_extract_span_context_from_headers,
.get_parent_span_context_arg = NULL,
};
start_span(&start_span_params);

// Set attributes
if (!get_go_string_from_user_ptr((void *)(stream_ptr + stream_method_ptr_pos), grpcReq->method, sizeof(grpcReq->method)))
{
bpf_printk("Failed to read gRPC method from stream");
bpf_map_delete_elem(&streamid_to_grpc_events, &stream_id);
return 0;
damemi marked this conversation as resolved.
Show resolved Hide resolved
}
}

void *status_ptr = get_argument(ctx, 3);
void *s_ptr = 0;
bpf_probe_read_user(&s_ptr, sizeof(s_ptr), (void *)(status_ptr + status_s_pos));
// Get status code from Status.s pointer
bpf_probe_read_user(&grpcReq->status_code, sizeof(grpcReq->status_code), (void *)(s_ptr + status_code_pos));

bpf_map_update_elem(&grpc_events, &key, grpcReq, 0);
bpf_map_delete_elem(&streamid_to_grpc_events, &stream_id);
return 0;
}

UPROBE_RETURN(http2Server_WriteStatus, struct grpc_request_t, grpc_events, events, 2, stream_ctx_pos, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this return probe, since we already have UPROBE_RETURN(server_handleStream...).
The UPROBE_RETURN macro sends the event to user-space so I don't think we need to have it twice.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,13 @@ import (
"log/slog"

"github.com/hashicorp/go-version"
"golang.org/x/sys/unix"
"google.golang.org/grpc/codes"

"go.opentelemetry.io/otel/attribute"
otelcodes "go.opentelemetry.io/otel/codes"
semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
"go.opentelemetry.io/otel/trace"
"golang.org/x/sys/unix"

"go.opentelemetry.io/auto/internal/pkg/inject"
"go.opentelemetry.io/auto/internal/pkg/instrumentation/context"
Expand Down Expand Up @@ -60,6 +63,14 @@ func New(logger *slog.Logger) probe.Probe {
Key: "frame_stream_id_pod",
Val: structfield.NewID("golang.org/x/net", "golang.org/x/net/http2", "FrameHeader", "StreamID"),
},
probe.StructFieldConst{
Key: "status_s_pos",
Val: structfield.NewID("google.golang.org/grpc", "google.golang.org/grpc/internal/status", "Status", "s"),
},
probe.StructFieldConst{
Key: "status_code_pos",
Val: structfield.NewID("google.golang.org/grpc", "google.golang.org/genproto/googleapis/rpc/status", "Status", "Code"),
},
framePosConst{},
},
Uprobes: []probe.Uprobe{
Expand All @@ -72,6 +83,11 @@ func New(logger *slog.Logger) probe.Probe {
Sym: "google.golang.org/grpc/internal/transport.(*http2Server).operateHeaders",
EntryProbe: "uprobe_http2Server_operateHeader",
},
{
Sym: "google.golang.org/grpc/internal/transport.(*http2Server).WriteStatus",
EntryProbe: "uprobe_http2Server_WriteStatus",
ReturnProbe: "uprobe_http2Server_WriteStatus_Returns",
},
},
SpecFn: loadBpf,
ProcessFn: convertEvent,
Expand Down Expand Up @@ -100,7 +116,8 @@ func (c framePosConst) InjectOption(td *process.TargetDetails) (inject.Option, e
// event represents an event in the gRPC server during a gRPC request.
type event struct {
context.BaseSpanProperties
Method [100]byte
Method [100]byte
StatusCode int32
}

func convertEvent(e *event) []*probe.SpanEvent {
Expand All @@ -125,18 +142,29 @@ func convertEvent(e *event) []*probe.SpanEvent {
pscPtr = nil
}

return []*probe.SpanEvent{
{
SpanName: method,
StartTime: utils.BootOffsetToTime(e.StartTime),
EndTime: utils.BootOffsetToTime(e.EndTime),
Attributes: []attribute.KeyValue{
semconv.RPCSystemKey.String("grpc"),
semconv.RPCServiceKey.String(method),
},
ParentSpanContext: pscPtr,
SpanContext: &sc,
TracerSchema: semconv.SchemaURL,
event := &probe.SpanEvent{
SpanName: method,
StartTime: utils.BootOffsetToTime(e.StartTime),
EndTime: utils.BootOffsetToTime(e.EndTime),
Attributes: []attribute.KeyValue{
semconv.RPCSystemKey.String("grpc"),
semconv.RPCServiceKey.String(method),
semconv.RPCGRPCStatusCodeKey.Int(int(e.StatusCode)),
},
ParentSpanContext: pscPtr,
SpanContext: &sc,
TracerSchema: semconv.SchemaURL,
}

// Set server status codes per semconv:
// See https://github.com/open-telemetry/semantic-conventions/blob/02ecf0c71e9fa74d09d81c48e04a132db2b7060b/docs/rpc/grpc.md#grpc-status
if e.StatusCode == int32(codes.Unknown) ||
e.StatusCode == int32(codes.DeadlineExceeded) ||
e.StatusCode == int32(codes.Unimplemented) ||
e.StatusCode == int32(codes.Internal) ||
e.StatusCode == int32(codes.Unavailable) ||
e.StatusCode == int32(codes.DataLoss) {
event.Status = probe.Status{Code: otelcodes.Error}
}
return []*probe.SpanEvent{event}
}
12 changes: 12 additions & 0 deletions internal/test/e2e/grpc/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,10 @@ import (
"time"

"google.golang.org/grpc"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/credentials/insecure"
pb "google.golang.org/grpc/examples/helloworld/helloworld"
"google.golang.org/grpc/status"
)

const port = 1701
Expand All @@ -26,6 +28,9 @@ type server struct {

func (s *server) SayHello(ctx context.Context, in *pb.HelloRequest) (*pb.HelloReply, error) {
log.Printf("Received: %v", in.GetName())
if in.GetName() == "unimplemented" {
return nil, status.Error(codes.Unimplemented, "unimplmented")
}
return &pb.HelloReply{Message: "Hello " + in.GetName()}, nil
}

Expand Down Expand Up @@ -68,6 +73,13 @@ func main() {
}
log.Printf("Greeting: %s", r.GetMessage())

// Contact the server expecting a server error
_, err = c.SayHello(ctx, &pb.HelloRequest{Name: "unimplemented"})
if err == nil {
log.Fatalf("expected an error but none was received")
}
log.Printf("received expected error: %+v", err)

s.GracefulStop()
<-done

Expand Down
Loading
Loading