Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUIC proxy peering #47587

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from
Draft

QUIC proxy peering #47587

wants to merge 17 commits into from

Conversation

espadolini
Copy link
Contributor

No description provided.

@espadolini espadolini added the no-changelog Indicates that a PR does not require a changelog entry label Oct 15, 2024
lib/proxy/peer/quicserver.go Outdated Show resolved Hide resolved
lib/proxy/peer/quicclient.go Show resolved Hide resolved

// Sent from the server to the client as a response to a DialRequest. The
// message is likewise sent in protobuf binary format, prefixed by its length
// encoded as a little endian uint32.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: conventionally data sent over the wire is big endian. GRPC performs length prefixing using big endian uint32s. unless there is some compelling reason to not to, I'd suggest sticking with that convention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counterpoint: it's 2024.

gRPC over HTTP/2 uses big endian for length prefixes because the HTTP/2 spec uses big endian and that's just how they happened to write the spec; protobuf itself uses little endian for any fixed-length integers, so "convention" should clearly not be a factor in any new protocol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, ignoring convention... little endian byte order is an affront to god and nature and has no place in a civilized codebase. Especially for the case of a home-brewed API that we might be called upon to debug at some point, since visually parsing little endian data is annoying/weird.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have a slight preference for big endian for over-the-wire data, if nothing else because I would expect it to be the case. That said, as long as it's well documented I'm OK with it.

Copy link
Contributor

@codingllama codingllama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick (no pun intended) look at API and docs.

proto/teleport/quicpeering/v1alpha/dial.proto Outdated Show resolved Hide resolved
proto/teleport/quicpeering/v1alpha/dial.proto Outdated Show resolved Hide resolved
proto/teleport/quicpeering/v1alpha/dial.proto Outdated Show resolved Hide resolved
proto/teleport/quicpeering/v1alpha/dial.proto Show resolved Hide resolved
proto/teleport/quicpeering/v1alpha/dial.proto Show resolved Hide resolved
If the status is ok (signifying no error) then the stream will stay open,
carrying the data for the connection between the user and the agent, otherwise
the stream will be closed. For sanity's sake, the size of both messages is
limited and any oversized message is treated as a protocol violation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would it know that a message is oversized, though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We read the size first, if the message is oversized we just close the stream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we capped at uint32, but later I saw we have a limit on top of the uint32, which then made this line make sense. Maybe we should mention that we have an arbitrary limit, so the messages can't use the full uint32 length?

lib/proxy/peer/quic.go Outdated Show resolved Hide resolved
lib/proxy/peer/quic.go Outdated Show resolved Hide resolved
lib/proxy/peer/quicclient.go Show resolved Hide resolved
lib/service/servicecfg/proxy.go Show resolved Hide resolved
Copy link
Contributor

@codingllama codingllama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server review. I think you would benefit from someone who understands QUIC (if we have any), but I did my best.

I'll wait for replies before catching up to the client, so you have time to catch up to it all.

// Note: it won't actually have an effect, since QUIC always uses TLS 1.3,
// and TLS 1.3 ciphersuites can't be configured in crypto/tls, but for
// consistency's sake this should be passed along from the agent
// configuration.
CipherSuites []uint16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this has no effect then what is the point in having it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was the easiest way to prevent "Why aren't you using utils.TLSConfig?" questions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kill the field and move the comment to type itself, or next to the appropriate initialization?

lib/proxy/peer/quicserver.go Outdated Show resolved Hide resolved
lib/proxy/peer/quicserver.go Show resolved Hide resolved
lib/proxy/peer/quicserver.go Outdated Show resolved Hide resolved
Comment on lines +201 to +202
s.wg.Add(1)
go s.handleConn(c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't spread out concurrency control, that makes ownership loose and makes it harder to reason about.

Suggested change
s.wg.Add(1)
go s.handleConn(c)
s.wg.Add(1)
go func() {
s.handleConn(c)
s.wg.Done()
}()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make sense if not for the fact that handleConn still needs to interact with the waitgroup when spawning goroutines for each stream - and this is actually somewhat important, because moving the wg.Done() out of handleConn would mean that the wg.Add(1) in it becomes very suspect, as it wouldn't be "guarded" by an active waitgroup count and thus could potentially race against waitgroup waiters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I politely disagree - both Serve and handleConn could use this same pattern to take care of the goroutines they spawn. A method shouldn't have to know it's running as a goroutine, or mandate this kind of setup to be called.

lib/proxy/peer/quicserver.go Outdated Show resolved Hide resolved
// an empty protobuf message has an empty wire encoding, so by sending a
// size of 0 (i.e. four zero bytes) we are sending an empty DialResponse
// with an empty Status, which signifies a successful dial
if _, err := st.Write(binary.LittleEndian.AppendUint32(nil, 0)); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clever and all but I'd rather we just did a send() call with the corresponding "OK" message.

Copy link
Contributor Author

@espadolini espadolini Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to bring in a fallible operation that allocates a bunch of times and initializes hidden state just to send four NULs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can marshal an empty message once and reuse if you like, but the readability is needlessly suffering here.

_, err := io.Copy(nodeConn, st)
return trace.Wrap(err)
})
_ = eg.Wait()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log the error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give it a shot, I suspect it would be pretty spammy tho.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for debug or trace level.

// available streams during a connection (so we can set it to 0)
st, err := c.AcceptStream(context.Background())
if err != nil {
log.DebugContext(c.Context(), "Got an error accepting a stream.", "error", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment, I'd rather this function returned an error and the caller made the choice to swallow it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only possible error is caused by the connection getting closed - which is also why the log line is at debug level. I'm not convinced that moving the error logging one layer above will do much for the clarity of the code, at least while this is the only exit point for the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the logging up makes this behave like a regular erroring function, which is already a valuable readability improvement IMO.


syntax = "proto3";

package teleport.quicpeering.v1alpha;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

Suggested change
package teleport.quicpeering.v1alpha;
package teleport.quicpeering.v1alpha1;

Bold of you to assume there will be only one alpha... ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1alpha seems to edge out (slightly) v1alpha1 in terms of google search result count - I wouldn't be opposed to v1alpha2 as a followup to v1alpha, either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it really depends on whether we expect more alphas. If yes, then go v1alpha1. Otherwise we can do v1alpha to v1alpha2 like you said.

@espadolini espadolini changed the base branch from master to espadolini/proxypeer-slog October 16, 2024 23:45
Base automatically changed from espadolini/proxypeer-slog to master October 17, 2024 13:14
Copy link
Contributor

@codingllama codingllama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the added documentation/context. Very helpful!

// Sent from a proxy to a peer proxy in a fresh QUIC stream to dial a Teleport
// resource through a QUIC proxy peering connection. The message is sent in
// protobuf binary format, prefixed by its length encoded as a little endian
// 32-bit integer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// 32-bit integer.
// 32-bit unsigned integer.

@@ -4767,13 +4771,13 @@ func (process *TeleportProcess) initProxyEndpoint(conn *Connector) error {

process.RegisterCriticalFunc("proxy.peer.quic", func() error {
if _, err := process.WaitForEvent(process.ExitContext(), ProxyReverseTunnelReady); err != nil {
logger.DebugContext(process.ExitContext(), "Process exiting: failed to start QUIC peer proxy service waiting for reverse tunnel server.")
logger.DebugContext(process.ExitContext(), "process exiting: failed to start QUIC peer proxy service waiting for reverse tunnel server")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the "agreed" logging style is to capitalize the first word but avoid punctuation.

See the examples in RFD 0154 and this section:

The message should be a fragment, and not a full sentence. Terminating the message with punctuation should be avoided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Word of @rosstimothy is that the message shouldn't be capitalized and "the message should just be another key-value pair" (presumably with "message" as the key), although I agree that the RFD could be clearer on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should update the examples and add a section stating the preferred style, I think very few of us are getting this "right" (me included apparently).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog Indicates that a PR does not require a changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants