This repository has been archived by the owner on Feb 14, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
OK, this is super gross, but here's the general idea:
The stack overflow is happening when the code attempts to build the schema, since it gets caught in an infinite loop when going down a recursively defined protobuf. I've defined corresponding "recursive" methods below which compare the FieldDescriptor of the parent message to that of the child message. If the message of the parent and the child have the same type (as in the case of a recursively defined protobuf), then it sets the Spark type of the child message to just be a String.
In the step where the code transforms the protobuf to a dataframe (
messageToRow
), I simply pass in the parent message on each call totoRowData
, and if the parent message has the same type as the child message (again, as in the case of a recursively defined protobuf), it simply returnsnull
.Note that this only works in simple cases of recursively defined protobufs, where the type of the child is the same as the parent. It will still barf in cases where a grandchild message has the same type as the grandparent (e.g., an
Event
which contains aView
, which contains anEvent
). I don't believe we have any cases of that, and we can discourage it from happening, but I don't think we can guarantee it.