You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
only looks at the top level struct not all structs we could simply add a counter while we iterate through with TypeUtil.getProjectedIds
Simply fixing this would be a behavior change that would impact people on upgrading iceberg where their previously nested structs had metrics and then suddenly after upgrade they no longer have them with new writes. we have basically 2 choices.
Just change the behavior and document it with some workaround to kind of preserve existing behavior (set the value to 1K for example)
Leave existing behavior alone and introduce a new config and deprecate the old one.
Ideally I think I'd want to prioritize top level columns getting metrics first and then the first fields in each struct weighted equally for metrics. so if we have multiple nested structs each one gets their first fields with metrics and not just 1 large struct consuming all the metrics.
Willingness to contribute
I can contribute a fix for this bug independently
I would be willing to contribute a fix for this bug with guidance from the Iceberg community
I cannot contribute a fix for this bug at this time
The text was updated successfully, but these errors were encountered:
Apache Iceberg version
1.5.1
Query engine
Spark
Please describe the bug 🐞
write.metadata.metrics.max-inferred-column-defaults
only considers top level columns, not nested columns
iceberg/core/src/main/java/org/apache/iceberg/MetricsConfig.java
Line 136 in 1526c1f
Simply fixing this would be a behavior change that would impact people on upgrading iceberg where their previously nested structs had metrics and then suddenly after upgrade they no longer have them with new writes. we have basically 2 choices.
Ideally I think I'd want to prioritize top level columns getting metrics first and then the first fields in each struct weighted equally for metrics. so if we have multiple nested structs each one gets their first fields with metrics and not just 1 large struct consuming all the metrics.
Willingness to contribute
The text was updated successfully, but these errors were encountered: