Add predicate pushdown #75

mike-luabase · 2024-10-10T11:42:26Z

No description provided.

mike-luabase · 2024-10-10T11:49:04Z

@samansmink made the updates here. I tried getting rid of the data dir from the commit, but wasn't sure the best way to do that.

mike-luabase · 2024-10-10T13:34:01Z

Enhance query performance by filtering data at the metadata level, reducing the amount of data read during scans.

Key Changes

Extended IcebergManifestEntry:
- Added lower_bounds and upper_bounds maps to store column statistics.
Utility Function:
- Implemented IcebergUtils::GetFullPath to resolve file paths accurately.
Metadata Retrieval:
- Added GetEntries template in IcebergTable to fetch relevant manifest entries, excluding deleted ones.
Predicate Evaluation:
- Created EvaluatePredicateAgainstStatistics to assess if data files satisfy query predicates based on their statistics.
- For each predicate:
  - Identifies the column involved.
  - Checks if the column has defined lower and upper bounds.
  - Based on the comparison type (e.g., =, >, <), determines if the predicate can be satisfied given the file's bounds.
  - If any predicate fails, the file is excluded from the scan.
Scan Expression Modification:
- Updated MakeScanExpression to filter data_file_entries using predicates before scanning.
Binding Function Enhancements:
- Enhanced IcebergScanBindReplace with additional logging and prepared data files based on predicate results.

This implementation optimizes Iceberg table scans by leveraging metadata for early data filtering, significantly improving query efficiency and resource usage.

Mytherin

Thanks for the PR! Some comments from my side. Could you also look at the failing CI?

Mytherin · 2024-10-14T20:13:18Z

data/iceberg/generated_spec1_0_001/expected_results/last/count.csv

@@ -1,2 +0,0 @@
-count


Do we need to delete all of these files?

Mytherin · 2024-10-14T20:13:40Z

src/include/avro_codegen/iceberg_manifest_entry_partial.hpp



 #include <sstream>
-#include "boost/any.hpp"
+#include <any>


This code is generated - should this be modified?

Mytherin · 2024-10-14T20:14:19Z

src/iceberg_functions/iceberg_scan.cpp


- return make_uniq<ComparisonExpression>(ExpressionType::COMPARE_NOT_DISTINCT_FROM, std::move(data_filename_expr),


All these format changes make the code hard to review - can we leave the old format in place?

mike-luabase · 2024-10-16T13:40:29Z

@Mytherin thanks for the review! I cleaned up the issues here: #78

will close this one.

mike-luabase · 2024-10-16T13:40:51Z

closing, see above.

mike-luabase and others added 10 commits September 28, 2024 09:25

adding predicate pushdown for iceberg

e90ef07

adding predicate pushdown for iceberg

6c90a4a

getting predicates now

0586b7c

adding predicate pushdown for iceberg 2

ed7e802

working

55cb98c

adding empty file list handling

139d19a

added empty file list handling

21f6b34

Merge remote-tracking branch 'upstream/main'

814f8fa

Update submodules duckdb and extension-ci-tools

3541140

Update CMakeLists.txt

5e29ff7

carlopi requested a review from samansmink October 14, 2024 13:15

Mytherin reviewed Oct 14, 2024

View reviewed changes

mike-luabase closed this Oct 16, 2024

samansmink mentioned this pull request Oct 18, 2024

Add predicate pushdown (updated) #78

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add predicate pushdown #75

Add predicate pushdown #75

mike-luabase commented Oct 10, 2024

mike-luabase commented Oct 10, 2024

mike-luabase commented Oct 10, 2024 •

edited

Loading

Mytherin left a comment

Mytherin Oct 14, 2024

Mytherin Oct 14, 2024

Mytherin Oct 14, 2024

mike-luabase commented Oct 16, 2024

mike-luabase commented Oct 16, 2024


		return make_uniq<ComparisonExpression>(ExpressionType::COMPARE_NOT_DISTINCT_FROM, std::move(data_filename_expr),

Add predicate pushdown #75

Add predicate pushdown #75

Conversation

mike-luabase commented Oct 10, 2024

mike-luabase commented Oct 10, 2024

mike-luabase commented Oct 10, 2024 • edited Loading

Key Changes

Mytherin left a comment

Choose a reason for hiding this comment

Mytherin Oct 14, 2024

Choose a reason for hiding this comment

Mytherin Oct 14, 2024

Choose a reason for hiding this comment

Mytherin Oct 14, 2024

Choose a reason for hiding this comment

mike-luabase commented Oct 16, 2024

mike-luabase commented Oct 16, 2024

mike-luabase commented Oct 10, 2024 •

edited

Loading