Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial implementation of startsWith #78

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Liorba
Copy link

@Liorba Liorba commented Oct 23, 2018

Fixes #49

Initial implementation only for iceberg-api expressions

Open issues:

  1. should we add implementation for spark, parquet and avro?
  2. should we use case sensitive startsWith?
  3. should we support unicode characters startsWith?

// due to the fact that startWith function in Expressions accepts only string types casting is a must here.
// in Unbound#bind function we added a check for that

return startsWith((BoundReference<String>) pred.ref(), pred.literal().to(pred.ref().type()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this calling to? A BoundReference guarantees that the type of its literal is correct and has already been coerced during the binding process.

@@ -96,6 +96,10 @@ public static Expression not(Expression child) {
return new UnboundPredicate<>(Expression.Operation.NOT_EQ, ref(name), value);
}

public static <T> UnboundPredicate<T> startWith(String name, T value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this accept T and not String? Shouldn't this always be a string, or does this also allow startsWith evaluation on byte arrays?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is it a typo that this is startWith and not startsWith?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I leave this as String it won't compile, I wanted to have it throw ValidationException as the other operations
you can see the test here
if I change it to string it won't compile

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the compile error? I'm surprised it won't compile if you remove the type variable and use String.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a test checking that if I try to run startWith on anything other then string it will throw ValidationError which is the exception I throw in the bind method.
if startWith excepting only strings test would not compile.
I can remove this and change the type to String.
but does it mean that the bind method is useless for startWith?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always use the Expressions.predicate method to create the predicate with a non-string value. I think that it's better to make Expressions.startsWith accept only a String so that the type system helps make expressions correct, not just binding checks.

@rdblue
Copy link
Contributor

rdblue commented Oct 23, 2018

@Liorba, thanks for working on this! This is a good start.

To answer your questions:

  1. Integrating engines to use startsWith can be done in a follow-up.
  2. I think this should be case sensitive.
  3. I'm not sure what you mean.

This also needs to update the project methods in each transform. Otherwise, Iceberg wouldn't be able to convert STARTS_WITH data predicates to partition predicates.

@rdblue
Copy link
Contributor

rdblue commented Oct 29, 2018

@Liorba, thanks for updating, but this still needs to update the project methods in each transform so Iceberg can convert these into partition predicates.

evaluator.eval(TestHelpers.Row.of(("xyzffff"))));
}

@Test(expected = ValidationException.class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a binding test, which should go in TestExpressionBinding.

@rdblue
Copy link
Contributor

rdblue commented Dec 7, 2018

@Liorba, if you want to continue working on this, please re-open it in the apache/incubator-iceberg repository. That's the project's new home. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants