Improve Joins #360

eruizalo · 2023-05-30T15:43:27Z

eruizalo
May 30, 2023
Maintainer

We have been lately thinking about joins in Doric & its equivalences in Spark. In Doric we have changed the signature of join methods so join type would be more explicit:

Spark: def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): DataFrame
Doric: def join(df2: Dataset[_], joinType: String, col: DoricColumn[_], cols: DoricColumn[_]*): DataFrame

Join type

The first improvement I would like to discuss is the use of JoinType object. This way the developer could get an error in runtime (just as spark do), but getting all the errors together (joinType error + doric column errors).

This new method would live among the others:

def join(df2: Dataset[_], joinType: String, col: DoricColumn[_], cols: DoricColumn[_]*): DataFrame
- Example: df.join(df2, "inner", colInt("myCol"))
def join(df2: Dataset[_], joinType: JoinType, col: DoricColumn[_], cols: DoricColumn[_]*): DataFrame -->
- Example: df.join(df2, JoinType("inner"), colInt("myCol"))

Join type enforcement

The idea of using JoinType object is to avoid confusing using doric/spark and get early errors, but maybe some other solution could simplify this. Today I thought about a join object which will expose the most used join methods, so it would be easier to see the type of join and could never get a join type error. It would look like this:

df.join.inner(df2, colLong(id))
df.join.leftAnti(df2, colLong(id))

EDIT

Let's vote the different options using examples:

1️⃣ df.join(df2, "inner", colInt("myCol"))
2️⃣ df.join(df2, JoinType("inner"), colInt("myCol"))
3️⃣ df.join.inner(df2, colLong(id))
4️⃣ df.innerJoin(df2, colLong(id))

Ernesto-VL · 2023-05-31T07:13:35Z

Ernesto-VL
May 31, 2023
Maintainer

The df.join(df2, JoinType("inner"), colInt("myCol")) notation is too verbose for me.
I prefer the df.join.inner(df2, colLong(id)) notation. It's intuitive and more like database.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Joins #360

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Improve Joins #360

eruizalo May 30, 2023 Maintainer

Join type

Join type enforcement

EDIT

Replies: 1 comment

Ernesto-VL May 31, 2023 Maintainer

eruizalo
May 30, 2023
Maintainer

Ernesto-VL
May 31, 2023
Maintainer