-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes nulls in framecols and improves column creation situation #925
Conversation
…KType.toColumnKind()
…ing KDocs for that suite of functions to explain what they're for
115033a
to
48fc9ff
Compare
…from before the PR. Added allColsMakesColGroup argument for createColumnGuessingType() and guessValueType() so the old behavior of createColumn() is now controlled in the same place as all other conversions
48fc9ff
to
32bee07
Compare
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/chunked.kt
Outdated
Show resolved
Hide resolved
…or CSV and other parsing usages
f80fd18
to
6c68f18
Compare
6c68f18
to
88ee08d
Compare
# Conflicts: # core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/constructors.kt # plugins/kotlin-dataframe/src/org/jetbrains/kotlinx/dataframe/plugin/impl/DataFrameAdapter.kt
Generated sources will be updated after merging this PR. |
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/DataColumn.kt
Outdated
Show resolved
Hide resolved
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/DataColumn.kt
Outdated
Show resolved
Hide resolved
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/DataColumn.kt
Outdated
Show resolved
Hide resolved
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/DataColumn.kt
Outdated
Show resolved
Hide resolved
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/constructors.kt
Outdated
Show resolved
Hide resolved
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/api/parse.kt
Outdated
Show resolved
Hide resolved
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/columns/constructors.kt
Outdated
Show resolved
Hide resolved
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/io/readJson.kt
Show resolved
Hide resolved
…Suggestion, working on feedback
…e check from guessing type. May be unexpected behavior. Fixed tests
e007cc3
to
222086a
Compare
sequenceOf(columnOf("a"), columnOf(1)), | ||
allColsMakesRow = true, | ||
) shouldBe typeOf<DataRow<*>>() | ||
guessValueType( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split two guessValueType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then it would need to be split in 4 because we also have listifyValues: Boolean
. I think in this case it's good to have all logic of values -> column type
of the entire library handled by one function guessValueType()
. This makes it easier to debug and reason about.
core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/io/ParserTests.kt
Outdated
Show resolved
Hide resolved
core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/columns/DataColumns.kt
Outdated
Show resolved
Hide resolved
// Checks for nulls in the `values` list. | ||
// This only runs with `kotlin.dataframe.debug=true` in gradle.properties. | ||
if (BuildConfig.DEBUG) { | ||
require(!values.anyNull()) { "FrameColumn cannot null values." } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if we apply this check not only during DEBUG mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then each time we create a frame column, we'll have to check all values for nulls, which at worst is O(n)
. For efficiency it may be best to at least have a zero-checks path available for library calls.
* Returns the value type of the given [values] sequence. | ||
* Returns the guessed value type of the given [values] sequence. | ||
* | ||
* This function analyzes all [values] once and returns the expected column type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we cover in tests the situation when one column in file or in List has different types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have a lot of tests of this specific function: https://github.com/Kotlin/dataframe/blob/nulls-in-framecols/core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/types/UtilTests.kt.
These cases are probably covered by other column creation tests, like in https://github.com/Kotlin/dataframe/blob/nulls-in-framecols/core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/constructors.kt
* | ||
* This is generally safe, as [T] can be inferred, and more efficient than [createWithTypeInference]. | ||
* | ||
* Be careful when casting occurs; Values in [values] are NOT checked to adhere to the given/inferred type [T], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unsafe because of type? For me createUnsafe creates weird reference to something like sun.misc.Unsafe
. Could we rename it somehow to the createWithoutTypeChecking
or createAndFailFast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createAndFailFast
haha, another title could be createRaw
, but createWithoutTypeChecking
might also be clear. It's unsafe in the sense that it solely uses the given type T
to decide which type of column to instantiate. It doesn't check any of the values, which makes it fast, but, well, unsafe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, on the other hand, then we can have DataColumn.createWithoutTypeInference(..., infer = Infer.Type)
. That's gonna be confusing haha. We need a title that says it will choose createValueColumn, createColumnGroup, or createFrameColumn based on the type T
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I think I'm gonna go with:
- createByInference
- createByType
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/DataColumn.kt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've suggested just rename or answer some my questions before merging
@zaleslaw Thanks! I'll have a look. I'll also check the binary compatibility now that plugin is merged. I suspect I broke some stuff. |
Alright, thanks for the reviews :) they were most helpful! I think I covered all now, so I'll merge so we won't get further out of sync with other branches. Let me know if I missed something major, then I'll fix it :) |
Fixes #593
This PR has turned into a collection of a couple smaller fixes:
null
leaks in the values like before.DataColumn.Companion
cleaningcreate()
was renamed tocreateUnsafe()
for the same reason. Together with.toColumnKind()
, giving the typeAnyFrame?
will now instantiate a value column instead of a frame column.createFrameColumn()
withstartIndices
has been deprecated. This is not the "normal" way to instantiate aFrameColumn
and thus belongs somewhere else, in this case, inchunked.kt
createColumn()
andguessColumnType()
have been merged intocreateColumnGuessingType()
createColumn
were already covered byguessColumnType()
aside fromIterable<DataColumn>
->ColumnGroup
allColsMakesColGroup
argument was added tocreateColumnGuessingType()
to keep this behavior optionally. It is used bycolumnOf
for instance, but doesn't make sense everywhere.guessValueType()
was modified to handle this too.dataFrameOf
constructors had some fixes to align behavior with other column creation functions, added testsTODO: check whether other uses of
DataColumn.create(Unsafe)
need to be swapped to createWithTypeInference/createColumnGuessingType