-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable feather and parquet in S3 #361
Conversation
d43deef
to
93ae346
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! except for todo.
gokart/file_processor.py
Outdated
|
||
def load(self, file): | ||
loaded_df = pd.read_feather(file.name) | ||
loaded_df = pd.read_feather(BytesIO(file.read())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid file.read()
here? I think it will read all content onto memory.
I hope BytesIO
accept file
(reader-like) itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope BytesIO accept file (reader-like) itself
Unfortunately it seems not to do so.
https://docs.python.org/3/library/io.html#binary-i-o
Binary I/O (also called buffered I/O) expects bytes-like objects and produces bytes objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a little bit heavy to enable streaming read because we need to modify codes in luigi. I leave FIXME comment there in 7abbc33. Can we merge this PR with these comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally I added a conditional branching with respect to whether the passed file
is reader-like or not in 65bb5b0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted the way of loading except S3 in 0a6880f.
I tried it and found to need the time. I wrote down the details to #363 and would proceed to merge. May I do so? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Problem
When I use
.feather
or.parquet
with S3, got errorToDO