-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
suggestion: Upload dataset with jsonl format #38
Comments
You can upload in JSONL format already. We'll look into the newline at the end of the file bug though. When you write a json.gz file, you should be able to upload and import that file, does it not work? |
Yeah, I can upload this file. I was suggesting a helper function that would convert from csv file to jsonl in the backend before uploading as this might help reduce parsing mistakes. It would also enable the solvebio package to upload any file via the rio package (https://cran.r-project.org/web/packages/rio/index.html) |
Good idea! That would probably make it a lot friendlier for R users to get data in safely. |
And this is a slightly longer code snippet for the entire process
|
Enabling dataset upload via jsonl might improve performance and reduce parsing mistakes compared to CSV
This is what is implemented in the bigrquery package to upload datasets using the insert_upload_job function. Specifically, the bigrquery:::export_json() creates the appropriate file format
This works:
x<-bigrquery:::export_json(mtcars)
x<-gsub("\n$","",x) # solvebio parser fails if there is an extra line at the end of the file
write_lines(x,path="mtcars.json.gz") # zipping file
Note exporting json from Google's bigquery directly incorrectly formats integers
https://issuetracker.google.com/issues/35906037
The text was updated successfully, but these errors were encountered: