Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is the dataset size measured? #14

Open
hansbogert opened this issue Jun 9, 2015 · 0 comments
Open

How is the dataset size measured? #14

hansbogert opened this issue Jun 9, 2015 · 0 comments

Comments

@hansbogert
Copy link

When looking at the size of the Rankings table, the website/docs say it is 6.38GB, however it seems it is more in the range of 5.2GB when looking at the on disk size as well as Spark's reported size.
The format I downloaded was the txt-format, which I expected would be close to the reported 6.38GB since there is no compression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant