Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add sparse option in pivot_table / pivot / unstack #14493

Open
yupbank opened this issue Oct 25, 2016 · 10 comments
Open

ENH: Add sparse option in pivot_table / pivot / unstack #14493

yupbank opened this issue Oct 25, 2016 · 10 comments
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type

Comments

@yupbank
Copy link

yupbank commented Oct 25, 2016

A small, complete example of the issue

# Your code here
sparse_df = df.pivot(index='a', columns='b', values='c', aggfunc=lambda x: len(x), sparse=True)

Expected Output

Output of pd.show_versions()

# Paste the output here
@jreback jreback changed the title Add sparse option in pivot_table ENH: Add sparse option in pivot_table / pivot / unstack Oct 26, 2016
@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type Difficulty Advanced labels Oct 26, 2016
@jreback jreback added this to the Next Major Release milestone Oct 26, 2016
@jreback
Copy link
Contributor

jreback commented Oct 26, 2016

this is possible; would have to be integrated into unstacking proper. We do do something like this in get_dummies, but that is not fully generalizable. So this would be an effort. A community pull-request would be needed here.

@yupbank
Copy link
Author

yupbank commented Oct 26, 2016

i might try my best for a pull-request :)

@yupbank
Copy link
Author

yupbank commented Oct 26, 2016

@jreback i noticed that there isn't many options for sparse_matrix. (which in my case want something df.to_csr_matrix) is that ok, i add some sparse_matrix to current limited scipy.coo_matrix only collection ?
or do you suggest i talk to someone who knows the pandas.sparse better?
and also DataFrame.groupby people, cause i notice the actually agg_func/value assign of pivot happened in groupby?

@jreback
Copy link
Contributor

jreback commented Oct 26, 2016

adding sparse means pandas sparse structures: http://pandas.pydata.org/pandas-docs/stable/sparse.html

@jnothman
Copy link
Contributor

jnothman commented Jan 16, 2018

I have implemented, I think, most of the logic for sparse unstacking of series (or homogeneous dtypes) at master...jnothman:sparse-unstack

So far I am:

  • currently overwriting the existing _Unstacker.get_result() implementation for ease of testing.
  • not handling the categorical case yet (cf. Behaviour of Categorical inputs to sparse data structures #19278).
  • not currently correctly outputting SparseDataFrame where it would be output
  • not adding a sparse parameter to allow the user to switch on sparse output

I'd be very happy for someone else to complete the work!

@jnothman
Copy link
Contributor

Maybe then it can be downgraded to Effort Medium :P

@jnothman
Copy link
Contributor

jnothman commented Oct 30, 2019

In pandas 0.25, making a series have a sparse dtype and then unstacking it seems to produce a sparse DataFrame (see https://stackoverflow.com/questions/58617185/converting-a-list-of-counters-to-sparse-pandas-dataframe/58617186). I'm assuming this does not require a dense in-memory structure along the way. Maybe this can be closed?

@GeekLad
Copy link

GeekLad commented Jul 18, 2022

I know this is a really old issue, but I believe there are many scenarios where this functionality would be useful. For instance, in my case, I'm trying to pivot order details (line items, where one order has multiple lines, with different items with different quantities/prices ordered). I want to see if I can develop a predictive model where quantities of particular items predict outcomes for an order, so I need to engineer features to pivot the item numbers into columns.

I have 10 million rows of item detail on 3.3 million orders with about 15K unique items. So the pivot would have 15K columns, with only about 3 columns populated on average. I run out of memory when I try to pivot with Pandas, and I have 64GB of RAM.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@alonba
Copy link

alonba commented Dec 6, 2022

I need to pivot a big table (100 million rows, 4 cols). The pivoted table is insanely big. It has to be returned as a sparse matrix, but it isn't.
This feature is extremely needed.
I am currently thinking to use this way, but it is an ugly workaround, if it would even work.

@michelkluger
Copy link

it is a hack around (would love to see this ENH become a feature)

https://medium.com/@michelkluger/pivot-to-sparse-dataframe-d0b1759a9d14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type
Projects
None yet
Development

No branches or pull requests

8 participants