ENH: Add sparse option in pivot_table / pivot / unstack #14493

yupbank · 2016-10-25T20:25:09Z

A small, complete example of the issue

# Your code here
sparse_df = df.pivot(index='a', columns='b', values='c', aggfunc=lambda x: len(x), sparse=True)

Expected Output

Output of `pd.show_versions()`

# Paste the output here

The text was updated successfully, but these errors were encountered:

jreback · 2016-10-26T10:48:32Z

this is possible; would have to be integrated into unstacking proper. We do do something like this in get_dummies, but that is not fully generalizable. So this would be an effort. A community pull-request would be needed here.

yupbank · 2016-10-26T13:18:05Z

i might try my best for a pull-request :)

yupbank · 2016-10-26T15:39:27Z

@jreback i noticed that there isn't many options for sparse_matrix. (which in my case want something df.to_csr_matrix) is that ok, i add some sparse_matrix to current limited scipy.coo_matrix only collection ?
or do you suggest i talk to someone who knows the pandas.sparse better?
and also DataFrame.groupby people, cause i notice the actually agg_func/value assign of pivot happened in groupby?

jreback · 2016-10-26T16:14:53Z

adding sparse means pandas sparse structures: http://pandas.pydata.org/pandas-docs/stable/sparse.html

jnothman · 2018-01-16T13:52:09Z

I have implemented, I think, most of the logic for sparse unstacking of series (or homogeneous dtypes) at master...jnothman:sparse-unstack

So far I am:

currently overwriting the existing _Unstacker.get_result() implementation for ease of testing.
not handling the categorical case yet (cf. Behaviour of Categorical inputs to sparse data structures #19278).
not currently correctly outputting SparseDataFrame where it would be output
not adding a sparse parameter to allow the user to switch on sparse output

I'd be very happy for someone else to complete the work!

jnothman · 2018-01-16T13:52:45Z

Maybe then it can be downgraded to Effort Medium :P

jnothman · 2019-10-30T00:32:21Z

In pandas 0.25, making a series have a sparse dtype and then unstacking it seems to produce a sparse DataFrame (see https://stackoverflow.com/questions/58617185/converting-a-list-of-counters-to-sparse-pandas-dataframe/58617186). I'm assuming this does not require a dense in-memory structure along the way. Maybe this can be closed?

GeekLad · 2022-07-18T16:44:59Z

I know this is a really old issue, but I believe there are many scenarios where this functionality would be useful. For instance, in my case, I'm trying to pivot order details (line items, where one order has multiple lines, with different items with different quantities/prices ordered). I want to see if I can develop a predictive model where quantities of particular items predict outcomes for an order, so I need to engineer features to pivot the item numbers into columns.

I have 10 million rows of item detail on 3.3 million orders with about 15K unique items. So the pivot would have 15K columns, with only about 3 columns populated on average. I run out of memory when I try to pivot with Pandas, and I have 64GB of RAM.

alonba · 2022-12-06T15:29:20Z

I need to pivot a big table (100 million rows, 4 cols). The pivoted table is insanely big. It has to be returned as a sparse matrix, but it isn't.
This feature is extremely needed.
I am currently thinking to use this way, but it is an ugly workaround, if it would even work.

michelkluger · 2024-10-23T12:26:33Z

it is a hack around (would love to see this ENH become a feature)

https://medium.com/@michelkluger/pivot-to-sparse-dataframe-d0b1759a9d14

jreback changed the title ~~Add sparse option in pivot_table~~ ENH: Add sparse option in pivot_table / pivot / unstack Oct 26, 2016

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type Difficulty Advanced labels Oct 26, 2016

jreback added this to the Next Major Release milestone Oct 26, 2016

yupbank mentioned this issue Oct 27, 2016

ENH : First attempt to add sparse_pivot #14510

Closed

4 tasks

jreback mentioned this issue Jan 3, 2017

stack() method of SparseDataFrame should return a SparseSeries and optimize memory usage #15045

Closed

jnothman mentioned this issue Jan 15, 2018

ENH: Sparse DataFrame.pivot, pd.pivot_table, Series.unstack #19241

Closed

jbrockmendel removed Difficulty Advanced labels Oct 21, 2019

mroeschke added the Enhancement label Apr 20, 2020

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add sparse option in pivot_table / pivot / unstack #14493

ENH: Add sparse option in pivot_table / pivot / unstack #14493

yupbank commented Oct 25, 2016

jreback commented Oct 26, 2016

yupbank commented Oct 26, 2016

yupbank commented Oct 26, 2016 •

edited

Loading

jreback commented Oct 26, 2016

jnothman commented Jan 16, 2018 •

edited

Loading

jnothman commented Jan 16, 2018

jnothman commented Oct 30, 2019 •

edited

Loading

GeekLad commented Jul 18, 2022

alonba commented Dec 6, 2022

michelkluger commented Oct 23, 2024

ENH: Add sparse option in pivot_table / pivot / unstack #14493

ENH: Add sparse option in pivot_table / pivot / unstack #14493

Comments

yupbank commented Oct 25, 2016

A small, complete example of the issue

Expected Output

Output of pd.show_versions()

jreback commented Oct 26, 2016

yupbank commented Oct 26, 2016

yupbank commented Oct 26, 2016 • edited Loading

jreback commented Oct 26, 2016

jnothman commented Jan 16, 2018 • edited Loading

jnothman commented Jan 16, 2018

jnothman commented Oct 30, 2019 • edited Loading

GeekLad commented Jul 18, 2022

alonba commented Dec 6, 2022

michelkluger commented Oct 23, 2024

Output of `pd.show_versions()`

yupbank commented Oct 26, 2016 •

edited

Loading

jnothman commented Jan 16, 2018 •

edited

Loading

jnothman commented Oct 30, 2019 •

edited

Loading