Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ExcelWriter() to add sheets to existing workbook #3441

Closed
ligon opened this issue Apr 23, 2013 · 41 comments · Fixed by #4933 or #21251
Closed

Allow ExcelWriter() to add sheets to existing workbook #3441

ligon opened this issue Apr 23, 2013 · 41 comments · Fixed by #4933 or #21251
Labels
Enhancement IO Excel read_excel, to_excel
Milestone

Comments

@ligon
Copy link

ligon commented Apr 23, 2013

The ability of ExcelWriter to save different dataframes to different worksheets is great for sharing those dfs with the python-deficient. But this quickly leads to a need to add worksheets to an existing workbook, not just creating one from scratch; something like:

df0=pd.DataFrame(np.arange(3))
df0.to_excel('foo.xlsx','Data 0')

df1=pd.DataFrame(np.arange(2))
df1.to_excel('foo.xlsx','Data 1')

The following little diff to io/parsers.py implements this behavior for *.xlsx files:

diff --git a/pandas/io/parsers.py b/pandas/io/parsers.py
index 89f892d..7f010ee 100644
--- a/pandas/io/parsers.py
+++ b/pandas/io/parsers.py
@@ -2099,12 +2099,19 @@ class ExcelWriter(object):
             self.fm_date = xlwt.easyxf(num_format_str='YYYY-MM-DD')
         else:
             from openpyxl.workbook import Workbook
-            self.book = Workbook()  # optimized_write=True)
-            # open pyxl 1.6.1 adds a dummy sheet remove it
-            if self.book.worksheets:
-                self.book.remove_sheet(self.book.worksheets[0])
+            from openpyxl.reader.excel import load_workbook
+
+            try:
+               self.book=load_workbook(filename = path)
+               self.sheets={wks.title:wks for wks in self.book.worksheets}
+            except InvalidFileException:
+                self.book = Workbook()  # optimized_write=True)
+                # open pyxl 1.6.1 adds a dummy sheet remove it
+                if self.book.worksheets:
+                    self.book.remove_sheet(self.book.worksheets[0])
+                self.sheets = {}
+
         self.path = path
-        self.sheets = {}
         self.cur_sheet = None

Doing this for *.xls files is a little harder.

@ligon ligon closed this as completed Apr 23, 2013
@ligon ligon reopened this Apr 23, 2013
@jreback
Copy link
Contributor

jreback commented Sep 22, 2013

@jtratner is this still a bug/needed enhancement?

@jtratner
Copy link
Contributor

Because of how to_excel is set up, this would mean reading in and then writing the file each time (because to_excel with a path argument saves the file). The right way to do this is to use ExcelWriter:

import pandas as pd
writer = pd.ExcelWriter('foo.xlsx')
df.to_excel(writer, 'Data 0')
df.to_excel(writer, 'Data 1')
writer.save()

I could see (eventually) adding an option to ExcelWriter that doesn't overwrite the file. But, yet again, that may mean writing in the entire file first. I don't know.

@ghost ghost assigned jtratner Sep 22, 2013
@jtratner
Copy link
Contributor

I'm going to add something to the docs about this, maybe a test case with this, and I'll look into adding an option to read in the file, but it depends on how xlwt and openpyxl work.

@jreback
Copy link
Contributor

jreback commented Sep 22, 2013

@jtratner what about a context manager get_excel?

with get_excel('foo.xlsx') as e:
    df.to_excel(e,'Data 0)
    df.to_excel(e,'Data 1)

?

@jtratner
Copy link
Contributor

how about we just make ExcelWriter into a contextmanager instead? it'll just call save at the end. Much simpler.

@jtratner
Copy link
Contributor

@ligon you can do this now this way:

with ExcelWriter('foo.xlsx') as writer:
    df.to_excel(writer, 'Data 0')
    df2.to_excel(writer, 'Data 1')

If you don't use the with statement, just have to call save() at the end.

@ligon
Copy link
Author

ligon commented Sep 23, 2013

Excellent. And great that it has an exit method.

Thanks,
-Ethan Ligon

Ethan Ligon, Associate Professor
Agricultural & Resource Economics
University of California, Berkeley

@dylancis
Copy link

I was extremely interesting by the request made by @ligon - but seems this is already there.
However using 0.12.0 pd version, when I am doing:
df = DataFrame([1,2,3])
df2 = DataFrame([5,5,6])
with ExcelWriter('foo.xlsx') as writer:
df.to_excel(writer, 'Data 0')
df2.to_excel(writer, 'Data 1')

Assumning foo.xlsx was containing a sheet named 'bar', basgot delete after the command run. While as per your comment, i was expecting to keep it in my foo excel file. Is that a bug?

@frenet
Copy link

frenet commented Apr 4, 2014

is it hard to add sheets to an existing excel file on the disk?
import pandas as pd
import numpy as np
a=pd.DataFrame(np.random.random((3,1)))
excel_writer=pd.ExcelWriter('c:\excel.xlsx')
a.to_excel(excel_writer, 'a1')
excel_writer.save()

excel_writer=pd.ExcelWriter('c:\excel.xlsx')
a.to_excel(excel_writer, 'a2')
excel_writer.save()

here only sheet 'a2" is save, but I like to save both 'a1' and 'a2'.

I know it is possible to add sheets to an existing workbook.

@jtratner
Copy link
Contributor

jtratner commented Apr 4, 2014

It's definitely possible to add sheets to an existing workbook, but it's
not necessarily easy to do it with pandas. I think you'd have to read the
workbook separately and then pass it into the ExcelWriter class... That
would be something we could consider supporting.

@jtratner
Copy link
Contributor

jtratner commented Apr 4, 2014

And I think if you subclass the ExcelWriter instance you want to use and
overwrite its__init__ method, as long as you set self.book it should work.
That said, no guarantee that this would continue to work in future
versions, since it's only a quasi-public API

@ankostis
Copy link

This stackoverflow workaround, which is based in openpyxl, may work
(EDIT: indeed works, checked with pandas-0.17.0):

import pandas
from openpyxl import load_workbook

book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') 
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

@jreback
Copy link
Contributor

jreback commented Dec 18, 2015

this would be pretty easy to implement inside ExcelWriter (Oder patch above)

prob just need to add a mode=kw and default to w and make a be append

@pylang
Copy link

pylang commented May 11, 2016

Was this ever patched?

@jreback
Copy link
Contributor

jreback commented May 11, 2016

it seems that you can work around it (see above), but I suppose would be nice to actually do it from pandas.

@andreacassioli
Copy link

Hi, any follow up on this issue?
I can provide a use case: I have excel files with pivot tables and pivot graphs that I need to reach out people not proficient in Python.

My idea was to use pandas to add a sheet that contains the data for the pivot. But up to know I am stuck and the proposed workaround, thought not difficult, sounds a bit cumbersome . It would make sense to jsut have an option whether overwrite an existing file.

@zeluspudding
Copy link

zeluspudding commented Dec 11, 2016

Let me echo @jreback , it would be super nice if I could just add a sheet to an excel workbook from pandas.

@jorisvandenbossche jorisvandenbossche modified the milestones: Next Major Release, 0.13, Someday Dec 11, 2016
@jorisvandenbossche
Copy link
Member

To be clear, we agree that this would be a nice functionality, and would certainly welcome a contribution from the community.
Best way to have this in pandas is to make a PR!

@aa3222119
Copy link

@jmcnamara how to use pandas.to_excel(Writer maybe use pandas.ExcelWriter) to add some data to an existed file , but not rewrite it??

@jorisvandenbossche
Copy link
Member

@aa3222119 That is exactly what this issue is about: an enhancement request to add this ability, which is not yet possible today.

(BTW, it is not needed to post the same question in multiple issues)

@aa3222119
Copy link

sorry. i will delete that one . BTW, will that be possible some day later? @jorisvandenbossche

@jmcnamara
Copy link
Contributor

BTW will that be possible some day later? @jmcnamara

This isn't and won't be possible when using XlsxWriter as the engine. It should be possible when using OpenPyXL. I've seen some examples on SO like this one: jmcnamara/excel-writer-xlsx#157

@aa3222119
Copy link

3Q very much! @jmcnamara
it is exactly what you said . use openpyxl 👍

import pandas as pd
from openpyxl import load_workbook
book = load_workbook('text.xlsx')
writer = pd.ExcelWriter('text.xlsx', engine='openpyxl')
writer.book = book
pd.DataFrame(userProfile,index=[1]).to_excel(writer,'sheet111',startrow=7,startcol=7)
pd.DataFrame(userProfile,index=[1]).to_excel(writer,'sheet123',startrow=0,startcol=0)
writer.save()
pd.DataFrame(userProfile,index=[1]).to_excel(writer,'sheet123',startrow=3,startcol=3)
writer.save()

all can be added to text.xlsx.
https://github.com/pandas-dev/pandas/issues/3441

@Themanwithoutaplan
Copy link
Contributor

@jgonzale
Copy link

jgonzale commented Mar 9, 2017

@ankostis, @aa3222119 when I follow the steps you comment, I always reach the following error:

Traceback (most recent call last):
File "./name_manipulation.py", line 60, in
df.to_excel(excel_writer, 'iadatasheet', startcol=0, startrow=5, columns=['codes', 'Zona Basica de Salud', month+"-"+year], index=False)
File "/Users/jgonzalez.iacs/Projects/SIIDI/PYTHON_ETLs/venv/lib/python3.4/site-packages/pandas/core/frame.py", line 1464, in to_excel
startrow=startrow, startcol=startcol)
File "/Users/jgonzalez.iacs/Projects/SIIDI/PYTHON_ETLs/venv/lib/python3.4/site-packages/pandas/io/excel.py", line 1306, in write_cells
wks = self.book.create_sheet()
AttributeError: 'str' object has no attribute 'create_sheet'

So, there is not solution yet, right?

Thanks

@ankostis
Copy link

ankostis commented Mar 9, 2017

Maybe the API has changed - it definitely worked back then.

@Themanwithoutaplan
Copy link
Contributor

@jgonzale which engine are you using?

@aa3222119
Copy link

@jgonzale by what python said , your excel_writer.book maybe just a str but not a workbook?

@jgonzale
Copy link

@aa3222119 Oh geez! You were right! Messing around with very similar names!

Thank you very much! 👏 👏 👏

@wxl3322335
Copy link

thank you very much!

@BLMeltdown
Copy link

Hello
I have some use case where it would be useful:
Even with the ExcelWriter trick as:

with ExcelWriter('foo.xlsx') as writer:
df.to_excel(writer, 'Data 0')
df2.to_excel(writer, 'Data 1')

you can't add a plot that you need without saving the file and reopening it. With the risk of meddling with any formatting you have in the workbook.

There is indeed the workaround to use the plotting functions from pandas to save these in the files, but (there is a but), when you need something a little more sophisticated like showing a PCA components graph you built from scikitlearn PCA and matplotlib, then it becomes tedious.

Hence
a pandas.nondf_manager (non df object or filename).to_excel(usual syntax)
would be exceedingly fine.
Thanks.

@orbitalz
Copy link

I don't know how it is possible, however, it works for me

create_excel = 0
if plot_spectra != 0:
    for x in range(min_sigma, max_sigma, step_size):
        # apply gaussian
        df1 = gaussian_filter(df, sigma=x, mode=padding_mode)
        df2 = pd.DataFrame(df1)
        if save_file:
            if save_csv:
                df2.to_csv('{} {}{}.csv'.format(Output_file, 'sigma_', x,))
            if save_xlsx:
                if os.path.isfile('{}.xlsx'.format(Output_file)):
                    print("Warning! Excel file is exist")
                    break
                if create_excel == 0:
                    xlsx_writer = pd.ExcelWriter('{}.xlsx'.format(Output_file), engine='xlsxwriter')                                          
                    create_excel += 1
                df2.to_excel(xlsx_writer, '{}{}'.format('sigma_', x))
                if x == max_sigma-1:
                    xlsx_writer.save()

At the end, I got the excel file which have several work sheets.

@jorisvandenbossche
Copy link
Member

@orbitalz you are creating an excel file the first time (xlsx_writer = pd.ExcelWriter(..)), and then adding multiple sheets to that file object. That is supported, but this issue is about adding sheets to an existing excel file.

@orbitalz
Copy link

orbitalz commented Aug 1, 2017

I'm sorry for misunderstanding the topic and Thank you for pointing me out :)

@tlysecust
Copy link

@orbitalz You solve my problem ,but I don't known how it works

@jreback jreback modified the milestones: Someday, 0.24.0 Jun 19, 2018
@ivoska
Copy link

ivoska commented Sep 23, 2019

mode={'a'} does not work as the documentation suggests
this is still a buggy mess

@codewithpatch
Copy link

Appending in the existing worksheet seems to work with
writer = pd.ExcelWriter('filename.xlsx', mode='a')

But, this only appends and does not overwrite sheets with the same sheetname

Example, my existing workbook has a sheetname 'mySheet'
If I try to do:
df.to_excel(writer, 'mySheet')
It will create a new sheet 'mySheet1' instead of rewriting the existing 'mySheet'

I wonder if there's any other way to append in the existing workbook, but overwriting sheets that you want to overwrite.

Hope someone helps.

@anvesha-nextsteps
Copy link

By using openpyxl as engine in ExcelWriter
writer = pd.ExcelWriter(filename, engine='openpyxl')
df.to_excel(writer, sheet_name)
at writer.save() i am getting this error
TypeError: got invalid input value of type <class 'xml.etree.ElementTree.Element'>, expected string or Element

@irishun
Copy link

irishun commented Mar 15, 2020

By using openpyxl as engine in ExcelWriter
writer = pd.ExcelWriter(filename, engine='openpyxl')
df.to_excel(writer, sheet_name)
at writer.save() i am getting this error
TypeError: got invalid input value of type <class 'xml.etree.ElementTree.Element'>, expected string or Element

I have met the same error. Has anyone solved this issue?

@LittleMoDel
Copy link

LittleMoDel commented Mar 20, 2020

engine should change to openyxl,because the default engine'xlsxwriter' NOT support append mode !

`
import pandas as pd

df= pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'], 'value': [1, 2, 3, 5]})

#engine should change to openyxl,because the default engine'xlsxwriter' NOT support append mode !

writer = pd.ExcelWriter('exist.xlsx',mode='a',engine='openpyxl')

df.to_excel(writer, sheet_name ='NewSheet')

writer.save()

writer.close()

`


Pandas chooses an Excel writer via two methods:

the engine keyword argument
the filename extension (via the default specified in config options)
By default, pandas uses the XlsxWriter for .xlsx, openpyxl for .xlsm, and xlwt for .xls files. If you have multiple engines installed, you can set the default engine through setting the config options io.excel.xlsx.writer and io.excel.xls.writer. pandas will fall back on openpyxl for .xlsx files if Xlsxwriter is not available.

To specify which writer you want to use, you can pass an engine keyword argument to to_excel and to ExcelWriter. The built-in engines are:

  • openpyxl: version 2.4 or higher is required
  • xlsxwriter
  • xlwt

@macifTest
Copy link

macifTest commented Jan 31, 2022

Hello,
I have an issue with the use of Pandas + ExcelWriter + load_workbook.
My need is to be able to modify data from an existing excel file (without deleting the rest).
It works partly, but when I check the size of the produced file and the original one the size is quite different.
Moreover, it seems to lack some properties. Which leads to an error message when I want to integrate the modified file into an application.
The code bellow :

data_filtered = pd.DataFrame([date, date, date, date], index=[2,3,4,5])
book = openpyxl.load_workbook(file_origin)
writer = pd.ExcelWriter(file_modif, engine='openpyxl',datetime_format='dd/mm/yyyy hh:mm:ss', date_format='dd/mm/yyyy')
writer.book = book
## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
data_filtered.to_excel(writer, sheet_name="PCA pour intégration", index=False, startrow=2, startcol=5, header=False, verbose=True)
writer.save()`

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Excel read_excel, to_excel
Projects
None yet