Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error trying to serialize a binary file handle #617

Open
neucer opened this issue Sep 3, 2023 · 3 comments
Open

Error trying to serialize a binary file handle #617

neucer opened this issue Sep 3, 2023 · 3 comments
Labels

Comments

@neucer
Copy link

neucer commented Sep 3, 2023

With fmode set to FILE_FMODE

import dill
dill.settings["fmode"] = dill.FILE_FMODE
with open('some_binary_file', 'rb') as file_handle:
    with open(f'some_binary_file.pkl', 'wb') as pkl_file:
        dill.dump(file_handle, pkl_file)

gives error

Traceback (most recent call last):
  File "C:\Users\neucer\PycharmProjects\pythonProject\tst.py", line 23, in <module>
    dill.dump(file_handle, pkl_file)
  File "C:\Users\neucer\PycharmProjects\pythonProject\venv310\lib\site-packages\dill\_dill.py", line 235, in dump
    Pickler(file, protocol, **_kwds).dump(obj)
  File "C:\Users\neucer\PycharmProjects\pythonProject\venv310\lib\site-packages\dill\_dill.py", line 394, in dump
    StockPickler.dump(self, obj)
  File "C:\Program Files\Python310\lib\pickle.py", line 487, in dump
    self.save(obj)
  File "C:\Users\neucer\PycharmProjects\pythonProject\venv310\lib\site-packages\dill\_dill.py", line 388, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "C:\Program Files\Python310\lib\pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "C:\Users\neucer\PycharmProjects\pythonProject\venv310\lib\site-packages\dill\_dill.py", line 1336, in save_file
    f = _save_file(pickler, obj, open)
  File "C:\Users\neucer\PycharmProjects\pythonProject\venv310\lib\site-packages\dill\_dill.py", line 1313, in _save_file
    fdata = f.read()
  File "C:\Program Files\Python310\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1133: character maps to <undefined>

because the code tries to open the binary file as text.

@mmckerns
Copy link
Member

mmckerns commented Sep 4, 2023

The following seems to work for me, given python 3.8 and the latest dill:

Python 3.8.17 (default, Jun 11 2023, 01:54:00) 
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.settings['fmode'] = dill.FILE_FMODE
>>> f = open('xxx.db', 'wb')
>>> f.write(b'hello world')
11
>>> f.write(b'goodbye')
7
>>> f.close()
>>> f = open('xxx.db', 'rb')
>>> dill.dumps(f)
b'\x80\x04\x95a\x00\x00\x00\x00\x00\x00\x00\x8c\ndill._dill\x94\x8c\x12_create_filehandle\x94\x93\x94(\x8c\x06xxx.db\x94\x8c\x02rb\x94K\x00\x89\x8c\x02io\x94\x8c\x04open\x94\x93\x94\x89K\x02\x8c\x12hello worldgoodbye\x94t\x94R\x94.'
>>> dill.__version__
'0.3.8.dev0'

This also works:

Python 3.8.17 (default, Jun 11 2023, 01:54:00) 
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.settings['fmode'] = dill.FILE_FMODE
>>> with open('xxx.db', 'rb') as file_handle:
...   dill.dumps(file_handle)
... 
b'\x80\x04\x95a\x00\x00\x00\x00\x00\x00\x00\x8c\ndill._dill\x94\x8c\x12_create_filehandle\x94\x93\x94(\x8c\x06xxx.db\x94\x8c\x02rb\x94K\x00\x89\x8c\x02io\x94\x8c\x04open\x94\x93\x94\x89K\x02\x8c\x12hello worldgoodbye\x94t\x94R\x94.'
>>>

and this also works:

Python 3.8.17 (default, Jun 11 2023, 01:54:00) 
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.settings['fmode'] = dill.FILE_FMODE
>>> with open('xxx.db', 'rb') as file_handle:
...   with open('xxx.pkl', 'wb') as pkl_file:
...     dill.dump(file_handle, pkl_file)
... 
>>> with open('xxx.pkl', 'rb') as pkl_file:
...   dill.load(pkl_file)
... 
<_io.BufferedReader name='xxx.db'>

What is your version of dill`? It looks like you are using Python 3.10 on Windows. Can you confirm that, and give any further details? If my test code succeeds for you, then can you give an example binary file that fails so I can test it?

@neucer
Copy link
Author

neucer commented Sep 5, 2023

Yes, I confirm. But the difference is probably the file. Try this

import dill
dill.settings['fmode'] = dill.FILE_FMODE
f = open('xxx.db', 'wb')
f.write(b'\x81')
f.close()
f = open('xxx.db', 'rb')
dill.dumps(f)

@mmckerns
Copy link
Member

mmckerns commented Sep 6, 2023

I can reproduce the error with that code, thanks.

It would seem that the error is pretty self-contained, as it attempts to pickle the file and immediately fails in the registered function _save_file.

...
>>> dill.detect.trace(True)
>>> dill.dumps(f)
┬ Fi: <_io.BufferedReader name='xxx.db'>
Traceback (most recent call last).
  File "<stdin>", line 1, in <module>
  File "/Users/mmckerns/lib/python3.8/site-packages/dill/_dill.py", line 278, in dumps
    dump(obj, file, protocol, byref, fmode, recurse, **kwds)#, strictio)
  File "/Users/mmckerns/lib/python3.8/site-packages/dill/_dill.py", line 250, in dump
    Pickler(file, protocol, **_kwds).dump(obj)
  File "/Users/mmckerns/lib/python3.8/site-packages/dill/_dill.py", line 418, in dump
    StockPickler.dump(self, obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pickle.py", line 487, in dump
    self.save(obj)
  File "/Users/mmckerns/lib/python3.8/site-packages/dill/_dill.py", line 412, in save
    StockPickler.save(self, obj, save_persistent_id)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/Users/mmckerns/lib/python3.8/site-packages/dill/_dill.py", line 1362, in save_file
    f = _save_file(pickler, obj, open)
  File "/Users/mmckerns/lib/python3.8/site-packages/dill/_dill.py", line 1339, in _save_file
    fdata = f.read()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 0: invalid start byte

@mmckerns mmckerns added the bug label Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants