-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrieve a query as a NumPy structured array #1156
base: master
Are you sure you want to change the base?
Conversation
pinging the PR to see if there is anything I can help with to merge this feature |
94673fe
to
ad272dd
Compare
I ran the unit tests with a debug version of Python and it complained that the Unicode string I was building was invalid. The original code was a modified version of an older Python tuple repr implementation, so I looked at doing that again. However, cpython now uses an internal _PyUnicode_Writer class we don't have access to, so I'm cheating by creating a tuple. Since repr should not be in the critical path of most performance sensitive DB jobs, this will do for now.
It is easier to build debug versions of Python now.
The "raw" encoding was Python 2.7 only. I originally created ODBCCHAR to replace SQLWCHAR because unixODBC would define it as wchar_t even when that was 4 bytes. The data in the buffer of 4-byte wchar_t's was still 2-byte data. Now I've just simplified to uint16_t. I added this to HACKING.md. Deleted Tuple wrapper. Use the Object wrapper and the PyTuple_ functions. This is to prepare for possibly using the ABI which would not allow me access the internal item pointers directly, so I could not use operator[] to set items. (Python has __getitem__ and __setitem__, but to overload __setitem__ in C++ you can only return a reference to the internal data.)
Used subprocess in setup.py to eliminate warnings about the process still running. Removed connect() ansi parameter. Updated SQLWChar to allow it to be declared on the stack and initialized later. Turned into a class with an operator to convert to SQLWCHAR*.
Somehow I lost some changes.
This is a fix for GitHub security advisory GHSA-pm6v-h62r-rwx8. The old code had a hardcoded buffer of 100 bytes (and a comment asking why it was hardcoded!) and fetching a decimal greater than 100 digits would cause a buffer overflow. Author arturxedex128 supplied a very simple code to reproduce the error which was put into the 3 PostgreSQL unit tests as test_large_decimal. (Thank you arturxedex128!) Unfortunately the strategy is still that we have to parse decimals, but now Python strings / Unicode objects are used so there is no arbitrary limit.
I have not ported this code path and I'm not as familiar with it as I need to be. To allow me to complete porting and testing the rest, I've temporarily commented it out. I will look into consolidating the binding for the two code paths. Also, I'd like to consider renaming it to "array binding" or "row wise binding" instead of "fast executemany". While the latter does tell us the goal of it, it is too generic. For one thing, what if we wanted to supply both row- and column-wise binding -- they are both "fast".
I'm not sure where the minor fixes came from, like PyEvel_ -> PyObject. I'll need to test with older 3.x versions. I am going to use the test file naming convention of xxx_tests.py to make it easier to use tab completion in shells and editors.
I accidentally deleted it. It is required for simple local pytest. (See comment in the file.)
I'm porting the tests one at a time and want to ensure the ones ported are successful.
I've only tested on Linux so far. Next step is to get the Windows tests working on a local machine and/or AppVeyor.
I uncommented some sections and the indentation was off. Perhaps it had tabs.
I missed a version. While in there, I simplified this code and used the year as an int and consolidated the "not SQL Server" code into the _get_sqlserver_year function.
I also added flake8 and pylint to the dev requirements.
This commit modifies the `params.cpp` file to check if the given iterable has items in it when using an empty custom iterable object. This way when executing the below code ```python import collections import pyodbc class MySequence(collections.abc.Sequence): def __getitem__(self, index): raise Exception def __len__(self): return 1 connection.execute("SELECT ?, ?", 123, MySequence()).fetchone() ``` a Python exception is returned, instead of a segfault.
...which have never worked. Maybe this will work with CIBUILDWHEEL eventually but until it does, drop them.
acc14a0
to
97da475
Compare
@mkleehammer this has been rebased against the |
Merge conflict flags where kept in `.github/workflows/ubuntu_build.yml` for some reason. This commit removes them.
@mkleehammer you will need to close out this PR so I can make a different one against the |
@mkleehammer this PR can be closed in favor of #1270 where I am still working on updating tests |
@ilanschnell You can close this PR, see #1270 for a comment about why this can be closed. @mkleehammer if Ilan does not close out this PR, you can close it out as you clean out stale PRs |
In this PR, we a
.fetchdictarray()
method to thepyodbc.Cursor
object. This adds numpy as an optional build and runtime dependency. Only when numpy is available at build time, is the extensionsrc/npcontainer.cpp
compiled. In additionWITH_NUMPY
will be defined such thatsrc/cursor.cpp
can add the method, andsrc/pyodbcmodule.cpp
can initialize numpy on import.Here is the docstring of the
.fetchdictarray()
method:Note: The code is based on a https://github.com/ContinuumIO/TextAdapter (which was released in 2017 by Anaconda, Inc. under the BSD license). The original authors of the numpy container are Francesc Alted and Oscar Villellas.