-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DB-API 2.0 cursor support to pandas DataFrame constructor data
parameter
#54376
Conversation
@@ -1219,6 +1219,35 @@ cdef bint c_is_list_like(object obj, bint allow_sets) except -1: | |||
) | |||
|
|||
|
|||
def is_cursor(obj: object) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just move this to the sql.py file itself; I don't think this would be used elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is that it would be used by the DataFrame
constructor after seeing if the data
is list-like (see pandas/core/frame.py
diff). I realize that is a bit invasive for such a special case, but it's the place where it is most generally useful for this use-case.
@@ -131,6 +131,42 @@ def shape(self): | |||
return self._values.shape | |||
|
|||
|
|||
class MockDBCursor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this or can we just set up a test that assigns this to the cursor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean, but I could drop this specific of a test and add a test with the rest of the database tests. That would be more of a real-world use of the feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an issue discussing this feature? We usually require an issue where enhancements are discussed with buy-in from the core team.
At first brush. I think the work-around seems adequate and not sure if the DataFrame constructor needs to support this
@mroeschke There isn't an open issue. I didn't realize that opening an issue was required before submitting a PR. |
Just a general question - why this as opposed to |
All of the pandas integrations with databases require SQLAlchemy. Supporting plain DB-API cursors in the I guess my main motivation is consistency between database clients for our customers. For example, you can do My particular issue is with returning StructSequences (because they are much faster than namedtuples), but unfortunately even though they are advertised as equivalent to namedtuples, they don't have a |
I missed the overall goal of this on initial review but I also am not really sure we want the DataFrame constructor to support this. The fact that you can call the DataFrame constructor on a SQLAlchemy object is just an implementation detail at the moment right? I don't recall that being a documented feature and think we should just stick to the read_sql interface |
SQLAlchemy uses namedtuples in its results and pandas looks for a |
Here's another idea. Rather than having something specific to the DB-API, how about adding support for some other arbitrary attribute / property on the |
while the official stance is that DBAPI2 connections (other than sqlite3) are untested (and raises a UserWarning), it works fine (as long as you're using a connection that returns a list of lists/tuples (dictcursors don't work since 2.0 - xref #53028) in your case, you can run |
Thanks for the PR, but it appears that there isn't sufficient enthusiasm for this feature yet in the PR. I would recommend opening an issue first so this can be discussed more thoroughly before moving forward with a PR so closing |
The current
DataFrame
constructor takes many objects as thedata
to use including a list-like object which can contain other list-like objects as rows. Most DB-API clients support iterating over a cursor object that has data to download from a database. If the rows of data are innamedtuple
s, the column names are extracted from the_fields
attribute of thenamedtuple
, but normal tuples and StructSequences are left without column names. The feature in this code adds support for detecting cursor objects and extracting the column names from the cursor if they haven't been explicitly specified.Output:
The current way of doing this is:
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.