Skip to content

Commit

Permalink
Add documentation for arrow
Browse files Browse the repository at this point in the history
  • Loading branch information
MathMagique committed Jun 5, 2017
1 parent 91f092f commit 137e42f
Show file tree
Hide file tree
Showing 4 changed files with 63 additions and 4 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ that use databases for which no efficient native Python drivers are available.

For maximum compatibility, turbodbc complies with the
[Python Database API Specification 2.0 (PEP 249)](https://www.python.org/dev/peps/pep-0249/).
For maximum performance, turbodbc offers built-in [NumPy](http://www.numpy.org) support
For maximum performance, turbodbc offers built-in [NumPy](http://www.numpy.org) and
[Apache Arrow](https://arrow.apache.org) support
and internally relies on batched data transfer instead of single-record communication as
other popular ODBC modules do.

Expand Down
3 changes: 2 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ that use databases for which no efficient native Python drivers are available.

For maximum compatibility, turbodbc complies with the
`Python Database API Specification 2.0 (PEP 249) <https://www.python.org/dev/peps/pep-0249/>`_.
For maximum performance, turbodbc offers built-in `NumPy <http://www.numpy.org>`_ support
For maximum performance, turbodbc offers built-in `NumPy <http://www.numpy.org>`_ and
`Apache Arrow <https://arrow.apache.org>`_ support
and internally relies on batched data transfer instead of single-record communication as
other popular ODBC modules do.

Expand Down
54 changes: 54 additions & 0 deletions docs/pages/advanced_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,8 @@ and you can also check whether autocommit is currently enabled:
... connection.commit()


.. _advanced_usage_numpy:

NumPy support
-------------

Expand Down Expand Up @@ -192,3 +194,55 @@ are converted to NumPy columns:
+-------------------------------------------+-----------------------+
| ``VARCHAR``, strings, ``DECIMAL(>18, 0)`` | ``object_`` |
+-------------------------------------------+-----------------------+


.. _advanced_usage_arrow:

Apache Arrow support
--------------------

.. note::
Turbodbc's Apache Arrow support requires the ``pyarrow`` package to be installed.
For all source builds, Apache Arrow needs to be installed before installing turbodbc.
Please check the :ref:`installation instructions <getting_started_installation>`
for more details.

`Apache Arrow <https://arrow.apache.org>`_ is a high-performance data layer that
is built for cross-system columnar in-memory analytics using a
`data model <https://arrow.apache.org/docs/python/data.html>`_ designed to make the
most of the CPU cache and vector operations.

.. note::
Apache Arrow support in turbodbc is still experimental and may not be as efficient
as possible yet. Also, Apache Arrow support is not yet available for Windows and
has some issues with Unicode fields. Stay tuned for upcoming improvements.

Obtaining Apache Arrow result sets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here is how to use turbodbc to retrieve the full result set in the form of an
Apache Arrow table:

::

>>> cursor.execute("SELECT A, B FROM my_table")
>>> table = cursor.fetchallarrow()
>>> table
pyarrow.Table
A: int64
B: string
>>> table[0].to_pylist()
[42]
>>> table[1].to_pylist()
[u'hello']

Looking at the data like this is not particularly useful. However, there is some
really useful stuff you can do with an Apache Arrow table, for example,
`convert it to a Pandas dataframe <https://arrow.apache.org/docs/python/pandas.html>`_
like this:

::

>>> table.to_pandas()
A B
0 42 hello
7 changes: 5 additions & 2 deletions docs/pages/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@ the following prerequisites are met:

Please ``pip install numpy`` before installing turbodbc, because turbodbc will search
for the ``numpy`` Python package at installation/compile time. If NumPy is not installed,
turbodbc will not compile the optional NumPy support features.
turbodbc will not compile the :ref:`optional NumPy support <advanced_usage_numpy>` features.
Similarly, please ``pip install pyarrow`` before installing turbodbc if you would like
to use the :ref:`optional Apache Arrow support <advanced_usage_arrow>`.

(1) The minimum viable Boost setup requires the libraries ``variant``, ``optional``,
``datetime``, and ``locale``.
Expand Down Expand Up @@ -69,7 +71,8 @@ If you require NumPy support, please

pip install numpy

Sometime after installing turbodbc.
Sometime after installing turbodbc. Apache Arrow support is not yet available
on Windows.

.. _MSVS 2015 Update 3 Redistributable, 64 bit: https://www.microsoft.com/en-us/download/details.aspx?id=53840

Expand Down

0 comments on commit 137e42f

Please sign in to comment.