Add documentation for arrow

blue-yonder · Jun 5, 2017 · 137e42f · 137e42f
1 parent 91f092f
commit 137e42f
Show file tree

Hide file tree

Showing 4 changed files with 63 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -15,7 +15,8 @@ that use databases for which no efficient native Python drivers are available.
 
 For maximum compatibility, turbodbc complies with the
 [Python Database API Specification 2.0 (PEP 249)](https://www.python.org/dev/peps/pep-0249/).
-For maximum performance, turbodbc offers built-in [NumPy](http://www.numpy.org) support
+For maximum performance, turbodbc offers built-in [NumPy](http://www.numpy.org) and
+[Apache Arrow](https://arrow.apache.org) support
 and internally relies on batched data transfer instead of single-record communication as
 other popular ODBC modules do.
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -12,7 +12,8 @@ that use databases for which no efficient native Python drivers are available.
 
 For maximum compatibility, turbodbc complies with the
 `Python Database API Specification 2.0 (PEP 249) <https://www.python.org/dev/peps/pep-0249/>`_.
-For maximum performance, turbodbc offers built-in `NumPy <http://www.numpy.org>`_ support
+For maximum performance, turbodbc offers built-in `NumPy <http://www.numpy.org>`_ and
+`Apache Arrow <https://arrow.apache.org>`_ support
 and internally relies on batched data transfer instead of single-record communication as
 other popular ODBC modules do.
 

diff --git a/docs/pages/advanced_usage.rst b/docs/pages/advanced_usage.rst
@@ -113,6 +113,8 @@ and you can also check whether autocommit is currently enabled:
     ...     connection.commit()
 
 
+.. _advanced_usage_numpy:
+
 NumPy support
 -------------
 
@@ -192,3 +194,55 @@ are converted to NumPy columns:
 +-------------------------------------------+-----------------------+
 | ``VARCHAR``, strings, ``DECIMAL(>18, 0)`` | ``object_``           |
 +-------------------------------------------+-----------------------+
+
+
+.. _advanced_usage_arrow:
+
+Apache Arrow support
+--------------------
+
+.. note::
+    Turbodbc's Apache Arrow support requires the ``pyarrow`` package to be installed.
+    For all source builds, Apache Arrow needs to be installed before installing turbodbc.
+    Please check the :ref:`installation instructions <getting_started_installation>`
+    for more details.
+
+`Apache Arrow <https://arrow.apache.org>`_ is a high-performance data layer that
+is built for cross-system columnar in-memory analytics using a
+`data model <https://arrow.apache.org/docs/python/data.html>`_ designed to make the
+most of the CPU cache and vector operations.
+
+.. note::
+    Apache Arrow support in turbodbc is still experimental and may not be as efficient
+    as possible yet. Also, Apache Arrow support is not yet available for Windows and
+    has some issues with Unicode fields. Stay tuned for upcoming improvements.
+
+Obtaining Apache Arrow result sets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Here is how to use turbodbc to retrieve the full result set in the form of an
+Apache Arrow table:
+
+::
+
+    >>> cursor.execute("SELECT A, B FROM my_table")
+    >>> table = cursor.fetchallarrow()
+    >>> table
+    pyarrow.Table
+    A: int64
+    B: string
+    >>> table[0].to_pylist()
+    [42]
+    >>> table[1].to_pylist()
+    [u'hello']
+
+Looking at the data like this is not particularly useful. However, there is some
+really useful stuff you can do with an Apache Arrow table, for example,
+`convert it to a Pandas dataframe <https://arrow.apache.org/docs/python/pandas.html>`_
+like this:
+
+::
+
+    >>> table.to_pandas()
+        A      B
+    0  42  hello
diff --git a/docs/pages/getting_started.rst b/docs/pages/getting_started.rst
@@ -35,7 +35,9 @@ the following prerequisites are met:
 
 Please ``pip install numpy`` before installing turbodbc, because turbodbc will search
 for the ``numpy`` Python package at installation/compile time. If NumPy is not installed,
-turbodbc will not compile the optional NumPy support features.
+turbodbc will not compile the :ref:`optional NumPy support <advanced_usage_numpy>` features.
+Similarly, please ``pip install pyarrow`` before installing turbodbc if you would like
+to use the :ref:`optional Apache Arrow support <advanced_usage_arrow>`.
 
 (1) The minimum viable Boost setup requires the libraries ``variant``, ``optional``,
 ``datetime``, and ``locale``.
@@ -69,7 +71,8 @@ If you require NumPy support, please
 
     pip install numpy
 
-Sometime after installing turbodbc.
+Sometime after installing turbodbc. Apache Arrow support is not yet available
+on Windows.
 
 .. _MSVS 2015 Update 3 Redistributable, 64 bit: https://www.microsoft.com/en-us/download/details.aspx?id=53840