Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a helper class for using the Databricks ODBC driver #615

Merged
merged 1 commit into from
Nov 14, 2023

Conversation

atheriel
Copy link
Collaborator

@atheriel atheriel commented Nov 8, 2023

The official Databricks ODBC driver supports a number of authentication methods but these are difficult to use correctly and require users to pass them manually rather than picking up on ambient credentials as other Databricks SDKs do.

This commit adds a high-level helper class that implements (a subset of) the Databricks client unified authentication model in its dbConnect() method and should make connection code that runs on Databricks-aware environments work automatically without manual configuration.

Compare the interface from this commit:

DBI::dbConnect(
  odbc::databricks(),
  http_path = "sql/protocolv1/o/4425955464597947/1026-023828-vn51jugj"
)

with filling all of the required parameters out manually:

DBI::dbConnect(
  odbc::odbc(),
  driver = "/opt/simba/spark/lib/64/libsparkodbc_sb64.so",
  host = gsub("https://", "", Sys.getenv("DATABRICKS_HOST")),
  HTTPPath = "sql/protocolv1/o/4425955464597947/1026-023828-vn51jugj"
  ThriftTransport = 2,
  UserAgentEntry = Sys.getenv("SPARK_CONNECT_USER_AGENT"),
  port = 443,
  Protocol = "https",
  SSL = 1,
  AuthMech = 11,
  Auth_Flow = 0,
  Auth_AccessToken = "<your OAuth token>"
)

R/databricks.R Show resolved Hide resolved
@atheriel
Copy link
Collaborator Author

atheriel commented Nov 8, 2023

Updated to use a new S4 class instead of a helper function.

@atheriel atheriel changed the title Add a helper function for using the Databricks ODBC driver Add a helper class for using the Databricks ODBC driver Nov 8, 2023
user_agent <- paste0("r-odbc/", utils::packageVersion("odbc"))
if (nchar(Sys.getenv("SPARK_CONNECT_USER_AGENT")) != 0) {
# Respect the existing user-agent if present. Normally we'd append, but the
# Databricks ODBC driver does not yet support space-separated entries in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that is most unfortunate.

Comment on lines +151 to +166
default_path <- ""
if (Sys.info()["sysname"] == "Linux") {
default_path <- "/opt/simba/spark/lib/64/libsparkodbc_sb64.so"
} else if (Sys.info()["sysname"] == "Darwin") {
default_path <- "/Library/simba/spark/lib/libsparkodbc_sbu.dylib"
}

if (file.exists(default_path)) default_path else "Simba Spark ODBC Driver"
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay for sensible defaults.

For linux, would we want to default to /opt/rstudio-drivers/spark/bin/lib/libsparkodbc_sb64.so and fallback to the /opt/simba/spark location?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the expected install location for the Pro driver? That could make sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the expected location of the driver. I don't quite understand bin/lib in the path, but it follows the convention of where the other drivers are installed.

R/databricks.R Show resolved Hide resolved
R/databricks.R Outdated Show resolved Hide resolved
R/databricks.R Show resolved Hide resolved
R/databricks.R Outdated Show resolved Hide resolved
@hadley
Copy link
Member

hadley commented Nov 13, 2023

I'm happy to merge once the build falures are resolved.

The official Databricks ODBC driver supports a number of authentication
methods but these are difficult to use correctly and require users to
pass them manually rather than picking up on ambient credentials as
other Databricks SDKs do.

This commit adds a high-level helper class that implements (a subset of)
the Databricks client unified authentication model [0] in its
dbConnect() method and should make connection code that runs on
Databricks-aware environments work automatically without manual
configuration.

Compare the interface from this commit:

    DBI::dbConnect(
      odbc::databricks(),
      http_path = "sql/protocolv1/o/4425955464597947/1026-023828-vn51jugj"
    )

with filling all of the required parameters out manually:

    DBI::dbConnect(
      odbc::odbc(),
      driver = "/opt/simba/spark/lib/64/libsparkodbc_sb64.so",
      host = gsub("https://", "", Sys.getenv("DATABRICKS_HOST")),
      HTTPPath = "sql/protocolv1/o/4425955464597947/1026-023828-vn51jugj"
      ThriftTransport = 2,
      UserAgentEntry = Sys.getenv("SPARK_CONNECT_USER_AGENT"),
      port = 443,
      Protocol = "https",
      SSL = 1,
      AuthMech = 11,
      Auth_Flow = 0,
      Auth_AccessToken = "<your OAuth token>"
    )

[0]: https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication

Signed-off-by: Aaron Jacobs <[email protected]>
@atheriel
Copy link
Collaborator Author

CI issues are fixed and R CMD check is now clean. 🚀

@hadley hadley merged commit 1ed1faf into r-dbi:main Nov 14, 2023
19 checks passed
@hadley
Copy link
Member

hadley commented Nov 14, 2023

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants