Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bit64 integer64 is broken and should not be used #557

Closed
moonylo opened this issue Apr 6, 2023 · 4 comments
Closed

bit64 integer64 is broken and should not be used #557

moonylo opened this issue Apr 6, 2023 · 4 comments

Comments

@moonylo
Copy link

moonylo commented Apr 6, 2023

Issue Description and Expected Result

bit64 is fundamentally broken. Examples:

317.15 / bit64::as.integer64(25) == 317.15 / 25L
# FALSE # 12.68 vs 12.686

# From https://github.com/truecluster/bit64/issues/12 :
3.5 * bit64::as.integer64(6) == bit64::as.integer64(6) * 3.5
# FALSE # 18 vs 21

Thus odbc should never convert to bit64 integer64. Unfortunately newer databases (e.g. Redshift / Athena) return COUNT() and SUM(some_int) as bigint and the default option for dbGetQuery is bigint = "integer64".

Unfortunately it also looks like bit64 has been abandoned. The underlying issues might not be a big deal as there are even pull requests for the above mentioned #12 here: r-lib/bit64#13. But no change since 2 1/2 years.

Database

Any database that has bigint. For example Redshift or Athena.

Reproducible Example

library(odbc)
library(DBI)
library(tidyverse)

rs_con <- odbc::dbConnect(odbc::odbc(), "Redshift")
test <- odbc::dbGetQuery(rs_con, "SELECT CAST(25 as BIGINT) as int64")

test %>% 
  mutate(
    v1 = 317.15 / int64
    , v2 = 317.15 / as.numeric(int64)
    , v1 == v2 # FALSE
  )

test <- odbc::dbGetQuery(rs_con, "SELECT CAST(25 as BIGINT) as int64")
test %>% 
  mutate(
    v1 = 3.5 * int64
    , v2 = int64 * 3.5
    , v1 == v2 # FALSE
  )
Session Info
devtools::session_info()

─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.1 (2022-06-23 ucrt)
 os       Windows 10 x64 (build 19045)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  English_Germany.utf8
 ctype    English_Germany.utf8
 tz       Europe/Berlin
 date     2023-04-06
 rstudio  2023.03.0+386 Cherry Blossom (desktop)
 pandoc   NAPackages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 ! package     * version    date (UTC) lib source
   abind         1.4-5      2016-07-21 [1] CRAN (R 4.2.0)
   assertthat  * 0.2.1      2019-03-21 [1] CRAN (R 4.2.0)
   backports     1.4.1      2021-12-13 [1] CRAN (R 4.2.0)
   bit           4.0.5      2022-11-15 [1] CRAN (R 4.2.2)
   bit64         4.0.5      2020-08-30 [1] CRAN (R 4.2.0)
   blob          1.2.4      2023-03-17 [1] CRAN (R 4.2.3)
   broom         1.0.4      2023-03-11 [1] CRAN (R 4.2.3)
   BTYD          2.4.3      2021-11-17 [1] CRAN (R 4.2.0)
   BTYDplus      1.2.0      2021-01-21 [1] CRAN (R 4.2.0)
   cachem        1.0.7      2023-02-24 [1] CRAN (R 4.2.2)
   callr         3.7.3      2022-11-02 [1] CRAN (R 4.2.2)
   car           3.1-1      2022-10-19 [1] CRAN (R 4.2.2)
   carData       3.0-5      2022-01-06 [1] CRAN (R 4.2.0)
   cli           3.6.1      2023-03-23 [1] CRAN (R 4.2.3)
   coda          0.19-4     2020-09-30 [1] CRAN (R 4.2.0)
   colorspace    2.1-0      2023-01-23 [1] CRAN (R 4.2.2)
   conflicted  * 1.2.0      2023-02-01 [1] CRAN (R 4.2.2)
   contfrac      1.1-12     2018-05-17 [1] CRAN (R 4.2.0)
   cowplot       1.1.1      2020-12-30 [1] CRAN (R 4.2.0)
   crayon        1.5.2      2022-09-29 [1] CRAN (R 4.2.2)
   data.table    1.14.8     2023-02-17 [1] CRAN (R 4.2.2)
   DBI           1.1.3      2022-06-18 [1] CRAN (R 4.2.1)
   deSolve       1.35       2023-03-12 [1] CRAN (R 4.2.3)
   devtools      2.4.5      2022-10-11 [1] CRAN (R 4.2.2)
   digest        0.6.31     2022-12-11 [1] CRAN (R 4.2.2)
   dplyr       * 1.1.1      2023-03-22 [1] CRAN (R 4.2.3)
   dqrng         0.3.0      2021-05-01 [1] CRAN (R 4.2.0)
   ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
   elliptic      1.4-0      2019-03-14 [1] CRAN (R 4.2.0)
   fansi         1.0.4      2023-01-22 [1] CRAN (R 4.2.2)
   farver        2.1.1      2022-07-06 [1] CRAN (R 4.2.1)
   fastmap       1.1.1      2023-02-24 [1] CRAN (R 4.2.2)
   forcats     * 1.0.0      2023-01-29 [1] CRAN (R 4.2.2)
   fs            1.6.1      2023-02-06 [1] CRAN (R 4.2.2)
   generics      0.1.3      2022-07-05 [1] CRAN (R 4.2.1)
   ggplot2     * 3.4.1      2023-02-10 [1] CRAN (R 4.2.3)
   ggpubr        0.6.0      2023-02-10 [1] CRAN (R 4.2.3)
   ggsignif      0.6.4      2022-10-13 [1] CRAN (R 4.2.2)
   glue          1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
   gridExtra   * 2.3        2017-09-09 [1] CRAN (R 4.2.0)
   gtable        0.3.3      2023-03-21 [1] CRAN (R 4.2.3)
   hms           1.1.3      2023-03-21 [1] CRAN (R 4.2.3)
   htmltools     0.5.5      2023-03-23 [1] CRAN (R 4.2.3)
   htmlwidgets   1.6.2      2023-03-17 [1] CRAN (R 4.2.3)
   httpuv        1.6.9      2023-02-14 [1] CRAN (R 4.2.2)
   httr          1.4.5      2023-02-24 [1] CRAN (R 4.2.2)
   hypergeo      1.2-13     2016-04-07 [1] CRAN (R 4.2.0)
   keyring       1.3.1      2022-10-27 [1] CRAN (R 4.2.3)
   knitr         1.42       2023-01-25 [1] CRAN (R 4.2.2)
   labeling      0.4.2      2020-10-20 [1] CRAN (R 4.2.0)
   later         1.3.0      2021-08-18 [1] CRAN (R 4.2.0)
   lattice       0.20-45    2021-09-22 [2] CRAN (R 4.2.1)
   lifecycle     1.0.3      2022-10-07 [1] CRAN (R 4.2.2)
   lubridate   * 1.9.2      2023-02-10 [1] CRAN (R 4.2.2)
   magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
   mailR         0.8        2021-12-03 [1] CRAN (R 4.2.0)
   MASS          7.3-58.3   2023-03-07 [2] CRAN (R 4.2.3)
   Matrix        1.5-3      2022-11-11 [1] CRAN (R 4.2.3)
   memoise       2.0.1      2021-11-26 [1] CRAN (R 4.2.0)
   mgcv          1.8-42     2023-03-02 [2] CRAN (R 4.2.3)
   mime          0.12       2021-09-28 [1] CRAN (R 4.2.0)
   miniUI        0.1.1.1    2018-05-18 [1] CRAN (R 4.2.0)
   munsell       0.5.0      2018-06-12 [1] CRAN (R 4.2.0)
   nlme          3.1-162    2023-01-31 [2] CRAN (R 4.2.3)
   numDeriv      2016.8-1.1 2019-06-06 [1] CRAN (R 4.2.0)
   odbc          1.3.4      2023-01-17 [1] CRAN (R 4.2.2)
   openxlsx    * 4.2.5.2    2023-02-06 [1] CRAN (R 4.2.3)
   optimx        2022-4.30  2022-05-10 [1] CRAN (R 4.2.0)
   patchwork   * 1.1.2      2022-08-19 [1] CRAN (R 4.2.2)
   pillar        1.9.0      2023-03-22 [1] CRAN (R 4.2.3)
   pkgbuild      1.4.0      2022-11-27 [1] CRAN (R 4.2.3)
   pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
   pkgload       1.3.2      2022-11-16 [1] CRAN (R 4.2.2)
   plyr          1.8.8      2022-11-11 [1] CRAN (R 4.2.2)
   prettyunits   1.1.1      2020-01-24 [1] CRAN (R 4.2.0)
   processx      3.8.0      2022-10-26 [1] CRAN (R 4.2.2)
   profvis       0.3.7      2020-11-02 [1] CRAN (R 4.2.1)
   promises      1.2.0.1    2021-02-11 [1] CRAN (R 4.2.0)
   ps            1.7.3      2023-03-21 [1] CRAN (R 4.2.3)
   purrr       * 1.0.1      2023-01-10 [1] CRAN (R 4.2.2)
   R.methodsS3   1.8.2      2022-06-13 [1] CRAN (R 4.2.0)
   R.oo          1.25.0     2022-06-12 [1] CRAN (R 4.2.0)
   R.utils       2.12.2     2022-11-11 [1] CRAN (R 4.2.2)
   R6            2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
   Rcpp          1.0.10     2023-01-22 [1] CRAN (R 4.2.2)
   readr       * 2.1.4      2023-02-10 [1] CRAN (R 4.2.2)
   remotes       2.4.2      2021-11-30 [1] CRAN (R 4.2.0)
   reshape2      1.4.4      2020-04-09 [1] CRAN (R 4.2.0)
 D rJava         1.0-6      2021-12-10 [1] CRAN (R 4.2.0)
   rlang         1.1.0      2023-03-14 [1] CRAN (R 4.2.3)
   rsconnect     0.8.29     2023-01-09 [1] CRAN (R 4.2.3)
   rstatix       0.7.2      2023-02-01 [1] CRAN (R 4.2.3)
   rstudioapi    0.14       2022-08-22 [1] CRAN (R 4.2.1)
   scales      * 1.2.1      2022-08-20 [1] CRAN (R 4.2.1)
   sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
   shiny         1.7.4      2022-12-15 [1] CRAN (R 4.2.2)
   stringi       1.7.12     2023-01-11 [1] CRAN (R 4.2.2)
   stringr     * 1.5.0      2022-12-02 [1] CRAN (R 4.2.2)
   tibble      * 3.2.1      2023-03-20 [1] CRAN (R 4.2.3)
   tidyr       * 1.3.0      2023-01-24 [1] CRAN (R 4.2.2)
   tidyselect    1.2.0      2022-10-10 [1] CRAN (R 4.2.2)
   tidyverse   * 2.0.0      2023-02-22 [1] CRAN (R 4.2.2)
   timechange    0.2.0      2023-01-11 [1] CRAN (R 4.2.2)
   tzdb          0.3.0      2022-03-28 [1] CRAN (R 4.2.0)
   urlchecker    1.0.1      2021-11-30 [1] CRAN (R 4.2.1)
   usethis       2.1.6      2022-05-25 [1] CRAN (R 4.2.0)
   utf8          1.2.3      2023-01-31 [1] CRAN (R 4.2.2)
   vctrs         0.6.1      2023-03-22 [1] CRAN (R 4.2.3)
   withr         2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
   xfun          0.38       2023-03-24 [1] CRAN (R 4.2.3)
   xtable        1.8-4      2019-04-21 [1] CRAN (R 4.2.0)
   zealtools   * 23.2.1     2023-04-05 [1] Github (A-Team/zealtools@d832e1b)
   zip           2.2.2      2022-10-26 [1] CRAN (R 4.2.2)

 [1] C:/Users/jan.stolle/R/R-Library
 [2] C:/Users/jan.stolle/R/R-4.2.1/library

 D ── DLL MD5 mismatch, broken installation.
@moonylo moonylo changed the title bit64 integer64 should not be used bit64 integer64 is broken and should not be used Apr 6, 2023
@detule
Copy link
Collaborator

detule commented Apr 6, 2023

Hi - thanks for the note.

Looks like the operator behavior is noted in the documentation for bit64 under Design Considerations, so mayhaps broken is not the right term to describe the behavior. In fact, I saw this in the package documentation:

WARNING: do not use them as replacement for 32bit integers, integer64 are not
supported for subscripting by R-core and they have different semantics when
combined with double

As I am not aware of a better alternative, and as there is already a mechanism to change the default, I am biased to leave as is.

Thanks again.

@nivr
Copy link

nivr commented Apr 9, 2023

Thanks for this, I am similarly puzzled.

If I understand that note correctly, then any bit64 output from a database can only feed operations with bit64 outputs. For example, a dplyr summarise would be ok in the case of a sum or median but in the case of a mean you are stuck with no solution.

Did I misread something? What is the best way would you recommend summarising a bit64 column returned from a database?

@detule
Copy link
Collaborator

detule commented Apr 9, 2023

Hi:

I think you are wrestling with a limitation of [R] ( handling / triage of 64 bit values ) more-so than anything to do with package:odbc.

I don't mean to brush your question away but I think it is probably best discussed in a different forum.
Thanks again.

@hadley hadley closed this as completed Apr 24, 2023
@MichaelChirico
Copy link

Hi all. Thanks for flagging this concern. I have taken over maintenance of {bit64}, so do feel free to take up any issues you have directly on the issue tracker there -- I can't promise what will/won't be fixed (there are some fundamental issues that might not be fixable), but will be happy to see highlighted the concerns that users have.

Ultimately I'd prefer base R to get an INT64SXP but I'm not sure we're any closer to that now than when {bit64} first launched :\

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants