Releases: tidyverse/dbplyr
dbplyr 2.5.0
Improved tools for qualified table names
-
Specification of table names with schema/catalogs has been overhauled to
make it simpler. This includes the following features and fixes:-
The simplest way to refer to a qualified table is now to wrap it in
I()
, e.g.I("schema_name.table_name")
. -
Use of
sql()
andident_q()
insidein_catalog()
andin_schema()
is once again supported (#1388). -
It's ok to use
ident_q()
once again (#1413) and you should no longer
see unsuppressable warnings about usingin_schema()
(#1408). -
The names of the arguments to
Id()
no longer matter, only their order
(#1416). Additionally, thanks to changes to the DBI package, you no
longer need to name each argument. -
If you accidentally pass a named vector to any of the database identifer
functions, those names will be automatically stripped (#1404). -
tbl_sql(check_from)
is now deprecated.
-
-
dbplyr now exports some tools to work with the internal
table_path
class
which is useful for certain backends that need to work with this
data structure (#1300).
Improved SQL
-
New translations for clock functions
add_years()
,add_days()
,
date_build()
,get_year()
,get_month()
,get_day()
,
andbase::difftime()
on SQL server, Redshift, Snowflake, and Postgres. -
select()
will keep computed columns used toarrange()
subqueries that are
eliminated by a subsequent select (@ejneer, #1437). -
semi_join()
will no longer inline away an aggregate filter (i.e.HAVING
clause) that was followed by aselect()
(@ejneer, #1474) -
Improved function translations:
-
Functions qualified with the base namespace are now also translated, e.g.
base::paste0(x, "_1")
is now translated (@mgirlich, #1022). -
-1 + x
now generates a translation instead erroring (#1420). -
x$name
never attempts to evaluatename
(#1368). -
You can once again use
NULL
on the LHS of an infix operator in order
to generate SQL with unusual syntax (#1345). -
Namespaced calls now error if the function doesn't exist, or a translation
is not available (#1426). -
lead()
translation coercesn
to an integer.
-
-
Databricks: now supports creating non-temporary tables too (#1418).
-
Oracle:
-
db_explain()
now works (@thomashulst, #1353). -
as.Date()
works when applied to a string (#1389). -
head()
is once again translated toFETCH FIRST
. This does require Oracle
12c or newer, but it actually works, compared to the approach using
ROWNUM
from #1292 (#1436). -
Added support for
str_replace()
andstr_replace_all()
via
REGEXP_REPLACE()
(@thomashulst, #1402).
-
-
Snowflake (@nathanhaigh, #1406)
-
Added support for
str_starts()
andstr_ends()
viaREGEXP_INSTR()
-
Refactored
str_detect()
to useREGEXP_INSTR()
so now supports
regular expressions. -
Refactored
grepl()
to useREGEXP_INSTR()
so now supports
case-insensitive matching throughgrepl(..., ignore.case = TRUE)
-
-
SQL server:
-
MySQL:
as.integer()
gets correct translation (@krlmlr, #1375).
Minor improvements and bug fixes
-
Deprecation status of functions deprecated in previous versions (at least
2 years old) have been advanced. In particular,src_sql()
is now defunct,
as is the use ofpartial_eval()
with characterdata
. -
Database errors now show the generated SQL, which hopefully will make it
faster to track down problems (#1401). -
When dbplyr creates an index on a table in a schema (e.g.
schema.table
),
it now only includes the table name in the index name, not the schema name. -
The class of remote sources now includes all S4 class names, not just
the first (#918). -
compute()
passes additional arguments all the way to
sql_query_save()
-methods (@rsund). -
db_sql_render()
correctly passes on...
when re-calling with
sql_options
set (#1394). -
reframe()
now gives an informative error that it isn't supported (#1148). -
rows_patch(in_place = FALSE)
now works when more than one column should be
patched (@gorcha, #1443). -
sql_translator()
now checks for duplicated definitions (@krlmlr, #1374).
dbplyr 2.4.0
Breaking changes
-
Using
compute(temporary = FALSE)
without providing a name is now
deprecated (@mgirlich, #1154). -
ntile()
's first argument has been renamed fromorder_by
tox
to
match the interface ofdplyr::ntile()
(@mgirlich, #1242). -
simulate_vars()
andsimulate_vars_is_typed()
were removed as they weren't
used and tidyselect now offerstidyselect_data_proxy()
and
tidyselect_data_has_predicates()
(@mgirllich, #1199). -
sql_not_supported()
now expects a function name without parentheses. -
sql_query_append()
,sql_query_insert()
,sql_query_update()
,
sql_query_upsert()
, andsql_query_delete()
changed their arguments to
make them more consistent to the othersql_query_*()
functions:x_name
was renamed totable
.y
was renamed tofrom
and must now be a table identifier or SQL instead
of a lazy table.sql_query_append()
andsql_query_insert()
have gained the argumentcols
.
-
remote_name()
now returns a string with the name of the table. To get the
qualified identifier use the newly addedremote_table()
(@mgirlich, #1280). -
tbl_lazy()
losessrc
argument after it has been deprecated for years
(@mgirlich, #1208). -
translate_sql()
now requires thecon
argument (@mgirlich, #1311).
Thevars
argument has been removed after it threw an error for the last 7
years (@mgirlich).
Improved SQL
-
Preliminary databricks Spark SQL backend (#1377).
-
Joins
-
*_join()
now allows specifying the relationship argument. It must be
NULL
or"many-to-many"
(@bairdj, #1305). -
Queries now qualify
*
with the table alias for better compatibility
(@mgirlich, #1003). -
full_join()
can now handle column names that only differ in case
(@ejneer, #1255). -
The
na_matches
argument ofsemi_join()
andanti_join()
works again
(@mgirlich, #1211). -
A
semi/anti_join()
on fitleredy
is inlined when possible (@mgirlich, #884). -
Joins now work again for Pool and Oracle connections (@mgirlich, #1177, #1181).
-
A sequence of
union()
resp.union_all()
now produces a flat query
instead of subqueries (@mgirlich, #1269). -
Added translations for:
-
if_any()
andif_all()
translations are now wrapped in parentheses.
This makes sure it can be combined via&
with other conditions
(@mgirlich, #1153). -
nth()
,first()
, andlast()
now support thena_rm
argument
(@mgirlich, #1193).
Minor improvements and bug fixes
-
across()
now supports namespaced functions, e.g.
across(x, dplyr::dense_rank)
(@mgirlich, #1231). -
db_copy_to(overwrite = TRUE)
now actually works. -
db_copy_to()
's...
are now passed todb_write_table()
(@mgirlich, #1237). -
Added
db_supports_table_alias_with_as()
to customise whether a backend
supports specifying a table alias withAS
or not (@mgirlich). -
db_write_table()
anddb_save_query()
gain theoverwrite
argument. -
dbplyr_pivot_wider_spec()
is now exported. Unlikepivot_wider()
this can
be lazy. Note that this will be removed soon afterpivot_wider_spec()
becomes a generic (@mgirlich). -
filter()
ing with window functions now generates columns calledcol01
rather thanq01
(@mgirlich, #1258). -
pivot_wider()
now matches tidyrNA
column handling (@ejneer #1238). -
select()
can once again be used afterarrange(desc(x))
(@ejneer, #1240). -
show_query()
andremote_query()
gain the argumentsql_options
that allows
to control how the SQL is generated. It can be created viasql_options()
which has the following arguments:cte
: use common table expressions?use_star
: useSELECT *
or explicitly select every column?qualify_all_columns
: qualify all columns in a join or only the ambiguous ones?
(@mgirlich, #1146).
Consequently the
cte
argument ofshow_query()
andremote_query()
has
been deprecated (@mgirlich, #1146). -
slice_min/max()
can now order by multiple variables like dplyr, e.g. use
slice_min(lf, tibble(x, y))
(@mgirlich, #1167). -
slice_*()
now supports the data masking pronouns.env
and.data
(@mgirlich, #1294). -
sql_join_suffix()
gains the argumentsuffix
so that methods can check
whether the suffix is valid for the backend (@mgirlich). -
sql_random()
is now deprecated. It was used to powerslice_sample()
which
is now done via the translation forrunif()
(@mgirlich, #1200). -
tbl()
now informs when the user probably forgot to wrap the table identifier
within_schema()
orsql()
(@mgirlich, #1287).
Backend specific improvements
-
Access
-
DuckDB
- now supports the
returning
argument ofrows_*()
.
- now supports the
-
MySQL/MariaDB:
rows_update()
androws_patch()
now give an informative error when the
unsupportedreturning
argument is used (@mgirlich, #1279).rows_upsert()
now gives an informative error that it isn't supported
(@mgirlich, #1279).rows_*()
use the column types ofx
when auto copyingy
(@mgirlich, #1327).copy_inline()
now works (@mgirlich, #1188).- Fix translation of
as.numeric()
,as.POSIXct()
,as_datetime()
, and
as.integer64()
(@avsdev-cw, #1189).
-
MS SQL:
row_number()
now works when no order is specified (@ejneer, @fh-mthomson, #1332)
-
Oracle:
- Fix translation of
rows_upsert()
(@mgirlich, @TBlackmore, #1286) head(n)
is now translated toWHERE ROWNUM <= n
to also support old
versions <= 11.2 (@JeremyPasco, #1292).
- Fix translation of
-
Postgres
-
SQLite
- Subqueries now also get an alias. This makes it consistent with other
backends and simplifies the implementation.
- Subqueries now also get an alias. This makes it consistent with other
-
SQL Server
-
Snowflake:
na.rm = TRUE
is now respected inpmin()
andpmax()
instead of being silently ignored (@fh-mthomson, #1329)row_number()
now works when no order is specified (@fh-mthomson, #1332)
-
Teradata
distinct()
+head()
now work (@mgirlich, #685).as.Date(x)
is now translate toCAST(x AS DATE)
again unlessx
is a
string (@mgirlich, #1285).row_number()
no longer defaults to partitioning by groups (now aligned with other databases when no order is specified:ROW_NUMBER()
defaults toORDER BY (SELECT NULL)
) (@fh-mthomson, #1331)
dbplyr 2.3.4
- Hot patch release to resolve R CMD check failures.
dbplyr 2.3.3
- Hot patch release to resolve R CMD check failures.
dbplyr 2.3.2
- Hot patch release to resolve R CMD check failures.
dbplyr 2.3.1
Breaking changes
window_order()
now only accepts bare symbols or symbols wrapped indesc()
.
This breaking change is necessary to allowselect()
to drop and rename
variables used inwindow_order()
(@mgirlich, #1103).
Improved error messages
-
quantile()
andmedian()
now error for SQL Server when used insummarise()
and for PostgreSQL when used inmutate()
as they can't be properly
translated (@mgirlich, #1110). -
Added an informative error for unsupported join arguments
unmatched
and
multiple
(@mgirlich). -
Using predicates, e.g.
where(is.integer)
, inacross()
now produces an
error as they never worked anyway (@mgirlich, #1169). -
Catch unsupported argument
pivot_wider(id_expand = TRUE)
and
pivot_longer(cols_vary)
(@mgirlich, #1109).
Bug fixes in SQL generation
-
Fixed an issue when using a window function after a
summarise()
and
select()
(@mgirlich, #1104). -
Fixed an issue when there where at least 3 joins and renamed variables
(@mgirlich, #1101). -
mutate()
andselect()
afterdistinct()
now again produce a subquery to
generate the correct translation (@mgirlich, #1119, #1141). -
Fixed an issue when using
filter()
on a summarised variable (@mgirlich, #1128). -
mutate()
+filter()
now again produces a new query if themutate()
uses a window function or SQL (@mgirlich, #1135). -
across()
andpick()
can be used (again) indistinct()
(@mgirlich, #1125). -
The
rows_*()
function work again for tables in a schema in PostgreSQL
(@mgirlich, #1133).
Minor improvements and bug fixes
-
sql()
now evaluates its arguments locally also when used inacross()
(@mgirlich, #1039). -
The rank functions (
row_number()
,min_rank()
,rank()
,dense_rank()
,
percent_rank()
, andcume_dist()
) now support multiple variables by
wrapping them intibble()
, e.g.rank(tibble(x, y))
(@mgirlich, #1118). -
Added support for
join_by()
added in dplyr 1.1.0 (@mgirlich, #1074). -
Using
by = character()
to perform a cross join is now soft-deprecated in
favor ofcross_join()
. -
full_join()
andright_join()
are now translated directly toFULL JOIN
andRIGHT JOIN
for SQLite as native support was finally added (@mgirlich, #1150). -
case_match()
now works with strings on the left hand side (@mgirlich, #1143). -
The rank functions (
row_number()
,min_rank()
,rank()
,dense_rank()
,
percent_rank()
, andcume_dist()
) now work again for variables wrapped in
desc()
, e.g.row_number(desc(x))
(@mgirlich, #1118). -
Moved argument
auto_index
after...
in*_join()
(@mgirlich, #1115). -
across()
now uses the original value when a column is overriden to match
the behaviour of dplyr. For examplemutate(df, across(c(x, y), ~ .x / x))
now producesSELECT `x` / `x` AS `x`, `y` / `x` AS `y` FROM `df`
instead of
SELECT `x`, `y` / `x` AS `y` FROM ( SELECT `x` / `x` AS `x`, `y` FROM `df` )
-
Restricted length of table aliases to avoid truncation on certain backends (e.g., Postgres) (@fh-mthomson, #1096)
dbplyr 2.3.0
New features
-
stringr::str_like()
(new in 1.5.0) is translated to the closestLIKE
equivalent (@rjpat, #509) -
In preparation for dplyr 1.1.0:
- The
.by
argument is supported (@mgirlich, #1051). - Passing
...
toacross()
is deprecated because the evaluation timing
of...
is ambiguous. Now instead of (e.g.)
across(a:b, mean, na.rm = TRUE)
use pick()
is translated (@mgirlich, #1044).case_match()
is translated (@mgirlich, #1020).case_when()
now supports the.default
argument (@mgirlich, #1017).
- The
-
Variables that aren't found in either the data or in the environment now
produce an error (@mgirlich, #907).
SQL optimisation
-
dbplyr now produces fewer subqueries resulting in shorter, more readable, and,
in some cases, faster SQL. The following combination of verbs now avoids a
subquery if possible: -
dbplyr now uses
SELECT *
after a join instead of explicitly selecting every
column, where possible (@mgirlich, #898). -
Joins only use the table aliases ("LHS" and "RHS") if necessary (@mgirlich).
-
When using common table expressions, the results of joins and set operations
are now reused (@mgirlich, #978).
Improved error messages
-
Many errors have been improved and now show the function where the error
happened instead of a helper function (@mgirlich, #907). -
Errors produced by the database, e.g. in
collect()
orrows_*()
, now show
the verb where the error happened (@mgirlich). -
window_order()
now produces a better error message when applied to a data
frame (@mgirlich, #947). -
Using a named
across()
now gives a clear error message (@mgirlich, #761).
Minor improvements and bug fixes
-
Keyword highlighting can now be customised via the option
dbplyr_highlight
.
Turn it off viaoptions(dbplyr_highlight = FALSE)
or pass a custom ansi
style, e.g.options(dbplyr_highlight = cli::combine_ansi_styles("bold", "cyan"))
(@mgirlich, #974). -
The rank functions (
row_number()
,min_rank()
,rank()
,dense_rank()
,
percent_rank()
, andcume_dist()
) now give missing values the rank NA to
match the behaviour of dplyr (@mgirlich, #991). -
NA
s inblob()
s are correctly translated toNULL
(#983). -
copy_inline()
gains atypes
argument to specify the SQL column types
(@mgirlich, #963). -
distinct()
returns columns ordered the way you request, not the same
as the input data (@mgirlich). -
fill()
can now fill "downup" and "updown" (@mgirlich, #1057), and
now order by non-numeric columns also in the up direction (@mgirlich, #1057). -
filter()
now works when using a window function and an external vector
(#1048). -
group_by()
+ renamed columns works once again (@mgirlich, #928). -
last()
is correctly translated when no window frame is specified
(@mgirlich, #1063). -
setOldClass()
uses a namespace, fixing an installation issue (@mgirlich, #927). -
sql()
is now translated differently. The...
are now evaluated locally
instead of being translated withtranslate_sql()
(@mgirlich, #952).
Backend specific improvements
-
HANA:
-
MySQL:
str_flatten()
usescollapse = ""
by default (@fh-afrachioni, #993)
-
Oracle:
-
PostgreSQL:
- Generates correct literals for Dates (#727).
str_flatten()
usescollapse = ""
by default (@fh-afrachioni, #993)rows_*()
use the column types ofx
when auto copying (@mgirlich, #909).
-
Redshift:
round()
now respects thedigits
argument (@owenjonesuob, #1033).- No longer tries to use named windows anymore (@owenjonesuob, #1035).
copy_inline()
now works for Redshift (#949, thanks to @ejneer for an
initial implementation).str_flatten()
usescollapse = ""
by default (@fh-afrachioni, #993)
-
Snowflake:
-
numeric functions:
all()
,any()
,log10()
,round()
,cor()
,cov()
andsd()
. -
date functions:
day()
,mday()
,wday()
,yday()
,week()
,
isoweek()
,month()
,quarter()
,isoyear()
,seconds()
,minutes()
,
hours()
,days()
,weeks()
,months()
,years()
andfloor_date()
. -
string functions:
grepl()
,paste()
,paste0()
,str_c()
,str_locate()
,
str_detect()
,str_replace()
,str_replace_all()
,str_remove()
,
str_remove_all()
,str_trim()
,str_squish()
andstr_flatten()
(@fh-afrachioni, #860). -
str_flatten()
usescollapse = ""
by default (@fh-afrachioni, #993) -
SQLite:
-
SQL server:
-
Teradata:
dbplyr 2.2.1
-
Querying Oracle databases works again. Unfortunately, the fix requires every
column to be explicitly selected again (@mgirlich, #908). -
semi_join()
andanti_join()
work again for Spark (@mgirlich, #915). -
str_c()
is now translated to||
in Oracle (@mgirlich, #921). -
sd()
,var()
,cor()
andcov()
now give clear error messages on
databases that don't support them. -
any()
andall()
gain default translations for all backends.
dbplyr 2.2.0
dbplyr 2.2.0
New features
-
SQL formatting has been considerably improved with new wrapping and indenting.
show_query()
creates more readable queries by printing the keywords in blue
(@mgirlich, #644). When possible dbplyr now usesSELECT *
instead of
explicitly selecting every column (@mgirlich). -
Added support for
rows_insert()
,rows_append()
,rows_update()
,
rows_patch()
,rows_upsert()
, androws_delete()
(@mgirlich, #736). -
Added
copy_inline()
as acopy_to()
equivalent that does not need write
access (@mgirlich, #628). -
remote_query()
,show_query()
,compute()
andcollect()
have an
experimentalcte
argument. IfTRUE
the SQL query will use common table
expressions instead of nested queries (@mgirlich, #638). -
New
in_catalog()
, which works likein_schema()
, but allows creation of
table identifiers consisting of three components: catalog, schema, name
(#806, @krlmlr).
Improvements to SQL generation
-
When possible, dbplyr now uses
SELECT *
instead of explicitly selecting
every column (@mgirlich). -
Improved translations for specific backends:
as.Date()
for Oracle (@mgirlich, #661).case_when()
with a final clause of the formTRUE ~ ...
usesELSE ...
for SQLite (@mgirlich, #754).day()
,week()
,isoweek()
, andisoyear()
for Postgres (@mgirlich, #675).explain()
for ROracle (@mgirlich).fill()
for SQL Server (#651, @mgirlich) and RPostgreSQL (@mgirlich).quantile()
for SQL Server (@mgirlich, #620).str_flatten()
for Redshift (@hdplsa, #804)slice_sample()
for MySQL/MariaDB and SQL Server (@mgirlich, #617).union()
for Hive (@mgirlich, #663).
-
The backend function
dbplyr_fill0()
(used for databases that lack
IGNORE NULLS
support) now respects database specific translations
(@rsund, #753). -
Calls of the form
stringr::foo()
orlubridate::foo()
are now evaluated in
the database, rather than locally (#197). -
Unary plus (e.g.
db %>% filter(x == +1)
) now works (@mgirlich, #674). -
is.na()
,ifelse()
,if_else()
,case_when()
, andif()
generate slightly more compact SQL (@mgirlich, #738). -
if_else()
now supports themissing
argument (@mgirlich, #641). -
quantile()
no longer errors when using thena.rm
argument (@mgirlich, #600). -
remote_name()
now returns a name in more cases where it makes sense
(@mgirlich, #850). -
The partial evaluation code is now more aligned with
dtplyr
. This makes it
easier to transfer bug fixes and new features from one package to the other.
In this process the second argument ofpartial_eval()
was changed to a lazy
frame instead of a character vector of variables (@mgirlich, #766).
Partially evaluated expressions with infix operations are now correctly
translated. For exampletranslate_sql(!!expr(2 - 1) * x)
now works
(@mgirlich, #634).
Minor improvements and bug fixes
-
New
pillar::tbl_format_header()
method for lazy tables: Printing a lazy
table where all rows are displayed also shows the exact number of rows in the
header. The threshold is controlled bygetOption("pillar.print_min")
,
with a default of 10 (#796, @krlmlr). -
The 1st edition extension mechanism is formally deprecated (#507).
-
across()
,if_any()
andif_all()
now defaults to.cols = everything()
(@mgirlich, #760). If.fns
is not providedif_any()
andif_all()
work
like a parallel version ofany()
/any()
(@mgirlich, #734). -
across()
,if_any()
, andif_all()
can now translate evaluated lists
and functions (@mgirlich, #796), and accept the name of a list of functions
(@mgirlich, #817). -
Multiple
across()
calls inmutate()
andtransmute()
can now access
freshly created variables (@mgirlich, #802). -
add_count()
now doesn't change the groups of the input (@mgirlich, #614). -
compute()
can now handle whenname
is named by unnaming it first
(@mgirlich, #623), and now works whentemporary = TRUE
for Oracle
(@mgirlich, #621). -
explain()
passes...
to methods (@mgirlich, #783), and
works for Redshift (@mgirlich, #740). -
filter()
throws an error if you supply a named argument (@mgirlich, #764). -
Joins disambiguates columns that only differ in case (@mgirlich, #702).
New argumentsx_as
andy_as
allow you to control the table alias
used in SQL query (@mgirlich, #637). Joins withna_matches = "na"
now work
for DuckDB (@mgirlich, #704). -
mutate()
andtransmute()
use named windows if a window definition is
used at least twice and the backend supports named windows (@mgirlich, #624). -
mutate()
now supports the arguments.keep
,.before
, and.after
(@mgirlich, #802). -
na.rm = FALSE
only warns once every 8 hours across all functions (#899). -
nesting()
now supports the.name_repair
argument (@mgirlich, #654). -
pivot_longer()
can now pivot a column namedname
(@mgirlich, #692),
can repair names (@mgirlich, #694), and can work with multiplenames_from
columns (@mgirlich, #693). -
pivot_wider(values_fn = )
andpivot_longer(values_transform = )
can now be formulas (@mgirlich, #745). -
pivot_wider()
now supports the argumentsnames_vary
,names_expand
, and
unused_fn
(@mgirlich, #774). -
remote_name()
now returns a name in more cases where it makes sense
(@mgirlich, #850). -
sql_random()
is now exported. -
ungroup()
removes variables in...
from grouping (@mgirlich, #689).
dbplyr 2.1.1
-
New support for Snowflake (@edgararuiz)
-
compute()
,sql_table_index()
, andsql_query_wrap()
now work with
schemas (@mgirlich, #595). -
if_any()
andif_all()
are now translated. -
group_by()
now ungroups when the dots argument is empty and.add
isFALSE
(@mgirlich, #615). -
sql_escape_date()
andsql_escape_datetime
gain methods for MS Access
(@erikvona, #608).