Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optional ID name for single series #224

Open
fuleky opened this issue Sep 21, 2023 · 3 comments
Open

optional ID name for single series #224

fuleky opened this issue Sep 21, 2023 · 3 comments

Comments

@fuleky
Copy link

fuleky commented Sep 21, 2023

tsbox is a fantastic package! Thank you for all your work on it.

I came across one issue that complicates my workflow: the differential treatment of univariate vs multivariate time series. Multivariate time series (e.g. ts and xts) have a name for each series, and those are retained between transformations by tsbox.

While the ts time series format doesn't use (or drops) the series name when there is only a single series in the data set, the xts format retains the series name. This is a useful feature of the xts format since the user does not need to test for or distinguish between univariate vs multivariate time series.

Unfortunately, transforming an xts series via ts_tbl() drops the series name when there is only a single series. It would be ideal if the series name was retained, and ts_tbl() returned the default long format with id, time, value columns.

See the example below:

Session Info
library("tsbox")

# generate test data
(sers_wide <- tibble::tibble(
  time = seq.Date(from = as.Date("2001-01-01"), 
                  to = as.Date("2005-01-01"), 
                  by = "year"),
  series_1 = 1:5,
  series_2 = 101:105
))
#> # A tibble: 5 × 3
#>   time       series_1 series_2
#>   <date>        <int>    <int>
#> 1 2001-01-01        1      101
#> 2 2002-01-01        2      102
#> 3 2003-01-01        3      103
#> 4 2004-01-01        4      104
#> 5 2005-01-01        5      105

# example with two series:

# convert to long format (default of tsbox: id, time, value)
(sers_long <- tsbox::ts_long(sers_wide))
#> # A tibble: 10 × 3
#>    id       time       value
#>    <chr>    <date>     <int>
#>  1 series_1 2001-01-01     1
#>  2 series_1 2002-01-01     2
#>  3 series_1 2003-01-01     3
#>  4 series_1 2004-01-01     4
#>  5 series_1 2005-01-01     5
#>  6 series_2 2001-01-01   101
#>  7 series_2 2002-01-01   102
#>  8 series_2 2003-01-01   103
#>  9 series_2 2004-01-01   104
#> 10 series_2 2005-01-01   105

# convert to xts, names of series remain present
(sers_xts <- tsbox::ts_ts(sers_long))
#> Time Series:
#> Start = 2001 
#> End = 2005 
#> Frequency = 1 
#>      series_1 series_2
#> 2001        1      101
#> 2002        2      102
#> 2003        3      103
#> 2004        4      104
#> 2005        5      105

# convert to default long format, recover original info
(ser_long_alt <- tsbox::ts_tbl(sers_xts))
#> # A tibble: 10 × 3
#>    id       time       value
#>    <chr>    <date>     <int>
#>  1 series_1 2001-01-01     1
#>  2 series_1 2002-01-01     2
#>  3 series_1 2003-01-01     3
#>  4 series_1 2004-01-01     4
#>  5 series_1 2005-01-01     5
#>  6 series_2 2001-01-01   101
#>  7 series_2 2002-01-01   102
#>  8 series_2 2003-01-01   103
#>  9 series_2 2004-01-01   104
#> 10 series_2 2005-01-01   105

# exmpale with one series:

# pick one of the two series above (starting off with id, time value)
(ser_1_long <- tsbox::ts_pick(sers_long, "series_1"))
#> # A tibble: 5 × 3
#>   id       time       value
#>   <chr>    <date>     <int>
#> 1 series_1 2001-01-01     1
#> 2 series_1 2002-01-01     2
#> 3 series_1 2003-01-01     3
#> 4 series_1 2004-01-01     4
#> 5 series_1 2005-01-01     5

# convert to xts, name of series remains present
(ser_1_xts <- tsbox::ts_xts(ser_1_long))
#> Loading required namespace: xts
#>            series_1
#> 2001-01-01        1
#> 2002-01-01        2
#> 2003-01-01        3
#> 2004-01-01        4
#> 2005-01-01        5

# converting to tbl does not recover the long format, 
# id column is dropped, series name is lost
(ser_1_long_alt <- tsbox::ts_tbl(ser_1_xts))
#> # A tibble: 5 × 2
#>   time       value
#>   <date>     <int>
#> 1 2001-01-01     1
#> 2 2002-01-01     2
#> 3 2003-01-01     3
#> 4 2004-01-01     4
#> 5 2005-01-01     5

# obviously following up with a conversion to long
# does not recover the lost info
(ser_1_long_long <- tsbox::ts_long(ser_1_long_alt))
#> # A tibble: 5 × 3
#>   id    time       value
#>   <chr> <date>     <int>
#> 1 value 2001-01-01     1
#> 2 value 2002-01-01     2
#> 3 value 2003-01-01     3
#> 4 value 2004-01-01     4
#> 5 value 2005-01-01     5

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.1 (2023-06-16)
#>  os       macOS Ventura 13.4
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Pacific/Honolulu
#>  date     2023-09-20
#>  pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package     * version date (UTC) lib source
#>  P anytime       0.3.9   2020-08-27 [?] RSPM (R 4.3.0)
#>  P cachem        1.0.8   2023-05-01 [?] CRAN (R 4.3.0)
#>  P callr         3.7.3   2022-11-02 [?] CRAN (R 4.3.0)
#>  P cli           3.6.1   2023-03-23 [?] CRAN (R 4.3.0)
#>  P crayon        1.5.2   2022-09-29 [?] CRAN (R 4.3.0)
#>  P data.table    1.14.8  2023-02-17 [?] CRAN (R 4.3.0)
#>  P devtools      2.4.5   2022-10-11 [?] CRAN (R 4.3.0)
#>  P digest        0.6.33  2023-07-07 [?] CRAN (R 4.3.0)
#>  P ellipsis      0.3.2   2021-04-29 [?] CRAN (R 4.3.0)
#>  P evaluate      0.21    2023-05-05 [?] CRAN (R 4.3.0)
#>  P fansi         1.0.4   2023-01-22 [?] CRAN (R 4.3.0)
#>  P fastmap       1.1.1   2023-02-24 [?] CRAN (R 4.3.0)
#>  P fs            1.6.3   2023-07-20 [?] CRAN (R 4.3.0)
#>  P glue          1.6.2   2022-02-24 [?] CRAN (R 4.3.0)
#>  P htmltools     0.5.6   2023-08-10 [?] CRAN (R 4.3.0)
#>  P htmlwidgets   1.6.2   2023-03-17 [?] RSPM (R 4.3.0)
#>  P httpuv        1.6.11  2023-05-11 [?] RSPM (R 4.3.0)
#>  P knitr         1.44    2023-09-11 [?] CRAN (R 4.3.1)
#>  P later         1.3.1   2023-05-02 [?] RSPM (R 4.3.0)
#>  P lattice       0.21-8  2023-04-05 [?] CRAN (R 4.3.0)
#>  P lifecycle     1.0.3   2022-10-07 [?] CRAN (R 4.3.0)
#>  P magrittr      2.0.3   2022-03-30 [?] CRAN (R 4.3.0)
#>  P memoise       2.0.1   2021-11-26 [?] CRAN (R 4.3.0)
#>  P mime          0.12    2021-09-28 [?] CRAN (R 4.3.0)
#>  P miniUI        0.1.1.1 2018-05-18 [?] CRAN (R 4.3.0)
#>  P pillar        1.9.0   2023-03-22 [?] CRAN (R 4.3.0)
#>  P pkgbuild      1.4.2   2023-06-26 [?] CRAN (R 4.3.0)
#>  P pkgconfig     2.0.3   2019-09-22 [?] CRAN (R 4.3.0)
#>  P pkgload       1.3.2.1 2023-07-08 [?] RSPM (R 4.3.0)
#>  P prettyunits   1.1.1   2020-01-24 [?] CRAN (R 4.3.0)
#>  P processx      3.8.2   2023-06-30 [?] CRAN (R 4.3.0)
#>  P profvis       0.3.8   2023-05-02 [?] CRAN (R 4.3.0)
#>  P promises      1.2.1   2023-08-10 [?] RSPM (R 4.3.0)
#>  P ps            1.7.5   2023-04-18 [?] CRAN (R 4.3.0)
#>  P purrr         1.0.2   2023-08-10 [?] CRAN (R 4.3.0)
#>  P R.cache       0.16.0  2022-07-21 [?] CRAN (R 4.3.0)
#>  P R.methodsS3   1.8.2   2022-06-13 [?] CRAN (R 4.3.0)
#>  P R.oo          1.25.0  2022-06-12 [?] CRAN (R 4.3.0)
#>  P R.utils       2.12.2  2022-11-11 [?] CRAN (R 4.3.0)
#>  P R6            2.5.1   2021-08-19 [?] CRAN (R 4.3.0)
#>  P Rcpp          1.0.11  2023-07-06 [?] CRAN (R 4.3.0)
#>  P remotes       2.4.2.1 2023-07-18 [?] CRAN (R 4.3.0)
#>  P reprex        2.0.2   2022-08-17 [?] CRAN (R 4.3.0)
#>  P rlang         1.1.1   2023-04-28 [?] CRAN (R 4.3.0)
#>  P rmarkdown     2.25    2023-09-18 [?] CRAN (R 4.3.1)
#>  P rstudioapi    0.15.0  2023-07-07 [?] CRAN (R 4.3.0)
#>  P sessioninfo   1.2.2   2021-12-06 [?] CRAN (R 4.3.0)
#>  P shiny         1.7.5   2023-08-12 [?] RSPM (R 4.3.0)
#>  P stringi       1.7.12  2023-01-11 [?] CRAN (R 4.3.0)
#>  P stringr       1.5.0   2022-12-02 [?] CRAN (R 4.3.0)
#>  P styler        1.10.2  2023-08-29 [?] CRAN (R 4.3.0)
#>  P tibble        3.2.1   2023-03-20 [?] CRAN (R 4.3.0)
#>  P tsbox       * 0.4.1   2023-05-08 [?] RSPM (R 4.3.0)
#>  P urlchecker    1.0.1   2021-11-30 [?] CRAN (R 4.3.0)
#>  P usethis       2.2.2   2023-07-06 [?] CRAN (R 4.3.0)
#>  P utf8          1.2.3   2023-01-31 [?] CRAN (R 4.3.0)
#>  P vctrs         0.6.3   2023-06-14 [?] CRAN (R 4.3.0)
#>  P withr         2.5.0   2022-03-03 [?] CRAN (R 4.3.0)
#>  P xfun          0.40    2023-08-09 [?] CRAN (R 4.3.0)
#>  P xtable        1.8-4   2019-04-21 [?] RSPM (R 4.3.0)
#>  P xts           0.13.1  2023-04-16 [?] RSPM (R 4.3.0)
#>  P yaml          2.3.7   2023-01-23 [?] CRAN (R 4.3.0)
#>  P zoo           1.8-12  2023-04-13 [?] CRAN (R 4.3.0)
#> 
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2023-09-20 with reprex v2.0.2

@christophsax
Copy link
Collaborator

Thanks! I had similar thoughts before. Keeping the names in the n = 1 case if they are already there may be useful.

@fuleky
Copy link
Author

fuleky commented Sep 22, 2023

It seems that this issue affects more than just the ts_tbl() function. Sorry to pile on, but I came across another case where the series name gets lost in the n=1 case. Here it is the ts_summary() function that loses the name:

library("tsbox")

# generate test data
(sers_wide <- tibble::tibble(
  time = seq.Date(
    from = as.Date("2001-01-01"),
    to = as.Date("2005-01-01"),
    by = "year"
  ),
  series_1 = 1:5,
  series_2 = 101:105
)
)
#> # A tibble: 5 × 3
#>   time       series_1 series_2
#>   <date>        <int>    <int>
#> 1 2001-01-01        1      101
#> 2 2002-01-01        2      102
#> 3 2003-01-01        3      103
#> 4 2004-01-01        4      104
#> 5 2005-01-01        5      105

# example with two series
(sers_long <- tsbox::ts_long(sers_wide) |>
  tsbox::ts_xts() |>
  tsbox::ts_summary()
)
#> Loading required namespace: xts
#>         id obs   diff freq      start        end
#> 1 series_1   5 1 year    1 2001-01-01 2005-01-01
#> 2 series_2   5 1 year    1 2001-01-01 2005-01-01

# example with one series
(sers_long <- tsbox::ts_long(sers_wide) |>
  tsbox::ts_pick("series_1") |>
  tsbox::ts_xts() |> # the name is still present at this stage
  tsbox::ts_summary() # here the name gets replaced by the pipe (see the id field in the output)
)
#>                                                                     id obs
#> 1 tsbox::ts_xts(tsbox::ts_pick(tsbox::ts_long(sers_wide), "series_1"))   5
#>     diff freq      start        end
#> 1 1 year    1 2001-01-01 2005-01-01

Created on 2023-09-21 with reprex v2.0.2

@christophsax
Copy link
Collaborator

Good point. Single series objects loos their name when converted to the internal dts class:

library(tsbox)
single_with_name <- 
  ts_c(fdeaths, mdeaths) |>
  ts_tbl() |> 
  ts_pick("fdeaths") |> 
  ts_xts()
#> Loading required namespace: tibble
#> Loading required namespace: xts

single_with_name |> 
  head()
#>            fdeaths
#> 1974-01-01     901
#> 1974-02-01     689
#> 1974-03-01     827
#> 1974-04-01     677
#> 1974-05-01     522
#> 1974-06-01     406

single_with_name |> 
  ts_dts() |> 
  attributes()
#> $names
#> [1] "time"  "value"
#> 
#> $row.names
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
#> 
#> $class
#> [1] "dts"        "data.table" "data.frame"
#> 
#> $.internal.selfref
#> <pointer: 0x12b00ece0>
#> 
#> $cname
#> $cname$id
#> character(0)
#> 
#> $cname$time
#> [1] "time"
#> 
#> $cname$value
#> [1] "value"

Changing this probably requires adjustments in a few places. Originally, I was convinced that dropping the name was the right thing. I guess this was because single ts objects don't have a series name. So the 'single series don't have a name' principle was applied quite deeply.

But I don't see much of a problem if a single series can have an optional series name that is kept if we convert between objects that support this (such as xts or data frames). If we convert to ts, we still lose it, but that's ok.

@christophsax christophsax changed the title ts_tbl() drops the name if there is only a single time series stored in xts format optional ID name for single series Sep 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants