Write columns of the imputed data in the same sequence as the incomplete data (long form) #569

stefvanbuuren · 2023-07-19T13:02:34Z

The call complete(imp, action = "long", include = TRUE) exports the imputed data in long form with (m + 1) * n records. The exported data contains two new variables with names:

".imp": the imputation sequence number, starting from 0 (for the original data) through 1:m
".id": the row.names of imp$data

Until now, the new variables are written to columns 1 and 2 of the data. The disadvantage is that this changes the column positions as found in the original data imp$data.

This PR writes the two new variables as the last two variables. In this way, the columns of the imputed data will have the same positions as in the original data, which is more user-friendly and easier to work with.

Commit cdb8bcf solves a problem with complete() that prevented proper transfer of the type of .id variable (integer or character).

Commit ca1e876 changes the column order in complete() and adapts functions that silently assumed that .imp and .id would be in columns 1 and 2, respectively (e.g. in plots and tests).

Note that any existing code that assumes that variables ".imp" and ".id" are in columns 1 and 2 will need to be modified. The advice is to modify the code using the variable names ".imp" and ".id".

…ect and the long form complete data

…xports to long format by complete(..., action = "long"). This will preserve the original variable sequence.

hanneoberman

Sensible new default! Three questions:

Would there be a use for an optional argument to keep the legacy code available (e.g. order = "first")? I don't know of any use cases, but it might be nice to provide shorthand to keep old code relying on column ordering working with a single argument.
About the complete function in general, is the current default of just a single complete dataset 'foolproof'? See e.g. this interpretation of the default outcome of the function. Maybe it would be wise to warn the user in some way that this is not the full result of their MI procedure? Or to change the default to the list of completed datasets (i.e. action = "all")?
Another question about the existing complete function, nl. whether the rownames might be used as .id qualifier? See reprex below.

library(mice)
#> 
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind
rownames(boys)[1:10]
#>  [1] "3"  "4"  "18" "23" "28" "36" "37" "38" "39" "43"
imp <- mice(boys, printFlag = FALSE)
long <- complete(imp, "long")
long$.id[1:10]
#>  [1]  1  2  3  4  5  6  7  8  9 10
rownames(long)[1:10]
#>  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

^{Created on 2023-07-20 with reprex v2.0.2}

stefvanbuuren · 2023-07-20T15:48:53Z

Good suggestions:

I have added an order argument to complete() to support old code.
Important point, but unfortunately I cannot change the default action in complete(). The function is widely used in mice and outside, with a history of over 20 years. I programmed the warning, but the result was ugly, with distracting warning messages popping up from various functions. I do not wish to punish savvy users, so I removed it.
The glitch was indeed present on 3.16.0, and I also saw it. Here's what it does now:

library(mice, warn.conflicts = FALSE)
head(rownames(boys))
#> [1] "3"  "4"  "18" "23" "28" "36"
head(attr(boys, 'row.names'))
#> [1]  3  4 18 23 28 36
imp <- mice(boys, printFlag = FALSE)
long <- complete(imp, "long", include = TRUE)
head(long$.id)
#> [1]  3  4 18 23 28 36
imp2 <- as.mids(long)
head(rownames(imp2$data))
#> [1] "3"  "4"  "18" "23" "28" "36"
head(attr(imp2$data, "row.names"))
#> [1]  3  4 18 23 28 36

^{Created on 2023-07-20 with reprex v2.0.2}

Thanks for the feedback.

stefvanbuuren added 2 commits July 19, 2023 12:12

Preserve the rownames type (integer or character) across the mids obj…

cdb8bcf

…ect and the long form complete data

Write .imp and .id variable as last instead for first variables for e…

ca1e876

…xports to long format by complete(..., action = "long"). This will preserve the original variable sequence.

stefvanbuuren requested review from gerkovink and hanneoberman July 19, 2023 13:13

stefvanbuuren added the enhancement label Jul 19, 2023

hanneoberman approved these changes Jul 20, 2023

View reviewed changes

For backward compatibility, adds an order argument to complete()

8325dd1

hanneoberman approved these changes Jul 20, 2023

View reviewed changes

stefvanbuuren added 2 commits July 21, 2023 09:20

Merge branch 'master' into complete_func

1cb1411

Document change in NEWS.md

409acfd

stefvanbuuren merged commit 05e90b3 into master Jul 21, 2023

stefvanbuuren deleted the complete_func branch July 21, 2023 07:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write columns of the imputed data in the same sequence as the incomplete data (long form) #569

Write columns of the imputed data in the same sequence as the incomplete data (long form) #569

stefvanbuuren commented Jul 19, 2023

hanneoberman left a comment

stefvanbuuren commented Jul 20, 2023

Write columns of the imputed data in the same sequence as the incomplete data (long form) #569

Write columns of the imputed data in the same sequence as the incomplete data (long form) #569

Conversation

stefvanbuuren commented Jul 19, 2023

hanneoberman left a comment

Choose a reason for hiding this comment

stefvanbuuren commented Jul 20, 2023