-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write columns of the imputed data in the same sequence as the incomplete data (long form) #569
Conversation
…ect and the long form complete data
…xports to long format by complete(..., action = "long"). This will preserve the original variable sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sensible new default! Three questions:
-
Would there be a use for an optional argument to keep the legacy code available (e.g.
order = "first"
)? I don't know of any use cases, but it might be nice to provide shorthand to keep old code relying on column ordering working with a single argument. -
About the
complete
function in general, is the current default of just a single complete dataset 'foolproof'? See e.g. this interpretation of the default outcome of the function. Maybe it would be wise to warn the user in some way that this is not the full result of their MI procedure? Or to change the default to the list of completed datasets (i.e.action = "all"
)? -
Another question about the existing
complete
function, nl. whether therownames
might be used as.id
qualifier? See reprex below.
library(mice)
#>
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following objects are masked from 'package:base':
#>
#> cbind, rbind
rownames(boys)[1:10]
#> [1] "3" "4" "18" "23" "28" "36" "37" "38" "39" "43"
imp <- mice(boys, printFlag = FALSE)
long <- complete(imp, "long")
long$.id[1:10]
#> [1] 1 2 3 4 5 6 7 8 9 10
rownames(long)[1:10]
#> [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
Created on 2023-07-20 with reprex v2.0.2
Good suggestions:
library(mice, warn.conflicts = FALSE)
head(rownames(boys))
#> [1] "3" "4" "18" "23" "28" "36"
head(attr(boys, 'row.names'))
#> [1] 3 4 18 23 28 36
imp <- mice(boys, printFlag = FALSE)
long <- complete(imp, "long", include = TRUE)
head(long$.id)
#> [1] 3 4 18 23 28 36
imp2 <- as.mids(long)
head(rownames(imp2$data))
#> [1] "3" "4" "18" "23" "28" "36"
head(attr(imp2$data, "row.names"))
#> [1] 3 4 18 23 28 36 Created on 2023-07-20 with reprex v2.0.2 Thanks for the feedback. |
The call
complete(imp, action = "long", include = TRUE)
exports the imputed data in long form with(m + 1) * n
records. The exported data contains two new variables with names:".imp"
: the imputation sequence number, starting from 0 (for the original data) through1:m
".id"
: therow.names
ofimp$data
Until now, the new variables are written to columns 1 and 2 of the data. The disadvantage is that this changes the column positions as found in the original data
imp$data
.This PR writes the two new variables as the last two variables. In this way, the columns of the imputed data will have the same positions as in the original data, which is more user-friendly and easier to work with.
Commit cdb8bcf solves a problem with
complete()
that prevented proper transfer of the type of.id
variable (integer
orcharacter
).Commit ca1e876 changes the column order in
complete()
and adapts functions that silently assumed that.imp
and.id
would be in columns 1 and 2, respectively (e.g. in plots and tests).Note that any existing code that assumes that variables
".imp"
and".id"
are in columns 1 and 2 will need to be modified. The advice is to modify the code using the variable names".imp"
and".id"
.