README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# dbmm: Dynamic Bayesian Measurement Models #

<!-- badges: start -->
<!-- badges: end -->

## Installation

You can install the development version of **dbmm** from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("devincaughey/dbmm")
```

You will also need to install **cmdstanr** (for instructions, see [here](https://mc-stan.org/cmdstanr/articles/cmdstanr.html)).

## Overview ##

The R package **dbmm** fits dynamic Bayesian measurement models using the
programming language [Stan](https://mc-stan.org) via the R package [**cmdstanr**](https://mc-stan.org/cmdstanr/). Currently, the only supported model is a dynamic factor (DF) model for indicators of mixed type (binary, ordinal, or metric). In the future, however, the package will incorporate other models, including the dynamic group-level item response theory (DGIRT) model currently implemented by the R package [**dgo**](https://github.com/jamesdunham/dgo).

## Workflow ##

Using **dbmm** involves the following steps:

  1. *Shape* the data into the list format required by **cmdstanr**.
  2. *Fit* the required model in Stan using.
  3. *Extract* parameter draws from the fitted model.
  4. If needed, *identify* the model the model by rotating and/or sign-flipping the draws.
  5. *Check* the convergence diagnostics of the fitted model.
  5. *Label* the draws with informative parameter names.
  6. *Summarize* and *plot* the posterior distributions.


### Step 1: Shape data

```{r, eval=FALSE}
## Load data on societal attributes of U.S. states in 2020 and 2021
data("social_outcomes_2020_2021")

## Drop observations with missing values
social_outcomes_2020_2021 <- na.omit(social_outcomes_2020_2021)

## Shape the data into list form
shaped_data <- shape_data(
    long_data = social_outcomes_2020_2021,
    unit_var = "st",                      
    time_var = "year",                    
    item_var = "outcome",                 
    value_var = "value",                  
    periods_to_estimate = 2020:2021,      
    ordinal_items = NA,
    binary_items = NA,
    max_cats = 10,
    standardize = TRUE,
    make_indicator_for_zeros = TRUE
)
```

### Step 2: Fit the model

You can specify additional arguments for `cmdstanr::sample()`. For details, see [here](https://mc-stan.org/cmdstanr/reference/model-method-sample.html).

```{r,eval=FALSE}
options(mc.cores = parallel::detectCores()) # for parallizing across chains
fitted <- fit(
    data = shaped_data,
    n_dim = 2,
    chains = 2,
    parallelize_within_chains = FALSE,
    constant_alpha = FALSE,
    separate_eta = TRUE,
    init_kappa = FALSE,
    force_recompile = FALSE,
    iter_warmup = 500, 
    iter_sampling = 500,
    adapt_delta = .9,
    refresh = 10,
    seed = 123
)
```

### Step 3: Extract draws ###

```{r,eval=FALSE}
fitted_draws <- extract_draws(fitted)
head(fitted_draws)
```

### Step 4: Identify the model ###

```{r,eval=FALSE}
identified_draws <- identify_draws(fitted_draws, rotate = TRUE)
## (To apply varimax rotation, set `rotate = TRUE`.)
```

### Step 5: Check convergence of the identified model ###

```{r,eval=FALSE}
## Basic check
check_convergence(identified_draws$id_draws)
## Traceplot of selected parameters
bayesplot::mcmc_trace(identified_draws$id_draws, pars = "lambda_metric[23,2]")
## More details 
summarized_draws <- summary(identified_draws$id_draws)
summary(summarized_draws)
```

### Step 6: Label parameters ###

```{r,eval=FALSE}
labeled_draws <- label_draws(identified_draws)
head(labeled_draws$eta)
head(labeled_draws$lambda_metric)
```

### Step 7: Summarizing and plotting the posterior draws

```{r,eval=FALSE}
library(tidyverse)

## Posterior mean and standard deviation of the factor scores and item loadings
eta_summ <- labeled_draws$eta %>%
    summarise(
        est = mean(value),
        err = sd(value),
        .by = c(TIME, UNIT, dim)
    )
head(eta_summ)
lambda_metric_summ <- labeled_draws$lambda_metric %>%
    summarise(
        est = mean(value),
        err = sd(value),
        .by = c(ITEM, dim)
    )
head(lambda_metric_summ)

## Plot item loadings
lambda_metric_summ %>%
    pivot_wider(
        id_cols = "ITEM",
        names_from = "dim",
        values_from = c("est", "err")
    ) %>%
    ggplot() +
    aes(x = est_1, y = est_2, label = ITEM) +
    geom_vline(xintercept = 0, linetype = "dotted") +
    geom_hline(yintercept = 0, linetype = "dotted") +
    geom_point() +
    geom_linerange(
        aes(xmin = est_1 - 1.96*err_1, xmax = est_1 + 1.96*err_1),
        alpha = 1/4, linewidth = 2
    ) +
    geom_linerange(
        aes(ymin = est_2 - 1.96*err_2, ymax = est_2 + 1.96*err_2),
        alpha = 1/4, linewidth = 2
    ) +
    ggrepel::geom_text_repel() +
    labs(title = "Item Loadings") +
    coord_fixed()
```