4a_Subgroup_effects_paired_tables.Rmd

---
author: "Leon Di Stefano"
date: "`r Sys.Date()`"
output: 
    html_document:
      keep_md: false
params:
  fit_name: "main_fit"
  outcome_min: 28
  outcome_max: 35
title: "`r paste('Subgroup effects - posterior tables', params$outcome_min, params$outcome_max, params$fit_name, sep = '-')`"

---

```{r}
knitr::opts_chunk$set(echo = TRUE)
require(here)
here::i_am(file.path("hcq_pooling_analysis", "4a_Subgroup_effects_paired_tables.Rmd"))

source(here("hcq_pooling_analysis", "common.R"))

require(brms)
require(tidybayes)
require(mice)
require(bayesplot)

bayesplot_theme_set(theme_cowplot())
bayesplot_theme_update(
  strip.background = element_blank(),
  strip.text.y = element_text(angle = 0),
  strip.text = element_text(face = "bold")
)

require(MASS, exclude = "select") # clashes with dplyr

out_stub         <- paste(params$outcome_min, params$outcome_max, sep = '-')
output_dir       <- here("hcq_pooling_analysis", "output", out_stub)
output_model_dir <- file.path(output_dir, params$fit_name)
fit_file <- file.path(output_model_dir, paste0(params$fit_name, ".rds"))

subgroup_dir <- file.path(output_model_dir, "subgroup_effects")
if(!dir.exists(subgroup_dir)) {
  dir.create(subgroup_dir, recursive = TRUE)
}
```

## Loading model and data

```{r}
brm_fit <- read_rds(fit_file)
```

First, work with the raw/incomplete data. Ultimately, **need to pool across imputations**:

```{r}
data_tbl <- read_rds(file.path(output_dir, "data_tbl.rds"))
mice_dfs <- read_rds(file.path(output_dir, "mice_complete_df_list.rds"))
```

Function to help create counterfactual tables:

```{r}
reverse_treatment <-
  function(x) {
    case_when(x == "HCQ"    ~ "no_HCQ",
              x == "no_HCQ" ~ "HCQ")
  }
```

## Creating the necessary tables of imputed values

### Creating the base counterfactual table

Create a table of multiply imputed data with potential outcomes under both observed and counterfactual treatment assignments:

```{r}
mice_dfs_po_merged <-
  imap(mice_dfs,
       (function(x, imputation_no) {
         bind_rows(
          x %>% mutate(assignment = "observed", 
                       .row = 1:n()),
          x %>% mutate(treat = reverse_treatment(treat),
                       assignment = "counterfactual")) %>%
           mutate(.imp = imputation_no)
       })
  ) %>%
  bind_rows() %>%
  mutate(.row = 1:n(),
         treat = factor(treat, levels = c("no_HCQ", "HCQ")))
```

### Posterior predictive imputations

For the finite population estimand, impute draws from the posterior predictive distribution in each case, averaging over imputations:


```{r}
data_tbl_po_pp <-
  mice_dfs_po_merged %>%
  add_predicted_draws(
    brm_fit, 
    n = 100 # Need to set this taking into account the number of imputations
    ) %>%
  # Each combination of an imputation and a posterior draw is a "true" posterior draw.
  mutate(.draw = str_c(.imp, "_", .draw))
```

```{r}
nrow(data_tbl_po_pp)
```

```{r}
head(data_tbl_po_pp)
```

Write this out:

```{r}
write_rds(data_tbl_po_pp, 
          file.path(subgroup_dir,
                    paste0(params$fit_name, "_data_tbl_po_pp.rds")))
```

### Posterior predictive imputations -- finite sample/matched version

For the finite sample/matched pair estimand, we want to set the value for `.prediction` to the observed outcome:

```{r}
data_tbl_po_pp_matched_pair <-
  data_tbl_po_pp %>%
  mutate(
    .prediction = case_when(
      assignment == "observed"       ~ as.numeric(niaid_outcome),
      assignment == "counterfactual" ~ as.numeric(.prediction)) %>%
        factor(levels = 1:7, labels = niaid_levels, ordered = TRUE)
  )
```

```{r}
nrow(data_tbl_po_pp_matched_pair)
```

```{r}
head(data_tbl_po_pp_matched_pair)
```

```{r}
write_rds(data_tbl_po_pp_matched_pair, 
          file.path(subgroup_dir,
          paste0(params$fit_name, "_data_tbl_po_pp_matched_pair.rds")))
```

### Posterior fitted/expected value imputations

For superpopulation estimands, we want a version of this with fitted values (expected response probabilities) rather than predictions:

```{r}
system.time(po_expected_draws <-
  mice_dfs_po_merged %>%
  posterior_epred(
    brm_fit, 
    newdata = .,
    nsamples = 100 # Need to set this taking into account the number of imputations. This is for each level within patient!
    ))
  # Each combination of an imputation and a posterior draw is a "true" posterior draw.
  # mutate(.draw = str_c(.imp, "_", .draw))
```

```{r}
system.time(data_tbl_po_fitted <-
  po_expected_draws %>% 
  aperm(c(2,3,1)) %>% 
  as_tibble() %>%
  mutate(.row = 1:n()) %>% 
  pivot_longer(-.row) %>% 
  separate(name, c(".category", ".draw"), sep = "\\.") %>%
  left_join(mice_dfs_po_merged) %>%
  mutate(.draw = str_c(.imp, "_", .draw)))
```

```{r}
nrow(data_tbl_po_fitted)
```

```{r}
head(data_tbl_po_fitted)
```

Write this out:

```{r}
write_rds(data_tbl_po_fitted,
          file.path(subgroup_dir,
          paste0(params$fit_name, "_data_tbl_po_fitted_.rds")))
```


```{r}
sessionInfo()
```


```{r}
Sys.time()
```