---
title: "Analyzing and comparing CHOIR Body Maps"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{compare-choirbms}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(rmarkdown.html_vignette.check_title = FALSE)
```

```{r setup}
library(CHOIRBM)
```

## Analyzing CBMs
Generally speaking, there are two ways we might analyze CHOIR Body Maps. We may interested to see if a categorical variable (such as gender) is related to a statistically significant difference in endorsement of different locations on the body map. Alternatively, we may want to see the effect of a continuous variable (such as age) on endorsement. There are helper functions implemented in CHOIRBM to assist with both of these analyses, and this vignette will demonstrate each of these with an example using the `validation` dataset.

First, we load the `validation` data into R and get a quick look at it. 

```{r load_data}
# loading the validation data included in the CHOIRBM package
data(validation)
head(validation)
```

### Example 1: Chi-Squared method
Now, we can examine the effect of a categorical variable on endorsement of different locations in the CBM, such ashe male and female CBMs to see which segments have more or less endorsement as related to gender difference. The `comp_choirbm_chi()` function uses a series of Chi-Squared tests to compare the observed percent endorsement versus the expected endorsement (in this case, we expect 50% male and 50% female endorsement). 

#### Data processing
Separate the data set into male and female groups before processing to get a single body map data frame for each group. See the vignette on preparing data from CHOIR databases for a full explanation of each step in this process.

```{r proc_data}
# isolate and process male data
male_data <- validation[validation[["gender"]] == "Male", ]
male_bodymap_list <- lapply(male_data[["bodymap_regions_csv"]], string_to_map)
male_bodymap_df <- agg_choirbm_list(male_bodymap_list)

# isolate and process female data
female_data <- validation[validation[["gender"]] == "Female", ]
female_bodymap_list <- lapply(
  female_data[["bodymap_regions_csv"]]
  , string_to_map
)
female_bodymap_df <- agg_choirbm_list(female_bodymap_list)

# quick snapshot of each
head(male_bodymap_df)
head(female_bodymap_df)
```

#### Compare the percent endorsements
Now use the `comp_choirbm_chi()` function to compare the percent endorsement of each segment of the CBM for men and women.
`comp_choirbm_chi()` takes a named list as an argument.

```{r comp_chi}
# name each data frame in the list as male or female
chi_res <- comp_choirbm_chi(
  list("male" = male_bodymap_df, "female" = female_bodymap_df)
  , method = "bonferroni"
)

# visualize the chi-square results
head(chi_res)
```

### Example 2: z-test method
The `comp_choirbm_ztest()` function uses a two-proportions z-test to compare observed proportions from two different groups.


```{r ztest}
comp_choirbm_ztest(list( "male" = male_data, "female" = female_data), tail = "two")
```

### Example 3: Logistic regression method
In this example, we use the `comp_choirbm_glm()` function to see if the continuous age variable in the validation data is related to whether or not an individual endorses a particular location on the body map using logistic regression. `comp_choirbm_glm()` accepts a data frame with a "bodymap_regions_csv" column and another column with the variable of interest. It returns a data.frame where each row is the result of one logistic regression using the continuous variable to predict CBM location endorsement.

```{r comp_glm}
# for the sake of speed, randomly sample 100 maps...
set.seed(123)
sampled_data <- validation[
  sample(
    seq_len(nrow(validation))
    , 100
    , replace = FALSE
  )
  , ]
colnames(sampled_data)[5] <- "bodymap"
model_output <- comp_choirbm_glm(sampled_data, "age", family = "binomial")
head(model_output)
```