---
title: "Processing data from a CHOIR database from start to finish"
author: "Eric Cramer"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{choir-db-ex}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(rmarkdown.html_vignette.check_title = FALSE)
```

```{r setup}
library(CHOIRBM)
library(ggplot2)
```

This vignette describes how to process and visualize data extracted from a CHOIR database and provides an example using the included `validation` dataset. In this example, we will visualize the percent endorsement of each region of the CHOIR Body Map (CBM) for men and women.

First, load the `validation` data into R and get a quick look at it.

```{r load_data}
# loading the validation data included in the CHOIRBM package
data(validation)
head(validation)
```

You will notice that the CHOIR Body Maps for each patient are comma-separated strings in a single column. Each of these will need to be converted into its own body map before we can go further.

## Data processing
Separate the data into male and female data frames.

```{r mf_split}
male_data <- validation[validation[["gender"]] == "Male", ]
female_data <- validation[validation[["gender"]] == "Female", ]
```

Create a list of body map data frames for the men and women by using the `string_to_map()` function and R's `lapply()`. Then use `agg_choirbm_list()` to reduce the list of data frames through addition of the endorsement values. Since we want the percent endorsement for plotting, we can then calculate the percentage as a separate column in the final data frame.

```{r data_proc_m}
male_bodymap_list <- lapply(male_data[["bodymap_regions_csv"]], string_to_map)
male_bodymap_df <- agg_choirbm_list(male_bodymap_list)

# we want to visualize the percent endorsement, so divide the values by
# the size of the data set and multiply by 100
male_bodymap_df[["perc"]] <- male_bodymap_df[["value"]] /
  nrow(male_data) * 100
head(male_bodymap_df)
```

```{r data_proc_f}
female_bodymap_list <- lapply(
  female_data[["bodymap_regions_csv"]]
  , string_to_map
)

female_bodymap_df <- agg_choirbm_list(female_bodymap_list)

# we want to visualize the percent endorsement, so divide the values by
# the size of the data set and multiply by 100
female_bodymap_df[["perc"]] <- female_bodymap_df[["value"]] /
  nrow(female_data) * 100
head(female_bodymap_df)
```

## Plotting
Once the data is properly formatted, and the values to plot are calculated, then we can generate some CBMs. Plot the male and female body maps separately.

```{r plot_m_cbm}
plot_male_choirbm(male_bodymap_df, value = "perc") +
  theme(legend.position = "bottom") +
  labs(fill = "Percent Endorsement")
```

```{r plot_f_cbm}
plot_female_choirbm(female_bodymap_df, value = "perc") +
  theme(legend.position = "bottom") +
  labs(fill = "Percent Endorsement")
```