Processing data from a CHOIR database from start to finish

library(CHOIRBM)
library(ggplot2)

This vignette describes how to process and visualize data extracted from a CHOIR database and provides an example using the included validation dataset. In this example, we will visualize the percent endorsement of each region of the CHOIR Body Map (CBM) for men and women.

First, load the validation data into R and get a quick look at it.

# loading the validation data included in the CHOIRBM package
data(validation)
head(validation)
#>   id gender  race age bodymap_regions_csv score
#> 1  1 Female White  45 128,117,107,105,223    58
#> 2  2 Female Other  46 110,106,105,235,233    60
#> 3  3 Female Other  42 219,218,213,212,210    73
#> 4  4 Female White  73 136,134,132,130,127    12
#> 5  5 Female Other  60 136,134,133,130,129   100
#> 6  6 Female White  51 128,125,118,117,214    33

You will notice that the CHOIR Body Maps for each patient are comma-separated strings in a single column. Each of these will need to be converted into its own body map before we can go further.

Data processing

Separate the data into male and female data frames.

male_data <- validation[validation[["gender"]] == "Male", ]
female_data <- validation[validation[["gender"]] == "Female", ]

Create a list of body map data frames for the men and women by using the string_to_map() function and R’s lapply(). Then use agg_choirbm_list() to reduce the list of data frames through addition of the endorsement values. Since we want the percent endorsement for plotting, we can then calculate the percentage as a separate column in the final data frame.

male_bodymap_list <- lapply(male_data[["bodymap_regions_csv"]], string_to_map)
male_bodymap_df <- agg_choirbm_list(male_bodymap_list)

# we want to visualize the percent endorsement, so divide the values by
# the size of the data set and multiply by 100
male_bodymap_df[["perc"]] <- male_bodymap_df[["value"]] /
  nrow(male_data) * 100
head(male_bodymap_df)
#>    id value group      perc
#> 1 101   147 Front 19.838057
#> 2 102   161 Front 21.727395
#> 3 103    66 Front  8.906883
#> 4 104    87 Front 11.740891
#> 5 105    57 Front  7.692308
#> 6 106    62 Front  8.367072
female_bodymap_list <- lapply(
  female_data[["bodymap_regions_csv"]]
  , string_to_map
)

female_bodymap_df <- agg_choirbm_list(female_bodymap_list)

# we want to visualize the percent endorsement, so divide the values by
# the size of the data set and multiply by 100
female_bodymap_df[["perc"]] <- female_bodymap_df[["value"]] /
  nrow(female_data) * 100
head(female_bodymap_df)
#>    id value group      perc
#> 1 101  1046 Front 16.532322
#> 2 102  1211 Front 19.140193
#> 3 103   713 Front 11.269164
#> 4 104   827 Front 13.070966
#> 5 105   430 Front  6.796270
#> 6 106   476 Front  7.523313

Plotting

Once the data is properly formatted, and the values to plot are calculated, then we can generate some CBMs. Plot the male and female body maps separately.

plot_male_choirbm(male_bodymap_df, value = "perc") +
  theme(legend.position = "bottom") +
  labs(fill = "Percent Endorsement")

plot_female_choirbm(female_bodymap_df, value = "perc") +
  theme(legend.position = "bottom") +
  labs(fill = "Percent Endorsement")