Analyzing and comparing CHOIR Body Maps

library(CHOIRBM)

Analyzing CBMs

Generally speaking, there are two ways we might analyze CHOIR Body Maps. We may interested to see if a categorical variable (such as gender) is related to a statistically significant difference in endorsement of different locations on the body map. Alternatively, we may want to see the effect of a continuous variable (such as age) on endorsement. There are helper functions implemented in CHOIRBM to assist with both of these analyses, and this vignette will demonstrate each of these with an example using the validation dataset.

First, we load the validation data into R and get a quick look at it.

# loading the validation data included in the CHOIRBM package
data(validation)
head(validation)
#>   id gender  race age bodymap_regions_csv score
#> 1  1 Female White  45 128,117,107,105,223    58
#> 2  2 Female Other  46 110,106,105,235,233    60
#> 3  3 Female Other  42 219,218,213,212,210    73
#> 4  4 Female White  73 136,134,132,130,127    12
#> 5  5 Female Other  60 136,134,133,130,129   100
#> 6  6 Female White  51 128,125,118,117,214    33

Example 1: Chi-Squared method

Now, we can examine the effect of a categorical variable on endorsement of different locations in the CBM, such ashe male and female CBMs to see which segments have more or less endorsement as related to gender difference. The comp_choirbm_chi() function uses a series of Chi-Squared tests to compare the observed percent endorsement versus the expected endorsement (in this case, we expect 50% male and 50% female endorsement).

Data processing

Separate the data set into male and female groups before processing to get a single body map data frame for each group. See the vignette on preparing data from CHOIR databases for a full explanation of each step in this process.

# isolate and process male data
male_data <- validation[validation[["gender"]] == "Male", ]
male_bodymap_list <- lapply(male_data[["bodymap_regions_csv"]], string_to_map)
male_bodymap_df <- agg_choirbm_list(male_bodymap_list)

# isolate and process female data
female_data <- validation[validation[["gender"]] == "Female", ]
female_bodymap_list <- lapply(
  female_data[["bodymap_regions_csv"]]
  , string_to_map
)
female_bodymap_df <- agg_choirbm_list(female_bodymap_list)

# quick snapshot of each
head(male_bodymap_df)
#>    id value group
#> 1 101   147 Front
#> 2 102   161 Front
#> 3 103    66 Front
#> 4 104    87 Front
#> 5 105    57 Front
#> 6 106    62 Front
head(female_bodymap_df)
#>    id value group
#> 1 101  1046 Front
#> 2 102  1211 Front
#> 3 103   713 Front
#> 4 104   827 Front
#> 5 105   430 Front
#> 6 106   476 Front

Compare the percent endorsements

Now use the comp_choirbm_chi() function to compare the percent endorsement of each segment of the CBM for men and women. comp_choirbm_chi() takes a named list as an argument.

# name each data frame in the list as male or female
chi_res <- comp_choirbm_chi(
  list("male" = male_bodymap_df, "female" = female_bodymap_df)
  , method = "bonferroni"
)

# visualize the chi-square results
head(chi_res)
#>     p.value statistic parameter
#> 101       1 0.5678564        NA
#> 102       1 0.5856935        NA
#> 103       1 0.6898166        NA
#> 104       1 0.6554975        NA
#> 105       1 0.5866239        NA
#> 106       1 0.5921560        NA

Example 2: z-test method

The comp_choirbm_ztest() function uses a two-proportions z-test to compare observed proportions from two different groups.

comp_choirbm_ztest(list( "male" = male_data, "female" = female_data), tail = "two")
#>     id      z.score p.value
#> 1  101  2.273003765       1
#> 2  102  1.684708172       1
#> 3  103 -1.942801273       1
#> 4  104 -1.020893839       1
#> 5  105  0.911112839       1
#> 6  106  0.819458278       1
#> 7  107 -0.934256548       1
#> 8  108 -0.444974186       1
#> 9  109 -0.008068232       1
#> 10 110 -1.409100931       1
#> 11 111 -0.553608940       1
#> 12 112 -1.252048276       1
#> 13 113 -1.649404886       1
#> 14 114 -0.391848941       1
#> 15 115 -0.245246394       1
#> 16 116 -0.766555845       1
#> 17 117 -0.719781304       1
#> 18 118  0.285986876       1
#> 19 119 -0.278889154       1
#> 20 120 -0.108097507       1
#> 21 121 -1.114963447       1
#> 22 122  0.095689545       1
#> 23 123 -0.698629061       1
#> 24 124 -0.896183650       1
#> 25 125 -1.474061775       1
#> 26 126 -0.106066410       1
#> 27 127  1.678702250       1
#> 28 128 -0.573210612       1
#> 29 129 -0.174862366       1
#> 30 130  0.677552495       1
#> 31 131  0.668049869       1
#> 32 132  1.167133923       1
#> 33 133  1.689532191       1
#> 34 134  2.097045957       1
#> 35 135  0.779821513       1
#> 36 136  1.286920265       1
#> 37 201  1.796766623       1
#> 38 202  1.818273541       1
#> 39 203 -2.701441922       1
#> 40 204 -0.537523467       1
#> 41 205 -0.339703821       1
#> 42 206  0.348915179       1
#> 43 207  0.134313258       1
#> 44 208 -0.844070561       1
#> 45 209  0.511815167       1
#> 46 210 -0.303096121       1
#> 47 211  0.085185018       1
#> 48 212  0.066052762       1
#> 49 213  0.622265254       1
#> 50 214  0.350778672       1
#> 51 215  0.265084324       1
#> 52 216  0.348446244       1
#> 53 217  0.158319680       1
#> 54 218  1.769743298       1
#> 55 219  0.821379198       1
#> 56 220 -0.056519849       1
#> 57 221 -0.455625881       1
#> 58 222  0.264623646       1
#> 59 223  0.347898967       1
#> 60 224 -0.549020325       1
#> 61 225 -0.549447892       1
#> 62 226 -0.689612063       1
#> 63 227 -1.224670416       1
#> 64 228 -0.969939784       1
#> 65 229  0.102718167       1
#> 66 230 -1.610910338       1
#> 67 231 -0.238010337       1
#> 68 232  0.908229304       1
#> 69 233  0.779332135       1
#> 70 234  1.285332634       1
#> 71 235 -0.122160362       1
#> 72 236  0.656157465       1
#> 73 237  0.132668883       1
#> 74 238  0.436952027       1

Example 3: Logistic regression method

In this example, we use the comp_choirbm_glm() function to see if the continuous age variable in the validation data is related to whether or not an individual endorses a particular location on the body map using logistic regression. comp_choirbm_glm() accepts a data frame with a “bodymap_regions_csv” column and another column with the variable of interest. It returns a data.frame where each row is the result of one logistic regression using the continuous variable to predict CBM location endorsement.

# for the sake of speed, randomly sample 100 maps...
set.seed(123)
sampled_data <- validation[
  sample(
    seq_len(nrow(validation))
    , 100
    , replace = FALSE
  )
  , ]
colnames(sampled_data)[5] <- "bodymap"
model_output <- comp_choirbm_glm(sampled_data, "age", family = "binomial")
head(model_output)
#>    id term     estimate  std.error   statistic p.value
#> 1 101  age -0.001175566 0.02053236 -0.05725428       1
#> 2 102  age -0.007904804 0.01921454 -0.41139696       1
#> 3 103  age  0.012616500 0.01980730  0.63696210       1
#> 4 104  age -0.031696760 0.01968127 -1.61050370       1
#> 5 105  age -0.001259361 0.02947968 -0.04271964       1
#> 6 106  age -0.008163480 0.03298135 -0.24751807       1

- Analyzing CBMs