Generally speaking, there are two ways we might analyze CHOIR Body
Maps. We may interested to see if a categorical variable (such as
gender) is related to a statistically significant difference in
endorsement of different locations on the body map. Alternatively, we
may want to see the effect of a continuous variable (such as age) on
endorsement. There are helper functions implemented in CHOIRBM to assist
with both of these analyses, and this vignette will demonstrate each of
these with an example using the validation
dataset.
First, we load the validation
data into R and get a
quick look at it.
# loading the validation data included in the CHOIRBM package
data(validation)
head(validation)
#> id gender race age bodymap_regions_csv score
#> 1 1 Female White 45 128,117,107,105,223 58
#> 2 2 Female Other 46 110,106,105,235,233 60
#> 3 3 Female Other 42 219,218,213,212,210 73
#> 4 4 Female White 73 136,134,132,130,127 12
#> 5 5 Female Other 60 136,134,133,130,129 100
#> 6 6 Female White 51 128,125,118,117,214 33
Now, we can examine the effect of a categorical variable on
endorsement of different locations in the CBM, such ashe male and female
CBMs to see which segments have more or less endorsement as related to
gender difference. The comp_choirbm_chi()
function uses a
series of Chi-Squared tests to compare the observed percent endorsement
versus the expected endorsement (in this case, we expect 50% male and
50% female endorsement).
Separate the data set into male and female groups before processing to get a single body map data frame for each group. See the vignette on preparing data from CHOIR databases for a full explanation of each step in this process.
# isolate and process male data
male_data <- validation[validation[["gender"]] == "Male", ]
male_bodymap_list <- lapply(male_data[["bodymap_regions_csv"]], string_to_map)
male_bodymap_df <- agg_choirbm_list(male_bodymap_list)
# isolate and process female data
female_data <- validation[validation[["gender"]] == "Female", ]
female_bodymap_list <- lapply(
female_data[["bodymap_regions_csv"]]
, string_to_map
)
female_bodymap_df <- agg_choirbm_list(female_bodymap_list)
# quick snapshot of each
head(male_bodymap_df)
#> id value group
#> 1 101 147 Front
#> 2 102 161 Front
#> 3 103 66 Front
#> 4 104 87 Front
#> 5 105 57 Front
#> 6 106 62 Front
head(female_bodymap_df)
#> id value group
#> 1 101 1046 Front
#> 2 102 1211 Front
#> 3 103 713 Front
#> 4 104 827 Front
#> 5 105 430 Front
#> 6 106 476 Front
Now use the comp_choirbm_chi()
function to compare the
percent endorsement of each segment of the CBM for men and women.
comp_choirbm_chi()
takes a named list as an argument.
# name each data frame in the list as male or female
chi_res <- comp_choirbm_chi(
list("male" = male_bodymap_df, "female" = female_bodymap_df)
, method = "bonferroni"
)
# visualize the chi-square results
head(chi_res)
#> p.value statistic parameter
#> 101 1 0.5678564 NA
#> 102 1 0.5856935 NA
#> 103 1 0.6898166 NA
#> 104 1 0.6554975 NA
#> 105 1 0.5866239 NA
#> 106 1 0.5921560 NA
The comp_choirbm_ztest()
function uses a two-proportions
z-test to compare observed proportions from two different groups.
comp_choirbm_ztest(list( "male" = male_data, "female" = female_data), tail = "two")
#> id z.score p.value
#> 1 101 2.273003765 1
#> 2 102 1.684708172 1
#> 3 103 -1.942801273 1
#> 4 104 -1.020893839 1
#> 5 105 0.911112839 1
#> 6 106 0.819458278 1
#> 7 107 -0.934256548 1
#> 8 108 -0.444974186 1
#> 9 109 -0.008068232 1
#> 10 110 -1.409100931 1
#> 11 111 -0.553608940 1
#> 12 112 -1.252048276 1
#> 13 113 -1.649404886 1
#> 14 114 -0.391848941 1
#> 15 115 -0.245246394 1
#> 16 116 -0.766555845 1
#> 17 117 -0.719781304 1
#> 18 118 0.285986876 1
#> 19 119 -0.278889154 1
#> 20 120 -0.108097507 1
#> 21 121 -1.114963447 1
#> 22 122 0.095689545 1
#> 23 123 -0.698629061 1
#> 24 124 -0.896183650 1
#> 25 125 -1.474061775 1
#> 26 126 -0.106066410 1
#> 27 127 1.678702250 1
#> 28 128 -0.573210612 1
#> 29 129 -0.174862366 1
#> 30 130 0.677552495 1
#> 31 131 0.668049869 1
#> 32 132 1.167133923 1
#> 33 133 1.689532191 1
#> 34 134 2.097045957 1
#> 35 135 0.779821513 1
#> 36 136 1.286920265 1
#> 37 201 1.796766623 1
#> 38 202 1.818273541 1
#> 39 203 -2.701441922 1
#> 40 204 -0.537523467 1
#> 41 205 -0.339703821 1
#> 42 206 0.348915179 1
#> 43 207 0.134313258 1
#> 44 208 -0.844070561 1
#> 45 209 0.511815167 1
#> 46 210 -0.303096121 1
#> 47 211 0.085185018 1
#> 48 212 0.066052762 1
#> 49 213 0.622265254 1
#> 50 214 0.350778672 1
#> 51 215 0.265084324 1
#> 52 216 0.348446244 1
#> 53 217 0.158319680 1
#> 54 218 1.769743298 1
#> 55 219 0.821379198 1
#> 56 220 -0.056519849 1
#> 57 221 -0.455625881 1
#> 58 222 0.264623646 1
#> 59 223 0.347898967 1
#> 60 224 -0.549020325 1
#> 61 225 -0.549447892 1
#> 62 226 -0.689612063 1
#> 63 227 -1.224670416 1
#> 64 228 -0.969939784 1
#> 65 229 0.102718167 1
#> 66 230 -1.610910338 1
#> 67 231 -0.238010337 1
#> 68 232 0.908229304 1
#> 69 233 0.779332135 1
#> 70 234 1.285332634 1
#> 71 235 -0.122160362 1
#> 72 236 0.656157465 1
#> 73 237 0.132668883 1
#> 74 238 0.436952027 1
In this example, we use the comp_choirbm_glm()
function
to see if the continuous age variable in the validation data is related
to whether or not an individual endorses a particular location on the
body map using logistic regression. comp_choirbm_glm()
accepts a data frame with a “bodymap_regions_csv” column and another
column with the variable of interest. It returns a data.frame where each
row is the result of one logistic regression using the continuous
variable to predict CBM location endorsement.
# for the sake of speed, randomly sample 100 maps...
set.seed(123)
sampled_data <- validation[
sample(
seq_len(nrow(validation))
, 100
, replace = FALSE
)
, ]
colnames(sampled_data)[5] <- "bodymap"
model_output <- comp_choirbm_glm(sampled_data, "age", family = "binomial")
head(model_output)
#> id term estimate std.error statistic p.value
#> 1 101 age -0.001175566 0.02053236 -0.05725428 1
#> 2 102 age -0.007904804 0.01921454 -0.41139696 1
#> 3 103 age 0.012616500 0.01980730 0.63696210 1
#> 4 104 age -0.031696760 0.01968127 -1.61050370 1
#> 5 105 age -0.001259361 0.02947968 -0.04271964 1
#> 6 106 age -0.008163480 0.03298135 -0.24751807 1