Tutorial on Cross-Cultural Analysis

Josh Jackson, June 2nd, 2020

Resources

We are analyzing a recent study of how people use religion to explain phenomena. Before we get into any analyses, it’s useful to see the kinds of resources that go into these analyses.

I have also enclosed a pdf containing variables already coded in the Standard Cross-Cultural Sample. We will be using some of these today.

Getting Started

Let’s begin by reading in and taking a quick look at our data. The first line of this code sets our working directory. If you press the “tab” button, you will be able to navigate to the place where you have saved your data. I have put mine on the Desktop.

The second line here reads in the data and defines it as some sequence (i.e. “string”) or letters. I am defining my data (“Tutorial data.csv”) as “d”. If you are following along, you will see the dataset “d” appear in your environment.

setwd("~/Desktop")
d<-read.csv("Tutorial data.csv")

Now that we have the data in the environment, we can examine some of the data attributes. For example, let’s take a look at some of the cultures in our sample, the times that these cultures were coded, and the way that these cultures procured food (subsistence style).

The “head” function will show us the first few values of these variables. To call on the variables, we start with the name of the dataframe (“d”) and then put a dollar sign “$” and then the variable within the dataframe (“Culture”).

We can also use the “summary” function to look at summary statistics of numerical variables. Let’s use it with “TimeFocus” since this is the numerical variable out of the three that we’re looking at.

head(d$Culture)
head(d$TimeFocus)
head(d$Subsistence)

summary(d$TimeFocus)

Now we know that our data range from 1530 to 1966, with a median time of 1910. So we are looking at how people use different religious explanations roughly 100 years ago in a range of hunter-gatherer, horticultural, pastoral, and agricultural communities.

Cleaning and Organizing our Data

Not all of these societies are appropriate to analyze. Some of them had unreliable ethnographies or insufficient data. So we are going to narrow down our file to only include the valid cases.

We will use the “Exclude Variable” to exclude variables that weren’t deemed good enough to analyze. In our environment, we can see that there are 115 cases. Using the “table” function we can see that, of these, 109 of them are suitable for analysis. To exclude the 6 extraneous cases, we are going to use square brackets to subset our dataframe and drop rows where the “Exclude” variable is NA or 1.

Finally, our last line of code transforms “99” values into missing values. “99” was the missing code in our analysis, but we need to tell R that this is stand-in for missing values.

table(d$Exclude)

d<-d[is.na(d$Exclude)==F,]
d<-d[d$Exclude==0,]

d[d==99]<-NA

Next we are going to transform our coding scale to a scale that is easier to analyze. Our original coding followed this scheme:

Code 0 = Absent
Code 1 = Rare
Code 2 = Common

However, we are going to make things simpler by looking at whether explanations are either common (“1”) or not common (“0”). We do this with the “ifelse” command, one of the most useful commands in R. The ifelse command searches for whether values meet some condition, and then transforms them based on whether or not they meet this condition. Below, the “==” sign indicates the condition that needs to be met, the value after the first comma indicates what the new variable should become if the condition is met, and the value after the second comma indicates what the new varaible should become if the condition is not met.

By using hashmarks below, we can comment on the code without interfering with it. My comments below indicate what kind of explanation each of these variables represent.

d$V_1s_common<-ifelse(d$V_1s==2,1,0) #religious explanations of pathogens
d$V_2s_common<-ifelse(d$V_2s==2,1,0) #religious explanations of natural causes of scarcity
d$V_3s_common<-ifelse(d$V_3s==2,1,0) #religious explanations of natural hazards
d$V_4s_common<-ifelse(d$V_4s==2,1,0) #religious explanations of warfare
d$V_5s_common<-ifelse(d$V_5s==2,1,0) #religious explanations of murder
d$V_6s_common<-ifelse(d$V_6s==2,1,0) #religious explanations of theft

After recoding these variables, we might be curious about comparing explanations based on their broader category. For example, disease, natural hazards, and scarcity can be loosely classified as phenomena in the “natural” world, whereas warfare, murder, and theft all represent “manmade” events.

The first block of code below will create a dataframe of each class of event using the “data.frame” function. Within this function, the part in quotes denotes the name of the new variables in this dataframe, and the part outside of quotes denotes the source of these new variables in the dataframe. If you are following along, you will see two new dataframe objects appear in your environment when you run this code.

The second block of code uses these new dataframes. The first line creates a summed variable in our original “d” dataframe that represents the total number of “natural (or manmade)” explanations in a society. The second line investigates the covariation between natural explanations. For example, does having one kind of natural religious explanation (e.g. natural hazards) make it more likely that a society will have another kind of natural religious explanation (e.g. disease)?

Natural_events<-data.frame("disease"=d$V_1s_common,
                           "scarcity"=d$V_2s_common,
                           "hazards"=d$V_3s_common)

d$Natural<-rowSums(Natural_events,na.rm=T)
psych::alpha(Natural_events)
                           
Manmade_events<-data.frame("war"=d$V_4s_common,
                           "murder"=d$V_5s_common,
                           "theft"=d$V_6s_common)

d$Manmade<-rowSums(Manmade_events,na.rm=T)   
psych::alpha(Manmade_events)

At this point, we have finished cleaning and organizing our data an we are ready for analyses!

Analyzing our Data

The primary aim of this project was to compare the frequency of religious explanations of natural phenomena with religious explanations of manmade (or social) phenomena. A simple way to start this is to just compare the means:

mean(d$Natural,na.rm=T)
mean(d$Manmade,na.rm=T)

Right away, we can see that the mean of natural explanations (2.45) is higher than for social explanations (1.42). Societies explain approximately 2.45 natural phenomena using the supernatural world, but only 1.42 social phenomena, on average. However, this comparison doesn’t establish whether the mean for natural explanations is significantly greater than for social explanations. We need some sort of inferential test to do this.

The line below uses a t-test approach to this question. In this t-test function, we need to specify whether it is a paired or independent groups t-test. Since all groups have data on natural and manmade explanations, this is appropriate for the pared t-test, and we indicate “paired=T” which stands for “paired=TRUE”.

The function will give us lots of useful data. At the bottom, we see the mean difference between natural and manmade explanations, which should match the mean difference we observed above. We should also see the p value, which represents the likelihood that we would have observed this mean difference in our sample if there was no difference between the frequency of natural and social explanations in the population at large. We also see the confidence intervals–the 95% bands around our mean difference. If both of these numbers are on the same side of zero, the difference is statistically significant.

t.test(d$Natural, d$Manmade,paired=T)

We can also test whether these types of explanations are related to one another using regression. This is shown in the line below. This regression shows that for every religious explanation of a natural phenomena, societies are estimated to gain .38 explanations of social phenomena. The p value shows that this would be very unlikely if there was no relationship between the variables.

summary(lm(Manmade~Natural,data=d))

However, this regression does not account for the fact that languages are clustered in language families. When societies are clustered in language families, they may be especially similar because they share a common ancestor that may have passed down some cultural characteristic (e.g. religious explanations of disease) to a daughter culture. Here we can use multilevel modeling to nest our societies in their language families and probe whether the prevalence of natural and social explanations are clustered within language families.

summary(lmer(Natural~(1|Family),data=d))
summary(lmer(Manmade~(1|Family),data=d))

summary(lmer(Manmade~Natural+(1|Family),data=d))

There is no clustering of natural explanations within language family. For manmade explanations, there is some clustering, and the ICC formula of $0.08368/(0.08368+0.86850) = .087$ tells us that 8.7% of variance is clustered by language family. This is not a huge amount, and we replicate the regression between manmade and natural explanations to show that their link is significant even beyond this clustering.

Plotting

Finally, we might be interested in plotting out the distribution of the different kinds of explanations. We can do this in many different ways, but here I’ll show you how to plot out the points on a world map.

At the beginning of this block of code, there are a series of “library” statements meaning that we are retrieving packages of functions from our library. If this is the first time you have used these packages, you would need to install them first using the “install.packages” command. If you have already installed them, you can just read them in using the “library” function.

After retrieving these packages, I am going to set the color pallete for my map using colors that I think are intuitive. Here, darker reds represent more explanations and lighter reds represent fewer explanations, but you could pick whatever colors you find intuitive using the “colors in r” webpage.

The map creation itself involves three lines of code: 1. Retrieving the world map and setting the resolution 2. Plotting the map and setting the x and y limits and the aspect ratio. 3. Matching the points from the data onto the map using the longitude, latitude, and source of data. The “pch” indicates the type of point.

We repeat this for both natural and manmade explanations. This allows us to clearly see that natural explanations are more frequent than manmade explanations.

library(rworldmap)
library(rworldxtra)
library(ggmap)
library(RColorBrewer)

palette(c("lightgoldenrod2","orange2","firebrick2","firebrick4")) ##Setting the color pallette

map <- getMap(resolution = "low")
plot(map, xlim = c(-130, 130), ylim = c(-130, 130), asp = 1.30)
points(d$Long, d$Lat, col=d$Natural+1,pch=16)

map <- getMap(resolution = "low")
plot(map, xlim = c(-130, 130), ylim = c(-130, 130), asp = 1.30)
points(d$Long, d$Lat, col=d$Manmade+1,pch=16)