dat <- read.csv(file.choose(), header=TRUE, na="")
nums <- sapply(dat, is.numeric) # Identify all numeric variables.
xNumber <- dat[ , nums] # Subset data set to include only numeric variables.
factors <- sapply(dat, is.factor) # Identify all categorical variables.
xFactor <- dat[ , factors] # Subset data set to include only numeric variables.
names(xFactor)
xFactor <- xFactor[,-c(1:3, 12)]
names(xFactor) # Verify that it worked
To to the next part, you will need to install the “plyr” package. You only need to do this part once. So if you have already done this, skip ahead.
install.packages(plyr, dependencies = TRUE)
install.packages("psych", dependencies = TRUE)
This next code chunk should run chi-square tests for all combinations of categorical variables.
library(plyr)
combos <- combn(ncol(xFactor),2)
adply(combos, 2, function(x) {
test <- chisq.test(xFactor[, x[1]], xFactor[, x[2]])
out <- data.frame("Row" = colnames(xFactor)[x[1]]
, "Column" = colnames(xFactor[x[2]])
, "Chi.Square" = round(test$statistic,3)
, "df"= test$parameter
, "p.value" = round(test$p.value, 3)
)
write.table(out, file="Chi-Square Results.csv", append=TRUE, sep=",", col.names=T, row.names=FALSE)
})
library("psych")
data(xNumber)
Correlations <- cor(xNumber, use="pairwise.complete.obs")
write.csv(Correlations, file="Correlations.csv")
Imagine that you have a data set (data) that contained two variables (troublesomeness and verisimilitude).
To run a chi-square test, use the following script:
chisq.test(data$troublesomeness, data$verisimilitude)
If you wish to save time typing the variable names, you can attach the data set and your script will change as follows:
attach(data) # You only have to attach the data set once.
chisq.test(troublesomeness, verisimilitude)
If one, or both, variables are numeric, then you can convert one or both using the following.
chisq.test(as.factor(troublesomeness), as.factor(verisimilitude))
As you conduct each test, record the following: the names of both variables that were tested, and the result. Result for correlation is the correlation value, and result for chi-square is the p-value. Use a spreadsheet to organize this into three columns (Variable_1, Variable_2, Result) and save it as a CSV.
There are fancy ways to do all this in R, but that is the topic for another day.
Once you have all your chi-square tests and correlation, use the following scripts to produce a network.
But, before yo do so, you’ll need to install “igraph”. You only have to install igraph once. But now is a good time.
install.packages("igraph", dependencies=TRUE)
Then start up the igraph package and begin constructing your network.
library(igraph)
el <- read.csv(file.choose(), header=FALSE)
el <- as.matrix(el[,1:2])
g <- graph_from_edgelist(el, directed=FALSE)
To make a network visualizaiton, use one of the two methods below:
plot(g,layout=layout.fruchterman.reingold)
tkplot(g)