Our task is to explore correlations between “Conversion” (marketing campaign conversion rates by town), and other demographic variables (stored in columns 6:22 in the data frame).
We want to:
Loop through Pearson correlation tests for “Conversion” and all appropriate variables
Extract estimates and p-values for each test where p<.15
Produce a scatter plot for each correlation where p<.15
Load in data
post <- read.csv("post_current.csv")
Data Prep
post$PopChange=as.numeric(post$PopChange)
post$MedVal=as.numeric(post$MedVal)
post$Housing.Units=as.numeric(post$Housing.Units)
post$Med..Household.Income=as.numeric(post$Med..Household.Income)
post$Population=as.numeric(post$Population)
Here is a sample test:
cor.test(post$Conversion, post[,6])
##
## Pearson's product-moment correlation
##
## data: post$Conversion and post[, 6]
## t = 0.22608, df = 22, p-value = 0.8232
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3622887 0.4429384
## sample estimates:
## cor
## 0.04814411
To write the loop, we’ll need to do the following:
Step One: loop through all tests, print estimates and p-values
for (i in 6:length(post)) {
a <- cor.test(post$Conversion, post[,i])
print(paste(colnames(post)[i], " est:", a$estimate, " p=value:", a$p.value))
}
## [1] "GenDifLab est: 0.0481441127519256 p=value: 0.823227900584166"
## [1] "MedVal est: 0.0798729980695959 p=value: 0.710636798286813"
## [1] "Housing.Units est: -0.137186524390174 p=value: 0.522673568637782"
## [1] "PopChange est: 0.348783974516281 p=value: 0.102864927153252"
## [1] "Pct.White est: -0.206345810207545 p=value: 0.333352107066312"
## [1] "Pct.U18 est: 0.11777891290728 p=value: 0.583617712585607"
## [1] "Pct.Fem est: -0.0760469494847729 p=value: 0.723956293474515"
## [1] "Pop.sq.mi est: 0.0448945179156487 p=value: 0.834994065974037"
## [1] "Med..Household.Income est: 0.204680942841935 p=value: 0.337353245609561"
## [1] "Over.25.HS est: 0.39184103684848 p=value: 0.0582685584589117"
## [1] "Over.25.College est: 0.399544751432363 p=value: 0.0530751387496134"
## [1] "Average.Household.Size est: 0.0616969225006032 p=value: 0.774579902724496"
## [1] "Population est: -0.147245442906709 p=value: 0.4923341284988"
## [1] "COL est: 0.0563897343913327 p=value: 0.803167125251977"
## [1] "HDR est: 0.182647465066667 p=value: 0.415895176217734"
## [1] "MedAge est: -0.183982769607545 p=value: 0.412435758082247"
## [1] "Groc est: -0.250624814544843 p=value: 0.260593971528363"
Step Two: add an if statement to specify which correlations to print (recall that we only want tests where p<.15)
if (a$p.value < .15) {
print(paste(colnames(post)[i], " cor:", a$estimate, " p=value:", a$p.value))
}
Step Three: produce scatter plots for all correlations where p<.15
plot(post[,i], post$Conversion,
main=paste("Conversion Rate and", colnames(post)[i]), ylab="Conversion Rate",
xlab=colnames(post)[i])
Final loop!
for (i in 6:length(post)) {
a <- cor.test(post$Conversion, post[,i])
if (a$p.value < .15) {
print(paste(colnames(post)[i], " cor:", a$estimate, " p=value:", a$p.value))
plot(post[,i], post$Conversion,
main=paste("Conversion Rate and", colnames(post)[i]), ylab="Conversion Rate",
xlab=colnames(post)[i])
}}
## [1] "PopChange cor: 0.348783974516281 p=value: 0.102864927153252"
## [1] "Over.25.HS cor: 0.39184103684848 p=value: 0.0582685584589117"
## [1] "Over.25.College cor: 0.399544751432363 p=value: 0.0530751387496134"