Our task is to explore correlations between “Conversion” (marketing campaign conversion rates by town), and other demographic variables (stored in columns 6:22 in the data frame).

We want to:

  1. Loop through Pearson correlation tests for “Conversion” and all appropriate variables

  2. Extract estimates and p-values for each test where p<.15

  3. Produce a scatter plot for each correlation where p<.15

Load in data

post <- read.csv("post_current.csv")

Data Prep

post$PopChange=as.numeric(post$PopChange)
post$MedVal=as.numeric(post$MedVal)
post$Housing.Units=as.numeric(post$Housing.Units)
post$Med..Household.Income=as.numeric(post$Med..Household.Income)
post$Population=as.numeric(post$Population)

Here is a sample test:

cor.test(post$Conversion, post[,6])
## 
##  Pearson's product-moment correlation
## 
## data:  post$Conversion and post[, 6]
## t = 0.22608, df = 22, p-value = 0.8232
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3622887  0.4429384
## sample estimates:
##        cor 
## 0.04814411

To write the loop, we’ll need to do the following:

Step One: loop through all tests, print estimates and p-values

for (i in 6:length(post)) {
  a <- cor.test(post$Conversion, post[,i])
  print(paste(colnames(post)[i], " est:", a$estimate, " p=value:", a$p.value))
}
## [1] "GenDifLab  est: 0.0481441127519256  p=value: 0.823227900584166"
## [1] "MedVal  est: 0.0798729980695959  p=value: 0.710636798286813"
## [1] "Housing.Units  est: -0.137186524390174  p=value: 0.522673568637782"
## [1] "PopChange  est: 0.348783974516281  p=value: 0.102864927153252"
## [1] "Pct.White  est: -0.206345810207545  p=value: 0.333352107066312"
## [1] "Pct.U18  est: 0.11777891290728  p=value: 0.583617712585607"
## [1] "Pct.Fem  est: -0.0760469494847729  p=value: 0.723956293474515"
## [1] "Pop.sq.mi  est: 0.0448945179156487  p=value: 0.834994065974037"
## [1] "Med..Household.Income  est: 0.204680942841935  p=value: 0.337353245609561"
## [1] "Over.25.HS  est: 0.39184103684848  p=value: 0.0582685584589117"
## [1] "Over.25.College  est: 0.399544751432363  p=value: 0.0530751387496134"
## [1] "Average.Household.Size  est: 0.0616969225006032  p=value: 0.774579902724496"
## [1] "Population  est: -0.147245442906709  p=value: 0.4923341284988"
## [1] "COL  est: 0.0563897343913327  p=value: 0.803167125251977"
## [1] "HDR  est: 0.182647465066667  p=value: 0.415895176217734"
## [1] "MedAge  est: -0.183982769607545  p=value: 0.412435758082247"
## [1] "Groc  est: -0.250624814544843  p=value: 0.260593971528363"

Step Two: add an if statement to specify which correlations to print (recall that we only want tests where p<.15)

if (a$p.value < .15) {
  print(paste(colnames(post)[i], " cor:", a$estimate, " p=value:", a$p.value))
}

Step Three: produce scatter plots for all correlations where p<.15

plot(post[,i], post$Conversion, 
       main=paste("Conversion Rate and", colnames(post)[i]), ylab="Conversion Rate",
       xlab=colnames(post)[i])

Final loop!

for (i in 6:length(post)) {
  a <- cor.test(post$Conversion, post[,i])
  if (a$p.value < .15) {
  print(paste(colnames(post)[i], " cor:", a$estimate, " p=value:", a$p.value))
  plot(post[,i], post$Conversion, 
       main=paste("Conversion Rate and", colnames(post)[i]), ylab="Conversion Rate",
       xlab=colnames(post)[i])
}}
## [1] "PopChange  cor: 0.348783974516281  p=value: 0.102864927153252"

## [1] "Over.25.HS  cor: 0.39184103684848  p=value: 0.0582685584589117"

## [1] "Over.25.College  cor: 0.399544751432363  p=value: 0.0530751387496134"