Rebecca Johnston 12/02/2014
library(limma) # fit linear models
library(ggplot2) # graphing
library(limma) # graphing
library(reshape2) # for reshaping data from wide to tall format
library(plyr) # for data aggregation tasks
Load gene expression dataset:
prDat <- read.table("GSE4051_data.tsv")
str(prDat, max.level = 0)
## 'data.frame': 29949 obs. of 39 variables:
Load design matrix:
prDes <- readRDS("GSE4051_design.rds")
str(prDes)
## 'data.frame': 39 obs. of 4 variables:
## $ sidChar : chr "Sample_20" "Sample_21" "Sample_22" "Sample_23" ...
## $ sidNum : num 20 21 22 23 16 17 6 24 25 26 ...
## $ devStage: Factor w/ 5 levels "E16","P2","P6",..: 1 1 1 1 1 1 1 2 2 2 ...
## $ gType : Factor w/ 2 levels "wt","NrlKO": 1 1 1 1 2 2 2 1 1 1 ...
Explain gene expression in the wild type mice as a function of developmental stage (one-way ANOVA)?
Let's just work with the wild type data.
wtDat <- subset(prDat, select = prDes$gType == "wt")
str(wtDat, max.level = 0)
## 'data.frame': 29949 obs. of 20 variables:
wtDes <- subset(prDes, gType == "wt")
str(wtDes)
## 'data.frame': 20 obs. of 4 variables:
## $ sidChar : chr "Sample_20" "Sample_21" "Sample_22" "Sample_23" ...
## $ sidNum : num 20 21 22 23 24 25 26 27 28 29 ...
## $ devStage: Factor w/ 5 levels "E16","P2","P6",..: 1 1 1 1 2 2 2 2 3 3 ...
## $ gType : Factor w/ 2 levels "wt","NrlKO": 1 1 1 1 1 1 1 1 1 1 ...
Before we can use limma we must make our design matrix. Let's accept the default “reference + treatment effects” scheme for handling the devStage factor. I encourage you to inspect the design matrix and confirm it's what you expect.
wtDesMat <- model.matrix(~devStage, wtDes)
wtDesMat
## (Intercept) devStageP2 devStageP6 devStageP10 devStage4_weeks
## 12 1 0 0 0 0
## 13 1 0 0 0 0
## 14 1 0 0 0 0
## 15 1 0 0 0 0
## 28 1 1 0 0 0
## 29 1 1 0 0 0
## 30 1 1 0 0 0
## 31 1 1 0 0 0
## 36 1 0 1 0 0
## 37 1 0 1 0 0
## 38 1 0 1 0 0
## 39 1 0 1 0 0
## 20 1 0 0 1 0
## 21 1 0 0 1 0
## 22 1 0 0 1 0
## 23 1 0 0 1 0
## 5 1 0 0 0 1
## 6 1 0 0 0 1
## 7 1 0 0 0 1
## 8 1 0 0 0 1
## attr(,"assign")
## [1] 0 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$devStage
## [1] "contr.treatment"
Now we will fit the model, for all probes at once, and use eBayes() to moderate the estimated error variances:
wtFit <- lmFit(wtDat, wtDesMat)
wtEbFit <- eBayes(wtFit)
The first thing we might ask is “which genes show differential expression over the course of development”? This can be addressed with an overall F test for the model. In the language used in lecture, we will compare a “big” model to a “small” model, where the “big” model includes a mean parameter (or effect) for each level of devStage and the “small” model includes a single mean parameter, e.g. an intercept. You might expect this to be the F test performed by topTable() by default, i.e. when no specific coefficients or contrasts are given to the coef argument …
topTable(wtEbFit)
## X.Intercept. devStageP2 devStageP6 devStageP10
## 1423641_s_at 12.18 -0.0175 0.0750 0.0675
## 1438940_x_at 12.86 0.0850 0.1325 0.3425
## 1438657_x_at 12.78 0.1400 0.1250 -0.1850
## 1456736_x_at 12.32 0.1625 0.3050 0.2075
## 1436884_x_at 12.93 0.1775 0.3225 0.0300
## 1419700_a_at 12.32 0.1650 0.6475 0.8175
## 1435800_a_at 12.28 0.0450 0.6825 0.9000
## 1454613_at 12.47 -0.1075 -0.0500 -0.1025
## 1451240_a_at 13.00 0.3100 0.2800 0.2800
## 1450084_s_at 12.63 0.0825 0.0525 0.1725
## devStage4_weeks AveExpr F P.Value adj.P.Val
## 1423641_s_at 0.1800 12.24 45350 3.574e-36 5.201e-32
## 1438940_x_at 0.3500 13.04 44957 3.865e-36 5.201e-32
## 1438657_x_at -0.4500 12.71 43486 5.210e-36 5.201e-32
## 1456736_x_at 0.0725 12.47 39509 1.233e-35 6.725e-32
## 1436884_x_at 0.0250 13.04 39269 1.302e-35 6.725e-32
## 1419700_a_at 0.6825 12.78 39121 1.347e-35 6.725e-32
## 1435800_a_at 1.0200 12.81 36668 2.410e-35 1.031e-31
## 1454613_at -0.3825 12.34 35835 2.962e-35 1.078e-31
## 1451240_a_at -0.3700 13.10 35481 3.239e-35 1.078e-31
## 1450084_s_at 0.2600 12.75 34411 4.265e-35 1.234e-31
You'll see that, by default, topTable() reports the top 10 hits. But let's take more care and specify explicitly the coefficients we want to test for equality with zero.
colnames(coef(wtEbFit)) # remind yourself of the coef names
## [1] "(Intercept)" "devStageP2" "devStageP6" "devStageP10"
## [5] "devStage4_weeks"
(dsHits <- topTable(wtEbFit, coef = grep("devStage", colnames(coef(wtEbFit)))))
## devStageP2 devStageP6 devStageP10 devStage4_weeks AveExpr
## 1440645_at 0.3990 0.1952 0.9200 3.961 6.528
## 1416041_at 0.1580 0.4797 0.3327 5.115 9.383
## 1425222_x_at 0.8820 0.7995 1.5488 5.532 7.028
## 1451635_at 1.3025 1.1900 2.0160 6.188 8.319
## 1429028_at -2.4433 -3.4073 -4.3105 -4.602 8.045
## 1422929_s_at -2.9117 -3.6182 -3.5472 -3.661 7.278
## 1424852_at 0.4575 0.2298 0.5740 3.979 7.454
## 1425171_at 0.9980 3.0530 5.2787 6.079 9.620
## 1451617_at 0.7255 2.5128 4.9837 6.685 8.817
## 1451618_at 0.6028 2.8903 5.0507 6.288 9.431
## F P.Value adj.P.Val
## 1440645_at 425.4 1.588e-17 4.755e-13
## 1416041_at 195.5 1.522e-14 2.280e-10
## 1425222_x_at 173.4 4.348e-14 4.341e-10
## 1451635_at 157.3 1.013e-13 7.585e-10
## 1429028_at 148.8 1.646e-13 9.203e-10
## 1422929_s_at 146.9 1.844e-13 9.203e-10
## 1424852_at 143.2 2.290e-13 9.799e-10
## 1425171_at 138.8 3.002e-13 1.124e-09
## 1451617_at 136.5 3.485e-13 1.160e-09
## 1451618_at 134.2 4.032e-13 1.207e-09
You will notice that these are not the same hits we got with our first call to topTable(). Compare, e.g., the Affy IDs for the top hits and/or look at the typical F statistic magnitudes. And so we learn that you really must use the coef argument (or a contrasts workflow in more complicated settings) to explicitly define what you regard as a hit.
Use the hit list you stored above and your functions for extracting and plotting data to produce this plot for hits 3, 6, and 9 on the list.
Function for preparing gene expression data:
prepData <- function(x) {
luckyDf <- prDat[x, ]
luckyDf <- data.frame(gene = rownames(luckyDf), luckyDf)
luckyDf <- melt(luckyDf, id.vars = "gene", variable.name = "sidChar",
value.name = "gExp")
# Merge with wt experimental design
luckyDf <- merge(luckyDf, wtDes, by = "sidChar")
luckyDf <- arrange(luckyDf, devStage, gType)
}
Function for making strip plots:
makeStripplot <- function(x) {
ggplot(x, aes(devStage, gExp, colour = gType, group = gType)) + geom_point() +
facet_wrap(~gene) + stat_summary(fun.y = mean, geom = "line")
}
Run both functions given required gene list:
makeStripplot(newDat <- prepData(rownames(dsHits[c(3, 6, 9), ])))
str(newDat)
## 'data.frame': 60 obs. of 6 variables:
## $ sidChar : Factor w/ 39 levels "Sample_20","Sample_21",..: 1 1 1 2 2 2 3 3 3 4 ...
## $ gene : Factor w/ 3 levels "1422929_s_at",..: 3 2 1 1 3 2 2 3 1 3 ...
## $ gExp : num 5.83 5.49 9.88 10.7 5.72 ...
## $ sidNum : num 20 20 20 21 21 21 22 22 22 23 ...
## $ devStage: Factor w/ 5 levels "E16","P2","P6",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ gType : Factor w/ 2 levels "wt","NrlKO": 1 1 1 1 1 1 1 1 1 1 ...
Does it look plausible to you that – using only wild type data – these probes show the most compelling evidence for expression change over development?
Optional exercise: use lm() on one or all 3 of these probes and check if the F stats and p-values are similar. Don't expect exact equality because you must remember that limma has moderated the estimated error variance.
I'll choose one of the 3 probes, “1451617_at”, for a comparison:
mDat <- prepData("1451617_at")
str(mDat)
## 'data.frame': 20 obs. of 6 variables:
## $ sidChar : Factor w/ 39 levels "Sample_20","Sample_21",..: 1 2 3 4 8 9 10 11 16 17 ...
## $ gene : Factor w/ 1 level "1451617_at": 1 1 1 1 1 1 1 1 1 1 ...
## $ gExp : num 5.83 5.72 6.08 5.71 6.8 ...
## $ sidNum : num 20 21 22 23 24 25 26 27 28 29 ...
## $ devStage: Factor w/ 5 levels "E16","P2","P6",..: 1 1 1 1 2 2 2 2 3 3 ...
## $ gType : Factor w/ 2 levels "wt","NrlKO": 1 1 1 1 1 1 1 1 1 1 ...
Now call linear model using “lm” function:
mDatlm <- lm(gExp ~ devStage, mDat)
summary(mDatlm)
##
## Call:
## lm(formula = gExp ~ devStage, data = mDat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.9730 -0.1863 -0.0371 0.2429 1.1810
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.835 0.259 22.54 5.6e-13 ***
## devStageP2 0.726 0.366 1.98 0.066 .
## devStageP6 2.513 0.366 6.86 5.4e-06 ***
## devStageP10 4.984 0.366 13.61 7.6e-10 ***
## devStage4_weeks 6.685 0.366 18.26 1.2e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.518 on 15 degrees of freedom
## Multiple R-squared: 0.969, Adjusted R-squared: 0.961
## F-statistic: 119 on 4 and 15 DF, p-value: 3.58e-11
Compare this to limma result:
dsHits["1451617_at", c("F", "P.Value")]
## F P.Value
## 1451617_at 136.5 3.485e-13
The results from lm vs. limma for this probe are fairly similar, but not exactly the same.
You need to learn to take control of topTable() by using various arguments to get the hits you want in the order you want. Furthermore, you should familiarize yourself with the output it returns, so you are comfortable extracting the output that you need.
How many probes have Benjamini-Hochberg (“BH”) adjusted p-values for the F test conducted above that are less than 1e-05?
cutoff <- 1e-05
dsHits <- topTable(wtEbFit, coef = grep("devStage", colnames(coef(wtEbFit))),
p.value = cutoff, n = nrow(wtDat))
nrow(dsHits)
## [1] 350
What is the 63rd hit on this list? Provide it's Affy ID, F statistic, BH adjusted p-value, and the estimated effect for developmental stage “P6” in that order.
dsHits[63, c("F", "adj.P.Val", "devStageP6")]
## F adj.P.Val devStageP6
## 1451633_a_at 64.01 1.049e-07 2.069
Consider the effects associated with developmental stages P2 and P10. Scatterplot the t statistics for the test that the P2 effect is zero against that for P10. Ideally this plot would be a high-volume scatterplot, include an x = y line, and have an aspect ratio of 1 and common axes, but just do your best.
First identify differentially expressed genes in P2 and P10 using topTable, make a dataframe containing the t statistic columns from the respective results, then plot using ggplot2:
# colnames(wtEbFit)
P2Hits <- topTable(wtEbFit, coef = "devStageP2", n = nrow(wtDat),
sort = "none")
P10Hits <- topTable(wtEbFit, coef = "devStageP10", n = nrow(wtDat),
sort = "none")
# head(P2Hits)
# head(P10Hits)
P2P10t <- data.frame(P2 = P2Hits$t, P10 = P10Hits$t)
summary(P2P10t)
## P2 P10
## Min. :-15.934 Min. :-20.055
## 1st Qu.: -1.650 1st Qu.: -1.830
## Median : 0.299 Median : 0.457
## Mean : 0.104 Mean : -0.033
## 3rd Qu.: 1.961 3rd Qu.: 1.916
## Max. : 7.012 Max. : 16.750
P10 has the largest range in values, therefore use these values for both the x and y limits of the plot for the aspect ratio to be 1:
ggplot(P2P10t, aes(P2, P10)) +
geom_point(alpha = 0.1) +
stat_density2d(geom = "polygon", aes(fill = ..level..)) +
geom_abline(intercept = 0, slope = 1) +
scale_x_continuous(limits = c(min(P2P10t$P10), max(P2P10t$P10))) +
scale_y_continuous(limits = c(min(P2P10t$P10), max(P2P10t$P10))) +
xlab("t-statistic for P2 effect") +
ylab("t-statistic for P10 effect")
Create a density plot of the associated adjusted p-values, so you can get a sense of which developmental stage, P2 or P10, is more clearly distinguished from baseline E16.
P2P10adjP <- data.frame(P2 = P2Hits$adj.P.Val, P10 = P10Hits$adj.P.Val)
P2P10adjPtall <- melt(P2P10adjP, variable.name = "DevStage",
value.name = "adjPVal")
ggplot(P2P10adjPtall, aes(adjPVal, colour = DevStage)) + geom_density() + xlab("Adjusted P Value")
Is this what you'd expect? Yes, I would expect the gene expression of P10 to be more clearly distinguished from baseline E16 than P2 as it is a later time point.
If you require a BH adjusted p-value less than 1e-03, how many hits do you get for P2? How many for P10? How much overlap is there?
addmargins(table(P2Hits$adj.P.Val < 0.001, P10Hits$adj.P.Val < 0.001, dnn = c("P2",
"P10")))
## P10
## P2 FALSE TRUE Sum
## FALSE 29201 695 29896
## TRUE 1 52 53
## Sum 29202 747 29949
There are 53 hits for P2, 747 hits for P10, and 53 overlap.
How can you do this test without using
table?
Now just focus on the P10 effect. Create a scatterplot matrix of raw p-values, BH adjusted p-values, and BY p-values.
P10pVals <- data.frame(raw = P10Hits$P.Value,
BH = P10Hits$adj.P.Val,
BY = p.adjust(P10Hits$P.Value, method = "BY"))
head(P10pVals)
## raw BH BY
## 1 0.5156 0.6027 1
## 2 0.0420 0.1056 1
## 3 0.1602 0.2498 1
## 4 0.9666 0.9742 1
## 5 0.1911 0.2839 1
## 6 0.2164 0.3112 1
The ggplot2 result from plotmatrix looked awful, so I'm resorting to base graphics. A message from the ggplot2 output suggested the GGally package intsead but this package masks several functions I use regularly such as melt!
pairs(P10pVals)
As expected, the raw and BH-adjusted p values are positively correlated. I'm not sure what BY-adjusted P values are so I can't comment on this.
Let's try to distinguish genes that have stable expression at the last three developmental stages (P6, P10, and 4_weeks) from those that do not. If expression doesn't change from P6 to P10 to 4_weeks, then the effects for all 3 of those developmental stages should be the same. That means that the difference between the P10 and P6 effects is zero and ditto for the difference between 4_weeks effect and P10 (or P6, for that matter). Let's form these contrasts.
colnames(wtDesMat)
## [1] "(Intercept)" "devStageP2" "devStageP6" "devStageP10"
## [5] "devStage4_weeks"
(cont.matrix <- makeContrasts(
P10VsP6 = devStageP10 - devStageP6,
fourweeksVsP10 = devStage4_weeks - devStageP10,
levels = wtDesMat))
## Warning: Renaming (Intercept) to Intercept
## Contrasts
## Levels P10VsP6 fourweeksVsP10
## Intercept 0 0
## devStageP2 0 0
## devStageP6 -1 0
## devStageP10 1 -1
## devStage4_weeks 0 1
wtFitCont <- contrasts.fit(wtFit, cont.matrix)
## Warning: row names of contrasts don't match col names of coefficients
wtEbFitCont <- eBayes(wtFitCont)
What does topTable() do with our contrasts?
topTable(wtEbFitCont)
## P10VsP6 fourweeksVsP10 AveExpr F P.Value adj.P.Val
## 1440645_at 0.7247 3.041 6.528 632.7 2.224e-17 6.662e-13
## 1416041_at -0.1470 4.782 9.383 302.4 1.473e-14 2.206e-10
## 1425222_x_at 0.7492 3.983 7.028 235.4 1.300e-13 1.297e-09
## 1424852_at 0.3442 3.405 7.454 225.1 1.910e-13 1.430e-09
## 1420726_x_at 0.1732 3.551 7.190 203.5 4.555e-13 2.640e-09
## 1451635_at 0.8260 4.172 8.319 200.0 5.289e-13 2.640e-09
## 1429394_at -0.0980 2.410 7.848 167.5 2.416e-12 1.034e-08
## 1455447_at -0.9765 -1.800 9.973 153.5 5.063e-12 1.896e-08
## 1429791_at 0.2480 1.658 8.026 145.7 7.877e-12 2.621e-08
## 1422612_at 0.4837 3.426 8.833 142.2 9.676e-12 2.840e-08
The top hits are probes where there is big change from P6 to P10, from P10 to 4_weeks, or both. Let's check that by plotting the data from the top 4 hits.
contHits <- topTable(wtEbFitCont)
makeStripplot(prepData(rownames(contHits)[1:4]))
So far, so good. These 4 probes show little expression change from P6 to P10 and a strong increase from P10 to 4_weeks. I would like to find some where there's a change in each case but perhaps in opposite direction. Let's press on.
Let's use decideTests() to adjust the p-values for both contrasts globally, i.e. all together and then threshhold them at a cutoff of 1e-04.
cutoff <- 1e-04
wtResCont <- decideTests(wtEbFitCont, p.value = cutoff, method = "global")
summary(wtResCont)
## P10VsP6 fourweeksVsP10
## -1 4 8
## 0 29945 29895
## 1 0 46
We see there are 4 probes that go down from P6 to P10 and no hits going the other way. There are 8 probes that go down from P10 to 4_weeks and 46 going the other way. Let's try to pull out various hits and plot their data. Here are the 4 that decline from P6 to P10:
(hits1 <- rownames(prDat)[which(wtResCont[, "P10VsP6"] < 0)])
## [1] "1416635_at" "1437781_at" "1454752_at" "1455260_at"
makeStripplot(prepData(hits1))
Here are 4 of the 8 that decline from P10 to 4_weeks:
(hits2 <- rownames(prDat)[which(wtResCont[, "fourweeksVsP10"] < 0)])
## [1] "1416021_a_at" "1423851_a_at" "1434500_at" "1435415_x_at"
## [5] "1437502_x_at" "1448182_a_at" "1452679_at" "1455447_at"
makeStripplot(prepData(hits2[1:4]))
Is there any overlap between these probes?
intersect(hits1, hits2)
## character(0)
Apparently not. Here are 4 of the 46 that increase from P10 to 4_weeks:
(hits3 <- rownames(prDat)[which(wtResCont[, "fourweeksVsP10"] > 0)])
## [1] "1416041_at" "1417280_at" "1418406_at" "1418710_at"
## [5] "1418789_at" "1419069_at" "1420725_at" "1420726_x_at"
## [9] "1421061_at" "1421818_at" "1422612_at" "1424852_at"
## [13] "1424895_at" "1425222_x_at" "1426059_at" "1426223_at"
## [17] "1427388_at" "1428763_at" "1429394_at" "1429791_at"
## [21] "1430580_at" "1431174_at" "1433699_at" "1434297_at"
## [25] "1434573_at" "1435436_at" "1435679_at" "1435727_s_at"
## [29] "1436265_at" "1436287_at" "1440402_at" "1440645_at"
## [33] "1441518_at" "1442243_at" "1443252_at" "1446484_at"
## [37] "1449170_at" "1449393_at" "1451042_a_at" "1451635_at"
## [41] "1452243_at" "1453385_at" "1455493_at" "1457878_at"
## [45] "1458418_at" "1459904_at"
makeStripplot(prepData(hits3[1:4]))
Is there any overlap between these probes and the previous “down” hits?
intersect(hits1, hits3)
## character(0)
intersect(hits2, hits3)
## character(0)
That's disappointing. If I revisit this workflow but make the p-value cutoff less stringent, maybe I can find the gene expression profile I'm looking for:
cutoff <- 0.01
nHits <- 8
wtResCont <- decideTests(wtEbFitCont, p.value = cutoff, method = "global")
summary(wtResCont)
## P10VsP6 fourweeksVsP10
## -1 40 49
## 0 29897 29636
## 1 12 264
hits1 <- rownames(prDat)[which(wtResCont[, "P10VsP6"] < 0)]
makeStripplot(prepData(hits1[1:nHits]))
hits2 <- rownames(prDat)[which(wtResCont[, "fourweeksVsP10"] < 0)]
makeStripplot(prepData(hits2[1:nHits]))
hits3 <- rownames(prDat)[which(wtResCont[, "P10VsP6"] > 0)]
makeStripplot(prepData(hits3[1:nHits]))
hits4 <- rownames(prDat)[which(wtResCont[, "fourweeksVsP10"] > 0)]
makeStripplot(prepData(hits4[1:nHits]))
vennDiagram(wtResCont)
hits5 <- rownames(prDat)[which(wtResCont[, "P10VsP6"] != 0 & wtResCont[, "fourweeksVsP10"] !=
0)]
makeStripplot(prepData(hits5))
hits6 <- rownames(prDat)[which(wtResCont[, "P10VsP6"] > 0 & wtResCont[, "fourweeksVsP10"] <
0)]
makeStripplot(prepData(hits6))
See if you can find one or more probes that have some expression changes up to P6 and then hold steady all the way to 4_weeks.
Let's make a new contrast matrix and make all contrasts across the developmental stages:
(cont.matrix <- makeContrasts(
P2VsE16 = devStageP2 - Intercept,
P6VsP2 = devStageP6 - devStageP2,
P10VsP6 = devStageP10 - devStageP6,
fourweeksVsP10 = devStage4_weeks - devStageP10,
fourweeksVsP6 = devStage4_weeks - devStageP6,
levels = wtDesMat))
## Warning: Renaming (Intercept) to Intercept
## Contrasts
## Levels P2VsE16 P6VsP2 P10VsP6 fourweeksVsP10 fourweeksVsP6
## Intercept -1 0 0 0 0
## devStageP2 1 -1 0 0 0
## devStageP6 0 1 -1 0 -1
## devStageP10 0 0 1 -1 0
## devStage4_weeks 0 0 0 1 1
wtFitCont <- contrasts.fit(wtFit, cont.matrix)
## Warning: row names of contrasts don't match col names of coefficients
wtEbFitCont <- eBayes(wtFitCont)
cutoff <- 1e-03
wtResCont <- decideTests(wtEbFitCont, p.value = cutoff, method = "global")
summary(wtResCont)
## P2VsE16 P6VsP2 P10VsP6 fourweeksVsP10 fourweeksVsP6
## -1 29927 13 69 80 376
## 0 22 29900 29861 29432 28769
## 1 0 36 19 437 804
# head(wtResCont)
Find hits where contrasts between P2 vs E16 and P6 vs P2 are not equal to 0:
hits7 <- rownames(prDat)[which(wtResCont[, "P2VsE16"] < 0 & wtResCont[, "P6VsP2"] <
0 & wtResCont[, "P10VsP6"] == 0 & wtResCont[, "fourweeksVsP10"] == 0 & wtResCont[,
"fourweeksVsP6"] == 0)]
makeStripplot(prepData(hits7))