# data import
urchin <- read.csv('fake-data.csv')
# view first few rows
head(urchin)Mock analysis of lead detections from Urchin tissue samples
From client:
My project involves analyzing the hard tissue of sea urchins (shell, teeth, and spines) for lead presence from two sample sites (the Cal Poly Pier, and a control site). I’m a little uncertain on how to organize my data when collecting it and how to run statistical tests to get results with said data. I have some very minor background using R coding, but not enough to feel confident setting up any code alone.
This document illustrates recommended data formatting based on our initial meeting and analysis based on some mock data.
Mock data
Here is some fake data formatted in the way we discussed. It is stored as a csv file. If needed, from excel/sheets/other simply open your spreadsheet and select Save as and under the file extension or file type dropdown menu look for Comma-separated values or .csv or similar. Move the resulting file to your working directory in R/RStudio (or alternatively, navigate to the file/project location using the RStudio file navigator and then select Gear icon > Set as working directory). Then read in the file as follows.
| urchin.id | site | spine | teeth | test |
|---|---|---|---|---|
| c.10 | ctrl | 0 | 0 | 0 |
| c.10 | ctrl | 0 | 1 | 0 |
| c.12 | ctrl | 0 | 1 | 0 |
| c.12 | ctrl | 0 | 0 | 0 |
| c.14 | ctrl | 0 | 0 | 1 |
| c.14 | ctrl | 1 | 0 | 0 |
The proportions of samples with lead detections by site in the mock data are:
| site | spine | teeth | test |
|---|---|---|---|
| ctrl | 0.15 | 0.2 | 0.2 |
| pier | 0.25 | 0.6 | 0.2 |
Testing for a difference in proportions
To test for a difference between sites in the share of urchins with lead present in teeth tissue, one first tallies the number of detections by site (table(...)) and then performs a test comparing binomial proportions (prop.test(...)):
# test comparing proportions
table(urchin$site, urchin$teeth) |>
prop.test()
2-sample test for equality of proportions with continuity correction
data: table(urchin$site, urchin$teeth)
X-squared = 5.1042, df = 1, p-value = 0.02387
alternative hypothesis: two.sided
95 percent confidence interval:
0.07281924 0.72718076
sample estimates:
prop 1 prop 2
0.8 0.4
The chi-square test (equivalent to the \(Z\)-test in case you learned that instead) indicates a significant difference. Below that the output provides an interval estimate and returns the sample proportions. You’ll notice that the estimates are inverted and show the share of nondetections (0’s) rather than detections (1’s); to get the estimates of interest simply recode the values:
# recode values by flipping 1's and 0's
table(urchin$site, 1 - urchin$teeth) |>
prop.test()
2-sample test for equality of proportions with continuity correction
data: table(urchin$site, 1 - urchin$teeth)
X-squared = 5.1042, df = 1, p-value = 0.02387
alternative hypothesis: two.sided
95 percent confidence interval:
-0.72718076 -0.07281924
sample estimates:
prop 1 prop 2
0.2 0.6
Lastly, the output reports an interval estimate for the difference prop 1 - prop 2, and above prop 1 is the control group, so the negative signs for the interval estimate indicate that the estimated proportion is higher for the pier site. So we have the statistical test result:
The data provide evidence of a difference in the proportion of urchins with lead present in teeth tissue between the pier site and the control site (\(\chi^2 = 5.1042\) on 1 degree of freedom, \(p = 0.02387\)).
And the estimated difference:
With 95% confidence, the share of urchins with lead present in teeth tissue is estimated to be between 7.28 and 72.72 percentage points greater at the pier site than at the control site.
That’s it! To analyze the other tissue samples, just replace $teeth above with other column names: urchin$spine and urchin$test for the mock data.
Troubleshooting
If any of your samples produce few to no detections and the estimated proportions are very low (or very high), the statistical test above may be unreliable. If so, R will warn you:
# note warning message -- this is due to one or more low estimated proportions
table(urchin$site, 1 - urchin$spine) |>
prop.test()Warning in prop.test(table(urchin$site, 1 - urchin$spine)): Chi-squared
approximation may be incorrect
2-sample test for equality of proportions with continuity correction
data: table(urchin$site, 1 - urchin$spine)
X-squared = 0.15625, df = 1, p-value = 0.6926
alternative hypothesis: two.sided
95 percent confidence interval:
-0.3959735 0.1959735
sample estimates:
prop 1 prop 2
0.15 0.25
In this instance, it is preferable to use Fisher’s exact test for the statistical inference:
# fisher's test
table(urchin$site, 1 - urchin$spine) |>
fisher.test()
Fisher's Exact Test for Count Data
data: table(urchin$site, 1 - urchin$spine)
p-value = 0.6948
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.07117728 3.32021494
sample estimates:
odds ratio
0.5378826
In this case, the p-values aren’t that different, so I’d feel fine about using the original test. If you did use Fisher’s test instead, unless you are already familiar with odds ratios from your prior STAT218 and know how to interpret the interval estimate for the odds ratio, I’d ignore the rest of the output below the p-value and simply report a point estimate for the difference in proportions.