In the following analyses, I will be conducting several types of regressions to estimate if the number of people whom you know who were victims of violence/crime on college campuses fluctates as a function of gender. I also sought to investigate the effect that campus related factors had on this outcome - like the percent of drug and alcohol related crimes on a campus and what type of town the campus is located in.
The data were obtained from a public use dataset called DRUGS, ALCOHOL, AND STUDENT CRIME IN THE UNITED STATES. This is a survey administered in 1989, for a study that examined different aspects of campus crime among college undergraduate students. The study utilized a national cross-sectional sample of undergraduate students in the United States. The results of my analyses are as follows:
library(nlme)
library(dplyr)
library(magrittr)
library(tidyr)
library(haven)
library(lmerTest)
library(ggplot2)
library(texreg)
library(readr)
library(foreign)
campus <- read.dta("/Users/sophia.halkitis/Desktop/R/Datasets/ICPSR_09585/DS0001/09585-0001-Data.dta")
duplicated levels in factors are deprecated
campus<-filter(campus, VICTIMS < 90)
victim <- subset(campus, select=c(ID1, SEX, LOCATION, VICTIMS, DRGALC))
head(victim)
victim <- filter(victim, SEX != "MISSING", LOCATION != "MISSING", DRGALC != 999)%>%
mutate(sexbin = ifelse(SEX == "MALE",1,0))
In the code above, I selected a subset from the original large dataset to measure my phenomenon. I selected sex, type of town the campus was located in (rural, suburban, urban), the number victims to campus crime that a person knows, and the percentage of drug and alcohol related crimes on that campus. Of these variables, location and percent of drug/alcohol crimes pertain to the school, and sex and victims known pertain to the individual student.
ggplot()+
geom_bar(data = victim, aes(x = SEX, y = VICTIMS, fill = SEX), stat = "identity")+
theme(axis.text.x=element_blank())
The graph above shows that females know more victims to campus crime than do their male counterparts.
ggplot(data = victim)+
geom_bar(aes(x = SEX, y = VICTIMS, fill = SEX), stat = "identity")+
facet_wrap("LOCATION")+
theme(axis.text.x=element_blank())+
ylab("VICTIMS TO CAMPUS CRIME")
The graph above shows that females consistently know more people who are victims of campus crimes than males do. This relationship is the most pronounced in participants who attended college in suburban towns.
table(victim$SEX, victim$VICTIMS)
0 1 2 3 4 5 6 7 8 9 10 11 12 13
MALE 269 109 97 77 38 34 11 7 4 4 16 2 10 1
FEMALE 422 167 157 129 62 51 15 10 9 2 27 2 5 2
MISSING 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 20 22 25 30 50
MALE 4 4 0 1 0 0
FEMALE 3 6 1 2 1 2
MISSING 0 0 0 0 0 0
This table shows the count of males and females who responded to the number of victims they knew to campus crime.
ggplot()+
geom_bar(data = victim, aes(x = LOCATION, y = DRGALC, fill = LOCATION), stat = "identity")+
theme(axis.text.x=element_blank())+
ylab("DRUG/ALCOHOL CAMPUS CRIME")
The above chart shows the relationship between location of a school and the percent of drug and alcohol related crimes on that campus. According to the graph, suburban schools have the highest percentage of drug/alcohol related campus crimes. However, as the chunk below shows, suburban more responses than rural or urban, so it may be affecting the relationship observed in the graph.
#Getting the mode
modeloc <- subset(victim, select=c(LOCATION))
tab <- table(modeloc)
head(modeloc)
length(unique(victim$DRGALC))
[1] 52
victim %>%
group_by(DRGALC) %>%
summarise(n_people = n())
There are 52 different percentages for the percent of drug or alcohol related campus crimes.
First, I conducted an ecological regression which only considers a school level analyses.
victimschool <- victim %>%
group_by(DRGALC) %>%
summarise(mean_v = mean(VICTIMS, na.rm = TRUE), mean_s = mean(sexbin, na.rm = TRUE))
head(victimschool)
ecoreg <- lm(mean_v ~ mean_s, data = victimschool)
summary(ecoreg)
Call:
lm(formula = mean_v ~ mean_s, data = victimschool)
Residuals:
Min 1Q Median 3Q Max
-2.5161 -1.2779 -0.4266 0.8301 9.1200
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.5161 0.4591 5.480 1.39e-06 ***
mean_s 0.7277 1.0052 0.724 0.472
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.155 on 50 degrees of freedom
Multiple R-squared: 0.01037, Adjusted R-squared: -0.00942
F-statistic: 0.5241 on 1 and 50 DF, p-value: 0.4725
Viewing the data in this way is problematic, as the school level association (provided by the mean value of the IV and the DV) may not align with the indiviudal student data. The results of this analysis are insignifcant, but show that a unit increase in the proportion of boys in a school, is associated with a .7 increase in the average number of victims to campus crime.
Then, I excecuted the complete pooling method. This means that I only studied the relationship between sex and the liklihood of knowing a victim of campus crime, and disregarded other school-level factors that may contribute to this relationship.
cpooling <- gls(VICTIMS ~ sexbin, data = victim, method = "ML")
summary(cpooling)
Generalized least squares fit by maximum likelihood
Model: VICTIMS ~ sexbin
Data: victim
AIC BIC logLik
9512.848 9529.272 -4753.424
Coefficients:
Value Std.Error t-value p-value
(Intercept) 2.2009302 0.1094600 20.107158 0.0000
sexbin 0.0069186 0.1752216 0.039485 0.9685
Correlation:
(Intr)
sexbin -0.625
Standardized residuals:
Min Q1 Med Q3 Max
-0.6155396 -0.6136107 -0.3348146 0.2227775 13.3261933
Residual standard error: 3.586851
Degrees of freedom: 1763 total; 1761 residual
Although not significantly so, being a male decreases your chance of knowing somebody who was a victim of campus crime by .007. This analysis is not ideal, because it does not consider the effect of other factors that contribute to campus violence, such as the percentage of drug and alcohol related crimes on campus. This is an important factor to consider when analyzing this relationship, and cannot be disregarded, as campuses’ with higher drug/alcohol related crimes are likely to have more victims of campus crimes.
The intercept
Then, I executed the no-pooling model, which conducts 52 regressions for each differing percentage of drug/alcohol related campus crimes. I plotted the intercepts and slopes to analyze the results of these 52 regressions, to see the distrbution of slopes and intercepts.
npint <- victim %>%
group_by(DRGALC) %>%
do(mod = lm(VICTIMS ~ sexbin, data = .))
coef <- npint %>% do(data.frame(intc = coef(.$mod)[1]))
ggplot(coef, aes(x = intc)) + geom_histogram()
The graph shows that most people know 0-4 victims of campus crime. There are also some extreme outliers at 10 or 20 victims.
The slope
npslope <- victim %>%
group_by(DRGALC) %>%
do(mod = lm(VICTIMS ~ sexbin, data = .))
coef <- npslope %>% do(data.frame(sexc = coef(.$mod)[2]))
ggplot(coef, aes(x = sexc)) + geom_histogram()
The graph shows a mostly neutral relationship, with most people lingering around zero and a small amount from 0-5. In the distrbution of slopes there are also outliers in whcih there is a negative relationship.
Upon viewing the distrbution of intercepts and slopes for each differing percentage of drug/alcohol crimes on a college campus, we see that there is a quite large variation between the two parameters. The histogram above shows the distrbution of intercepts and slopes from the 52 regressions. Between the percentage of crime at each college campus, the coefficents differ greatly.
Random Intercept
I then conducted the partial pooling method, which allows for variations between the percentages listed by using a random intercept of those percentages. By utilizing this method, I can see the standard deviation of the [assumed] normally distributed intercepts - however, it does not allow me to see the gender difference in knowing victims to campus crime BY the percent of drug/alcohol crimes on that campus.
ri_lme <- lme(VICTIMS ~ sexbin, data = victim, random = ~1|DRGALC, method = "ML")
summary(ri_lme)
Linear mixed-effects model fit by maximum likelihood
Data: victim
AIC BIC logLik
9514.848 9536.747 -4753.424
Random effects:
Formula: ~1 | DRGALC
(Intercept) Residual
StdDev: 0.0002550538 3.586851
Fixed effects: VICTIMS ~ sexbin
Value Std.Error DF t-value p-value
(Intercept) 2.2009303 0.1094601 1710 20.107155 0.0000
sexbin 0.0069186 0.1752216 1710 0.039485 0.9685
Correlation:
(Intr)
sexbin -0.625
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-0.6155397 -0.6136107 -0.3348146 0.2227775 13.3261933
Number of Observations: 1763
Number of Groups: 52
While not significant, the results of the random intercept model show that being a male decreases the effect of knowing a victim of college crime by .007 for each unit increase.
Random Slope
Since the random intercept does not allow me to view the gender difference by school, I conducted a random slope model, which will allow for the sex coefficent to be varied between schools (%percent of drug and alc crimes on campus).
rs_lme <- lme(VICTIMS ~ sexbin, data = victim, random = ~ sexbin|DRGALC, method = "ML")
summary(rs_lme)
Linear mixed-effects model fit by maximum likelihood
Data: victim
AIC BIC logLik
9518.848 9551.696 -4753.424
Random effects:
Formula: ~sexbin | DRGALC
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 0.0001737849 (Intr)
sexbin 0.0004423964 0
Residual 3.5868509883
Fixed effects: VICTIMS ~ sexbin
Value Std.Error DF t-value p-value
(Intercept) 2.2009302 0.1094600 1710 20.107157 0.0000
sexbin 0.0069186 0.1752216 1710 0.039485 0.9685
Correlation:
(Intr)
sexbin -0.625
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-0.6155399 -0.6136107 -0.3348146 0.2227775 13.3261933
Number of Observations: 1763
Number of Groups: 52
Even when we use the slope, which allows for difference in school, we get insigicant results of sex on number of victims known to campus crime.
anova(cpooling, ri_lme, rs_lme)
Model df AIC BIC logLik Test L.Ratio
cpooling 1 3 9512.848 9529.272 -4753.424
ri_lme 2 4 9514.848 9536.747 -4753.424 1 vs 2 1.930275e-06
rs_lme 3 6 9518.848 9551.696 -4753.424 2 vs 3 1.380249e-07
p-value
cpooling
ri_lme 0.9989
rs_lme 1.0000
According to the results of the model selection, the complete pooling method is the best fit. Although none of the findings were significant, it was insightful nonetheless to determine that gender and percent of drug/alcohol related campus crimes are not significant determinants of knowing people who were victims to campus crimes. These insignificant findings can be attributed to a few things:
Last it is worth mentioning that insiginifcant findings are still findings, nonetheless! While it is frusrating to do this work and not have significant findings, the restriction of time did not permit me to find a new dataset and conduct new analyses.
loc <- lme(VICTIMS ~ sexbin + LOCATION, data = victim, random = ~ sexbin|DRGALC, method = "ML")
summary(loc)
Linear mixed-effects model fit by maximum likelihood
Data: victim
AIC BIC logLik
9521.507 9565.305 -4752.753
Random effects:
Formula: ~sexbin | DRGALC
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 0.0001631929 (Intr)
sexbin 0.0003973851 0
Residual 3.5854871680
Fixed effects: VICTIMS ~ sexbin + LOCATION
Value Std.Error DF t-value
(Intercept) 2.2995907 0.1601706 1708 14.357131
sexbin 0.0006277 0.1753888 1708 0.003579
LOCATIONSUBURBAN/SMALL TOWN -0.1787662 0.1832245 1708 -0.975667
LOCATIONRURAL 0.0924321 0.3465675 1708 0.266707
p-value
(Intercept) 0.0000
sexbin 0.9971
LOCATIONSUBURBAN/SMALL TOWN 0.3294
LOCATIONRURAL 0.7897
Correlation:
(Intr) sexbin LOCATT
sexbin -0.431
LOCATIONSUBURBAN/SMALL TOWN -0.717 0.013
LOCATIONRURAL -0.363 -0.030 0.328
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-0.6673154 -0.5916774 -0.3126004 0.2450288 13.3037456
Number of Observations: 1763
Number of Groups: 52
The results of this above regression when adding the school level variable of location, renders the result that location of one’s school is also not a significant contributor to the likelihood of knowing people who were victim’s to campus violence.
Citation:
Bausell, Carole R., Charles E. Maloy, and Jan M. Sherrill. DRUGS, ALCOHOL, AND STUDENT CRIME IN THE UNITED STATES, APRIL-MAY 1989. 2nd ICPSR version. Towson, MD: Towson State University, Center for the Study and Prevention of Campus Violence [producer], 1990. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2001. http://doi.org/10.3886/ICPSR09585.v2