As enlightened, community minded Middlebury students, it’s easy to fall into the trap of assuming we know what’s going on in our community. We think we know what people care about, what the buzz is, how everything from global news to campus gossip is affecting our bubble; but how as an individual student can any of us actually perceive this? We considered different ways we might be able to probe the topics that are part of the collective consciousness in a numerical and chronological way, ideally approaching some form of objectivity. A zillion ethnographic interviews weren’t going to happen (especially not during finals) so we went straight to the proverbial horse’s mouth: The Campus.
We knew from the beginning that The Campus is not a perfect reflection of student views, as not everyone feels comfortable or included in The Campus space and the perspectives of the editorial board would be overemphasized, however it is a wealth of chronologically organized data regarding the goings on at Middlebury, and notably offers space for any student - not just staff writers and editors - to write about whatever they want in the Op-Ed section. Rebecca Conover developed a text-mining tool that we used to collect and count the occurrence of every single word in all the available digital archives of The Campus dating back to 2011.
We picked a number of topics to investigate based on our perceptions of what topics have been important in the larger campus discourse. Spoiler: Campus text data is not a perfect metric of student body consciousness, but it can offer some insight into notable events and shifts in attiude. We filtered out all of the normal, semantically uninteresting words (think: ‘and’, ‘the’, ‘but’, ‘very’, etc.) from the data and created semantically related categories of words having to do with what we perceived to be hot-button topics including the environment, feminism, racism, and mental-health to name a few. Then we graphed the frequency of the categories, as well as individual words, over time to identify spikes and trends, working under the assumption that the more frequently these category-indexing words are used, the more the campus community must be engaging with that topic.
Perhaps unsurprisingly, for several topics word usage correlated with specific events, either internal to the Middlebury community, or external involving nationwide or worldwide developments.
A look at the topic of terrorism demonstrates this phenomenon, with spikes occurring in January 2015 after the Charlie Hebdo attack and the massacre of a school auditorium in Peshwar, Pakistan. Similarly, the attack on a Paris nightclub in late fall of 2015 caused a sharp increase in such discussion. It’s nice to see Middlebury students care at least somewhat about what’s happening outside our immediate community.
We see that there is relatively consistent mention of elections in the campus throughout the year, presumably related to SGA and other campus groups. But, unsurprisingly, there are significant peaks during presidental elections. We found that the trends during the 2016 election were especially interesting. (Sarah comment: in what way were they interesting?)
When it came to specific candidates, election coverage in The Campus divereged from wider national trends. For most of the campaign, Senator Sanders was mentioned more often than any of the candidates in The Campus. This is not true of the national news media, where coverage of Trump dominated. But, in some ways this mirrors trends that took place in extremely liberal dominated communities. In these bubbles, the possibility of Trump winning the presidency appeared so absurd that most didn’t give it any thought. Thus, when he did win there was a major bump as the community readjusted their perceptions. While these trends are not necessarily suprising, they point to an notable dynamic: at Middlebury, it is very easy for most of us to insulate ourselves from the reality of the rest of the country.
Turning to Middlebury specific events, vandalism has similarly not been a consistent matter of concern. Mentions are unsurprisingly linked to certain events occurring on or near Middlebury College campus, such as the mysterious tree vandals perplexing the college arborist in September of 2013 (in this issue of the Campus there was a spike to 10 counts), the anonymous spray-paint graffiti of controversial statements like ‘too many cops, too little justice’ found on various campus buildings in March of 2015 (associated with a spike to 22 counts of vandalism related words in the same issue). Somewhat surprisingly, despite the severity of the symbolism, the more recent case of local vandalism where swastikas were scrawled on the entry of the Havurah house appeared in an issue that only contained 7 counts of the semantic group.
Alcohol is another topic that is highly Middlebury-community specific. Spikes in these words occurred in issues that included articles about tension between the College and town residents and “Fighting Alcoholism in College” (October of 2014), handbook violations and alcohol policies (November of 2013), a drunk driving accident involving Norwich University students and a piece where school administrators weighed in on the new registered party rules (October of 2012).
Interestingly, spikes in the conversation around alcohol appear in association with policy changes and other organized events, but remains in conversation through an overall downward trend. The consistency is perhaps due to the weekly Pub Safe Log that invariably includes 16 or 17 alcohol-related citations.
Certain topics notably fail to follow expected trends based on campus events. Mental health related words demonstrate a sharp increase in fall of 2014, but in actuality these two data points were caused by two articles: one focused on depression and the other suicide, which makes more sense when the various terms are separated. Both of them were written by the same student, trying to call attention to the issue of mental health on campus. Discussion about mental health does not show a peak where one might expect it: in the spring of 2015 when the tragic suicide of a student sparked a number of suicide attempts and greater discussion on campus of the seriousness with which we must consider mental health. It begs the question: why we weren’t talking more about these matters in our publication?
Our analysis of environmentalism-related words also showed surprising results. The Campus publishes a “green issue” every academic year focusing on topics related to the environment with both a local and national scope. As you might expect, there were spikes in environmentalism-related words in these issues. They covered things from the local to the national, including Solar Decathlon entries (2011, 2012, 2013), the XL Keystone pipeline and the Addison Rutland Natural Gas Project protests, and extensive discussions of Middlebury’s commitment to carbon-neutrality. Surprisingly, since 2011 the overall trend for the environmentalism indexing words has decreased, the highest spike occurring in 2012 with 142 words in a single issue. Since then it appears that even in subsequent “green issues” either the writers have started using more synonyms, or there is simply less related content being published. Even on the issue published on September 29, 2016 as we approached our goal of carbon neutrality, the word count came in at a measly 34, the feature spread aptly titled: “ “Environment or Enterprise? Opinions Split on College Brand and Focus”. Perhaps other issues have taken precedence over environment for our community.
Our investigation suggests that word counts for some topics cannot reflect events, but rather a collective consciousness. For example, extraction of words pertaining to assault show that the Campus discussed sexual assault the most when it concerned a highly-publicized incident that happened off-campus. Indeed, at this moment there were likely more students than normal discussing sexual assault. However, this cannot be interpreted as a time when sexual assault was particularly rampant on campus. Confidentiality and the anxiety of reporting sexual assault cases leads to a lack of awareness surrounding their prevalence.
The discussion around women’s issues appears to be consistent but somewhat limited. The average issue that mentions this topic will mention it about two times. Additionally, about half of the issues from the past five years mentioned this topic. Why is the discussion of such a prevalent issue behaving in this way? We see it as a reflection of the fact that while problems of sexism and misogyny are far from absent at Middlebury, they don’t often emerge as moments of direct conflict. There haven’t been any sharp peaks but the dialogue persists.
We also examined the use of gendered pronouns as a barometer of representation of women in the Campus. We found that overall, there was an equal use of masculine and feminine pronouns.
While this is does not mean that there is equal representation at Middlebury, these trends may be revealing of a lack of explicit gender bias in The Campus. What would a more productive dialogue look like? Will women’s issues become more or less prevalent in the future? This analysis does not give us very much insight.
The most problematic topics may in fact be those that are not discussed, but should be. Words with the root “race” exhibit the most distinct upward trend in the past several years, with an increase that is significant based on standard error. Is Middlebury more racist now than it was five years ago? Not necessarily. To the contrary, the fact that people are talking about it may mean that people are confronting the reality of racism on campus and in our country. Of course this does not imply that racism has decreased, but could show that we are moving in the right direction.
One of the early spikes in race-related words, the data point in 2012 that hovers around 19 mentions, is in part attributable to an article called “Fighting Bias by Finding it Within Ourselves,” in which a white student shares with great shock that she took an IAT test and discovered she is subconsciously racist. Her tone implies the prevalent attitude at the time was that Middlebury was a post-racial community and that lack of discussion more likely reflected lack of awareness as opposed to lack of racism. More recent articles focus heavily on racist incidents occurring both on our campus and nationwide, suggesting students are more in tune with the significance of racism in our society.
So what does this all mean? In essence, text data is not perfect for several reasons: cognates, summer vacations, separation of phrases, editorial board bias, etc. How do these findings compare with your perceptions of campus consciousness? How does one probe quantitatively for a campus consciousness? Most importantly, is there such thing as a collective campus concsciousness?
These are topics we arbitrarily felt to be important, but maybe your experiences predispose you toward others that seem more significant. Even for a small campus, it takes something huge to trigger a shift in attention that rises above the noise of individual interests: national elections, terrorist attacks, the Black Lives Matter movement. More often, people are focused on the daily struggles of their personal lives, home lives, grades, or extracurriculars. Nonetheless, we encourage you take a look through our data set and explore topics you think might have captured the transient collective consciousness of Middlebury College.
The Campus online archives go back to October 2001, but the formating of these issues was not ideal for our analysis as each article is archived as pdf. Starting in the 2011-2012 school year, Campus issues are archived as a pdf of the printed paper, which we could more efficiently scrape for text data. For that reason we began our analysis there. The archives can be found here. We utilized the sedja text extractor to convert the words from each issue to a text file.
Generating the data frame was the most labor intensive part of this project.
The code used to generate the data frame can be found at this directory: “/home/rconover/Final Project/campusdata/datagen.R”
Essentially we stripped white space from lines, split lines into words, removed punctuation, removed stop words, counted word occurrence per issue, extracted date information from the file name, and then merged the data for each issue into one data frame.
This book was very helpful for cleaning our data and doing exploratory analysis. If we had more time and more data from older Campus issues, these tools could fascilitate intersting discussions.
We definitely are wary of the arbitrariness of our analysis. Again, with more time, it may be fruitful to explore more conventional sociolinguistic models in the context of this data. Sources like this one could be useful here.