Download raw data from here and unzip it: https://www.kaggle.com/stackoverflow/statsquestions
library(tidyverse)
stats_questions <- "~/Downloads/statsquestions"
questions <- read_csv(file.path(stats_questions, "Questions.csv"))
answers <- read_csv(file.path(stats_questions, "Answers.csv"))
tags <- read_csv(file.path(stats_questions, "Tags.csv"))
Don’t include the most recent few months, since there may not have been adequate time to answer them! And there were almost no questions in 2009.
library(lubridate)
answers_by_question <- answers %>%
count(Id = ParentId) %>%
rename(NumAnswers = n)
joined <- questions %>%
filter(CreationDate >= "2010-01-01", CreationDate <= "2016-07-01") %>%
left_join(answers_by_question, by = "Id") %>%
replace_na(list(NumAnswers = 0))
by_year <- joined %>%
group_by(Year = year(CreationDate)) %>%
summarize(NumQuestions = n(),
AverageAnswers = mean(NumAnswers),
PercentAnswered = mean(NumAnswers > 0))
By any metric, the amount to which questions are answered has been decreasing:
ggplot(by_year, aes(Year, AverageAnswers)) +
geom_line()
ggplot(by_year, aes(Year, PercentAnswered)) +
geom_line() +
scale_y_continuous(labels = scales::percent_format())
What about time of day? Note that times are UTC (so England’s time zone).
joined %>%
group_by(Year = year(CreationDate), Hour = hour(CreationDate)) %>%
summarize(NumQuestions = n(),
AverageAnswers = mean(NumAnswers),
PercentAnswered = mean(NumAnswers > 0)) %>%
ggplot(aes(Hour, PercentAnswered, color = Year, group = Year)) +
geom_line() +
expand_limits(y = 0) +
scale_y_continuous(labels = scales::percent_format())
No trend- a question asked at UTC midnight is about as likely to get an answer as a question asked midday, and this trend is true within each year.
One problem is that this doesn’t include closed or deleted questions, and older non-answered questions are more likely to be deleted. Using the Stack Exchange Data Explorer can get the data for closed (but not deleted) quesitons and investigate this further.
Still, I do think this is a real effect!