This document demonstrates what your project should look like. The results, tables and graphs I include here are only examples.
Each section includes a recommended word count to give you an idea about how much you should write.
Try to stay around the word limit (not too below it or not too above it).
Note that the word count does not need to be exact but try to avoid going too below the limit or going too above it. For example, if I indicate that you can write anywhere between 100-300 words, do not submit that section with only 20 words or with 1000 words!
For each section, make sure that you include all the requested information.
Recommended word count for the introduction: 100-200 words.
What to include in the introduction:
Recommended word count for the literature review: 100-300 words.
What to include in the literature review section: Discuss relevant literature that is related to your research topic (3-4 papers will be enough, but you can include more as well)
Recommended word count for the data section: 100 words.
What to include in the data section:
education: Years of educationincome: Annual income in GBPage: Age of the respondentThis is where you summarize some of the important variables that describe the data.
For example, what is the average age in your sample?
What is the average education level?
What does the distribution of income look like? etc.
# Simulate data (note that this is just for demonstration for this tutorial,
# In your project, you do not need to simulate anything. You just use
# the actual crime data and do not need to simualte anything)
set.seed(123)
education <- rnorm(100, mean = 16, sd = 2)
income <- 20000 + 3000 * education + rnorm(100, mean = 0, sd = 5000)
age <- sample(25:60, 100, replace = TRUE)
# Use the data to generate summary statistics
data <- data.frame(education, income, age)
stargazer(data, type = "html", title = "Summary Statistics",digits = 3)
| Statistic | N | Mean | St. Dev. | Min | Max |
| education | 100 | 16.181 | 1.826 | 11.382 | 20.375 |
| income | 100 | 68,004.700 | 7,123.882 | 52,069.920 | 84,127.540 |
| age | 100 | 43.730 | 10.209 | 26 | 60 |
# Plot histogram of some of the important variables of interest (continuous variables e.g. income etc.)
ggplot(data, aes(x=income)) +
geom_histogram(binwidth=5000, fill="blue", alpha=0.7) +
theme_minimal() +
labs(title="Distribution of Income",
x="Income (GBP)",
y="Frequency")
Recommended word count for the methodology section: 100 words.
What to include in the Methodology section: Discuss what estimation methods you will be using to analyze the data and to answer your research question.
Recommended word count for the results/discussion section: 500-1000 words.
What to include in the results section: Here, you need to display your results and discuss their meaning. Please find more information/examples below.
Display estimation results:
# Fit a regression model
model <- lm(income ~ education + age, data = data)
# Create a regression table
stargazer(model, type = "html", title = "Regression Results",digits = 3)
| Dependent variable: | |
| income | |
| education | 2,859.074*** |
| (262.371) | |
| age | -101.376** |
| (46.920) | |
| Constant | 26,175.750*** |
| (4,770.707) | |
| Observations | 100 |
| R2 | 0.562 |
| Adjusted R2 | 0.553 |
| Residual Std. Error | 4,765.207 (df = 97) |
| F Statistic | 62.131*** (df = 2; 97) |
| Note: | p<0.1; p<0.05; p<0.01 |
You can also use various visualizations of your results:
# Scatter plot with regression line
ggplot(data, aes(x=education, y=income)) +
geom_point(alpha=0.5) +
geom_smooth(method="lm", se=FALSE, color="red") +
theme_minimal() +
labs(title="Effect of Education on Income",
x="Years of Education",
y="Income (GBP)")
Please make sure you do the following:
For each table or graph you display, you should include a brief discussion about what the results mean (interpretation, implications, statistical significance etc.)
For example, for a simple regression, you would need to interpret the regression coefficients and discuss their implications.
Example: “Each additional year of education increases income by approximately £3,000.”
Below, you can find some additional tips about how to generate tables:
# Customize the table
stargazer(model, type = "html",
title = "Customized Regression Table",
dep.var.labels = "Income (GBP)",
covariate.labels = c("Years of Education", "Age"),
notes = "Standard errors in parentheses",digits = 3)
| Dependent variable: | |
| Income (GBP) | |
| Years of Education | 2,859.074*** |
| (262.371) | |
| Age | -101.376** |
| (46.920) | |
| Constant | 26,175.750*** |
| (4,770.707) | |
| Observations | 100 |
| R2 | 0.562 |
| Adjusted R2 | 0.553 |
| Residual Std. Error | 4,765.207 (df = 97) |
| F Statistic | 62.131*** (df = 2; 97) |
| Note: | p<0.1; p<0.05; p<0.01 |
| Standard errors in parentheses | |
# Fit another model
model2 <- lm(income ~ education, data = data)
# Compare models
stargazer(model, model2, type = "html",
title = "Model Comparison",
column.labels = c("Model 1", "Model 2"),digits = 3)
| Dependent variable: | ||
| income | ||
| Model 1 | Model 2 | |
| (1) | (2) | |
| education | 2,859.074*** | 2,868.821*** |
| (262.371) | (267.197) | |
| age | -101.376** | |
| (46.920) | ||
| Constant | 26,175.750*** | 21,584.850*** |
| (4,770.707) | (4,350.615) | |
| Observations | 100 | 100 |
| R2 | 0.562 | 0.541 |
| Adjusted R2 | 0.553 | 0.536 |
| Residual Std. Error | 4,765.207 (df = 97) | 4,853.574 (df = 98) |
| F Statistic | 62.131*** (df = 2; 97) | 115.278*** (df = 1; 98) |
| Note: | p<0.1; p<0.05; p<0.01 | |
Recommended word count: 100-200 words.
What to include in the conclusion: