Home Assignment 4

Deadline: October 9 2023, 8:00 AM

Author

QAXTCV

Description

Follow the instructions! Label all of your chunks appropriately. If you are in doubt, consult the Quarto cheatsheet (it was included in the folder with the others before the first class) or visit the Quarto website for help. Most of the exercises are based on examples we saw during the lecture, the point of these exercises is to practice the syntax.

You should submit only a Quarto file named after your Neptun-ID: MyNeptunID.qmd, make sure that the file can be rendered into an html.

Question 1: basics

Change the YAML header such that:

Change the author to your NeptunID
Set echo to true, meaning that the code should be visible in the html
Set warnings to false, meaning that they should not be included

Clear the environment.

rm(list=ls())

Code

## clear the environment here:
rm(list=ls())

## load the packages
library(tidyverse)
library(ggpubr)
library(scales)
library(grid)
library(modelsummary)
library(fixest)
library(lspline)
if(!require(kableExtra)){
  install.packages("kableExtra")
  library(kableExtra)  
}
library(ggthemes)

Load the data from the following url: ‘https://osf.io/y6jvb/download’, name the dataframe hotels.

Code

hotels <- read_csv('https://osf.io/y6jvb/download')

Manipulate the data: keep the hotels with 2-4 stars (not missing), actual city being Vienna, prices less than 600, and non-missing rating. Change stars to factor variables, create log-prices called ln_price.

Code

hotels <- hotels %>% 
  filter(accommodation_type=='Hotel',
         city_actual=='Vienna',
         stars>=2 & stars<=4,
         !is.na(stars),
         price<=600,
         !is.na(rating)) %>% 
  mutate( stars = factor( stars ),
          ln_price = log( price ) )

Question 2: summary tables

Create a table with the kbl() and kable_styling() functions (based on the commands used in class) with the following descriptive statistics for price, log-price, and rating: mean, sd, N, Min, Max, Median, 10th and 90th quantile (not the usual 5th and 95th, and similarly to what we did before you have to first create them as new functions).

Label the table with the chunk label such that Table 1 refers to it, give the table a caption!

Code

P90 <- function(x){ quantile(x,.90,na.rm=TRUE)}
P10 <- function(x){ quantile(x,.10,na.rm=TRUE)}
datasummary( (`Price (EUR)` = price ) + 
              (`Distance (miles)` = distance ) + 
              (`Rating` = rating ) ~ Mean + SD + Min + 
              Max + Median + P10 + P90 + N, 
             data = hotels, 
             fmt = 1, 
             output = "dataframe"
             )  %>% 
  kbl(., digits = 3, booktabs = TRUE
      , table.attr = "data-quarto-disable-processing='true'") %>% 
  kable_styling(bootstrap_options = "striped",
                font_size = 20, full_width = FALSE,
                latex_options = c("HOLD_position", "scale_down"))

Table 1:
Descriptive statistics for used variables
	Mean	SD	Min	Max	Median	P10	P90	N
Price (EUR)	107.0	42.5	33.0	383.0	99.0	67.1	157.0	222
Distance (miles)	1.5	1.1	0.0	6.6	1.2	0.3	3.3	222
Rating	4.0	0.4	2.0	4.8	4.0	3.5	4.5	222

Use group_by() and summarise() commands to count the number of observations by offer categories (offer_cat variable), and use the kable-formatting to make the table pretty. Label the chunk according to Table 2, and provide a caption as well.

Code

hotels %>%
  group_by(offer_cat) %>%
  summarise(Count = n()) %>%
  kbl(., digits = 2, booktabs = TRUE
      , table.attr = "data-quarto-disable-processing='true'") %>% 
  kable_styling(bootstrap_options = "striped",
                font_size = 20, full_width = FALSE,
                latex_options = c("HOLD_position", "scale_down"))

Table 2:
Number of observations by offer categories
offer_cat	Count
0% no offer	62
1-15% offer	50
15-50% offer	99
50%-75% offer	11

Question 3: figures

Create a boxplot labeled Figure 1 that shows the boxplot figure for prices (not logs) by offer-categories of the hotel and provide a caption to the figure, based on the example we covered in class.

The figure should have the following parameters:

the boxplot itself should be black, with size 0.5, width = 0.1, alpha = 0.7
create an errorbar, which should be black, width = 0.1, linewidth = 0.1, color black as well
add the mean as a point geom, with shape = 23, size 2, and color and fill being red
axis labels should be appropriate
set the theme to be something different from theme_bw()
text-size should be set to 12 in theme()

Code

fig_ln_prices <- ggplot(hotels, aes(y = price, x = offer_cat)) +
  geom_boxplot(color = 'black', size = 0.5, width = 0.1, alpha = 0.7) +
  stat_boxplot(geom = 'errorbar', width = 0.1,  linewidth = 0.1, color = 'black')+
  stat_summary(fun=mean, geom='point', shape=23, size=2, color='red', fill='red') +
  labs(y='Price (EUR)',x='Offer categories')+
  theme_dark() +
  theme( text = element_text(size = 12))
fig_ln_prices

Question 4: regressions

Run the following regressions with heteroskedasticity robust standard errors, save the regression estimates:

log-prices on offer categories
log-prices on offer categories and stars
log-prices on offer categories, stars, and distance
log-prices on offer categories, stars, and distance, and rating

Code

reg_m1 <- feols(ln_price ~ offer_cat, data = hotels, vcov = 'hetero')
reg_m2 <- feols(ln_price ~ offer_cat + stars, data = hotels, vcov = 'hetero')
reg_m3 <- feols(ln_price ~ offer_cat + stars + distance, data = hotels, vcov='hetero')
reg_m4 <- feols(ln_price ~ offer_cat + stars + distance + rating, data = hotels, vcov= 'hetero')

Create a regression table with the four estimates using etable(). As we saw in class, add a dictionary with value labels such that the star-categories and the offer-categories are included (first check how they are displayed without the dictionary). Specify that we want 3 digits to be shown in the table for the estimates and the fit statistics, and we require only \(n\) and \(R^2\) to be visible in terms of fit statistics.

Use kbl() and kable_styling() to adjust the looks of the table (you can follow the class material, or you can experiment, the table should be visible in the end).

Code

dict_str <- c(
  "offer_cat15-50%offer" = "Offer category from 15-50%",
  "offer_cat50%-75% offer" = "Offer category from 50% to 75%",
  "offer_cat1-15%offer" = "Offer category from 1-15%",
  "stars2.5" = "2.5 stars",
  "stars3" = "3 stars",
  "stars3.5" = "3.5 stars",
  "stars4" = "4 stars"
)
    

              
reg_table <- etable( reg_m1, reg_m2, reg_m3, reg_m4,
       dict = dict_str, fitstat = c('n','r2'),
       depvar = TRUE, 
       digits = 3, 
       digits.stats = 3)

reg_table %>% 
  kbl(., digits = 3, booktabs = TRUE
      , table.attr = "data-quarto-disable-processing='true'") %>% 
  kable_styling(bootstrap_options = "striped", 
                font_size = 11, full_width = FALSE,
                latex_options = c("HOLD_position", "scale_down"))

	reg_m1	reg_m2	reg_m3	reg_m4
Dependent Var.:	ln_price	ln_price	ln_price	ln_price

Constant	4.69*** (0.049)	4.24*** (0.097)	4.42*** (0.093)	3.29*** (0.241)
Offer category from 1-15%	-0.060 (0.070)	-0.122* (0.061)	-0.128* (0.056)	-0.102. (0.053)
Offer category from 15-50%	-0.123* (0.058)	-0.169*** (0.048)	-0.203*** (0.045)	-0.207*** (0.043)
Offer category from 50% to 75%	-0.239 (0.166)	-0.196. (0.117)	-0.209. (0.109)	-0.168. (0.101)
2.5 stars		-0.022 (0.097)	-0.094 (0.092)	-0.165* (0.083)
3 stars		0.333*** (0.094)	0.384*** (0.090)	0.302*** (0.080)
3.5 stars		0.513*** (0.104)	0.557*** (0.094)	0.327*** (0.096)
4 stars		0.644*** (0.093)	0.638*** (0.087)	0.474*** (0.079)
distance			-0.119*** (0.021)	-0.098*** (0.018)
rating				0.304*** (0.058)
______________________________	_______________	_________________	_________________	_________________
S.E. type	Heteroske.-rob.	Heteroskeda.-rob.	Heteroskeda.-rob.	Heteroskeda.-rob.
Observations	222	222	222	222
R2	0.031	0.317	0.451	0.537