Home Assignment 4

Deadline: October 9 2023, 8:00 AM

Author

QAXTCV

Description

Follow the instructions! Label all of your chunks appropriately. If you are in doubt, consult the Quarto cheatsheet (it was included in the folder with the others before the first class) or visit the Quarto website for help. Most of the exercises are based on examples we saw during the lecture, the point of these exercises is to practice the syntax.

You should submit only a Quarto file named after your Neptun-ID: MyNeptunID.qmd, make sure that the file can be rendered into an html.

Question 1: basics

Change the YAML header such that:

  1. Change the author to your NeptunID
  2. Set echo to true, meaning that the code should be visible in the html
  3. Set warnings to false, meaning that they should not be included

Clear the environment.

rm(list=ls())

Code
## clear the environment here:
rm(list=ls())

## load the packages
library(tidyverse)
library(ggpubr)
library(scales)
library(grid)
library(modelsummary)
library(fixest)
library(lspline)
if(!require(kableExtra)){
  install.packages("kableExtra")
  library(kableExtra)  
}
library(ggthemes)

Load the data from the following url: ‘https://osf.io/y6jvb/download’, name the dataframe hotels.

Code
hotels <- read_csv('https://osf.io/y6jvb/download')

Manipulate the data: keep the hotels with 2-4 stars (not missing), actual city being Vienna, prices less than 600, and non-missing rating. Change stars to factor variables, create log-prices called ln_price.

Code
hotels <- hotels %>% 
  filter(accommodation_type=='Hotel',
         city_actual=='Vienna',
         stars>=2 & stars<=4,
         !is.na(stars),
         price<=600,
         !is.na(rating)) %>% 
  mutate( stars = factor( stars ),
          ln_price = log( price ) )

Question 2: summary tables

Create a table with the kbl() and kable_styling() functions (based on the commands used in class) with the following descriptive statistics for price, log-price, and rating: mean, sd, N, Min, Max, Median, 10th and 90th quantile (not the usual 5th and 95th, and similarly to what we did before you have to first create them as new functions).

Label the table with the chunk label such that Table 1 refers to it, give the table a caption!

Code
P90 <- function(x){ quantile(x,.90,na.rm=TRUE)}
P10 <- function(x){ quantile(x,.10,na.rm=TRUE)}
datasummary( (`Price (EUR)` = price ) + 
              (`Distance (miles)` = distance ) + 
              (`Rating` = rating ) ~ Mean + SD + Min + 
              Max + Median + P10 + P90 + N, 
             data = hotels, 
             fmt = 1, 
             output = "dataframe"
             )  %>% 
  kbl(., digits = 3, booktabs = TRUE
      , table.attr = "data-quarto-disable-processing='true'") %>% 
  kable_styling(bootstrap_options = "striped",
                font_size = 20, full_width = FALSE,
                latex_options = c("HOLD_position", "scale_down"))
Table 1:

Descriptive statistics for used variables

Mean SD Min Max Median P10 P90 N
Price (EUR) 107.0 42.5 33.0 383.0 99.0 67.1 157.0 222
Distance (miles) 1.5 1.1 0.0 6.6 1.2 0.3 3.3 222
Rating 4.0 0.4 2.0 4.8 4.0 3.5 4.5 222

Use group_by() and summarise() commands to count the number of observations by offer categories (offer_cat variable), and use the kable-formatting to make the table pretty. Label the chunk according to Table 2, and provide a caption as well.

Code
hotels %>%
  group_by(offer_cat) %>%
  summarise(Count = n()) %>%
  kbl(., digits = 2, booktabs = TRUE
      , table.attr = "data-quarto-disable-processing='true'") %>% 
  kable_styling(bootstrap_options = "striped",
                font_size = 20, full_width = FALSE,
                latex_options = c("HOLD_position", "scale_down"))
Table 2:

Number of observations by offer categories

offer_cat Count
0% no offer 62
1-15% offer 50
15-50% offer 99
50%-75% offer 11

Question 3: figures

Create a boxplot labeled Figure 1 that shows the boxplot figure for prices (not logs) by offer-categories of the hotel and provide a caption to the figure, based on the example we covered in class.

The figure should have the following parameters:

  • the boxplot itself should be black, with size 0.5, width = 0.1, alpha = 0.7
  • create an errorbar, which should be black, width = 0.1, linewidth = 0.1, color black as well
  • add the mean as a point geom, with shape = 23, size 2, and color and fill being red
  • axis labels should be appropriate
  • set the theme to be something different from theme_bw()
  • text-size should be set to 12 in theme()
Code
fig_ln_prices <- ggplot(hotels, aes(y = price, x = offer_cat)) +
  geom_boxplot(color = 'black', size = 0.5, width = 0.1, alpha = 0.7) +
  stat_boxplot(geom = 'errorbar', width = 0.1,  linewidth = 0.1, color = 'black')+
  stat_summary(fun=mean, geom='point', shape=23, size=2, color='red', fill='red') +
  labs(y='Price (EUR)',x='Offer categories')+
  theme_dark() +
  theme( text = element_text(size = 12))
fig_ln_prices

Figure 1: ?(caption)

Question 4: regressions

Run the following regressions with heteroskedasticity robust standard errors, save the regression estimates:

  1. log-prices on offer categories
  2. log-prices on offer categories and stars
  3. log-prices on offer categories, stars, and distance
  4. log-prices on offer categories, stars, and distance, and rating
Code
reg_m1 <- feols(ln_price ~ offer_cat, data = hotels, vcov = 'hetero')
reg_m2 <- feols(ln_price ~ offer_cat + stars, data = hotels, vcov = 'hetero')
reg_m3 <- feols(ln_price ~ offer_cat + stars + distance, data = hotels, vcov='hetero')
reg_m4 <- feols(ln_price ~ offer_cat + stars + distance + rating, data = hotels, vcov= 'hetero')

Create a regression table with the four estimates using etable(). As we saw in class, add a dictionary with value labels such that the star-categories and the offer-categories are included (first check how they are displayed without the dictionary). Specify that we want 3 digits to be shown in the table for the estimates and the fit statistics, and we require only \(n\) and \(R^2\) to be visible in terms of fit statistics.

Use kbl() and kable_styling() to adjust the looks of the table (you can follow the class material, or you can experiment, the table should be visible in the end).

Code
dict_str <- c(
  "offer_cat15-50%offer" = "Offer category from 15-50%",
  "offer_cat50%-75% offer" = "Offer category from 50% to 75%",
  "offer_cat1-15%offer" = "Offer category from 1-15%",
  "stars2.5" = "2.5 stars",
  "stars3" = "3 stars",
  "stars3.5" = "3.5 stars",
  "stars4" = "4 stars"
)
    

              
reg_table <- etable( reg_m1, reg_m2, reg_m3, reg_m4,
       dict = dict_str, fitstat = c('n','r2'),
       depvar = TRUE, 
       digits = 3, 
       digits.stats = 3)

reg_table %>% 
  kbl(., digits = 3, booktabs = TRUE
      , table.attr = "data-quarto-disable-processing='true'") %>% 
  kable_styling(bootstrap_options = "striped", 
                font_size = 11, full_width = FALSE,
                latex_options = c("HOLD_position", "scale_down"))
reg_m1 reg_m2 reg_m3 reg_m4
Dependent Var.: ln_price ln_price ln_price ln_price
Constant 4.69*** (0.049) 4.24*** (0.097) 4.42*** (0.093) 3.29*** (0.241)
Offer category from 1-15% -0.060 (0.070) -0.122* (0.061) -0.128* (0.056) -0.102. (0.053)
Offer category from 15-50% -0.123* (0.058) -0.169*** (0.048) -0.203*** (0.045) -0.207*** (0.043)
Offer category from 50% to 75% -0.239 (0.166) -0.196. (0.117) -0.209. (0.109) -0.168. (0.101)
2.5 stars -0.022 (0.097) -0.094 (0.092) -0.165* (0.083)
3 stars 0.333*** (0.094) 0.384*** (0.090) 0.302*** (0.080)
3.5 stars 0.513*** (0.104) 0.557*** (0.094) 0.327*** (0.096)
4 stars 0.644*** (0.093) 0.638*** (0.087) 0.474*** (0.079)
distance -0.119*** (0.021) -0.098*** (0.018)
rating 0.304*** (0.058)
______________________________ _______________ _________________ _________________ _________________
S.E. type Heteroske.-rob. Heteroskeda.-rob. Heteroskeda.-rob. Heteroskeda.-rob.
Observations 222 222 222 222
R2 0.031 0.317 0.451 0.537