Follow the instructions! Label all of your chunks appropriately. If you are in doubt, consult the Quarto cheatsheet (it was included in the folder with the others before the first class) or visit the Quarto website for help. Most of the exercises are based on examples we saw during the lecture, the point of these exercises is to practice the syntax.
You should submit only a Quarto file named after your Neptun-ID: MyNeptunID.qmd, make sure that the file can be rendered into an html.
Question 1: basics
Change the YAML header such that:
Change the author to your NeptunID
Set echo to true, meaning that the code should be visible in the html
Set warnings to false, meaning that they should not be included
Clear the environment.
rm(list=ls())
Code
## clear the environment here:rm(list=ls())## load the packageslibrary(tidyverse)library(ggpubr)library(scales)library(grid)library(modelsummary)library(fixest)library(lspline)if(!require(kableExtra)){install.packages("kableExtra")library(kableExtra) }library(ggthemes)
Load the data from the following url: ‘https://osf.io/y6jvb/download’, name the dataframe hotels.
Manipulate the data: keep the hotels with 2-4 stars (not missing), actual city being Vienna, prices less than 600, and non-missing rating. Change stars to factor variables, create log-prices called ln_price.
Create a table with the kbl() and kable_styling() functions (based on the commands used in class) with the following descriptive statistics for price, log-price, and rating: mean, sd, N, Min, Max, Median, 10th and 90th quantile (not the usual 5th and 95th, and similarly to what we did before you have to first create them as new functions).
Label the table with the chunk label such that Table 1 refers to it, give the table a caption!
Code
P90 <-function(x){ quantile(x,.90,na.rm=TRUE)}P10 <-function(x){ quantile(x,.10,na.rm=TRUE)}datasummary( (`Price (EUR)`= price ) + (`Distance (miles)`= distance ) + (`Rating`= rating ) ~ Mean + SD + Min + Max + Median + P10 + P90 + N, data = hotels, fmt =1, output ="dataframe" ) %>%kbl(., digits =3, booktabs =TRUE , table.attr ="data-quarto-disable-processing='true'") %>%kable_styling(bootstrap_options ="striped",font_size =20, full_width =FALSE,latex_options =c("HOLD_position", "scale_down"))
Table 1:
Descriptive statistics for used variables
Mean
SD
Min
Max
Median
P10
P90
N
Price (EUR)
107.0
42.5
33.0
383.0
99.0
67.1
157.0
222
Distance (miles)
1.5
1.1
0.0
6.6
1.2
0.3
3.3
222
Rating
4.0
0.4
2.0
4.8
4.0
3.5
4.5
222
Use group_by() and summarise() commands to count the number of observations by offer categories (offer_cat variable), and use the kable-formatting to make the table pretty. Label the chunk according to Table 2, and provide a caption as well.
Create a boxplot labeled Figure 1 that shows the boxplot figure for prices (not logs) by offer-categories of the hotel and provide a caption to the figure, based on the example we covered in class.
The figure should have the following parameters:
the boxplot itself should be black, with size 0.5, width = 0.1, alpha = 0.7
create an errorbar, which should be black, width = 0.1, linewidth = 0.1, color black as well
add the mean as a point geom, with shape = 23, size 2, and color and fill being red
axis labels should be appropriate
set the theme to be something different from theme_bw()
Create a regression table with the four estimates using etable(). As we saw in class, add a dictionary with value labels such that the star-categories and the offer-categories are included (first check how they are displayed without the dictionary). Specify that we want 3 digits to be shown in the table for the estimates and the fit statistics, and we require only \(n\) and \(R^2\) to be visible in terms of fit statistics.
Use kbl() and kable_styling() to adjust the looks of the table (you can follow the class material, or you can experiment, the table should be visible in the end).