R Markdown

Whats at stake in this report?

There are many kinds of ethical considerations we have to consider when working with this type of data. These ethical concerns aren’t as impactful as they might be when working with a data set on something such as cancer, but they are still there. The ethical concerns for this data would be things such as, promoting a rental that is low in cost and has amazing qualities, but is in a bad area that has crime in it. The ethical considerations for this type of data won’t be as impactful as for other data sets, but they are still evident. The potential stakeholders in the data being presented would be people looking to rent out an Airbnb and people putting their places up for rental on Airbnb’s website. The repercussions of this analysis would be that not all variables of Airbnb’s were examined, meaning that the analysis is limited from its full potential. Here is my analysis:


Graphs of factors that may affect price of airbnb’s in Columbus

ggplot(Columbus, aes(price)) +geom_freqpoly(binwidth=50,aes(linetype = room_type)) + aes(color = room_type) + 
  labs(x="Price of rental", y="Number of room types", title= "Room type compared to price \n for airbnb's in Columbus") + theme(plot.title = element_text(hjust = 0.5))

Analyzing the graph

This graph is showing price compared to room type for Airbnb’s in Columbus, Ohio. A note that can be taken from this graph is that the room types that are most common for rent in Columbus are private rooms and entire homes/ apartments. Entire homes and apartments tend to be the most expensive to rent, while shared rooms tend to be the least expensive to rent. A tentative conclusion that can be taken from this graph is that as the size of the room type increases, the price of the rental also increases.

Columbus %>%
  ggplot(aes(x=price, y=accommodates)) +
  geom_jitter(alpha=1/2) +
  geom_smooth(method="lm") +
  labs(x="Price of rental", y="Number of people \n rental accommodates", title = "Price of airbnb compared \n to number of people the rental accommodates") + theme(plot.title = element_text(hjust = 0.5))

Analyzing the graph

This graph is showing the price of Airbnb’s compared to the number of people the rental accommodates. As you can see in this graph, most rentals accommodate between 1 to 8 people. There is no real trend between price and the number of people the Airbnb accommodates. You would think that as the number of people accommodated increases that the price would also increase, but that is not the case here. There are outliers in this graph in regard to price, but this price increase does not seem to be caused by the number of people the Airbnb accommodates. We will have to analyze other variables to understand why price is higher for some rentals than others.


Map of Columbus with Airbnbs plotted

Description of the map

This is a map of Columbus with the locations of all the Airbnb’s in the data set plotted on it. Along with the location of each Airbnb being plotted, in this map we can also see the general prices of the Airbnb in regard to its location. This price range is indicated on the side of the graph with light blue being 1000 dollars to rent, and dark blue being 0-250 dollars to rent.


Single variable line of regression

mod1 <- lm(price~guests_included, data= Columbus)


summary(mod1)
## 
## Call:
## lm(formula = price ~ guests_included, data = Columbus)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -330.97  -47.02  -21.02   19.98  720.98 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       45.920      4.596   9.992   <2e-16 ***
## guests_included   33.096      1.314  25.190   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 102.4 on 1099 degrees of freedom
## Multiple R-squared:  0.366,  Adjusted R-squared:  0.3655 
## F-statistic: 634.5 on 1 and 1099 DF,  p-value: < 2.2e-16
step <- stepAIC(mod1, direction ="both")
## Start:  AIC=10195.45
## price ~ guests_included
## 
##                   Df Sum of Sq      RSS   AIC
## <none>                         11530558 10195
## - guests_included  1   6657533 18188091 10695
step$anova
## Stepwise Model Path 
## Analysis of Deviance Table
## 
## Initial Model:
## price ~ guests_included
## 
## Final Model:
## price ~ guests_included
## 
## 
##   Step Df Deviance Resid. Df Resid. Dev      AIC
## 1                       1099   11530558 10195.45
autoplot(step)

Analyzing the line of regression

In this single-variable regression test, we can see that a quality that greatly affects the price of an Airbnb in Columbus is the number of guests allowed. This linear regression has about a 36 percent confidence rate of being correct. With the p-value being less than 0.05, we will reject the null hypothesis and publish these results as correct.


What is the null hypothesis of the shapiro.test?

The shapiro.test tests the null hypothesis, which is, that the samples in the data come from a normal distribution. Whereas the alternative hypothesis for the shapiro.test is that the samples tested do not come from a normal distribution.


The shapiro.test

shapiro.test(mod1$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  mod1$residuals
## W = 0.78941, p-value < 2.2e-16

Analyzing the shapiro.test

This shapiro.test is seeing if mod1 comes from a normal distribution. Since the p-value for this test is less than 0.05, we would reject the null hypothesis stating that this data comes from a normal distribution. With that being said, we would accept the alternative hypothesis that mod1 does not come from a normal distribution.


What is the null hypothesis of the ncvTest?

The ncvTest computes the core test of the null hypothesis which is, that there is constant error variance. The alternate hypothesis is that error variance changes with the level of the response, or with a linear combination of predictors.


The ncvTest

ncvTest(mod1)
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 617.2528, Df = 1, p = < 2.22e-16

Analyzing the ncvTest

This ncvTest is showing us that the residuals tested in mod1 do not have homoskedaticity. Therefore, we are rejecting the null hypothesis and accepting the alternate hypothesis.


What do the results from the shapiro.test and ncvtest suggest?

They give me some concern about my model, but it is nothing that can’t be fixed. My model does not come from a normal distribution and the residuals do not have homoskedaticity. This causes for a little concern in my model.


Pairs plot

Analyzing the pairs plot

This pairs plot slightly changes my mind on what variables i want to use for my model. I would still like to use the variables: superhost, longitude, identity and guests included. All of these variables are not strongly correlated with each other, which will impact the accuracy of my model in a positive way. With this being said, i will not use the variables beds and latitude. These variables have strong correlation, meaning they will negatively affect the accuracy of my model.


Multivariable linear regression

mod3 <- lm(price~superhost + longitude + identity+ guests_included, data=Columbus)



summary(mod3)
## 
## Call:
## lm(formula = price ~ superhost + longitude + identity + guests_included, 
##     data = Columbus)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -339.49  -49.84  -15.96   25.49  670.10 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -14761.011   4088.923  -3.610 0.000320 ***
## superhost          -40.541      6.288  -6.448  1.7e-10 ***
## longitude         -178.783     49.272  -3.628 0.000298 ***
## identity           -21.309      6.339  -3.362 0.000802 ***
## guests_included     31.186      1.298  24.021  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 99.05 on 1096 degrees of freedom
## Multiple R-squared:  0.4088, Adjusted R-squared:  0.4067 
## F-statistic: 189.5 on 4 and 1096 DF,  p-value: < 2.2e-16
step3 <- stepAIC(mod3, direction ="backward")
## Start:  AIC=10124.49
## price ~ superhost + longitude + identity + guests_included
## 
##                   Df Sum of Sq      RSS   AIC
## <none>                         10752147 10124
## - identity         1    110857 10863004 10134
## - longitude        1    129162 10881309 10136
## - superhost        1    407852 11159998 10164
## - guests_included  1   5660689 16412835 10588
step3$anova
## Stepwise Model Path 
## Analysis of Deviance Table
## 
## Initial Model:
## price ~ superhost + longitude + identity + guests_included
## 
## Final Model:
## price ~ superhost + longitude + identity + guests_included
## 
## 
##   Step Df Deviance Resid. Df Resid. Dev      AIC
## 1                       1096   10752147 10124.49
autoplot(step3)

Analyzing the multivariable regression test

As can be seen in the multivariable linear regression above, the qualities that affect price the most for Airbnb’s in Columbus are: having the host be the super host, the longitude location of the Airbnb, having the host identity verified and how many guests are allowed to be in the Airbnb. This linear regression has about a 40 percent confidence rate of being correct. With the p-value being less than 0.05, we will reject the null hypothesis and publish these results as correct.


Predictions of rental prices

Model1 = lm(price~bedrooms + guests_included + accommodates + property_type + room_type + bathrooms + beds + extra_people + maximum_nights, data=Columbus)

testdata <- data.frame(bedrooms=c(0,4,7), guests_included=c(1,4,14), accommodates=c(1,6,16), property_type=c("Loft","Guesthouse","House"), room_type=c("Shared room", "Private room", "Entire home/apt"), bathrooms=c(0,1.500,8), beds=c(1,3,14), extra_people=c(0,20,150), maximum_nights=c(1,1125,3000))

predict(Model1, testdata)
##         1         2         3 
##  63.99429 155.02395 517.43940

Description of perdiction function

This prediction model is showing different predicted prices for rentals based on values and characteristics of the house. Row 1 is showing values that will lead to a low price in Airbnb’s. Row 2 is showing values that will lead to middle priced Airbnb’s. Finally, row 3 is showing values that will lead to a high priced Airbnb’s.


What are the best characteristics for opening an Airbnb at a high price in columbus?

There are many characteristics of an Airbnb that need to be taken into consideration when renting them out in Columbus, Ohio. When wanting to open a rental at a high price in Columbus, you will want to include these characteritics: Have seven bedrooms in the Airbnb that will accomodate up to 16 people, while also allowing 14 guests to come over. The property type should be a house that has a maximum stay of up to 3000 nights. Have 8 bathrooms and 14 beds in the house to accomoddate guests. When considering opening an Airbnb in Columbus, these are all characteristics that you will want included into your rental in order to rent it out at a high price. With this being said, there was not enough time or space to dive into all of the variables in this data set. Therefore, there needs to be more analyzation in order to understand what other characteristics go into renting out an Airbnb at a high price in Columbus.


Analysis of The Mediterranean Traveller (Article #1)

Based on the analysis I have done for Airbnb’s in Columbus, I do agree with the author of this article to the extent that Airbnb’s are increasing in number and do have the ability of destroying local communities. Looking at the map of Columbus with the Airbnb’s plotted on it, it is easy to say that Airbnb’s are becoming very popular and very evident in today’s world. With this being said, I do not think that is a bad thing. As long as the hosts and owners of the Airbnb’s are renting their homes in a safe, legal, efficient, and friendly way, then I do not see the issue with the growing amount of Airbnb’s. Whether Airbnb’s are ethical or not really comes down to the owner of the Airbnb and how they are going about doing business for their rentals.


Analysis of Airbnb leads to median rent increase (Article #2)

Based off of my analysis of Airbnb’s in Columbus, I do agree with the article that there are a plethora of Airbnb’s in major cities, with that number only increasing. With this being said, I believe that is the owners right and choice to list their home as an Airbnb for rent if they want to. In terms of it raising rent in some places, this is a problem that needs to be addressed to greater extent. With all of this being said, I still support the use of Airbnb’s and believe it is the right of the owner on whether to use their house for rent or not.