Write all code in the chunks provided. Complete this .Rmd file and knit it into an .html. You must upload both files for credit.

Remember to unzip to a real directory before running everything!

1. Basics of R and R markdown

1.1. Create a vector containing elements 10, 22, 27, 19, 20 and assign it with a name.

vectory <- c(10, 22, 27, 19, 20)

1.2. Use R as a calculator to compute the following values.

  1. 27(38-17) 567
  2. ln(14^7) 18.4734
  3. sqrt(436/12)6.027714
#calculate 
27 * (38 - 17)
## [1] 567
#calculate
log(14^7)
## [1] 18.4734
#calculate
sqrt(436/12)
## [1] 6.027714

1.3. Run the below code to create a vector. Observe what e contains and use ?seq to see help of function seq().

e <- seq(0, 10, length=5)
e
## [1]  0.0  2.5  5.0  7.5 10.0
?seq

#1.4. Create vector b = (87, 86, 85, … 56)

b <- 87:56

What is the 19th, 20th, and 21st elements of b? 69 68 67

elements <- b[c(19,20,21)]

1.4. Compute the following statistics of b:

sum_b <-sum(b) median_b <- median(b) sd_b <- sd(b) a) sum 2288 b) median 71.5 c) standard deviation 9.3808

1.5. Following the example given in lab1, mix in-line R calculations with text and make reference to vector b. You must use in-line R calculations at least once (e.g. functions like mean(), sd(), max()) and may not hard-code any numbers referenced in your text. An example is given below:

The average of b is 71.5.

The standard deviation 9.3808315 indicates the spread of the values in b

2. Research Question (You don’t need code for this question)

For this problem you’ll answer some questions to help explore your interests in data science. These are questions that you’re interested in. They don’t have to be things that you know the answer to and still less new areas of study.

However, problem 3 asks you to come up with a ‘big data’ dataset that you think you might use to answer your question. If you’re new to R or not sure about what to do, I encourage you to use the Airbnb data that we’ll be using in class. In that case, make sure that your answers to problem 2 relate to the airbnb data.

2.1: What are some areas of interest for you within sociology, big data, and computational social science?

I am interested in being able to quantify how new age businesses like Airbnb affect our current economic and societal well being before we fully give in to letting business like airbnb, uber, and doordash completely change how things work for citizens and businesses.

3. Import data and identify variables

3.1. Import your data into R and output the column names.

library(readr)
seattle_airbnb <- read_csv("~/Desktop/Lab02 2/data/seattle_airbnb.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 101 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): id, name, neighbourhood_group, neighbourhood
## dbl (2): price, number_of_reviews
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(seattle_airbnb)
colnames(seattle_airbnb)
## [1] "id"                  "name"                "neighbourhood_group"
## [4] "neighbourhood"       "price"               "number_of_reviews"

3.2. Use View(), head() or tail() to check your data. What variables does it contain? How many rows are in your data? What is the unit of analysis in your data?

It contains 6 variables, id, name, neighborhood group, price, and the number of reviews. It contains 101 rows. The unit of analysis is a listing of a airbnb property in the seattle area.

View(seattle_airbnb)
str(seattle_airbnb)
## spc_tbl_ [101 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ id                 : chr [1:101] "2318" "4291" "5682" "6606" ...
##  $ name               : chr [1:101] "Casa Madrona - Urban Oasis, 1 block from the Park!" "Sunrise in Seattle Master Suite" "Cozy Studio, min. to downtown -WiFi" "Fab, private seattle urban cottage!" ...
##  $ neighbourhood_group: chr [1:101] "Central Area" "Other neighborhoods" "Delridge" "Other neighborhoods" ...
##  $ neighbourhood      : chr [1:101] "Madrona" "Roosevelt" "South Delridge" "Wallingford" ...
##  $ price              : num [1:101] 296 82 48 90 70 80 165 125 120 125 ...
##  $ number_of_reviews  : num [1:101] 16 54 428 110 120 366 34 32 61 48 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   id = col_character(),
##   ..   name = col_character(),
##   ..   neighbourhood_group = col_character(),
##   ..   neighbourhood = col_character(),
##   ..   price = col_double(),
##   ..   number_of_reviews = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
nrow(seattle_airbnb)
## [1] 101

3.3. Discuss how might some variables serve your research interest as discussed in problem 2 above.

One of the easiest things to analyze would be the location density of the listings, it would provide insight to if these properties seem to be concentrated to certain neighborhoods or areas. You can then also look at the price and the compare the average price of the listings in these areas to the average price of rent in these areas and see what the difference is. This could reveal that these listings are either cheaper or more expensive then the average price in the area, either option is going to affect the market of the area these listings are in. Once you understand more about how these properties affect the value and price of the area you can make further inquires into the societal value this system has.