1. Basics of R and R markdown
2. Research Question (You don’t need code for this question)
- 2.1: What are some areas of interest for you within sociology, big data, and computational social science?
- 2.2: Provide a link to a dataset which you think intersects with one of your interests. Explain the connection. You can find datasets by doing a google search or by looking here:http://hadoopilluminated.com/hadoop_illuminated/Public_Bigdata_Sets.html or here: https://www.kaggle.com/
3. Import data and identify variables

Write all code in the chunks provided. Complete this .Rmd file and knit it into an .html. You must upload both files for credit.

Remember to unzip to a real directory before running everything!

1. Basics of R and R markdown

1.1. Create a vector containing elements 10, 22, 27, 19, 20 and assign it with a name.

vectory <- c(10, 22, 27, 19, 20)

1.2. Use R as a calculator to compute the following values.

27(38-17) 567
ln(14^7) 18.4734
sqrt(436/12)6.027714

#calculate 
27 * (38 - 17)

## [1] 567

#calculate
log(14^7)

## [1] 18.4734

#calculate
sqrt(436/12)

## [1] 6.027714

1.3. Run the below code to create a vector. Observe what e contains and use `?seq` to see help of function `seq()`.

e <- seq(0, 10, length=5)
e

## [1]  0.0  2.5  5.0  7.5 10.0

?seq

#1.4. Create vector b = (87, 86, 85, … 56)

b <- 87:56

What is the 19th, 20th, and 21st elements of b? 69 68 67

elements <- b[c(19,20,21)]

1.4. Compute the following statistics of b:

sum_b <-sum(b) median_b <- median(b) sd_b <- sd(b) a) sum 2288 b) median 71.5 c) standard deviation 9.3808

1.5. Following the example given in lab1, mix in-line R calculations with text and make reference to vector b. You must use in-line R calculations at least once (e.g. functions like mean(), sd(), max()) and may not hard-code any numbers referenced in your text. An example is given below:

The average of b is 71.5.

The standard deviation 9.3808315 indicates the spread of the values in b

2. Research Question (You don’t need code for this question)

For this problem you’ll answer some questions to help explore your interests in data science. These are questions that you’re interested in. They don’t have to be things that you know the answer to and still less new areas of study.

However, problem 3 asks you to come up with a ‘big data’ dataset that you think you might use to answer your question. If you’re new to R or not sure about what to do, I encourage you to use the Airbnb data that we’ll be using in class. In that case, make sure that your answers to problem 2 relate to the airbnb data.

2.2: Provide a link to a dataset which you think intersects with one of your interests. Explain the connection. You can find datasets by doing a google search or by looking here:http://hadoopilluminated.com/hadoop_illuminated/Public_Bigdata_Sets.html or here: https://www.kaggle.com/

If you’re using the airbnb set, explain how it connects to your interests.

In the last couple years I have seen many videos online of people who got rich by starting a “airbnb business”where they buy spaces for year-round rental on airbnb and then use the profits to buy more properties and so forth. I cannot imagine this is a good thing, so data sets like the airbnb data can show how this effects areas property value and demand.

3. Import data and identify variables

3.1. Import your data into R and output the column names.

library(readr)
seattle_airbnb <- read_csv("~/Desktop/Lab02 2/data/seattle_airbnb.csv")

## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)

## Rows: 101 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): id, name, neighbourhood_group, neighbourhood
## dbl (2): price, number_of_reviews
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(seattle_airbnb)
colnames(seattle_airbnb)

## [1] "id"                  "name"                "neighbourhood_group"
## [4] "neighbourhood"       "price"               "number_of_reviews"

3.2. Use View(), head() or tail() to check your data. What variables does it contain? How many rows are in your data? What is the unit of analysis in your data?

It contains 6 variables, id, name, neighborhood group, price, and the number of reviews. It contains 101 rows. The unit of analysis is a listing of a airbnb property in the seattle area.

View(seattle_airbnb)
str(seattle_airbnb)

## spc_tbl_ [101 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ id                 : chr [1:101] "2318" "4291" "5682" "6606" ...
##  $ name               : chr [1:101] "Casa Madrona - Urban Oasis, 1 block from the Park!" "Sunrise in Seattle Master Suite" "Cozy Studio, min. to downtown -WiFi" "Fab, private seattle urban cottage!" ...
##  $ neighbourhood_group: chr [1:101] "Central Area" "Other neighborhoods" "Delridge" "Other neighborhoods" ...
##  $ neighbourhood      : chr [1:101] "Madrona" "Roosevelt" "South Delridge" "Wallingford" ...
##  $ price              : num [1:101] 296 82 48 90 70 80 165 125 120 125 ...
##  $ number_of_reviews  : num [1:101] 16 54 428 110 120 366 34 32 61 48 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   id = col_character(),
##   ..   name = col_character(),
##   ..   neighbourhood_group = col_character(),
##   ..   neighbourhood = col_character(),
##   ..   price = col_double(),
##   ..   number_of_reviews = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

nrow(seattle_airbnb)

## [1] 101

3.3. Discuss how might some variables serve your research interest as discussed in problem 2 above.

One of the easiest things to analyze would be the location density of the listings, it would provide insight to if these properties seem to be concentrated to certain neighborhoods or areas. You can then also look at the price and the compare the average price of the listings in these areas to the average price of rent in these areas and see what the difference is. This could reveal that these listings are either cheaper or more expensive then the average price in the area, either option is going to affect the market of the area these listings are in. Once you understand more about how these properties affect the value and price of the area you can make further inquires into the societal value this system has.

Homework 1

Soc 225: Data & Society

GRIFFIN MCGRATH

2024-10-19

1. Basics of R and R markdown

1.1. Create a vector containing elements 10, 22, 27, 19, 20 and assign it with a name.

1.2. Use R as a calculator to compute the following values.

1.3. Run the below code to create a vector. Observe what e contains and use `?seq` to see help of function `seq()`.

1.4. Compute the following statistics of b:

1.5. Following the example given in lab1, mix in-line R calculations with text and make reference to vector b. You must use in-line R calculations at least once (e.g. functions like mean(), sd(), max()) and may not hard-code any numbers referenced in your text. An example is given below:

2. Research Question (You don’t need code for this question)

2.2: Provide a link to a dataset which you think intersects with one of your interests. Explain the connection. You can find datasets by doing a google search or by looking here:http://hadoopilluminated.com/hadoop_illuminated/Public_Bigdata_Sets.html or here: https://www.kaggle.com/

3. Import data and identify variables

3.1. Import your data into R and output the column names.

3.2. Use View(), head() or tail() to check your data. What variables does it contain? How many rows are in your data? What is the unit of analysis in your data?

3.3. Discuss how might some variables serve your research interest as discussed in problem 2 above.

Homework 1

Soc 225: Data & Society

GRIFFIN MCGRATH

2024-10-19

1. Basics of R and R markdown

1.1. Create a vector containing elements 10, 22, 27, 19, 20 and assign it with a name.

1.2. Use R as a calculator to compute the following values.

1.3. Run the below code to create a vector. Observe what e contains and use ?seq to see help of function seq().

1.4. Compute the following statistics of b:

1.5. Following the example given in lab1, mix in-line R calculations with text and make reference to vector b. You must use in-line R calculations at least once (e.g. functions like mean(), sd(), max()) and may not hard-code any numbers referenced in your text. An example is given below:

2. Research Question (You don’t need code for this question)

2.1: What are some areas of interest for you within sociology, big data, and computational social science?

2.2: Provide a link to a dataset which you think intersects with one of your interests. Explain the connection. You can find datasets by doing a google search or by looking here:http://hadoopilluminated.com/hadoop_illuminated/Public_Bigdata_Sets.html or here: https://www.kaggle.com/

3. Import data and identify variables

3.1. Import your data into R and output the column names.

3.2. Use View(), head() or tail() to check your data. What variables does it contain? How many rows are in your data? What is the unit of analysis in your data?

3.3. Discuss how might some variables serve your research interest as discussed in problem 2 above.

1.3. Run the below code to create a vector. Observe what e contains and use `?seq` to see help of function `seq()`.