Write your homework using R Markdown and submit it in HTML format via Canvas. Name your homework as “HW#_LastName_FirstName”, where “#” is the homework number (1,2,…). In order to receive credits, you must include the necessary R codes, outputs and textual explanations in your submitted work.

  1. Reading, exploring and editing data in R. The data set available on Canvas, named strike.txt, consists of annual observations on the level of strike volume (days lost due to industrial disputes per 1000 wage salary earners) and their covariates in 18 OECD countries from 1951-1985. In particular, this data set includes the following variables: (1) country code; (2) year; (3) strike volume; (4) unemployment; (5) inflation; (6) parliamentary representation of social democratic and labor parties; and (7) a time-invariant measure of union centralization.

    1. Save the file strike.txt in your working directory, then load the data set into R using
    getwd()
    strikedat <- read.table("strike.txt", header=TRUE)

    Check if strikedat is a data frame.

        is.data.frame(strikedat)
    1. How many rows and columns does strikedat have? (If you do not have 625 rows and 7 columns, something is wrong; check the previous part to see what might have gone wrong in the previous part.)
      dim(strikedat)
    1. What are the names of the columns of strikedat?
      names(strikedat)
    1. What is the value of row 123, column 4 of strikedat?
      strikedat[123, 4]
    1. Display the last 15 entries of the second column of strikedat in its entirety.
     tail(strikedat[, 2], 15)
    1. Explain what this command:
    names(strikedat) = c("natcode","year","strikevol","unemployment","inflation","leftwingprop","unioncentr")
    head(strikedat)
    tail(strikedat)
    # this code updates the column names of strikedat. 

    does, by running it on your data and examining the object. (You may find the display functions head() and tail() useful here.)

    1. The column named lefwingprop contains a percentage (between 0 and 100). Create a new column in the data frame called leftwingprop.scaled that contains the actual proportion (between 0 and 1). Display the first 5 rows of this dataset.
    strikedat$leftwingprop.scaled <- strikedat$leftwingprop / 100
    head(strikedat,5)
    1. Using this new column, create a line plot of leftwingprop.scaled for country 1 (hint: use the column named natcode) where the y axis is the proportion and the x axis is year (hint: use the function plot(x,y,type="l")). Is there an apparent trend over time?
    country1 <- strikedat[strikedat$natcode == 1, ]
    plot(country1$year, country1$leftwingprop.scaled, type = "l",
     xlab = "Year", ylab = "Leftwing Proportion (Scaled)", 
     main = "Trend of Leftwing Proportion Over Time for Country 1")
    
    # over time the leftwing proportions seem to increase while having steep decreases every 10 years
    1. Instead of appending columns, create a new data frame strikedat.fix that takes the original dataset and replaces the columns for unemployment and leftwingprop with proportions. Display the first five rows of this new dataset.
      strikedat.fix <- strikedat
      strikedat.fix$unemployment <- strikedat.fix$unemployment / 100
      strikedat.fix$leftwingprop <- strikedat.fix$leftwingprop / 100
      head(strikedat.fix, 5)
  2. Decathlon with Superheroes. In your R console, run the following line: install.packages("ade4") in order to install the package ade4 (you only have to do this once). Then, the following code:

    library(ade4)
    data(olympic)

    will load an object called olympic into your current R workspace, containing data about records of 33 athletes in the 10 events of a decathlon: 100 meters (100), long jump (long), shotput (poid), high jump (haut), 400 meters (400), 110-meter hurdles (110), discus throw (disq), pole vault (perc), javelin (jave) and 1500 meters (1500).

    1. olympic is a list. How many objects does it hold? What are the types and names of these objects?
    length(olympic)
    names(olympic)
    sapply(olympic, class)
    1. Take the first object in olympic and copy it into a new object called olympicmat. Cast it into a matrix, then back to a data frame. Did anything change?
    olympicmat <- olympic[[1]]
    olympicmat <- as.matrix(olympicmat)
    olympicmat <- as.data.frame(olympicmat)
    class(olympicmat)
    #
    1. Change the column names of the olympicmat into something more human-readable (although in general, usage of succinct variable names is good practice): replace the column names with their longer versions shown above (e.g., 100 into 100 meters). Show the first 10 rows after doing so.
    colnames(olympicmat) <- c("100 meters", "long jump", "shotput", "high jump","400 meters", "110-meter hurdles", "discus throw","pole vault", "javelin", "1500 meters")
    1. Suppose we have found out that the first three contestants are superheros. Replace just these names with ironman, wolverine and hulk. Again, show the first 10 rows after doing so.
    rownames(olympicmat)[1:3] <- c("ironman", "wolverine", "hulk")
    head(olympicmat, 10)
    1. Now add a new datapoint to olympicmat, by appending the row shown below. Make sure to change the row name too. Show the last 10 rows after doing so. Is thor an extraordinary discus thrower? Draw a histogram using the function hist() to justify your answer.
    thor = c(8.52, 10.31, 16.28, 4.51, 30.12, 13.62, 50.5, 10.1, 100.24, 200.12)
    rbind(olympicmat, thor)
    tail(olympicmat, 10)
    
    hist(olympicmat[, "discus throw"], main = "Discus Throw Distribution",
     xlab = "Discus Throw Distance", ylab = "Frequency")
    #thor is an extrodinary thrower because his distance was 50.5 which is above the mean and median scores for throwing. it is the highest of all data points
    1. Add the above changes back to olympic, by assigning your olypicmat back into the first object of olympic. Make sure you refer to objects of a list by its name (e.g., mylist[["mykey"]]), and not the index (e.g., mylist[[3]]).
    olympic[["tab"]] <- olympicmat
    1. Now we will add a few objects to the list olympic. Add year and sporttype to the list in that order, with those same names.
    year = 1998
    sporttype = "decathlon"
    olympic[["year"]] <- year
    olympic[["sporttype"]] <- sporttype
    names(olympic)