Write your homework using R Markdown and submit it in HTML format via Canvas. Name your homework as “HW#_LastName_FirstName”, where “#” is the homework number (1,2,…). In order to receive credits, you must include the necessary R codes, outputs and textual explanations in your submitted work.
Reading, exploring and editing data in R. The
data set available on Canvas, named strike.txt, consists of
annual observations on the level of strike volume (days lost due to
industrial disputes per 1000 wage salary earners) and their covariates
in 18 OECD countries from 1951-1985. In particular, this data set
includes the following variables: (1) country code; (2) year; (3) strike
volume; (4) unemployment; (5) inflation; (6) parliamentary
representation of social democratic and labor parties; and (7) a
time-invariant measure of union centralization.
strike.txt in your working directory,
then load the data set into R usinggetwd()
strikedat <- read.table("strike.txt", header=TRUE)
Check if strikedat is a data frame.
is.data.frame(strikedat)
strikedat have? (If you
do not have 625 rows and 7 columns, something is wrong; check the
previous part to see what might have gone wrong in the previous
part.) dim(strikedat)
strikedat? names(strikedat)
strikedat? strikedat[123, 4]
strikedat in its entirety. tail(strikedat[, 2], 15)
names(strikedat) = c("natcode","year","strikevol","unemployment","inflation","leftwingprop","unioncentr")
head(strikedat)
tail(strikedat)
# this code updates the column names of strikedat.
does, by running it on your data and examining the object. (You may
find the display functions head() and tail()
useful here.)
lefwingprop contains a percentage
(between 0 and 100). Create a new column in the data frame called
leftwingprop.scaled that contains the actual proportion
(between 0 and 1). Display the first 5 rows of this dataset.strikedat$leftwingprop.scaled <- strikedat$leftwingprop / 100
head(strikedat,5)
leftwingprop.scaled for country 1 (hint: use the column
named natcode) where the y axis is the proportion and the x
axis is year (hint: use the function plot(x,y,type="l")).
Is there an apparent trend over time?country1 <- strikedat[strikedat$natcode == 1, ]
plot(country1$year, country1$leftwingprop.scaled, type = "l",
xlab = "Year", ylab = "Leftwing Proportion (Scaled)",
main = "Trend of Leftwing Proportion Over Time for Country 1")
# over time the leftwing proportions seem to increase while having steep decreases every 10 years
strikedat.fix that takes the original dataset and
replaces the columns for unemployment and
leftwingprop with proportions. Display the first five rows
of this new dataset. strikedat.fix <- strikedat
strikedat.fix$unemployment <- strikedat.fix$unemployment / 100
strikedat.fix$leftwingprop <- strikedat.fix$leftwingprop / 100
head(strikedat.fix, 5)Decathlon with Superheroes. In your R console,
run the following line: install.packages("ade4") in order
to install the package ade4 (you only have to do this
once). Then, the following code:
library(ade4)
data(olympic)
will load an object called olympic into your current R
workspace, containing data about records of 33 athletes in the 10 events
of a decathlon: 100 meters (100), long jump (long), shotput (poid), high
jump (haut), 400 meters (400), 110-meter hurdles (110), discus throw
(disq), pole vault (perc), javelin (jave) and 1500 meters (1500).
olympic is a list. How many objects does it hold? What
are the types and names of these objects?length(olympic)
names(olympic)
sapply(olympic, class)
olympic and copy it into a new
object called olympicmat. Cast it into a matrix, then back
to a data frame. Did anything change?olympicmat <- olympic[[1]]
olympicmat <- as.matrix(olympicmat)
olympicmat <- as.data.frame(olympicmat)
class(olympicmat)
#
olympicmat into
something more human-readable (although in general, usage of succinct
variable names is good practice): replace the column names with their
longer versions shown above (e.g., 100 into
100 meters). Show the first 10 rows after doing so.colnames(olympicmat) <- c("100 meters", "long jump", "shotput", "high jump","400 meters", "110-meter hurdles", "discus throw","pole vault", "javelin", "1500 meters")
rownames(olympicmat)[1:3] <- c("ironman", "wolverine", "hulk")
head(olympicmat, 10)
olympicmat, by appending the
row shown below. Make sure to change the row name too. Show the last 10
rows after doing so. Is thor an extraordinary discus thrower? Draw a
histogram using the function hist() to justify your
answer.thor = c(8.52, 10.31, 16.28, 4.51, 30.12, 13.62, 50.5, 10.1, 100.24, 200.12)
rbind(olympicmat, thor)
tail(olympicmat, 10)
hist(olympicmat[, "discus throw"], main = "Discus Throw Distribution",
xlab = "Discus Throw Distance", ylab = "Frequency")
#thor is an extrodinary thrower because his distance was 50.5 which is above the mean and median scores for throwing. it is the highest of all data points
olympic, by assigning
your olypicmat back into the first object of
olympic. Make sure you refer to objects of a list by its
name (e.g., mylist[["mykey"]]), and not the index (e.g.,
mylist[[3]]).olympic[["tab"]] <- olympicmat
olympic. Add
year and sporttype to the list in that order,
with those same names.year = 1998
sporttype = "decathlon"
olympic[["year"]] <- year
olympic[["sporttype"]] <- sporttype
names(olympic)