The content of this webpage belongs to Lecture 1, activity 1.
presid_name= c("Obama","Bush","Bush","Clinton","Clinton","Bush Father","Reagan","Reagan","Carter","Nixon","Nixon","Johnson","Kennedy","Eisenhower","Eisenhower","Truman")
winner = c(185, 182, 182, 188, 188, 188, 185, 185, 177, 182, 182, 193, 183, 179, 179, 175)
opponent = c(175, 193, 185, 187, 188, 173, 180, 177, 183, 185, 180, 180, 182, 178, 178, 173)
year= seq (from= 2008, to= 1948, by=-4)
First, add a boolean column stating if the winner is taller or not:
isWinnerTaller= winner > opponent
Count how many times the winner has been taller than the opponent:
sum(isWinnerTaller)
## [1] 11
Create a data frame including all the created columns:
df_presidents= data.frame(presid_name, year, winner, opponent, isWinnerTaller)
Return only the first 6 rows of the data frame:
head(df_presidents)
Return the structure of the data frame:
str (df_presidents)
## 'data.frame': 16 obs. of 5 variables:
## $ presid_name : chr "Obama" "Bush" "Bush" "Clinton" ...
## $ year : num 2008 2004 2000 1996 1992 ...
## $ winner : num 185 182 182 188 188 188 185 185 177 182 ...
## $ opponent : num 175 193 185 187 188 173 180 177 183 185 ...
## $ isWinnerTaller: logi TRUE FALSE FALSE TRUE FALSE TRUE ...
Return all the column names of the data frame:
colnames(df_presidents)
## [1] "presid_name" "year" "winner" "opponent"
## [5] "isWinnerTaller"
Return the number of columns of the data frame:
ncol(df_presidents)
## [1] 5
Return the number of rows of the data frame:
nrow(df_presidents)
## [1] 16
Create a new column “difference” that calculates the height difference between the winner and the opponent:
df_presidents$difference = winner - opponent
Return a new data frame that deletes column no. 6 which is the newly generated “difference column
df_presidents [ , -6]
In order to save this change, update df_presidents to this new data frame:
df_presidents = df_presidents [, -6]
Return the new changed data frame:
df_presidents
Add the difference column again:
df_presidents$difference = winner - opponent
Finally, delete the column again. This time using the column name specifically:
df_presidents= df_presidents[ , colnames(df_presidents) != 'difference' ]
Print the final data frame (not including the difference column since it got deleted):
df_presidents
Return the second column of the data frame
df_presidents[ , 2]
## [1] 2008 2004 2000 1996 1992 1988 1984 1980 1976 1972 1968 1964 1960 1956 1952
## [16] 1948
# alternative 2: df_presidents [, 'year']
# alternative 3: df_presidents$year
Return the last column:
df_presidents[ , ncol(df_presidents)]
## [1] TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE
## [13] TRUE TRUE TRUE TRUE
# alternative: df_presidents[ , 5]
Return a subset of the data including the first 3 rows and rows 3 and 4:
df_presidents [c(1,2,3), c(3,4)]
# alternative: df_presidents [1:3 , c(3,4)]
Use the subset() function to get the rows from the data frame where the Winner > Opponent
subset (df_presidents, df_presidents$isWinnerTaller==TRUE)
# or, more succinct: subset (df_presidents, isWinnerTaller)
Get the winners’ names only for cases when Winner > Opponent
subset (df_presidents$presid_name, df_presidents$isWinnerTaller==TRUE)
## [1] "Obama" "Clinton" "Bush Father" "Reagan" "Reagan"
## [6] "Nixon" "Johnson" "Kennedy" "Eisenhower" "Eisenhower"
## [11] "Truman"
Add a new column “party”:
party= c("Dem", "Rep", "Rep", "Dem", "Dem", "Rep", "Rep", "Rep", "Dem","Rep","Rep","Dem","Dem","Rep","Rep","Dem")
Put the column party in the data frame, calling it “presid_party”:
df_presidents$presid_party= party
df_presidents
Return the mean of the winners` height, separated by party:
tapply (df_presidents$winner, df_presidents$presid_party, mean)
## Dem Rep
## 184.1429 182.6667
Return the max of the winners` height, separated by party:
tapply (df_presidents$winner, df_presidents$presid_party, max)
## Dem Rep
## 193 188
Return the mean heights, separated by winner and opponent:
apply (df_presidents [, c("winner", "opponent")] ,2, mean)
## winner opponent
## 183.3125 181.0625
# alternative: colMeans (df_presidents [, c("winner", "opponent")])
Return the standard deviation of the heights, separated by winner and opponent:
apply (df_presidents [, c("winner", "opponent")] , 2, sd)
## winner opponent
## 4.629165 5.579352
Print the standard deviation of “winner” individually:
sd(df_presidents [, "winner"])
## [1] 4.629165
Print the standard deviation of “opponent” individually:
sd(df_presidents [, "opponent"])
## [1] 5.579352