1.Consider the mtcars data set
mtcars[mtcars$gear == 4,]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
The above list has all the cars with 4 forward gears.
mtcars[mtcars$gear == 4 & mtcars$am == 1,]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
The above list has all of the cars with both 4 forward gears and manual transmission.
mtcars[mtcars$gear == 4 | mtcars$am == 1,]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
The above list has all the cars with either 4 forward gears or manual tramsission.
mean(mtcars[mtcars$carb == 2,]$mpg)
## [1] 22.4
Mean $mpg is the function to get the average of the mpg column in the list and mtcars carb limits it to only cars with 2 carburators.
?rivers
## starting httpd help server ...
## done
Putting a question mark in front of a data set gives information about that data set. Above is some information about the rivers data.
mean(rivers)
## [1] 591.1844
sd(rivers)
## [1] 493.8708
Mean and sd give the average and standard deviation of the rivers data set.
hist(rivers)
The hist function displays a histogram of the given data set, in this case the rivers data.
summary(rivers)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 135.0 310.0 425.0 591.2 680.0 3710.0
The summary function gives a summary of a given data set, in this case the rivers data
min(rivers)
## [1] 135
max(rivers)
## [1] 3710
The min and max functions show the minimum and maximum values of a given data set; here the rivers max and min are shown.
rivers[rivers > 1000]
## [1] 1459 1450 1243 2348 1171 3710 2315 2533 1306 1054 1270 1885 1100 1205
## [15] 1038 1770
Above is a list of all the rivers with a length of longer than 1000 miles.
x <- c(1,2,3)
y <- c(6,5,4)
The above code assigns the matrixes (1,2,3) to the variable x and (6,5,4) to the variable y.
x*2
## [1] 2 4 6
All of the numbers in the matrix will be multiplied by 2.
x*y
## [1] 6 10 12
The cross product of the two matrices will be found.
x[1]*y[2]
## [1] 5
The first number in x (1) and the second number in y (5) will be multiplied together.
sum((1:100)^2)
## [1] 338350
A number with a colon followed by another number will make a matrix of all integers between those two numbers, and then this value is squared, then all values are added together.
str(airquality)
## 'data.frame': 153 obs. of 6 variables:
## $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
## $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
## $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
## $ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
## $ Month : int 5 5 5 5 5 5 5 5 5 5 ...
## $ Day : int 1 2 3 4 5 6 7 8 9 10 ...
Str shows that there are 153 observations of 6 variables.
What are the names of the variables? Looking above at the str function we can see it also shows the headers of the airquality data set, which are Ozone, Solar.R, Wind, Temp, Month and Day.
What type of data is each variable? Again looking at the str data we see that the Ozone, Solar.R, Temp Month and Day are all integers, and the Wind is the only numeric data type.
Do you agree with the data type that has been given to each variable? What would have been some alternate choices? Most of the data could be interchangable between integer and numeric, as the data was often numbers that were saved with the integer data type. Since this doesn’t make a huge difference I agree with most of the choices made, however Month could have easily been saved as a character data type with the name of each month displayed instead of the number.
summary(state.region)
## Northeast South North Central West
## 9 16 12 13
The summary shows there are 4 regions states can be found in, Northeast, South, North Central and West. There are 9 states in the Northeast, 16 in the South, 12 in the North Central and 13 in the West.
state.name[state.area < 10000]
## [1] "Connecticut" "Delaware" "Hawaii" "Massachusetts"
## [5] "New Hampshire" "New Jersey" "Rhode Island" "Vermont"
Above is shown the areas of the 8 states with an area of less than 10000 square miles.
state.name[which.min(state.center$y)]
## [1] "Florida"
Since the y axis is the North South axis the minimum value on the y axis will be the further south. The code above shows that the 9th state in the list has the furthest south geographic center.
library(Lahman)
Since R is reset whenever it is knit the library must be reloaded.
a.How many observations ofhow many variables are there?
str(Batting)
## 'data.frame': 101332 obs. of 22 variables:
## $ playerID: chr "abercda01" "addybo01" "allisar01" "allisdo01" ...
## $ yearID : int 1871 1871 1871 1871 1871 1871 1871 1871 1871 1871 ...
## $ stint : int 1 1 1 1 1 1 1 1 1 1 ...
## $ teamID : Factor w/ 149 levels "ALT","ANA","ARI",..: 136 111 39 142 111 56 111 24 56 24 ...
## $ lgID : Factor w/ 7 levels "AA","AL","FL",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ G : int 1 25 29 27 25 12 1 31 1 18 ...
## $ AB : int 4 118 137 133 120 49 4 157 5 86 ...
## $ R : int 0 30 28 28 29 9 0 66 1 13 ...
## $ H : int 0 32 40 44 39 11 1 63 1 13 ...
## $ X2B : int 0 6 4 10 11 2 0 10 1 2 ...
## $ X3B : int 0 0 5 2 3 1 0 9 0 1 ...
## $ HR : int 0 0 0 2 0 0 0 0 0 0 ...
## $ RBI : int 0 13 19 27 16 5 2 34 1 11 ...
## $ SB : int 0 8 3 1 6 0 0 11 0 1 ...
## $ CS : int 0 1 1 1 2 1 0 6 0 0 ...
## $ BB : int 0 4 2 0 2 0 1 13 0 0 ...
## $ SO : int 0 0 5 2 1 1 0 1 0 0 ...
## $ IBB : int NA NA NA NA NA NA NA NA NA NA ...
## $ HBP : int NA NA NA NA NA NA NA NA NA NA ...
## $ SH : int NA NA NA NA NA NA NA NA NA NA ...
## $ SF : int NA NA NA NA NA NA NA NA NA NA ...
## $ GIDP : int NA NA NA NA NA NA NA NA NA NA ...
Str shows that there are 101332 observations of 22 variables in the Batting data set.
head(Batting)
## playerID yearID stint teamID lgID G AB R H X2B X3B HR RBI SB CS BB
## 1 abercda01 1871 1 TRO NA 1 4 0 0 0 0 0 0 0 0 0
## 2 addybo01 1871 1 RC1 NA 25 118 30 32 6 0 0 13 8 1 4
## 3 allisar01 1871 1 CL1 NA 29 137 28 40 4 5 0 19 3 1 2
## 4 allisdo01 1871 1 WS3 NA 27 133 28 44 10 2 2 27 1 1 0
## 5 ansonca01 1871 1 RC1 NA 25 120 29 39 11 3 0 16 6 2 2
## 6 armstbo01 1871 1 FW1 NA 12 49 9 11 2 1 0 5 0 1 0
## SO IBB HBP SH SF GIDP
## 1 0 NA NA NA NA NA
## 2 0 NA NA NA NA NA
## 3 5 NA NA NA NA NA
## 4 2 NA NA NA NA NA
## 5 1 NA NA NA NA NA
## 6 1 NA NA NA NA NA
Head shows the first six lines of data and the headings for the Batting data set.
max(Batting$X3B, na.rm = TRUE)
## [1] 36
Max shows the maximum number of triples hit in that season.
Batting[max(Batting$X3B, na.rm = TRUE),]$playerID
## [1] "ewellge01"
Batting[max(Batting$X3B, na.rm = TRUE),]$yearID
## [1] 1871
Most of the two lines of code above are the same, finding the max number of triples hit by a single player, with the only difference being displaying either the playerID or the yearID.