[Video]
# Creating three 3's and four 4's, respectively
rep(3, 3)
## [1] 3 3 3
rep(4, 4)
## [1] 4 4 4 4
# Creating a vector with the first three even numbers and the first three odd numbers
seq(2, 6, by = 2)
## [1] 2 4 6
seq(1, 5, by = 2)
## [1] 1 3 5
# Re-creating the previous four vectors using the 'c' command
c(3, 3, 3)
## [1] 3 3 3
c(4, 4, 4, 4)
## [1] 4 4 4 4
c(2, 4, 6)
## [1] 2 4 6
c(1, 3, 5)
## [1] 1 3 5
# Add x to y and print
print(x + y)
## [1] 3 6 9 12 15 18 21
# Multiply z by 2 and print
print(2*z)
## [1] 2 2 4
# Multiply x and y by each other and print
print(x*y)
## [1] 2 8 18 32 50 72 98
# Add x to z, if possible, and print
print(x + z)
## Warning in x + z: longer object length is not a multiple of shorter object
## length
## [1] 2 3 5 5 6 8 8
# Create a matrix of all 1's and all 2's that are 2 by 3 and 3 by 2, respectively
matrix(1, nrow = 2, ncol = 3)
## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 1 1 1
print(matrix(2, nrow = 3, ncol = 2))
## [,1] [,2]
## [1,] 2 2
## [2,] 2 2
## [3,] 2 2
# Create a matrix and changing the byrow designation.
B <- matrix(c(1, 2, 3, 2), nrow = 2, ncol = 2, byrow = FALSE)
B <- matrix(c(1, 2, 3, 2), nrow = 2, ncol = 2, byrow = TRUE)
# Add A to the previously-created matrix
A + B
## [,1] [,2]
## [1,] 2 3
## [2,] 4 3
[Video]
Consider the matrix A created by the R code:
A = matrix(c(1, 2, 3, -1, 0, 3), nrow = 2, ncol = 3, byrow = TRUE)
Which of the following vectors b can be multiplied by A to create Ab?
# Multiply A by b
A%*%b
## [,1]
## [1,] 4
## [2,] 1
# Multiply B by b
B%*%b
## [,1]
## [1,] 0.000000
## [2,] 1.666667
# Multiply A by b
A%*%b
## [,1]
## [1,] -2
## [2,] 1
# Multiply B by b
B%*%b
## [,1]
## [1,] 2
## [2,] -1
# Multiply C by b
C%*%b
## [,1]
## [1,] -8
## [2,] -2
[Video]
The two matrices generated by the R code below are (small) examples of what are used in neural network models to weigh datasets for prediction:
A = matrix(c(1, 3, 2, -1, 0, 1), nrow = 2, ncol = 3)
B = matrix(c(-1, 1, 2, -3), nrow = 2, ncol = 2)
Often times these collections of weights are applied iteratively using successive applications of matrix multiplication.
Are A and B compatible in any way in terms of matrix multiplication? Use A%*%B and B%*%A in the console to check. What are the dimensions of the resulting matrix?
# Multiply A by B
A%*%B
## [,1] [,2]
## [1,] 0.7071068 0.7071068
## [2,] 0.7071068 -0.7071068
# Multiply A on the right of B
B%*%A
## [,1] [,2]
## [1,] 0.7071068 -0.7071068
## [2,] -0.7071068 -0.7071068
# Multiply the product of A and B by the vector b
A%*%B%*%b
## [,1]
## [1,] 1.414214
## [2,] 0.000000
# Multiply A on the right of B, and then by the vector b
B%*%A%*%b
## [,1]
## [1,] 0.000000
## [2,] -1.414214
# Take the inverse of the 2 by 2 identity matrix
solve(diag(2))
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
# Take the inverse of the matrix A
Ainv <- solve(A)
# Multiply A inverse by A
Ainv%*%A
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
# Multiply A by its inverse
A%*%Ainv
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
[Video]
A great deal of applied mathematics and statistics, as well as data science, ends in a matrix-vector equation of the form:
Ax = b
Which of the following is the most correct way to describe what solving this equation for x is trying to accomplish?
x that, upon some mysterious transformation, makes b.x that is a linear combination of the elements of b.b using a linear combination of the columns of A.b using a linear combination of the rows of A.# Print the Massey Matrix M
print(M)
## Atlanta Chicago Connecticut Dallas Indiana Los.Angeles Minnesota New.York
## 1 33 -4 -2 -3 -3 -3 -3 -3
## 2 -4 33 -3 -3 -3 -3 -2 -3
## 3 -2 -3 34 -3 -3 -3 -3 -4
## 4 -3 -3 -3 34 -3 -4 -3 -3
## 5 -3 -3 -3 -3 33 -3 -3 -3
## 6 -3 -3 -3 -4 -3 41 -8 -3
## 7 -3 -2 -3 -3 -3 -8 41 -3
## 8 -3 -3 -4 -3 -3 -3 -3 34
## 9 -3 -3 -4 -2 -3 -6 -4 -3
## 10 -3 -3 -3 -3 -3 -3 -3 -2
## 11 -3 -3 -3 -3 -2 -2 -3 -3
## 12 -3 -3 -3 -4 -4 -3 -6 -4
## Phoenix San.Antonio Seattle Washington
## 1 -3 -3 -3 -3
## 2 -3 -3 -3 -3
## 3 -4 -3 -3 -3
## 4 -2 -3 -3 -4
## 5 -3 -3 -2 -4
## 6 -6 -3 -2 -3
## 7 -4 -3 -3 -6
## 8 -3 -2 -3 -4
## 9 38 -3 -4 -3
## 10 -3 32 -4 -2
## 11 -4 -4 33 -3
## 12 -3 -2 -3 38
# Print the vector of point differentials f
print(f)
## Differential
## 1 -135
## 2 -171
## 3 152
## 4 -104
## 5 -308
## 6 292
## 7 420
## 8 83
## 9 -4
## 10 -213
## 11 -5
## 12 -7
## 13 0
# Find the sum of the first column of M
sum(M[, 1])
## [1] 0
# Find the sum of the vector f
sum(f)
## [1] 0
[Video]
For our WNBA Massey Matrix model, some adjustments need to be made for a solution to our rating problem to exist and be unique.
To see this, notice that the following code produces an error:
` > print(M)
1 33 -4 -2 -3 -3 -3 -3 -3 -3 -3 -3 -3 2 -4 33 -3 -3 -3 -3 -2 -3 -3 -3 -3 -3 3 -2 -3 34 -3 -3 -3 -3 -4 -4 -3 -3 -3 4 -3 -3 -3 34 -3 -4 -3 -3 -2 -3 -3 -4 5 -3 -3 -3 -3 33 -3 -3 -3 -3 -3 -2 -4 6 -3 -3 -3 -4 -3 41 -8 -3 -6 -3 -2 -3 7 -3 -2 -3 -3 -3 -8 41 -3 -4 -3 -3 -6 8 -3 -3 -4 -3 -3 -3 -3 34 -3 -2 -3 -4 9 -3 -3 -4 -2 -3 -6 -4 -3 38 -3 -4 -3 10 -3 -3 -3 -3 -3 -3 -3 -2 -3 32 -4 -2 11 -3 -3 -3 -3 -2 -2 -3 -3 -4 -4 33 -3 12 -3 -3 -3 -4 -4 -3 -6 -4 -3 -2 -3 38
solve(M) Error in solve.default(M) : system is computationally singular: reciprocal condition number = 3.06615e-17 `
Which of the conditions does M explicitly violate in this case?
In two dimensions, the solution structure of a system of two equations in two unknowns can be understood in a straightforward way via pictures, with the two equations representing lines (this is why it’s called linear algebra) in the x-y (or x1 - x2) plane. A solution is any point (x,y) ((x1,x2)) where the two lines intersect.
Which of the following three graphs is that of a linear system of two equations with two unknowns that has no solutions?
For our WNBA Massey Matrix model, some adjustments need to be made for a solution to our rating problem to exist and be unique.
This is because the matrix M, with R output 1 33 -4 -2 -3 -3 -3 -3 -3 -3 -3 -3 -3 2 -4 33 -3 -3 -3 -3 -2 -3 -3 -3 -3 -3 3 -2 -3 34 -3 -3 -3 -3 -4 -4 -3 -3 -3 4 -3 -3 -3 34 -3 -4 -3 -3 -2 -3 -3 -4 5 -3 -3 -3 -3 33 -3 -3 -3 -3 -3 -2 -4 6 -3 -3 -3 -4 -3 41 -8 -3 -6 -3 -2 -3 7 -3 -2 -3 -3 -3 -8 41 -3 -4 -3 -3 -6 8 -3 -3 -4 -3 -3 -3 -3 34 -3 -2 -3 -4 9 -3 -3 -4 -2 -3 -6 -4 -3 38 -3 -4 -3 10 -3 -3 -3 -3 -3 -3 -3 -2 -3 32 -4 -2 11 -3 -3 -3 -3 -2 -2 -3 -3 -4 -4 33 -3 12 -3 -3 -3 -4 -4 -3 -6 -4 -3 -2 -3 38
usually does not (computationally) have an inverse, as shown by the error produced from running solve(M) in a previous exercise.
One way we can change this is to add a row of 1’s on the bottom of the matrix M, a column of -1’s to the far right of M, and a 0 to the bottom of the vector of point differentials f⃗ .
What does that row of 1’s represent in the setting of rating teams? In other words, what does the final equation stipulate?
# Add a row of 1's
M_2 <- rbind(M, rep(1, 12))
# Add a column of -1's
M_3 <- cbind(M_2, rep(-1, 13))
# Change the element in the lower-right corner of the matrix
M_3[13, 13] <- 1
# Print M_3
print(M_3)
## Atlanta Chicago Connecticut Dallas Indiana Los.Angeles Minnesota New.York
## 1 33 -4 -2 -3 -3 -3 -3 -3
## 2 -4 33 -3 -3 -3 -3 -2 -3
## 3 -2 -3 34 -3 -3 -3 -3 -4
## 4 -3 -3 -3 34 -3 -4 -3 -3
## 5 -3 -3 -3 -3 33 -3 -3 -3
## 6 -3 -3 -3 -4 -3 41 -8 -3
## 7 -3 -2 -3 -3 -3 -8 41 -3
## 8 -3 -3 -4 -3 -3 -3 -3 34
## 9 -3 -3 -4 -2 -3 -6 -4 -3
## 10 -3 -3 -3 -3 -3 -3 -3 -2
## 11 -3 -3 -3 -3 -2 -2 -3 -3
## 12 -3 -3 -3 -4 -4 -3 -6 -4
## 13 1 1 1 1 1 1 1 1
## Phoenix San.Antonio Seattle Washington rep(-1, 13)
## 1 -3 -3 -3 -3 -1
## 2 -3 -3 -3 -3 -1
## 3 -4 -3 -3 -3 -1
## 4 -2 -3 -3 -4 -1
## 5 -3 -3 -2 -4 -1
## 6 -6 -3 -2 -3 -1
## 7 -4 -3 -3 -6 -1
## 8 -3 -2 -3 -4 -1
## 9 38 -3 -4 -3 -1
## 10 -3 32 -4 -2 -1
## 11 -4 -4 33 -3 -1
## 12 -3 -2 -3 38 -1
## 13 1 1 1 1 1
# Find the inverse of M
solve(M)
## [,1] [,2] [,3] [,4] [,5]
## Atlanta 0.032449804 0.005402927 0.003876665 0.004630004 0.004629590
## Chicago 0.005402927 0.032446789 0.004608094 0.004626913 0.004628272
## Connecticut 0.003876665 0.004608094 0.031714805 0.004613451 0.004629714
## Dallas 0.004630004 0.004626913 0.004613451 0.031707219 0.004649172
## Indiana 0.004629590 0.004628272 0.004629714 0.004649172 0.032447936
## Los.Angeles 0.004626242 0.004554829 0.004676789 0.005214940 0.004652111
## Minnesota 0.004611109 0.003985203 0.004651940 0.004727810 0.004678479
## New.York 0.004609212 0.004627729 0.005362761 0.004647832 0.004649262
## Phoenix 0.004610546 0.004608018 0.005295038 0.004013187 0.004613089
## San.Antonio 0.004630254 0.004631081 0.004608596 0.004609009 0.004587382
## Seattle 0.004629212 0.004631185 0.004646217 0.004595132 0.003854641
## Washington 0.004627769 0.004582295 0.004649264 0.005298666 0.005313685
## rep(-1, 13) -0.083333333 -0.083333333 -0.083333333 -0.083333333 -0.083333333
## [,6] [,7] [,8] [,9] [,10]
## Atlanta 0.004626242 0.004611109 0.004609212 0.004610546 0.004630254
## Chicago 0.004554829 0.003985203 0.004627729 0.004608018 0.004631081
## Connecticut 0.004676789 0.004651940 0.005362761 0.005295038 0.004608596
## Dallas 0.005214940 0.004727810 0.004647832 0.004013187 0.004609009
## Indiana 0.004652111 0.004678479 0.004649262 0.004613089 0.004587382
## Los.Angeles 0.027807608 0.007319076 0.004637275 0.006363490 0.004606288
## Minnesota 0.007319076 0.027810474 0.004677632 0.005388578 0.004578013
## New.York 0.004637275 0.004677632 0.031716432 0.004648253 0.003835528
## Phoenix 0.006363490 0.005388578 0.004648253 0.029212019 0.004646110
## San.Antonio 0.004606288 0.004578013 0.003835528 0.004646110 0.033267202
## Seattle 0.004032687 0.004573214 0.004607331 0.005265228 0.005427397
## Washington 0.004841998 0.006331805 0.005314087 0.004669776 0.003906474
## rep(-1, 13) -0.083333333 -0.083333333 -0.083333333 -0.083333333 -0.083333333
## [,11] [,12] [,13]
## Atlanta 0.004629212 0.004627769 8.333333e-02
## Chicago 0.004631185 0.004582295 8.333333e-02
## Connecticut 0.004646217 0.004649264 8.333333e-02
## Dallas 0.004595132 0.005298666 8.333333e-02
## Indiana 0.003854641 0.005313685 8.333333e-02
## Los.Angeles 0.004032687 0.004841998 8.333333e-02
## Minnesota 0.004573214 0.006331805 8.333333e-02
## New.York 0.004607331 0.005314087 8.333333e-02
## Phoenix 0.005265228 0.004669776 8.333333e-02
## San.Antonio 0.005427397 0.003906474 8.333333e-02
## Seattle 0.032485332 0.004585756 8.333333e-02
## Washington 0.004585756 0.029211757 8.333333e-02
## rep(-1, 13) -0.083333333 -0.083333333 2.220446e-16
[Video]
As we saw in the video, solving matrix-vector equations is as simple as multiplying both sides of the equation by A’s inverse, A−1, should it exist. The analogy with solving linear equations like 5x=7 is a good one.
If A−1 doesn’t exist, this does not work. The equivalent analogy for linear equations would be a situation in which the coefficient in front of the x were 0, which is the only real number that does not have an inverse. Which of the following does NOT analogize in this situation?
# Solve for r and rename column
r <- solve(M)%*%f
colnames(r) <- "Rating"
# Print r
print(r)
## Rating
## Atlanta -4.012938e+00
## Chicago -5.156260e+00
## Connecticut 4.309525e+00
## Dallas -2.608129e+00
## Indiana -8.532958e+00
## Los.Angeles 7.850327e+00
## Minnesota 1.061241e+01
## New.York 2.541565e+00
## Phoenix 8.979110e-01
## San.Antonio -6.181574e+00
## Seattle -2.666953e-01
## Washington 5.468121e-01
## WNBA 1.043610e-14
The dplyr package has been loaded for you, as has the solution to the previous question. The arrange() function in dplyr allows you to re-order a vector based on a trait.
In the previous exercise, you rated the teams at the end of the 2017 WNBA season using the solution to a matrix-vector equation.
Using the the syntax
arrange(r, -Rating)
we can see which team was the best in the WNBA in 2017 (using the negative (“-”) sign in front of the ordering variable (“Rating”) puts the values in descending order, as opposes to ascending order if just “Rating” is used).
Which team was the best?
# arrange(r, -Rating)
[Video]
Which of the following was NOT proposed as a method to solve matrix-vector equations with non-square matrices?
# Print M
print(M)
## Atlanta Chicago Connecticut Dallas Indiana Los.Angeles Minnesota New.York
## [1,] 33 -4 -2 -3 -3 -3 -3 -3
## [2,] -4 33 -3 -3 -3 -3 -2 -3
## [3,] -2 -3 34 -3 -3 -3 -3 -4
## [4,] -3 -3 -3 34 -3 -4 -3 -3
## [5,] -3 -3 -3 -3 33 -3 -3 -3
## [6,] -3 -3 -3 -4 -3 41 -8 -3
## [7,] -3 -2 -3 -3 -3 -8 41 -3
## [8,] -3 -3 -4 -3 -3 -3 -3 34
## [9,] -3 -3 -4 -2 -3 -6 -4 -3
## [10,] -3 -3 -3 -3 -3 -3 -3 -2
## [11,] -3 -3 -3 -3 -2 -2 -3 -3
## [12,] -3 -3 -3 -4 -4 -3 -6 -4
## [13,] 1 1 1 1 1 1 1 1
## Phoenix San.Antonio Seattle Washington WNBA
## [1,] -3 -3 -3 -3 -1
## [2,] -3 -3 -3 -3 -1
## [3,] -4 -3 -3 -3 -1
## [4,] -2 -3 -3 -4 -1
## [5,] -3 -3 -2 -4 -1
## [6,] -6 -3 -2 -3 -1
## [7,] -4 -3 -3 -6 -1
## [8,] -3 -2 -3 -4 -1
## [9,] 38 -3 -4 -3 -1
## [10,] -3 32 -4 -2 -1
## [11,] -4 -4 33 -3 -1
## [12,] -3 -2 -3 38 -1
## [13,] 1 1 1 1 1
# Find the rating vector the conventional way
r <- solve(M)%*%f
colnames(r) <- "Rating"
print(r)
## Rating
## Atlanta -4.012938e+00
## Chicago -5.156260e+00
## Connecticut 4.309525e+00
## Dallas -2.608129e+00
## Indiana -8.532958e+00
## Los.Angeles 7.850327e+00
## Minnesota 1.061241e+01
## New.York 2.541565e+00
## Phoenix 8.979110e-01
## San.Antonio -6.181574e+00
## Seattle -2.666953e-01
## Washington 5.468121e-01
## WNBA 1.043610e-14
# Find the rating vector using ginv
r <- ginv(M)%*%f
colnames(r) <- "Rating"
print(r)
## Rating
## [1,] -4.012938e+00
## [2,] -5.156260e+00
## [3,] 4.309525e+00
## [4,] -2.608129e+00
## [5,] -8.532958e+00
## [6,] 7.850327e+00
## [7,] 1.061241e+01
## [8,] 2.541565e+00
## [9,] 8.979110e-01
## [10,] -6.181574e+00
## [11,] -2.666953e-01
## [12,] 5.468121e-01
## [13,] 5.773160e-14
[Video]
Matrix-Vector Multiplications
Scalar Multiplication
c times vector \(\vec{x}\) Notation: \(c\vec{x}\)
[Video]
In data science, the term “big data” is generally referring to what with the term “big”?
# Print the first 6 observations of the dataset
head(combine)
## player position school year height weight forty vertical
## 1 Jaire Alexander CB Louisville 2018 71 192 4.38 35.0
## 2 Brian Allen C Michigan St. 2018 73 298 5.34 26.5
## 3 Mark Andrews TE Oklahoma 2018 77 256 4.67 31.0
## 4 Troy Apke S Penn St. 2018 74 198 4.34 41.0
## 5 Dorance Armstrong EDGE Kansas 2018 76 257 4.87 30.0
## 6 Ade Aruna DE Tulane 2018 78 262 4.60 38.5
## bench broad_jump three_cone shuttle
## 1 14 127 6.71 3.98
## 2 27 99 7.81 4.71
## 3 17 113 7.34 4.38
## 4 16 131 6.56 4.03
## 5 20 118 7.12 4.23
## 6 18 128 7.53 4.48
## drafted
## 1 Green Bay Packers / 1st / 18th pick / 2018
## 2 Los Angeles Rams / 4th / 111th pick / 2018
## 3 Baltimore Ravens / 3rd / 86th pick / 2018
## 4
## 5
## 6 Minnesota Vikings / 6th / 218th pick / 2018
# Find the correlation between variables forty and three_cone
cor(combine$forty, combine$three_cone)
## [1] 0.8315171
# Find the correlation between variables vertical and broad_jump
cor(combine$vertical, combine$broad_jump)
## [1] 0.8163375
Given the results of the previous parts of the exercise, what can you say about the dataset combine at this point?
forty and three_cone are the only redundant variables we’ve found so far.vertical and broad_jump are the only redundant variables we’ve found so far.[Video]
If the covariance between two columns of a matrix is positive and large, what can we say?
# Extract columns 5-12 of combine
A <- combine[, 5:12]
# Make A into a matrix
A <- as.matrix(A)
# Subtract the mean of each column
A[, 1] <- A[, 1] - mean(A[, 1])
A[, 2] <- A[, 2] - mean(A[, 2])
A[, 3] <- A[, 3] - mean(A[, 3])
A[, 4] <- A[, 4] - mean(A[, 4])
A[, 5] <- A[, 5] - mean(A[, 5])
A[, 6] <- A[, 6] - mean(A[, 6])
A[, 7] <- A[, 7] - mean(A[, 7])
A[, 8] <- A[, 8] - mean(A[, 8])
# Create matrix B from equation in instructions
B <- t(A)%*%A/(nrow(A) - 1)
# Compare 1st element of the 1st column of B to the variance of the first column of A
B[1,1]
## [1] 7.159794
var(A[, 1])
## [1] 7.159794
# Compare 1st element of 2nd column of B to the 1st element of the 2nd row of B to the covariance between the first two columns of A
B[1, 2]
## [1] 90.78808
B[2, 1]
## [1] 90.78808
cov(A[, 1], A[, 2])
## [1] 90.78808
# Find eigenvalues of B
V <- eigen(B)
# Print eigenvalues
V$values
## [1] 2.187628e+03 4.403246e+01 2.219205e+01 5.267129e+00 2.699702e+00
## [6] 6.317016e-02 1.480866e-02 1.307283e-02
The eigenvalues of B are, when rounding to four digits,
2187.6283 44.0325 22.1921 5.2671 2.6997 0.0632 0.0148 0.0131
Roughly how much of the variability in the dataset can be explained by the first principal component?
[Video]
# Scale columns 5-12 of combine
B <- scale(combine[, 5:12])
# Print the first 6 rows of the data
head(B)
## height weight forty vertical bench broad_jump
## [1,] -1.11844839 -1.30960025 -1.3435337 0.5624657 -1.1089286 1.45502476
## [2,] -0.37100257 1.00066356 1.6449741 -1.4281627 0.9238361 -1.49512459
## [3,] 1.12388907 0.08527601 -0.4407553 -0.3743006 -0.6398290 -0.02004991
## [4,] 0.00272034 -1.17883060 -1.4680548 1.9676151 -0.7961955 1.87647467
## [5,] 0.75016616 0.10707096 0.1818505 -0.6084922 -0.1707295 0.50676247
## [6,] 1.49761199 0.21604566 -0.6586673 1.3821362 -0.4834625 1.56038724
## three_cone shuttle
## [1,] -1.38083506 -1.5879750
## [2,] 1.16888714 1.1170258
## [3,] 0.07946038 -0.1057828
## [4,] -1.72852445 -1.4027010
## [5,] -0.43048406 -0.6616049
## [6,] 0.51986694 0.2647653
# Summarize the principal component analysis
summary(prcomp(B))
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.3679 0.9228 0.78904 0.61348 0.46811 0.37178 0.34834
## Proportion of Variance 0.7009 0.1064 0.07782 0.04704 0.02739 0.01728 0.01517
## Cumulative Proportion 0.7009 0.8073 0.88514 0.93218 0.95957 0.97685 0.99202
## PC8
## Standard deviation 0.25266
## Proportion of Variance 0.00798
## Cumulative Proportion 1.00000
# Subset combine only to "WR"
combine_WR <- subset(combine, position == "WR")
# Scale columns 5-12 of combine_WR
B <- scale(combine_WR[, 5:12])
# Print the first 6 rows of the data
head(B)
## height weight forty vertical bench broad_jump
## 7 1.4022982 0.88324903 1.20674474 -0.3430843 -0.3223377 0.07414249
## 17 0.5575402 -0.09700717 -0.80129388 -0.4969965 -0.7938424 -0.95388361
## 18 0.9799192 1.58343202 0.88968601 1.0421255 0.8564239 1.61618163
## 25 0.9799192 1.16332222 1.41811723 -1.5743819 -0.7938424 -1.29655897
## 29 -1.1319757 -1.56739147 -0.80129388 -0.1891721 -0.0865854 -1.29655897
## 46 0.1351613 0.11304773 0.04419607 0.2725645 -1.0295947 0.24548017
## three_cone shuttle
## 7 0.712845019 0.02833449
## 17 -1.098542478 0.84141123
## 18 -1.853287268 -1.46230619
## 25 -1.148858797 0.50262926
## 29 0.008416548 -0.64922946
## 46 0.109049187 0.84141123
# Summarize the principal component analysis
summary(prcomp(B))
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 1.5425 1.4255 1.0509 0.9603 0.77542 0.63867 0.59792
## Proportion of Variance 0.2974 0.2540 0.1380 0.1153 0.07516 0.05099 0.04469
## Cumulative Proportion 0.2974 0.5514 0.6894 0.8047 0.87987 0.93085 0.97554
## PC8
## Standard deviation 0.44235
## Proportion of Variance 0.02446
## Cumulative Proportion 1.00000
In the last exercise, you looked at the PCA analysis of just the wide receivers in the NFL combine data. The summaries of the PCA analysis for the whole combine dataset and the wide receiver subset are loaded as pca_summary and pca_summary_wr, respectively.
What is true about this data in relation to the dataset as a whole?
[Video]
Michael is a hybrid thinker and doer—a byproduct of being a StrengthsFinder “Learner” over time. With 20+ years of engineering, design, and product experience, he helps organizations identify market needs, mobilize internal and external resources, and deliver delightful digital customer experiences that align with business goals. He has been entrusted with problem-solving for brands—ranging from Fortune 500 companies to early-stage startups to not-for-profit organizations.
Michael earned his BS in Computer Science from New York Institute of Technology and his MBA from the University of Maryland, College Park. He is also a candidate to receive his MS in Applied Analytics from Columbia University.
LinkedIn | Twitter | www.michaelmallari.com/data | www.columbia.edu/~mm5470