I used the “:” command to create a sequence x of numbers 2-9 Then I display the length of x
x = 2:10
x
## [1] 2 3 4 5 6 7 8 9 10
length(x)
## [1] 9
In this example I use seq() to create the same sequence. “seq(from,to)” instead of “start : end”
seq(2,10)
## [1] 2 3 4 5 6 7 8 9 10
I used the rnorm() function to generate a vector of 20 random normal values.
rnorm(20)
## [1] -0.03177165 0.24706403 -1.17940826 -0.16321583 0.59883704 0.53004176
## [7] 0.81720784 0.98077250 0.10380944 0.84273593 -0.49502044 -0.70402202
## [13] 1.38605278 2.81479273 -0.26258951 -0.49488177 -0.16289588 0.72066460
## [19] -0.88644209 0.80971229
I used set.seed() before calling the random function so that it will show the same 20 value vector.
set.seed(2021)
rnorm(20)
## [1] -0.12245998 0.55245663 0.34864950 0.35963224 0.89805369 -1.92256952
## [7] 0.26174436 0.91556637 0.01377194 1.72996316 -1.08220485 -0.27282518
## [13] 0.18199540 1.50854179 1.60447011 -1.84147561 1.62331021 0.13138902
## [19] 1.48112247 1.51331829
I used seq() to display a seq of pi. Along with a start and an end. I had to specify “length.out” equal to 50.
seq(-pi,pi,length.out=50)
## [1] -3.14159265 -3.01336438 -2.88513611 -2.75690784 -2.62867957 -2.50045130
## [7] -2.37222302 -2.24399475 -2.11576648 -1.98753821 -1.85930994 -1.73108167
## [13] -1.60285339 -1.47462512 -1.34639685 -1.21816858 -1.08994031 -0.96171204
## [19] -0.83348377 -0.70525549 -0.57702722 -0.44879895 -0.32057068 -0.19234241
## [25] -0.06411414 0.06411414 0.19234241 0.32057068 0.44879895 0.57702722
## [31] 0.70525549 0.83348377 0.96171204 1.08994031 1.21816858 1.34639685
## [37] 1.47462512 1.60285339 1.73108167 1.85930994 1.98753821 2.11576648
## [43] 2.24399475 2.37222302 2.50045130 2.62867957 2.75690784 2.88513611
## [49] 3.01336438 3.14159265
I was able to use the “rep(value, times)” function to replicate 0 10 times.
rep(0,10)
## [1] 0 0 0 0 0 0 0 0 0 0
I used “rep()” to repeat “NA” 10 times.
rep(NA,10)
## [1] NA NA NA NA NA NA NA NA NA NA
I used the “set.seed()” function to set the seed to 2021. I then created two vectors x,y. That both hold 100 of the random values of seed 2021. This graph is the result.
set.seed(2021)
x = rnorm(100)
y = rnorm(100)
plot(x,y)
I made the previous graph look prettier by adding a title, axis titles, and changing the plot points character and color.
set.seed(2021)
x = rnorm(100)
y = rnorm(100)
plot(x, y, main = "Scatter Plot of X and Y", xlab = "this is the x-axis", ylab = "this is the y-axis", pch = 9, col = "blue")
I first created a 4x4 matrix with values 1-16 arranged by row using the “matrix()” function.
my_matrix = matrix(1:16, nrow = 4, byrow = TRUE)
my_matrix
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 5 6 7 8
## [3,] 9 10 11 12
## [4,] 13 14 15 16
I used array notation to find the value “10”
value_10 = my_matrix[3, 2]
value_10
## [1] 10
This is the first row of the matrix
first_row = my_matrix[1,]
first_row
## [1] 1 2 3 4
This is the last column of the matrix
last_column = my_matrix[, 4]
last_column
## [1] 4 8 12 16
I created a sub-matrix of just the first two rows and the first two columns.
submatrix = my_matrix[1:2, 1:2]
submatrix
## [,1] [,2]
## [1,] 1 2
## [2,] 5 6
I used the “dim()” function to display the dimensions of my original matrix.
matrix_dimension = dim(my_matrix)
matrix_dimension
## [1] 4 4
I used the “library()” function to fetch the ISLR package. Then I proceeded to load the “wage” dataset. Lastly I can display the dimensions of “Wage” with the “dim()” function.
library(ISLR)
data("Wage")
dim(Wage)
## [1] 3000 11
####Note: Calling just “dim(wage)” works in rstudio but when rmd file is knitted it does not know what “wage” is
I used the “str()” function to show the variables of the dataset.
str(Wage)
## 'data.frame': 3000 obs. of 11 variables:
## $ year : int 2006 2004 2003 2003 2005 2008 2009 2008 2006 2004 ...
## $ age : int 18 24 45 43 50 54 44 30 41 52 ...
## $ maritl : Factor w/ 5 levels "1. Never Married",..: 1 1 2 2 4 2 2 1 1 2 ...
## $ race : Factor w/ 4 levels "1. White","2. Black",..: 1 1 1 3 1 1 4 3 2 1 ...
## $ education : Factor w/ 5 levels "1. < HS Grad",..: 1 4 3 4 2 4 3 3 3 2 ...
## $ region : Factor w/ 9 levels "1. New England",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ jobclass : Factor w/ 2 levels "1. Industrial",..: 1 2 1 2 2 2 1 2 2 2 ...
## $ health : Factor w/ 2 levels "1. <=Good","2. >=Very Good": 1 2 1 2 1 2 2 1 2 2 ...
## $ health_ins: Factor w/ 2 levels "1. Yes","2. No": 2 2 1 1 1 1 1 1 1 1 ...
## $ logwage : num 4.32 4.26 4.88 5.04 4.32 ...
## $ wage : num 75 70.5 131 154.7 75 ...
I used the “summary(object)” function to show the summary information of all the variables in the dataset.”summary() is a generic function.
summary(Wage)
## year age maritl race
## Min. :2003 Min. :18.00 1. Never Married: 648 1. White:2480
## 1st Qu.:2004 1st Qu.:33.75 2. Married :2074 2. Black: 293
## Median :2006 Median :42.00 3. Widowed : 19 3. Asian: 190
## Mean :2006 Mean :42.41 4. Divorced : 204 4. Other: 37
## 3rd Qu.:2008 3rd Qu.:51.00 5. Separated : 55
## Max. :2009 Max. :80.00
##
## education region jobclass
## 1. < HS Grad :268 2. Middle Atlantic :3000 1. Industrial :1544
## 2. HS Grad :971 1. New England : 0 2. Information:1456
## 3. Some College :650 3. East North Central: 0
## 4. College Grad :685 4. West North Central: 0
## 5. Advanced Degree:426 5. South Atlantic : 0
## 6. East South Central: 0
## (Other) : 0
## health health_ins logwage wage
## 1. <=Good : 858 1. Yes:2083 Min. :3.000 Min. : 20.09
## 2. >=Very Good:2142 2. No : 917 1st Qu.:4.447 1st Qu.: 85.38
## Median :4.653 Median :104.92
## Mean :4.654 Mean :111.70
## 3rd Qu.:4.857 3rd Qu.:128.68
## Max. :5.763 Max. :318.34
##
I used the “quantile(x,probs)” function to get the 90th percentile for ‘wage’ in the “Wage” dataset.
quantile(Wage$wage, 0.9)
## 90%
## 154.7036
I used the “table()” function to show a table of the education of workers whose wage is above the 90th percentile. Table uses cross-classifying factors to build a contingency table of the counts at each combination of factor levels.
table(Wage[Wage$wage > quantile(Wage$wage, 0.90), "education"])
##
## 1. < HS Grad 2. HS Grad 3. Some College 4. College Grad
## 0 18 28 105
## 5. Advanced Degree
## 149
I used the “plot()” function and it reveals that there isn’t much of a correlation with wage to other variables. The only ones showing any correlation is wage to logwage and education to wage.
plot(Wage)
This last graph shows that there is definetely a correlation with education and wage. The more education a person has will likely lead them to having a higher wage.
plot(Wage$education, Wage$wage,
main = "Plot of Education vs Wage",
xlab = "Education",
ylab = "Wage")