Sequences in R

1.

I used the “:” command to create a sequence x of numbers 2-9 Then I display the length of x

x = 2:10
x
## [1]  2  3  4  5  6  7  8  9 10
length(x)
## [1] 9

2.

In this example I use seq() to create the same sequence. “seq(from,to)” instead of “start : end”

seq(2,10)
## [1]  2  3  4  5  6  7  8  9 10

3.

I used the rnorm() function to generate a vector of 20 random normal values.

rnorm(20)
##  [1] -0.03177165  0.24706403 -1.17940826 -0.16321583  0.59883704  0.53004176
##  [7]  0.81720784  0.98077250  0.10380944  0.84273593 -0.49502044 -0.70402202
## [13]  1.38605278  2.81479273 -0.26258951 -0.49488177 -0.16289588  0.72066460
## [19] -0.88644209  0.80971229

4.

I used set.seed() before calling the random function so that it will show the same 20 value vector.

set.seed(2021)
rnorm(20)
##  [1] -0.12245998  0.55245663  0.34864950  0.35963224  0.89805369 -1.92256952
##  [7]  0.26174436  0.91556637  0.01377194  1.72996316 -1.08220485 -0.27282518
## [13]  0.18199540  1.50854179  1.60447011 -1.84147561  1.62331021  0.13138902
## [19]  1.48112247  1.51331829

5.

I used seq() to display a seq of pi. Along with a start and an end. I had to specify “length.out” equal to 50.

seq(-pi,pi,length.out=50)
##  [1] -3.14159265 -3.01336438 -2.88513611 -2.75690784 -2.62867957 -2.50045130
##  [7] -2.37222302 -2.24399475 -2.11576648 -1.98753821 -1.85930994 -1.73108167
## [13] -1.60285339 -1.47462512 -1.34639685 -1.21816858 -1.08994031 -0.96171204
## [19] -0.83348377 -0.70525549 -0.57702722 -0.44879895 -0.32057068 -0.19234241
## [25] -0.06411414  0.06411414  0.19234241  0.32057068  0.44879895  0.57702722
## [31]  0.70525549  0.83348377  0.96171204  1.08994031  1.21816858  1.34639685
## [37]  1.47462512  1.60285339  1.73108167  1.85930994  1.98753821  2.11576648
## [43]  2.24399475  2.37222302  2.50045130  2.62867957  2.75690784  2.88513611
## [49]  3.01336438  3.14159265

6.

I was able to use the “rep(value, times)” function to replicate 0 10 times.

rep(0,10)
##  [1] 0 0 0 0 0 0 0 0 0 0

7.

I used “rep()” to repeat “NA” 10 times.

rep(NA,10)
##  [1] NA NA NA NA NA NA NA NA NA NA

8.

I used the “set.seed()” function to set the seed to 2021. I then created two vectors x,y. That both hold 100 of the random values of seed 2021. This graph is the result.

set.seed(2021)
x = rnorm(100)
y = rnorm(100)
plot(x,y)

9.

I made the previous graph look prettier by adding a title, axis titles, and changing the plot points character and color.

set.seed(2021)
x = rnorm(100)
y = rnorm(100)
plot(x, y, main = "Scatter Plot of X and Y", xlab = "this is the x-axis", ylab = "this is the y-axis", pch = 9, col = "blue")

10.

I first created a 4x4 matrix with values 1-16 arranged by row using the “matrix()” function.

my_matrix = matrix(1:16, nrow = 4, byrow = TRUE)
my_matrix
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
## [3,]    9   10   11   12
## [4,]   13   14   15   16

I used array notation to find the value “10”

value_10 = my_matrix[3, 2]
value_10
## [1] 10

This is the first row of the matrix

first_row = my_matrix[1,]
first_row
## [1] 1 2 3 4

This is the last column of the matrix

last_column = my_matrix[, 4]
last_column
## [1]  4  8 12 16

I created a sub-matrix of just the first two rows and the first two columns.

submatrix = my_matrix[1:2, 1:2]
submatrix
##      [,1] [,2]
## [1,]    1    2
## [2,]    5    6

I used the “dim()” function to display the dimensions of my original matrix.

matrix_dimension = dim(my_matrix)
matrix_dimension
## [1] 4 4

11.

I used the “library()” function to fetch the ISLR package. Then I proceeded to load the “wage” dataset. Lastly I can display the dimensions of “Wage” with the “dim()” function.

library(ISLR)
data("Wage")
dim(Wage)
## [1] 3000   11

####Note: Calling just “dim(wage)” works in rstudio but when rmd file is knitted it does not know what “wage” is

12.

I used the “str()” function to show the variables of the dataset.

str(Wage)
## 'data.frame':    3000 obs. of  11 variables:
##  $ year      : int  2006 2004 2003 2003 2005 2008 2009 2008 2006 2004 ...
##  $ age       : int  18 24 45 43 50 54 44 30 41 52 ...
##  $ maritl    : Factor w/ 5 levels "1. Never Married",..: 1 1 2 2 4 2 2 1 1 2 ...
##  $ race      : Factor w/ 4 levels "1. White","2. Black",..: 1 1 1 3 1 1 4 3 2 1 ...
##  $ education : Factor w/ 5 levels "1. < HS Grad",..: 1 4 3 4 2 4 3 3 3 2 ...
##  $ region    : Factor w/ 9 levels "1. New England",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ jobclass  : Factor w/ 2 levels "1. Industrial",..: 1 2 1 2 2 2 1 2 2 2 ...
##  $ health    : Factor w/ 2 levels "1. <=Good","2. >=Very Good": 1 2 1 2 1 2 2 1 2 2 ...
##  $ health_ins: Factor w/ 2 levels "1. Yes","2. No": 2 2 1 1 1 1 1 1 1 1 ...
##  $ logwage   : num  4.32 4.26 4.88 5.04 4.32 ...
##  $ wage      : num  75 70.5 131 154.7 75 ...

13.

I used the “summary(object)” function to show the summary information of all the variables in the dataset.”summary() is a generic function.

summary(Wage)
##       year           age                     maritl           race     
##  Min.   :2003   Min.   :18.00   1. Never Married: 648   1. White:2480  
##  1st Qu.:2004   1st Qu.:33.75   2. Married      :2074   2. Black: 293  
##  Median :2006   Median :42.00   3. Widowed      :  19   3. Asian: 190  
##  Mean   :2006   Mean   :42.41   4. Divorced     : 204   4. Other:  37  
##  3rd Qu.:2008   3rd Qu.:51.00   5. Separated    :  55                  
##  Max.   :2009   Max.   :80.00                                          
##                                                                        
##               education                     region               jobclass   
##  1. < HS Grad      :268   2. Middle Atlantic   :3000   1. Industrial :1544  
##  2. HS Grad        :971   1. New England       :   0   2. Information:1456  
##  3. Some College   :650   3. East North Central:   0                        
##  4. College Grad   :685   4. West North Central:   0                        
##  5. Advanced Degree:426   5. South Atlantic    :   0                        
##                           6. East South Central:   0                        
##                           (Other)              :   0                        
##             health      health_ins      logwage           wage       
##  1. <=Good     : 858   1. Yes:2083   Min.   :3.000   Min.   : 20.09  
##  2. >=Very Good:2142   2. No : 917   1st Qu.:4.447   1st Qu.: 85.38  
##                                      Median :4.653   Median :104.92  
##                                      Mean   :4.654   Mean   :111.70  
##                                      3rd Qu.:4.857   3rd Qu.:128.68  
##                                      Max.   :5.763   Max.   :318.34  
## 

14.

I used the “quantile(x,probs)” function to get the 90th percentile for ‘wage’ in the “Wage” dataset.

quantile(Wage$wage, 0.9)
##      90% 
## 154.7036

15.

I used the “table()” function to show a table of the education of workers whose wage is above the 90th percentile. Table uses cross-classifying factors to build a contingency table of the counts at each combination of factor levels.

table(Wage[Wage$wage > quantile(Wage$wage, 0.90), "education"])
## 
##       1. < HS Grad         2. HS Grad    3. Some College    4. College Grad 
##                  0                 18                 28                105 
## 5. Advanced Degree 
##                149

16.

I used the “plot()” function and it reveals that there isn’t much of a correlation with wage to other variables. The only ones showing any correlation is wage to logwage and education to wage.

plot(Wage)

17.

This last graph shows that there is definetely a correlation with education and wage. The more education a person has will likely lead them to having a higher wage.

plot(Wage$education, Wage$wage, 
     main = "Plot of Education vs Wage",
     xlab = "Education",
     ylab = "Wage")