Assignment 1

1.What are the measures of central tendency and variation of data?
Mean (the average value) and Median (middle value on a sorted list of data), are the measures for the central tendency of data.
Although, Mean is the basic measure for central tendency, it could be sensitive to outliers. In that case Median or trimmed Mean (after dropping outliers) is more robust measure.

Variation of data is measured by
* Variance (average of squared deviation, which is difference between Mean and observed data)
* Standard Deviation (square root of the variance).
* Mean absolute deviation (average of absolute value of difference between mean and observed data)
All of these measures are sensitive to Outliers.

A robust estimate of Variability can be measured by either one of the below :
* MAD (Median absolute Deviation:(average of absolute value of difference between Median and observed data)
* IQR (Interquartile Range: difference between 25th percentile and 75th percentile)
* Trimmed Standard Deviation (Standard Deviation after dropping outliers)

2. What are the different ways to create a vector in R?

#1) Vector creation for Numeric values
A <- c(100,101,102)
A <- c(C1=100,C2=101,C3=102)  # Vector creation with column labels
A

##  C1  C2  C3 
## 100 101 102

B <- 1:10
B

##  [1]  1  2  3  4  5  6  7  8  9 10

C <- seq(1,10,by=0.5)
C

##  [1]  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5
## [15]  8.0  8.5  9.0  9.5 10.0

#2) Vector creation for String values
D <- c("red","blue","green")
D

## [1] "red"   "blue"  "green"

#3)  Vector creation by reading a csv,txt,xlsx,fwf files. 

# 3a) Example of reading from csv files
data <- url("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data")
iris <- read.csv(data, header=FALSE)
head(iris)

##    V1  V2  V3  V4          V5
## 1 5.1 3.5 1.4 0.2 Iris-setosa
## 2 4.9 3.0 1.4 0.2 Iris-setosa
## 3 4.7 3.2 1.3 0.2 Iris-setosa
## 4 4.6 3.1 1.5 0.2 Iris-setosa
## 5 5.0 3.6 1.4 0.2 Iris-setosa
## 6 5.4 3.9 1.7 0.4 Iris-setosa

#4) Reading from data saved in workspaces, using load() function
#5) Reading from Apache style log files using read_log() function

3. Create the following vector and check the class (‘x’,’x’, ‘x’, 1,3,5,7,9,2,4,6,8,10)

A <- c('x','x', 'x', 1,3,5,7,9,2,4,6,8,10)
class(A)

## [1] "character"

4. Create a vector of positive odd integers less than 100

# 4. Creating odd integers vector
A <- 1:100
A <- A[A %% 2 != 0]
A

##  [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45
## [24] 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91
## [47] 93 95 97 99

5. Remove the values greater than 60 and less than 80

# 5. Removing values greater than 80 and less than 60
A <- A[A > 60]
A <- A[A < 80]
A

##  [1] 61 63 65 67 69 71 73 75 77 79

6. Write a function to return standard deviation, mean, and median of the vector from Question 5.

# 6. Calculating standard deviation, mean, and median of the vector

meanMedianStdDev <- function(A1){
  c(Mean=mean(A1), Median=median(A1), Standard_Deviation=sd(A1))
}
meanMedianStdDev(A)

##               Mean             Median Standard_Deviation 
##          70.000000          70.000000           6.055301

7. Create two matrices of the form from the given set of numbers in two ways X1 = {2,3,7,1,6,2,3,5,1} and x2 = {3,2,9,0,7,8,5,8,2}

X1 = c(2,3,7,1,6,2,3,5,1)
X2 = c(3,2,9,0,7,8,5,8,2)
#Matrix for X1 created by populating column first
matX1 = matrix(X1,3)
matX1

##      [,1] [,2] [,3]
## [1,]    2    1    3
## [2,]    3    6    5
## [3,]    7    2    1

#Matrix for X1 created by populating row first
matX1 = matrix(X1,3, byrow=TRUE)
matX1

##      [,1] [,2] [,3]
## [1,]    2    3    7
## [2,]    1    6    2
## [3,]    3    5    1

#Matrix for X2 created by populating column first
matX2 = matrix(X2,3)
matX2

##      [,1] [,2] [,3]
## [1,]    3    0    5
## [2,]    2    7    8
## [3,]    9    8    2

#Matrix for X2 created by populating row first
matX2 = matrix(X2,3, byrow=TRUE)
matX2

##      [,1] [,2] [,3]
## [1,]    3    2    9
## [2,]    0    7    8
## [3,]    5    8    2

8. Find the matrix product

# Matrix multiplication
matX1 %*% matX2

##      [,1] [,2] [,3]
## [1,]   41   81   56
## [2,]   13   60   61
## [3,]   14   49   69

9. Find the class of ‘iris’ dataframe, find the class of all the columns of ‘iris’ get the summary. Get rownames, get column names. Get the number of rows and number of columns.

remove(iris)
#install.packages("plyr")
library("plyr")
#head(iris)
# Class for all columns of iris 
class(iris)

## [1] "data.frame"

# Class for each column of iris
paste( "class for colnames:" 
       ,class(iris$Sepal.Length)
       ,class(iris$Sepal.Width)
       ,class(iris$Petal.Length)
       ,class(iris$Petal.Width)
       ,class(iris$Species))

## [1] "class for colnames: numeric numeric numeric numeric factor"

#Summary for iris
summary(iris)

##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
##

#Column names for iris
colnames(iris)

## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
## [5] "Species"

#Row names for iris
rownames(iris)

##   [1] "1"   "2"   "3"   "4"   "5"   "6"   "7"   "8"   "9"   "10"  "11" 
##  [12] "12"  "13"  "14"  "15"  "16"  "17"  "18"  "19"  "20"  "21"  "22" 
##  [23] "23"  "24"  "25"  "26"  "27"  "28"  "29"  "30"  "31"  "32"  "33" 
##  [34] "34"  "35"  "36"  "37"  "38"  "39"  "40"  "41"  "42"  "43"  "44" 
##  [45] "45"  "46"  "47"  "48"  "49"  "50"  "51"  "52"  "53"  "54"  "55" 
##  [56] "56"  "57"  "58"  "59"  "60"  "61"  "62"  "63"  "64"  "65"  "66" 
##  [67] "67"  "68"  "69"  "70"  "71"  "72"  "73"  "74"  "75"  "76"  "77" 
##  [78] "78"  "79"  "80"  "81"  "82"  "83"  "84"  "85"  "86"  "87"  "88" 
##  [89] "89"  "90"  "91"  "92"  "93"  "94"  "95"  "96"  "97"  "98"  "99" 
## [100] "100" "101" "102" "103" "104" "105" "106" "107" "108" "109" "110"
## [111] "111" "112" "113" "114" "115" "116" "117" "118" "119" "120" "121"
## [122] "122" "123" "124" "125" "126" "127" "128" "129" "130" "131" "132"
## [133] "133" "134" "135" "136" "137" "138" "139" "140" "141" "142" "143"
## [144] "144" "145" "146" "147" "148" "149" "150"

# Number of Rows
NROW(iris)

## [1] 150

# Number of Columns
NCOL(iris)

## [1] 5

10. Get the last two rows in the last 2 columns from iris dataset.

iris[149:150,4:5]

##     Petal.Width   Species
## 149         2.3 virginica
## 150         1.8 virginica

#Last 2 rows and columns when number of rows and columns are unknown
iris[(NROW(iris)-1):NROW(iris),(NCOL(iris)-1):NCOL(iris)]

##     Petal.Width   Species
## 149         2.3 virginica
## 150         1.8 virginica

Assignment 1

Monika Bansal

7/13/2018