Q1 Create three vectors “name”, “height” and “weight” and create a dataframe using these with class name class

name <- c('a','b','c','d','e')
height <- c(58,62,67,55,63)
weight <- c(158,162,167,155,163)
class <- data.frame(name,height,weight)
print(class)
##   name height weight
## 1    a     58    158
## 2    b     62    162
## 3    c     67    167
## 4    d     55    155
## 5    e     63    163

Q2 Load “cars” data (which is by default in R) and calculate:

2.1) Calculate the correlation coefficient value of ‘speed’ and ‘dist’

data(cars)
print(cor(cars['speed'],cars['dist']))
##            dist
## speed 0.8068949

2.2) Calculate the correlation coefficient value of ‘speed’ and ‘dist’

print(cor(cars))
##           speed      dist
## speed 1.0000000 0.8068949
## dist  0.8068949 1.0000000

Q3 Load “iris” data (which is by default in R) and perform following tasks:

3.1) Show the 3rd column only

data(iris)

print(iris[,3])
##   [1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 1.6 1.4 1.1 1.2 1.5 1.3 1.4
##  [19] 1.7 1.5 1.7 1.5 1.0 1.7 1.9 1.6 1.6 1.5 1.4 1.6 1.6 1.5 1.5 1.4 1.5 1.2
##  [37] 1.3 1.4 1.3 1.5 1.3 1.3 1.3 1.6 1.9 1.4 1.6 1.4 1.5 1.4 4.7 4.5 4.9 4.0
##  [55] 4.6 4.5 4.7 3.3 4.6 3.9 3.5 4.2 4.0 4.7 3.6 4.4 4.5 4.1 4.5 3.9 4.8 4.0
##  [73] 4.9 4.7 4.3 4.4 4.8 5.0 4.5 3.5 3.8 3.7 3.9 5.1 4.5 4.5 4.7 4.4 4.1 4.0
##  [91] 4.4 4.6 4.0 3.3 4.2 4.2 4.2 4.3 3.0 4.1 6.0 5.1 5.9 5.6 5.8 6.6 4.5 6.3
## [109] 5.8 6.1 5.1 5.3 5.5 5.0 5.1 5.3 5.5 6.7 6.9 5.0 5.7 4.9 6.7 4.9 5.7 6.0
## [127] 4.8 4.9 5.6 5.8 6.1 6.4 5.6 5.1 5.6 6.1 5.6 5.5 4.8 5.4 5.6 5.1 5.1 5.9
## [145] 5.7 5.2 5.0 5.2 5.4 5.1

3.2) Show first 3 columns of the 4 th row

print(iris[4,1:3])
##   Sepal.Length Sepal.Width Petal.Length
## 4          4.6         3.1          1.5

3.3) Show last 6 rows

print(tail(iris,6))
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 145          6.7         3.3          5.7         2.5 virginica
## 146          6.7         3.0          5.2         2.3 virginica
## 147          6.3         2.5          5.0         1.9 virginica
## 148          6.5         3.0          5.2         2.0 virginica
## 149          6.2         3.4          5.4         2.3 virginica
## 150          5.9         3.0          5.1         1.8 virginica

3.4) Save “Sepal.Length” variable as a vector named as “slength”

slength <- iris$Sepal.Length

3.5) Trim the length of the “slength” vector as 10 (records) and Calculate the average of “slength”

70 Elements would be cut from either end and hence we will get 10 elements

Trimming <- Trim(slength,70)
mean(Trimming)
## [1] 5.77

Q4 Use “iris” data to draw the following graphs:

4.1) Draw a Bar plot with “Species” variable. Label X axis as ‘Species’ and Y axis as ‘Frequency’. Give chart title as ‘Bar plot for species’. And give different colors for the bars.
ggplot(iris,aes(Species,fill=Species)) + geom_bar() + xlab("Species") + ylab("Frequency") + labs(title="Bar plot for species")

4.2) Draw a Pie chart with “Species” variable. Give chart title as ‘Pie chart for species’. And give different colors for the pies.

pie(table(iris$Species), main = "Pie chart for species", radius = 1)

4.3)Draw a multiple correlation plot (pairs plot) for four variables ‘Sepal.Length’,‘Sepal.Width’, ‘Petal.Length’ and ‘Petal.Width’ of the “iris” data.

pairs(iris[,1:4],col="blue",main="Correlation Plot")

Q5. Run a simple liner regression on “cars” data. ‘dist’ is the dependent variable and ‘speed’ as independent variable. Save the regression model in a named object “car.linreg” and show the summary statistics of the model.

Give the R-square and adjusted R-square values.

car.linreg <- lm(dist~speed,data=cars)
summary(car.linreg)
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12
  • The Adjusted RSquared Value is 0.6511 and Adjusted R-Squared is 0.6438

  • The adjusted p value is p-value: 1.49e-12 signifying the variable speed is highly significant for the prediction and hence model is stastically significant