a) Interprating the confusion matrix

In this example, the model correctly predicted for Tested Positive is 120 times and incorrectly predicted it 15 times. The model correctly predicted for Tested Negative is 50 times and incorrectly predicted it 10 times.

The following can be computed from this confusion matrix:

  • The model made 170 correct predictions (120 + 50).
  • The model made 25 incorrect predictions (15 + 10).
  • There are 195 total scored cases (120 + 15 + 10 + 50).
  • The error rate is 25/195 = 0.1282.
  • The overall accuracy rate is 170/195 = 0.8718.

b) EDA and Codebook

EDA

Description

  • An experiment was conducted to measure and compare the effectiveness of various feed supplements on the growth rate of chickens.

Format

  • A data frame with 71 observations on the following 2 variables.

    weight = a numeric variable giving the chick weight.

    feed = a factor giving the feed type.

Details - Newly hatched chicks were randomly allocated into six groups, and each group was given a different feed supplement. Their weights in grams after six weeks are given along with feed types.

Dataset of Chicken Weight by Feed Type(first 6 data)

##   weight      feed
## 1    179 horsebean
## 2    160 horsebean
## 3    136 horsebean
## 4    227 horsebean
## 5    217 horsebean
## 6    168 horsebean

Codebook

-Categorical Variables

## 
##    casein horsebean   linseed  meatmeal   soybean sunflower 
##        12        10        12        11        14        12
  • The summary for each feed type.
## chickwts$feed: casein
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   216.0   277.2   342.0   323.6   370.8   404.0 
## ------------------------------------------------------------ 
## chickwts$feed: horsebean
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   108.0   137.0   151.5   160.2   176.2   227.0 
## ------------------------------------------------------------ 
## chickwts$feed: linseed
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   141.0   178.0   221.0   218.8   257.8   309.0 
## ------------------------------------------------------------ 
## chickwts$feed: meatmeal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   153.0   249.5   263.0   276.9   320.0   380.0 
## ------------------------------------------------------------ 
## chickwts$feed: soybean
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   158.0   206.8   248.0   246.4   270.0   329.0 
## ------------------------------------------------------------ 
## chickwts$feed: sunflower
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   226.0   312.8   328.0   328.9   340.2   423.0
  • The standard deviation of the weights grouped by feed type.
## chickwts$feed: casein
## [1] 323.5833
## ------------------------------------------------------------ 
## chickwts$feed: horsebean
## [1] 160.2
## ------------------------------------------------------------ 
## chickwts$feed: linseed
## [1] 218.75
## ------------------------------------------------------------ 
## chickwts$feed: meatmeal
## [1] 276.9091
## ------------------------------------------------------------ 
## chickwts$feed: soybean
## [1] 246.4286
## ------------------------------------------------------------ 
## chickwts$feed: sunflower
## [1] 328.9167

c) Demonstration of dplyr for data manipulation

##    Year     Month Palestinians.Killed Israelis.Killed
## 1  2000  DECEMBER                  51               8
## 2  2000  NOVEMBER                 112              22
## 3  2000   OCTOBER                 104              10
## 4  2000 SEPTEMBER                  16               1
## 5  2001  DECEMBER                  67              36
## 6  2001  NOVEMBER                  39              14
## 7  2001   OCTOBER                  89              14
## 8  2001 SEPTEMBER                  59              13
## 9  2001    AUGUST                  37              26
## 10 2001      JULY                  32              10

i) Changing the existing column name.

Palestine1 <- dplyr :: rename(Palestine, "Palestinians_Killed" = Palestinians.Killed, "Israelis_Killed" = Israelis.Killed)
  • Changed the column name from Palestinians.Killed to Palestinians_Killed
  • Changed the column name from Israelis.Killed to Israelis_Killed
  • This changes used (dplyr::rename(dataframe,existing.name = new.name))

ii) Pick rows based on their values.

dplyr::filter(Palestine1, as.numeric(Palestinians_Killed) > 100)
##   Year    Month Palestinians_Killed Israelis_Killed
## 1 2000 NOVEMBER                 112              22
## 2 2000  OCTOBER                 104              10
  • Picked the row with Palestinians_Killed which are more than 100 by using (dplyr::filter(dataframe, condition))

iii) Add new columns to a data frame.

Palestine2 <- dplyr::mutate(Palestine1, Palestinian_Injured = as.numeric(Palestinians_Killed)*2)
  • Added a new columns named “Palestinian_Injured” and the data are (Palestinians_Killed*2)
  • by using _(dplyr::mutate(dataframe, new.variable = choosed.variable*2))_

iv) Combine data across two or more data frames

dplyr::bind_cols(Palestine2, PalestineCombine)
##    Year     Month Palestinians_Killed Israelis_Killed Palestinian_Injured
## 1  2000  DECEMBER                  51               8                 102
## 2  2000  NOVEMBER                 112              22                 224
## 3  2000   OCTOBER                 104              10                 208
## 4  2000 SEPTEMBER                  16               1                  32
## 5  2001  DECEMBER                  67              36                 134
## 6  2001  NOVEMBER                  39              14                  78
## 7  2001   OCTOBER                  89              14                 178
## 8  2001 SEPTEMBER                  59              13                 118
## 9  2001    AUGUST                  37              26                  74
## 10 2001      JULY                  32              10                  64
##    Israelis.Injuries
## 1                n/a
## 2                n/a
## 3                n/a
## 4                n/a
## 5                n/a
## 6                n/a
## 7                n/a
## 8                n/a
## 9                n/a
## 10               n/a
  • Combined 2 set of dataframe (Palestine2) & (PalestineCombine) into one dataframe by using (dplyr::bind_cols(dataframe1, dataframe2))