R - Week2 Assignment

Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes

knitr::opts_chunk$set(echo = TRUE)
# Use the summary function to gain an overview of the data set. Then display the mean and
# median for at least two attributes.

dts<-read.csv("C:\\data\\datasets.csv")

summary1 <- summary(dts)
summary1

##       Package          Item    
##  Stat2Data:119   aids    :  2  
##  Ecdat    :104   channing:  2  
##  MASS     : 85   Cigar   :  2  
##  datasets : 72   Clothing:  2  
##  car      : 51   CO2     :  2  
##  boot     : 48   Crime   :  2  
##  (Other)  :448   (Other) :915  
##                                                                  Title    
##  Seven data sets showing a bifactor solution.                       :  8  
##  Individual Preferences Over Immigration Policy                     :  6  
##  John Snow's map and data on the 1854 London Cholera outbreak       :  5  
##  Automobile Data from 'Consumer Reports' 1990                       :  3  
##  Data from Minard's famous graphic map of Napoleon's march on Moscow:  3  
##  Diabetes in Pima Indian Women                                      :  3  
##  (Other)                                                            :899  
##                                                                                  csv     
##  https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/boot/acme.csv      :  1  
##  https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/boot/aids.csv      :  1  
##  https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/boot/aircondit.csv :  1  
##  https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/boot/aircondit7.csv:  1  
##  https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/boot/amis.csv      :  1  
##  https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/boot/aml.csv       :  1  
##  (Other)                                                                           :921  
##                                                                                   doc     
##  https://raw.github.com/vincentarelbundock/Rdatasets/master/doc/boot/acme.html      :  1  
##  https://raw.github.com/vincentarelbundock/Rdatasets/master/doc/boot/aids.html      :  1  
##  https://raw.github.com/vincentarelbundock/Rdatasets/master/doc/boot/aircondit.html :  1  
##  https://raw.github.com/vincentarelbundock/Rdatasets/master/doc/boot/aircondit7.html:  1  
##  https://raw.github.com/vincentarelbundock/Rdatasets/master/doc/boot/amis.html      :  1  
##  https://raw.github.com/vincentarelbundock/Rdatasets/master/doc/boot/aml.html       :  1  
##  (Other)                                                                            :921

Create a new data frame with a subset of the columns and rows. Make sure to rename it

knitr::opts_chunk$set(echo = TRUE)

PACKAGE <- dts[1]
ITEM <- dts[2]
DESCRIPTON <- dts[3]
subdts <- data.frame(PACKAGE,ITEM,DESCRIPTON)

subdts_car <- subset.data.frame(subdts, PACKAGE == 'car')

subdts_car

##     Package           Item
## 186     car      AMSsurvey
## 187     car          Adler
## 188     car         Angell
## 189     car       Anscombe
## 190     car        Baumann
## 191     car           Bfox
## 192     car      Blackmore
## 193     car           Burt
## 194     car         CanPop
## 195     car          Chile
## 196     car         Chirot
## 197     car         Cowles
## 198     car          Davis
## 199     car      DavisThin
## 200     car   Depredations
## 201     car         Duncan
## 202     car       Ericksen
## 203     car        Florida
## 204     car       Freedman
## 205     car       Friendly
## 206     car       Ginzberg
## 207     car         Greene
## 208     car          Guyer
## 209     car      Hartnagel
## 210     car       Highway1
## 211     car KosteckiDillon
## 212     car      Leinhardt
## 213     car           LoBD
## 214     car         Mandel
## 215     car      Migration
## 216     car          Moore
## 217     car           Mroz
## 218     car   OBrienKaiser
## 219     car       Ornstein
## 220     car        Pottery
## 221     car       Prestige
## 222     car        Quartet
## 223     car          Robey
## 224     car           SLID
## 225     car        Sahlins
## 226     car       Salaries
## 227     car          Soils
## 228     car         States
## 229     car       Transact
## 230     car             UN
## 231     car          USPop
## 232     car          Vocab
## 233     car     WeightLoss
## 234     car        Womenlf
## 235     car           Wong
## 236     car           Wool
##                                                                                      Title
## 186                                                      American Math Society Survey Data
## 187                                                              Experimenter Expectations
## 188                                                   Moral Integration of American Cities
## 189                                                 U. S. State Public-School Expenditures
## 190                                              Methods of Teaching Reading Comprehension
## 191                                            Canadian Women's Labour-Force Participation
## 192                           Exercise Histories of Eating-Disordered and Control Subjects
## 193                                           Fraudulent Data on IQs of Twins Raised Apart
## 194                                                               Canadian Population Data
## 195                                       Voting Intentions in the 1988 Chilean Plebiscite
## 196                                                    The 1907 Romanian Peasant Rebellion
## 197                                                Cowles and Davis's Data on Volunteering
## 198                                                      Self-Reports of Height and Weight
## 199                                                     Davis's Data on Drive for Thinness
## 200                                                        Minnesota Wolf Depredation Data
## 201                                                    Duncan's Occupational Prestige Data
## 202                                                        The 1980 U.S. Census Undercount
## 203                                                                  Florida County Voting
## 204                                         Crowding and Crime in U. S. Metropolitan Areas
## 205                                                               Format Effects on Recall
## 206                                                                     Data on Depression
## 207                                                                        Refugee Appeals
## 208                                                              Anonymity and Cooperation
## 209                                                       Canadian Crime-Rates Time Series
## 210                                                                      Highway Accidents
## 211                                                        Treatment of Migraine Headaches
## 212                                                               Data on Infant-Mortality
## 213 Cancer drug data use to provide an example of the use of the skew power distributions.
## 214                                                               Contrived Collinear Data
## 215                                                Canadian Interprovincial Migration Data
## 216                                               Status, Authoritarianism, and Conformity
## 217                                                 U.S. Women's Labor-Force Participation
## 218                                            O'Brien and Kaiser's Repeated-Measures Data
## 219                                   Interlocking Directorates Among Major Canadian Firms
## 220                                                        Chemical Composition of Pottery
## 221                                                       Prestige of Canadian Occupations
## 222                                                               Four Regression Datasets
## 223                                                            Fertility and Contraception
## 224                                                   Survey of Labour and Income Dynamics
## 225                                              Agricultural Production in Mazulu Village
## 226                                                                Salaries for Professors
## 227                             Soil Compositions of Physical and Chemical Characteristics
## 228                                   Education and Related Statistics for the U.S. States
## 229                                                                       Transaction data
## 230                                                               GDP and Infant Mortality
## 231                                                        Population of the United States
## 232                                                               Vocabulary and Education
## 233                                                                       Weight Loss Data
## 234                                            Canadian Women's Labour-Force Participation
## 235                                                               Post-Coma Recovery of IQ
## 236                                                                              Wool data

3. Create new column names for the new data frame

#knitr::opts_chunk$set(echo = TRUE)

#install.packages("stringr")
library(stringr)

colnames(subdts_car)<-c("PACKAGE","ITEM_NAME","DESCRIPTON")
subdts_car

##     PACKAGE      ITEM_NAME
## 186     car      AMSsurvey
## 187     car          Adler
## 188     car         Angell
## 189     car       Anscombe
## 190     car        Baumann
## 191     car           Bfox
## 192     car      Blackmore
## 193     car           Burt
## 194     car         CanPop
## 195     car          Chile
## 196     car         Chirot
## 197     car         Cowles
## 198     car          Davis
## 199     car      DavisThin
## 200     car   Depredations
## 201     car         Duncan
## 202     car       Ericksen
## 203     car        Florida
## 204     car       Freedman
## 205     car       Friendly
## 206     car       Ginzberg
## 207     car         Greene
## 208     car          Guyer
## 209     car      Hartnagel
## 210     car       Highway1
## 211     car KosteckiDillon
## 212     car      Leinhardt
## 213     car           LoBD
## 214     car         Mandel
## 215     car      Migration
## 216     car          Moore
## 217     car           Mroz
## 218     car   OBrienKaiser
## 219     car       Ornstein
## 220     car        Pottery
## 221     car       Prestige
## 222     car        Quartet
## 223     car          Robey
## 224     car           SLID
## 225     car        Sahlins
## 226     car       Salaries
## 227     car          Soils
## 228     car         States
## 229     car       Transact
## 230     car             UN
## 231     car          USPop
## 232     car          Vocab
## 233     car     WeightLoss
## 234     car        Womenlf
## 235     car           Wong
## 236     car           Wool
##                                                                                 DESCRIPTON
## 186                                                      American Math Society Survey Data
## 187                                                              Experimenter Expectations
## 188                                                   Moral Integration of American Cities
## 189                                                 U. S. State Public-School Expenditures
## 190                                              Methods of Teaching Reading Comprehension
## 191                                            Canadian Women's Labour-Force Participation
## 192                           Exercise Histories of Eating-Disordered and Control Subjects
## 193                                           Fraudulent Data on IQs of Twins Raised Apart
## 194                                                               Canadian Population Data
## 195                                       Voting Intentions in the 1988 Chilean Plebiscite
## 196                                                    The 1907 Romanian Peasant Rebellion
## 197                                                Cowles and Davis's Data on Volunteering
## 198                                                      Self-Reports of Height and Weight
## 199                                                     Davis's Data on Drive for Thinness
## 200                                                        Minnesota Wolf Depredation Data
## 201                                                    Duncan's Occupational Prestige Data
## 202                                                        The 1980 U.S. Census Undercount
## 203                                                                  Florida County Voting
## 204                                         Crowding and Crime in U. S. Metropolitan Areas
## 205                                                               Format Effects on Recall
## 206                                                                     Data on Depression
## 207                                                                        Refugee Appeals
## 208                                                              Anonymity and Cooperation
## 209                                                       Canadian Crime-Rates Time Series
## 210                                                                      Highway Accidents
## 211                                                        Treatment of Migraine Headaches
## 212                                                               Data on Infant-Mortality
## 213 Cancer drug data use to provide an example of the use of the skew power distributions.
## 214                                                               Contrived Collinear Data
## 215                                                Canadian Interprovincial Migration Data
## 216                                               Status, Authoritarianism, and Conformity
## 217                                                 U.S. Women's Labor-Force Participation
## 218                                            O'Brien and Kaiser's Repeated-Measures Data
## 219                                   Interlocking Directorates Among Major Canadian Firms
## 220                                                        Chemical Composition of Pottery
## 221                                                       Prestige of Canadian Occupations
## 222                                                               Four Regression Datasets
## 223                                                            Fertility and Contraception
## 224                                                   Survey of Labour and Income Dynamics
## 225                                              Agricultural Production in Mazulu Village
## 226                                                                Salaries for Professors
## 227                             Soil Compositions of Physical and Chemical Characteristics
## 228                                   Education and Related Statistics for the U.S. States
## 229                                                                       Transaction data
## 230                                                               GDP and Infant Mortality
## 231                                                        Population of the United States
## 232                                                               Vocabulary and Education
## 233                                                                       Weight Loss Data
## 234                                            Canadian Women's Labour-Force Participation
## 235                                                               Post-Coma Recovery of IQ
## 236                                                                              Wool data

4. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare

summary2 <- summary(subdts_car)

summary2

##      PACKAGE       ITEM_NAME 
##  car     :51   Adler    : 1  
##  boot    : 0   AMSsurvey: 1  
##  cluster : 0   Angell   : 1  
##  COUNT   : 0   Anscombe : 1  
##  datasets: 0   Baumann  : 1  
##  Ecdat   : 0   Bfox     : 1  
##  (Other) : 0   (Other)  :45  
##                                        DESCRIPTON
##  Canadian Women's Labour-Force Participation: 2  
##  Agricultural Production in Mazulu Village  : 1  
##  American Math Society Survey Data          : 1  
##  Anonymity and Cooperation                  : 1  
##  Canadian Crime-Rates Time Series           : 1  
##  Canadian Interprovincial Migration Data    : 1  
##  (Other)                                    :44

BONUS - place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.

latent.growth.data <- read.csv("https://github.com/jameskuruvilla/R-Assignments_MSDA/blob/master/R-Assignments/datasets.csv")

#latent.growth.data