Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes
knitr::opts_chunk$set(echo = TRUE)
# Use the summary function to gain an overview of the data set. Then display the mean and
# median for at least two attributes.
dts<-read.csv("C:\\data\\datasets.csv")
summary1 <- summary(dts)
summary1
## Package Item
## Stat2Data:119 aids : 2
## Ecdat :104 channing: 2
## MASS : 85 Cigar : 2
## datasets : 72 Clothing: 2
## car : 51 CO2 : 2
## boot : 48 Crime : 2
## (Other) :448 (Other) :915
## Title
## Seven data sets showing a bifactor solution. : 8
## Individual Preferences Over Immigration Policy : 6
## John Snow's map and data on the 1854 London Cholera outbreak : 5
## Automobile Data from 'Consumer Reports' 1990 : 3
## Data from Minard's famous graphic map of Napoleon's march on Moscow: 3
## Diabetes in Pima Indian Women : 3
## (Other) :899
## csv
## https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/boot/acme.csv : 1
## https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/boot/aids.csv : 1
## https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/boot/aircondit.csv : 1
## https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/boot/aircondit7.csv: 1
## https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/boot/amis.csv : 1
## https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/boot/aml.csv : 1
## (Other) :921
## doc
## https://raw.github.com/vincentarelbundock/Rdatasets/master/doc/boot/acme.html : 1
## https://raw.github.com/vincentarelbundock/Rdatasets/master/doc/boot/aids.html : 1
## https://raw.github.com/vincentarelbundock/Rdatasets/master/doc/boot/aircondit.html : 1
## https://raw.github.com/vincentarelbundock/Rdatasets/master/doc/boot/aircondit7.html: 1
## https://raw.github.com/vincentarelbundock/Rdatasets/master/doc/boot/amis.html : 1
## https://raw.github.com/vincentarelbundock/Rdatasets/master/doc/boot/aml.html : 1
## (Other) :921
Create a new data frame with a subset of the columns and rows. Make sure to rename it
knitr::opts_chunk$set(echo = TRUE)
PACKAGE <- dts[1]
ITEM <- dts[2]
DESCRIPTON <- dts[3]
subdts <- data.frame(PACKAGE,ITEM,DESCRIPTON)
subdts_car <- subset.data.frame(subdts, PACKAGE == 'car')
subdts_car
## Package Item
## 186 car AMSsurvey
## 187 car Adler
## 188 car Angell
## 189 car Anscombe
## 190 car Baumann
## 191 car Bfox
## 192 car Blackmore
## 193 car Burt
## 194 car CanPop
## 195 car Chile
## 196 car Chirot
## 197 car Cowles
## 198 car Davis
## 199 car DavisThin
## 200 car Depredations
## 201 car Duncan
## 202 car Ericksen
## 203 car Florida
## 204 car Freedman
## 205 car Friendly
## 206 car Ginzberg
## 207 car Greene
## 208 car Guyer
## 209 car Hartnagel
## 210 car Highway1
## 211 car KosteckiDillon
## 212 car Leinhardt
## 213 car LoBD
## 214 car Mandel
## 215 car Migration
## 216 car Moore
## 217 car Mroz
## 218 car OBrienKaiser
## 219 car Ornstein
## 220 car Pottery
## 221 car Prestige
## 222 car Quartet
## 223 car Robey
## 224 car SLID
## 225 car Sahlins
## 226 car Salaries
## 227 car Soils
## 228 car States
## 229 car Transact
## 230 car UN
## 231 car USPop
## 232 car Vocab
## 233 car WeightLoss
## 234 car Womenlf
## 235 car Wong
## 236 car Wool
## Title
## 186 American Math Society Survey Data
## 187 Experimenter Expectations
## 188 Moral Integration of American Cities
## 189 U. S. State Public-School Expenditures
## 190 Methods of Teaching Reading Comprehension
## 191 Canadian Women's Labour-Force Participation
## 192 Exercise Histories of Eating-Disordered and Control Subjects
## 193 Fraudulent Data on IQs of Twins Raised Apart
## 194 Canadian Population Data
## 195 Voting Intentions in the 1988 Chilean Plebiscite
## 196 The 1907 Romanian Peasant Rebellion
## 197 Cowles and Davis's Data on Volunteering
## 198 Self-Reports of Height and Weight
## 199 Davis's Data on Drive for Thinness
## 200 Minnesota Wolf Depredation Data
## 201 Duncan's Occupational Prestige Data
## 202 The 1980 U.S. Census Undercount
## 203 Florida County Voting
## 204 Crowding and Crime in U. S. Metropolitan Areas
## 205 Format Effects on Recall
## 206 Data on Depression
## 207 Refugee Appeals
## 208 Anonymity and Cooperation
## 209 Canadian Crime-Rates Time Series
## 210 Highway Accidents
## 211 Treatment of Migraine Headaches
## 212 Data on Infant-Mortality
## 213 Cancer drug data use to provide an example of the use of the skew power distributions.
## 214 Contrived Collinear Data
## 215 Canadian Interprovincial Migration Data
## 216 Status, Authoritarianism, and Conformity
## 217 U.S. Women's Labor-Force Participation
## 218 O'Brien and Kaiser's Repeated-Measures Data
## 219 Interlocking Directorates Among Major Canadian Firms
## 220 Chemical Composition of Pottery
## 221 Prestige of Canadian Occupations
## 222 Four Regression Datasets
## 223 Fertility and Contraception
## 224 Survey of Labour and Income Dynamics
## 225 Agricultural Production in Mazulu Village
## 226 Salaries for Professors
## 227 Soil Compositions of Physical and Chemical Characteristics
## 228 Education and Related Statistics for the U.S. States
## 229 Transaction data
## 230 GDP and Infant Mortality
## 231 Population of the United States
## 232 Vocabulary and Education
## 233 Weight Loss Data
## 234 Canadian Women's Labour-Force Participation
## 235 Post-Coma Recovery of IQ
## 236 Wool data
3. Create new column names for the new data frame
#knitr::opts_chunk$set(echo = TRUE)
#install.packages("stringr")
library(stringr)
colnames(subdts_car)<-c("PACKAGE","ITEM_NAME","DESCRIPTON")
subdts_car
## PACKAGE ITEM_NAME
## 186 car AMSsurvey
## 187 car Adler
## 188 car Angell
## 189 car Anscombe
## 190 car Baumann
## 191 car Bfox
## 192 car Blackmore
## 193 car Burt
## 194 car CanPop
## 195 car Chile
## 196 car Chirot
## 197 car Cowles
## 198 car Davis
## 199 car DavisThin
## 200 car Depredations
## 201 car Duncan
## 202 car Ericksen
## 203 car Florida
## 204 car Freedman
## 205 car Friendly
## 206 car Ginzberg
## 207 car Greene
## 208 car Guyer
## 209 car Hartnagel
## 210 car Highway1
## 211 car KosteckiDillon
## 212 car Leinhardt
## 213 car LoBD
## 214 car Mandel
## 215 car Migration
## 216 car Moore
## 217 car Mroz
## 218 car OBrienKaiser
## 219 car Ornstein
## 220 car Pottery
## 221 car Prestige
## 222 car Quartet
## 223 car Robey
## 224 car SLID
## 225 car Sahlins
## 226 car Salaries
## 227 car Soils
## 228 car States
## 229 car Transact
## 230 car UN
## 231 car USPop
## 232 car Vocab
## 233 car WeightLoss
## 234 car Womenlf
## 235 car Wong
## 236 car Wool
## DESCRIPTON
## 186 American Math Society Survey Data
## 187 Experimenter Expectations
## 188 Moral Integration of American Cities
## 189 U. S. State Public-School Expenditures
## 190 Methods of Teaching Reading Comprehension
## 191 Canadian Women's Labour-Force Participation
## 192 Exercise Histories of Eating-Disordered and Control Subjects
## 193 Fraudulent Data on IQs of Twins Raised Apart
## 194 Canadian Population Data
## 195 Voting Intentions in the 1988 Chilean Plebiscite
## 196 The 1907 Romanian Peasant Rebellion
## 197 Cowles and Davis's Data on Volunteering
## 198 Self-Reports of Height and Weight
## 199 Davis's Data on Drive for Thinness
## 200 Minnesota Wolf Depredation Data
## 201 Duncan's Occupational Prestige Data
## 202 The 1980 U.S. Census Undercount
## 203 Florida County Voting
## 204 Crowding and Crime in U. S. Metropolitan Areas
## 205 Format Effects on Recall
## 206 Data on Depression
## 207 Refugee Appeals
## 208 Anonymity and Cooperation
## 209 Canadian Crime-Rates Time Series
## 210 Highway Accidents
## 211 Treatment of Migraine Headaches
## 212 Data on Infant-Mortality
## 213 Cancer drug data use to provide an example of the use of the skew power distributions.
## 214 Contrived Collinear Data
## 215 Canadian Interprovincial Migration Data
## 216 Status, Authoritarianism, and Conformity
## 217 U.S. Women's Labor-Force Participation
## 218 O'Brien and Kaiser's Repeated-Measures Data
## 219 Interlocking Directorates Among Major Canadian Firms
## 220 Chemical Composition of Pottery
## 221 Prestige of Canadian Occupations
## 222 Four Regression Datasets
## 223 Fertility and Contraception
## 224 Survey of Labour and Income Dynamics
## 225 Agricultural Production in Mazulu Village
## 226 Salaries for Professors
## 227 Soil Compositions of Physical and Chemical Characteristics
## 228 Education and Related Statistics for the U.S. States
## 229 Transaction data
## 230 GDP and Infant Mortality
## 231 Population of the United States
## 232 Vocabulary and Education
## 233 Weight Loss Data
## 234 Canadian Women's Labour-Force Participation
## 235 Post-Coma Recovery of IQ
## 236 Wool data
4. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare
summary2 <- summary(subdts_car)
summary2
## PACKAGE ITEM_NAME
## car :51 Adler : 1
## boot : 0 AMSsurvey: 1
## cluster : 0 Angell : 1
## COUNT : 0 Anscombe : 1
## datasets: 0 Baumann : 1
## Ecdat : 0 Bfox : 1
## (Other) : 0 (Other) :45
## DESCRIPTON
## Canadian Women's Labour-Force Participation: 2
## Agricultural Production in Mazulu Village : 1
## American Math Society Survey Data : 1
## Anonymity and Cooperation : 1
## Canadian Crime-Rates Time Series : 1
## Canadian Interprovincial Migration Data : 1
## (Other) :44
BONUS - place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.
latent.growth.data <- read.csv("https://github.com/jameskuruvilla/R-Assignments_MSDA/blob/master/R-Assignments/datasets.csv")
#latent.growth.data