The major differences between matrices and data frames are matrices are limited to the same type of data. For example if an part of a matrices data is numeric, the rest must remain numeric. Data frams are much more general, allowing numeric, character, etc in one set.
ncbirths <- read.csv("/resources/rstudio/BusinessStatistics/data/ncbirths.csv")
str(ncbirths)
## 'data.frame': 1000 obs. of 13 variables:
## $ fage : int NA NA 19 21 NA NA 18 17 NA 20 ...
## $ mage : int 13 14 15 15 15 15 15 15 16 16 ...
## $ mature : Factor w/ 2 levels "mature mom","younger mom": 2 2 2 2 2 2 2 2 2 2 ...
## $ weeks : int 39 42 37 41 39 38 37 35 38 37 ...
## $ premie : Factor w/ 3 levels "<NA>","full term",..: 2 2 2 2 2 2 2 3 2 2 ...
## $ visits : int 10 15 11 6 9 19 12 5 9 13 ...
## $ marital : Factor w/ 3 levels "<NA>","married",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ gained : int 38 20 38 34 27 22 76 15 NA 52 ...
## $ weight : num 7.63 7.88 6.63 8 6.38 5.38 8.44 4.69 8.81 6.94 ...
## $ lowbirthweight: Factor w/ 2 levels "low","not low": 2 2 2 2 2 1 2 1 2 2 ...
## $ gender : Factor w/ 2 levels "female","male": 2 2 1 2 1 2 2 2 2 1 ...
## $ habit : Factor w/ 3 levels "<NA>","nonsmoker",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ whitemom : Factor w/ 3 levels "<NA>","not white",..: 2 2 3 3 2 2 2 2 3 3 ...
summary(ncbirths)
## fage mage mature weeks
## Min. :14.00 Min. :13 mature mom :133 Min. :20.00
## 1st Qu.:25.00 1st Qu.:22 younger mom:867 1st Qu.:37.00
## Median :30.00 Median :27 Median :39.00
## Mean :30.26 Mean :27 Mean :38.33
## 3rd Qu.:35.00 3rd Qu.:32 3rd Qu.:40.00
## Max. :55.00 Max. :50 Max. :45.00
## NA's :171 NA's :2
## premie visits marital gained
## <NA> : 2 Min. : 0.0 <NA> : 1 Min. : 0.00
## full term:846 1st Qu.:10.0 married :386 1st Qu.:20.00
## premie :152 Median :12.0 not married:613 Median :30.00
## Mean :12.1 Mean :30.33
## 3rd Qu.:15.0 3rd Qu.:38.00
## Max. :30.0 Max. :85.00
## NA's :9 NA's :27
## weight lowbirthweight gender habit
## Min. : 1.000 low :111 female:503 <NA> : 1
## 1st Qu.: 6.380 not low:889 male :497 nonsmoker:873
## Median : 7.310 smoker :126
## Mean : 7.101
## 3rd Qu.: 8.060
## Max. :11.750
##
## whitemom
## <NA> : 2
## not white:284
## white :714
##
##
##
##
There are 1000 newborns in the dataset.
There are 13 columns.
There are 13 variables.
The heaviest newborn is 11.75 lbs.