Self-grading Instructions (see also self-grading template)

  1. Grade your work by checking each question to see if you had the (1) correct codes, (2) R outputs, and (3) the written answer in complete sentences. Be as thorough and precise as possible by referring to the data dictionary/documentation.

  2. Reflection: Write a short recap of what you have learned and practiced this week.

  3. If you need revisions, indicate the question numbers and make the necessary corrections (both the corrected code and answer) in your original answer.

Self-grading

  • Questions that need corrections:

  • My Reflections for this week:

  • Explain the corrections in the corresponding questions in your original assignment.

Answer Key

Answer the following questions using the appropriate dataset and codebook. For each question, provide (1) your codes, (2) R outputs AND (3) the answer in complete sentences.

Q1: Import the dataset named “birthweight_smoking.csv”, name your dataset.

smoking_data <- read.csv("birthweight_smoking.csv", header=TRUE, sep=",") 

# Or, smoking_data <- read.csv("birthweight_smoking.csv") 
# either is fine

Q2: How many observations and variables does the dataset have?

  • The dataset had 3000 observations and 13 variables.
dim(smoking_data)
## [1] 3000   13

Q3: Use summary() function to find out the summary statistics of age. Report your findings.

  • In our sample, the average age of mothers was 26.89 years (approximately 27 years). The median age, representing the middle value, was also 27 years. The youngest mother (the minimum) in the dataset was 14 years old, while the oldest (the maximum) was 44 years old.
summary(smoking_data)
##        id          birthweight      nprevist        alcohol       
##  Min.   :   1.0   Min.   : 425   Min.   : 0.00   Min.   :0.00000  
##  1st Qu.: 750.8   1st Qu.:3062   1st Qu.: 9.00   1st Qu.:0.00000  
##  Median :1500.5   Median :3420   Median :12.00   Median :0.00000  
##  Mean   :1500.5   Mean   :3383   Mean   :10.99   Mean   :0.01933  
##  3rd Qu.:2250.2   3rd Qu.:3750   3rd Qu.:13.00   3rd Qu.:0.00000  
##  Max.   :3000.0   Max.   :5755   Max.   :35.00   Max.   :1.00000  
##      smoker        unmarried           educ            age       
##  Min.   :0.000   Min.   :0.0000   Min.   : 0.00   Min.   :14.00  
##  1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:12.00   1st Qu.:23.00  
##  Median :0.000   Median :0.0000   Median :12.00   Median :27.00  
##  Mean   :0.194   Mean   :0.2267   Mean   :12.91   Mean   :26.89  
##  3rd Qu.:0.000   3rd Qu.:0.0000   3rd Qu.:14.00   3rd Qu.:31.00  
##  Max.   :1.000   Max.   :1.0000   Max.   :17.00   Max.   :44.00  
##      drinks            tripre1         tripre2         tripre3     
##  Min.   : 0.00000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
##  1st Qu.: 0.00000   1st Qu.:1.000   1st Qu.:0.000   1st Qu.:0.000  
##  Median : 0.00000   Median :1.000   Median :0.000   Median :0.000  
##  Mean   : 0.05833   Mean   :0.804   Mean   :0.153   Mean   :0.033  
##  3rd Qu.: 0.00000   3rd Qu.:1.000   3rd Qu.:0.000   3rd Qu.:0.000  
##  Max.   :21.00000   Max.   :1.000   Max.   :1.000   Max.   :1.000  
##     tripre0    
##  Min.   :0.00  
##  1st Qu.:0.00  
##  Median :0.00  
##  Mean   :0.01  
##  3rd Qu.:0.00  
##  Max.   :1.00
summary(smoking_data$age)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   14.00   23.00   27.00   26.89   31.00   44.00

Q4: Find out and report the median and range of drinks.

  • In terms of the number of drinks mothers had per week, the median was 0 and the range was 21 drinks (with the minimum of 0 and the maximum of 21).
summary(smoking_data$drinks)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  0.00000  0.00000  0.00000  0.05833  0.00000 21.00000
median(smoking_data$drinks)
## [1] 0
range(smoking_data$drinks)
## [1]  0 21

Q5: Use the table() function, report the summary statistics of smoker.

  • 2418 out of 3000 mothers (80.6%) in the dataset did not smoke during pregnancy, while 582 mothers (19.4%) were smokers.
# frequency table
table(smoking_data$smoker)
## 
##    0    1 
## 2418  582
# bonus: a percentage breakdown
table(smoking_data$smoker) / nrow(smoking_data)
## 
##     0     1 
## 0.806 0.194