Research Question

How many students in NYC got the COVID 19 vaccine ages 5+ in 2022?

## Rows: 1584 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): School DBN
## dbl (5): # of students over five active on register, # of students w/ at lea...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 6 × 6
##   `School DBN` # of students over five active on regist…¹ # of students w/ at …²
##   <chr>                                             <dbl>                  <dbl>
## 1 01M015                                              169                     62
## 2 01M019                                              168                     79
## 3 01M020                                              307                    159
## 4 01M034                                              244                     99
## 5 01M063                                              161                     76
## 6 01M064                                              185                     77
## # ℹ abbreviated names: ¹​`# of students over five active on register`,
## #   ²​`# of students w/ at least one dose`
## # ℹ 3 more variables: `% w/ at least one dose` <dbl>,
## #   `# of students fully vaccinated` <dbl>, `% fully vaccinated` <dbl>

Introduction

I decided to work with a public data set from the city of New York, the data is about students who got the Covid 19 vaccine in 2022.

Cases

There are 1584 cases in this dataset with six columns

Data Collection

It appears that the for this particular data set, the collection of the data comes from a several institutions, such as DOE, health dept, etc, gathering all the data required to define the amount of students vaccinated for COVID 19 in 2022. Each column describes students with one dose, fully vaccinated, etc which it will give me rich data to work with.

Type of Study

I think that, for this data set there are two types of studies: observational, research, obvserve, and collect data. as well as experimental, since some of the results of the vaccine were not 100% proven in labs because of the emergency.

Data Source

I will obtain the Data from the Open Data City of New York Website: https://data.cityofnewyork.us/Education/Student-COVID-Vaccinations-3-24-2022-/q5xz-reje/about_data

Dependent Variables

The dependent variable in an observational study is the outcome or response that is being measured or observed. For this dataset, the dependent variable would be the the amount of students succesfully fully vaccinated, to have an idea of the total percentaje of it.

Independent Variables

For this particular data set, I believe that independent variables will be the students over five years register, and also students with at least one dose.

Relevant Summary Statistics

# Review Structure of the data set
str(nyc_file)
## spc_tbl_ [1,584 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ School DBN                                : chr [1:1584] "01M015" "01M019" "01M020" "01M034" ...
##  $ # of students over five active on register: num [1:1584] 169 168 307 244 161 185 296 165 325 283 ...
##  $ # of students w/ at least one dose        : num [1:1584] 62 79 159 99 76 77 164 58 158 101 ...
##  $ % w/ at least one dose                    : num [1:1584] 0.367 0.47 0.518 0.406 0.472 0.416 0.554 0.352 0.486 0.357 ...
##  $ # of students fully vaccinated            : num [1:1584] 53 65 123 71 58 66 153 41 126 75 ...
##  $ % fully vaccinated                        : num [1:1584] 0.314 0.387 0.401 0.291 0.36 0.357 0.517 0.248 0.388 0.265 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `School DBN` = col_character(),
##   ..   `# of students over five active on register` = col_double(),
##   ..   `# of students w/ at least one dose` = col_double(),
##   ..   `% w/ at least one dose` = col_double(),
##   ..   `# of students fully vaccinated` = col_double(),
##   ..   `% fully vaccinated` = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
# Get summary statistics of columns.
summary(nyc_file$"# of students w/ at least one dose")
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##     12.0    128.0    215.0    625.9    351.2 495740.0
summary(nyc_file$"% fully vaccinated")
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1050  0.3040  0.4660  0.4726  0.6220  0.9320
summary(nyc_file$"# of students fully vaccinated")
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##      9.0     99.8    182.0    548.8    305.5 434618.0
summary(nyc_file$"% w/ at least one dose")
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1280  0.3980  0.5575  0.5558  0.7020  0.9460
boxplot(nyc_file$"% w/ at least one dose", main = "One Dose")

boxplot(nyc_file$"% w/ at least one dose" ~ nyc_file$"% fully vaccinated")