Data-Exploration-Class 11-20-2025

Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.

Running Code

When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:1

1. Load Data:

  • First I will load my libraries & check the output for errors

  • Dplyr Library is used to help remote on-disk data stored in databases, to help with amount of data I will process as it does not all fit into memory simultaneously and I need to use some external storage engine

  • ggplot2 Library is loaded because it is a package for producing visualizations of our data. Unlike other graphics packages, ggplot 2 uses a conceptual framework that is based of the grammar of graphics. This lets me “speak” a graph from the composable elements, and not be limited to a predefined set of charts

1 + 1
[1] 2
#| echo: false
#| 
#|Library Loaded below: Dplyr & ggplot2
#|
#|1) dplyr used because of data not fitting into memory
library(dplyr)
Warning: package 'dplyr' was built under R version 4.5.2

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
#|2) ggplot2 used as a visulation of data
#| using conceptual framework based on the grammar of graphics
#| shows multiple layers for charts, mapping, scales,
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.5.2
  1. Load Data
    1. I Utilized the Diabetes 012 healthIndicators BRFSS2015
    2. I renamed it the data to T2DM Indx All
`T2DM_Inx_All` <- read.csv("C:/Users/diana/OneDrive/Computer-1-All Documents-6-14-25/Loyola - Copy/2025/HIDS-411/diabetes_012_health_indicators_BRFSS2015 (1).csv")
    ```
  1. Proposed Research Question

-from the data. That means that you have to look at the data and see what it contains

The echo: false option disables the printing of code (only output is displayed).

  1. Variables
    1. Identifying our variables of interest (predictor and outcome variables)
  1. Power Analysis
    1. Do a power analysis to determine if you have sufficient data to answer your question.
  1. Answer to Research Question
    1. With the existing data, answer your question.
  1. Limitations of my Data

    1. Write limitations of your data (i.e. if there wasn’t enough power, but it was all that was available.)
  2. How I would collect more data.

    1. Specify the kind of study, collection methods and any adaptations you may do to make the sample generalization.