1. From the data you have chosen, select a variable that you are interested in
  2. Use pastecs::stat.desc to describe the variable. Include a few sentences about what the variable is and what it’s measuring.
  3. Remove NA’s if needed using dplyr:filter (or anything similar)
  4. Provide a histogram of the variable (as shown in this lesson)
  5. transform the variable using the log transformation or square root transformation (whatever is more appropriate) using dplyr::mutate or something similar
  6. provide a histogram of the transformed variable
  7. submit via rpubs on CANVAS
#Loaded data (adjusted skip since headers off)
raw_data <-read_excel("2025-County-Health-Rankings-Texas-Data-v3.xlsx", sheet = "Select Measure Data", skip = 2) 
## New names:
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...9`
## • `` -> `...10`
## • `` -> `...11`
## • `` -> `...12`
## • `` -> `...13`
## • `` -> `...14`
## • `` -> `...15`
## • `` -> `...16`
## • `` -> `...17`
## • `` -> `...18`
## • `` -> `...19`
## • `` -> `...20`
## • `` -> `...21`
## • `` -> `...22`
## • `` -> `...23`
## • `` -> `...24`
## • `` -> `...25`
## • `` -> `...26`
## • `` -> `...27`
## • `` -> `...28`
## • `` -> `...29`
## • `` -> `...30`
## • `` -> `...31`
## • `` -> `...32`
## • `` -> `...33`
## • `` -> `...34`
## • `` -> `...35`
## • `` -> `...36`
## • `` -> `...37`
## • `` -> `...41`
## • `` -> `...42`
## • `` -> `...46`
## • `` -> `...47`
## • `` -> `...48`
## • `` -> `...49`
## • `` -> `...50`
## • `` -> `...51`
## • `` -> `...52`
## • `` -> `...53`
## • `` -> `...54`
## • `` -> `...55`
## • `` -> `...56`
## • `` -> `...57`
## • `` -> `...58`
## • `` -> `...59`
## • `` -> `...60`
## • `` -> `...61`
## • `` -> `...62`
## • `` -> `...63`
## • `` -> `...64`
## • `` -> `...65`
## • `` -> `...66`
## • `` -> `...67`
## • `` -> `...71`
## • `` -> `...75`
## • `` -> `...77`
## • `` -> `...78`
## • `` -> `...79`
## • `` -> `...80`
## • `` -> `...81`
## • `` -> `...82`
## • `` -> `...84`
## • `` -> `...86`
## • `` -> `...90`
## • `` -> `...94`
## • `` -> `...98`
## • `` -> `...100`
## • `` -> `...101`
## • `` -> `...102`
## • `` -> `...103`
## • `` -> `...104`
## • `` -> `...105`
## • `` -> `...107`
## • `` -> `...108`
## • `` -> `...109`
## • `` -> `...110`
## • `` -> `...111`
## • `` -> `...112`
## • `` -> `...117`
## • `` -> `...130`
## • `` -> `...134`
## • `` -> `...135`
## • `` -> `...136`
## • `` -> `...137`
## • `` -> `...138`
## • `` -> `...139`
## • `` -> `...140`
## • `` -> `...141`
## • `` -> `...142`
## • `` -> `...143`
## • `` -> `...144`
## • `` -> `...145`
## • `` -> `...146`
## • `` -> `...147`
## • `` -> `...148`
## • `` -> `...149`
## • `` -> `...154`
## • `` -> `...156`
## • `` -> `...157`
## • `` -> `...158`
## • `` -> `...163`
## • `` -> `...165`
## • `` -> `...171`
## • `` -> `...177`
## • `` -> `...181`
## • `` -> `...185`
## • `` -> `...189`
## • `` -> `...190`
## • `` -> `...191`
## • `` -> `...192`
## • `` -> `...193`
## • `` -> `...194`
## • `` -> `...199`
## • `` -> `...200`
## • `` -> `...201`
## • `` -> `...202`
## • `` -> `...203`
## • `` -> `...204`
## • `` -> `...205`
## • `` -> `...206`
## • `` -> `...207`
## • `` -> `...208`
## • `` -> `...209`
## • `` -> `...210`
## • `` -> `...211`
## • `` -> `...212`
## • `` -> `...213`
## • `` -> `...214`
## • `` -> `...215`
## • `` -> `...216`
## • `` -> `...217`
## • `` -> `...218`
## • `` -> `...219`
## • `` -> `...220`
## • `` -> `...223`
## • `` -> `...225`

QUESTION 1 PERCENTAGE ANNUAL MAMMOGRAM SCREENING AMONG TEXAS FEMALE MEDICARE ENROLLEES 65-74

Focusing on older adults (65+), highlighting preventive care disparities in Texas counties. Independent variable:“Mammo”(it represents the % of female Medicare enrollees 65-74 who receive annual mammogram screening). This variable assesses preventive breast cancer screening access and preventive utilization among older women in Medicare.

older_data <- raw_data[, c(3,106)] 

colnames(older_data) <- c("County", "Mammo") 

older_data[,2:2] <- lapply(older_data[,2:2], as.numeric) 

older_data <- na.omit(older_data) 

QUESTION 2 PASTECS

pastecs::stat.desc(older_data$Mammo) # % with Annual Mammogram
##      nbr.val     nbr.null       nbr.na          min          max        range 
##  250.0000000    0.0000000    0.0000000   14.0000000   55.0000000   41.0000000 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
## 8637.0000000   35.5000000   34.5480000    0.5045961    0.9938207   63.6543133 
##      std.dev     coef.var 
##    7.9783653    0.2309357

The above information indicates moderate variation in screening rates. A slight left skew does suggest a few Texas counties have low preventive utilization when it comes to annual mammogram screenings.

QUESTION 3 - Remove NAs

# Removed NAs + bad values 

older_data$Mammo <- as.numeric(as.character(older_data$Mammo)) 

# Dropped rows with any NA 

older_data <- older_data[complete.cases(older_data[,2:2]), ] 

nrow(older_data)  # 250 good rows now (one county name was missing and four counties had no information on % with annual mammogram screening)
## [1] 250

QUESTION 4 - Histogram

hist(older_data$Mammo)

QUESTION 5 - Transform the Variable

older_data<-older_data %>% mutate(Mammo_transformed=sqrt(Mammo))

QUESTION 6 - Histogram of the Transformed Variable

hist(older_data$Mammo_transformed)