library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
library(pastecs)
## 
## Attaching package: 'pastecs'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## 
## The following object is masked from 'package:tidyr':
## 
##     extract
research_data<-read.csv("state_data_combo.csv")
selected_variable<-research_data$Library.Visits.Per.Capita
  1. From the data you have chosen, select a variable that you are interested

I am interested in one of my selected dependent variables “Library.Visits.Per.Capita”

  1. Use pastecs::stat.desc to describe the variable. Include a few sentences about what the variable is and what it’s measuring.
stat.desc(selected_variable, basic=TRUE, desc=TRUE)
##      nbr.val     nbr.null       nbr.na          min          max        range 
##   51.0000000    0.0000000    0.0000000    0.6000000    3.8200000    3.2200000 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
##  118.8400000    2.2900000    2.3301961    0.1015756    0.2040206    0.5261980 
##      std.dev     coef.var 
##    0.7253950    0.3113021

This variable represents the average annual trips a citizen of a state makes to their state’s libraries. Using a per capita statistic helps to descrease the advantage higher population states have in simple counting stats. The higher the state’s number of annual trips per capita, the more popular the libraries in that state can be said to be generally. Almost every state except Hawaii (which has .6 annual visits to their libraries per capita) has a per capita library visit count higher than 1, signifying that theoretically everyone of their citizens has gone at least once. On average, the citizens of the USA’s fifty states and the District of Columbia visit a museum ~2.33 times per year. The median state’s visits per capita is 2.29. The standard deviation of annual per capita visits is ~.73.

  1. Remove NA’s if needed using dplyr:filter (or anything similar)

There were no N/A values in this dataset.

  1. Provide a histogram of the variable (as shown in this lesson)
hist(selected_variable)

shapiro.test(selected_variable) 
## 
##  Shapiro-Wilk normality test
## 
## data:  selected_variable
## W = 0.98072, p-value = 0.5699
  1. transform the variable using the log transformation or square root transformation (whatever is more appropriate) using dplyr::mutate or something similar
selected_variable_transformed<-log(selected_variable)
  1. provide a histogram of the transformed variable
hist(abs(selected_variable_transformed))

shapiro.test(selected_variable_transformed) 
## 
##  Shapiro-Wilk normality test
## 
## data:  selected_variable_transformed
## W = 0.93265, p-value = 0.006297

The logarithmic transformation of the data actually made the histogram plot substantially more abnormal than the un-transformed data.