library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
library(pastecs)
##
## Attaching package: 'pastecs'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
##
## The following object is masked from 'package:tidyr':
##
## extract
research_data<-read.csv("state_data_combo.csv")
selected_variable<-research_data$Library.Visits.Per.Capita
I am interested in one of my selected dependent variables “Library.Visits.Per.Capita”
stat.desc(selected_variable, basic=TRUE, desc=TRUE)
## nbr.val nbr.null nbr.na min max range
## 51.0000000 0.0000000 0.0000000 0.6000000 3.8200000 3.2200000
## sum median mean SE.mean CI.mean.0.95 var
## 118.8400000 2.2900000 2.3301961 0.1015756 0.2040206 0.5261980
## std.dev coef.var
## 0.7253950 0.3113021
This variable represents the average annual trips a citizen of a state makes to their state’s libraries. Using a per capita statistic helps to descrease the advantage higher population states have in simple counting stats. The higher the state’s number of annual trips per capita, the more popular the libraries in that state can be said to be generally. Almost every state except Hawaii (which has .6 annual visits to their libraries per capita) has a per capita library visit count higher than 1, signifying that theoretically everyone of their citizens has gone at least once. On average, the citizens of the USA’s fifty states and the District of Columbia visit a museum ~2.33 times per year. The median state’s visits per capita is 2.29. The standard deviation of annual per capita visits is ~.73.
There were no N/A values in this dataset.
hist(selected_variable)
shapiro.test(selected_variable)
##
## Shapiro-Wilk normality test
##
## data: selected_variable
## W = 0.98072, p-value = 0.5699
selected_variable_transformed<-log(selected_variable)
hist(abs(selected_variable_transformed))
shapiro.test(selected_variable_transformed)
##
## Shapiro-Wilk normality test
##
## data: selected_variable_transformed
## W = 0.93265, p-value = 0.006297
The logarithmic transformation of the data actually made the histogram plot substantially more abnormal than the un-transformed data.