HOMEWORK

  1. From the data you have chosen, select a variable that you are interested in
  2. Use pastecs::stat.desc to describe the variable. Include a few sentences about what the variable is and what it’s measuring. Remember to load pastecs “library(pastecs)”
  3. Remove NA’s if needed using dplyr:filter (or anything similar)
  4. Provide a histogram of the variable (as shown in this lesson)
  5. transform the variable using the log transformation or square root transformation (whatever is more appropriate) using dplyr::mutate or something similar
  6. provide a histogram of the transformed variable
  7. submit via rpubs on CANVAS
setwd("~/Desktop/My Class Stuff/Project Data")
library(readxl)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(stargazer)
## 
## Please cite as: 
## 
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(pastecs)
## 
## Attaching package: 'pastecs'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## 
## The following object is masked from 'package:tidyr':
## 
##     extract
#Data
project_data <- read_excel("texas federal funds.xlsx")
#Q1 & Q2
pastecs::stat.desc(project_data$`COOPERATIVE FISHERY STATISTICS`)
##      nbr.val     nbr.null       nbr.na          min          max        range 
## 1.500000e+01 0.000000e+00 0.000000e+00 5.941700e+04 9.679200e+04 3.737500e+04 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
## 1.109252e+06 7.130000e+04 7.395013e+04 2.102809e+03 4.510077e+03 6.632708e+07 
##      std.dev     coef.var 
## 8.144144e+03 1.101302e-01
#Q3
project_data <- project_data %>% filter(`COOPERATIVE FISHERY STATISTICS`>0)
head(project_data$`COOPERATIVE FISHERY STATISTICS`)
## [1] 96792 71300 71300 71300 59417 83183
#Q4
hist(project_data$`COOPERATIVE FISHERY STATISTICS`)

#Q5
coopfishstatlog<-project_data %>% mutate(LOG_CFS=log(`COOPERATIVE FISHERY STATISTICS`)) %>% select(`COOPERATIVE FISHERY STATISTICS`,LOG_CFS)

head(coopfishstatlog)
## # A tibble: 6 × 2
##   `COOPERATIVE FISHERY STATISTICS` LOG_CFS
##                              <dbl>   <dbl>
## 1                            96792    11.5
## 2                            71300    11.2
## 3                            71300    11.2
## 4                            71300    11.2
## 5                            59417    11.0
## 6                            83183    11.3
#Q6
hist(coopfishstatlog$LOG_CFS)

I feel that this would look like like a bell curve if we were able to flip the x and y axis, though it looks like that can only be done using ggplot? Will resubmit if this is necessary.