WF ED Class 4 Practice

Objective

This worksheet illustrates how I worked used Mark Biegert’s RPub file (Retrieved from http://mathscinotes.com/2017/02/an-example-of-cleaning-untidy-data-with-tidyr/ ) to show examples of tidy vs. untidy data. I then used self-created completely fictitious .xlsx files to complete part 2 of the assignment.

Load in some libraries

These are the libraries that were loaded:

library(readxl)
library(dplyr)
library(csvread)

Issues with Untidy Data

I simply copied this file from the web site linked above. This was a file of untidy data:

poor format
missing data
inconsistent: NA’s represented by “-”, “$-”, and empty cells. Currency symbol sometimes used, but mostly not.

#UNTIDY
untidy <- read_excel("~/Desktop/WFED 540 FA17/Class4Homework/Untidy.xlsx", col_names = FALSE)
untidy

# A tibble: 66 x 7
                                               X__1            X__2
                                              <chr>           <chr>
 1 "Type\tModel \t1941 \t1942 \t1943 \t1944 \t1945"            <NA>
 2                               Very Heavy Bombers            <NA>
 3                                   "B-29 \t\t897"  "730 \t \t605"
 4                                    Heavy Bombers            <NA>
 5                                     "B-17 \t301"     "221 \t258"
 6                                     "B-24 \t379"     "162 \t304"
 7                                 "B-32 \t- \t790" "433 \t- \t790"
 8                                   Medium Bombers            <NA>
 9                                     "B-25 \t180"     "031 \t153"
10                                     "B-26 \t261"     "062 \t239"
# ... with 56 more rows, and 5 more variables: X__3 <chr>, X__4 <chr>,
#   X__5 <chr>, X__6 <chr>, X__7 <chr>

#TIDY
tidy <- read_excel("~/Desktop/WFED 540 FA17/Class4Homework/Tidy.xlsx")
tidy

# A tibble: 280 x 4
    Type Model  Year   Cost
   <chr> <chr> <dbl>  <chr>
 1    NA  B-29  1941     NA
 2    NA  B-17  1941 301221
 3    NA  B-24  1941 379162
 4    NA  B-32  1941     NA
 5    NA  B-25  1941 180031
 6    NA  B-26  1941 261062
 7    NA  A-20  1941 136813
 8    NA  A-26  1941 224498
 9    NA  A-28  1941     NA
10    NA  A-29  1941     NA
# ... with 270 more rows

Fictitious Data I Want To Say Something About

The 3 datasets below - Examples 1 through 3 - are completely fictious. In retrospect, I should have probably chosen different kinds of data for a variety of variables, which I will discuss in more detail below.

First, we will look at an Excel file that includes data about the year and age of participants: ##Example 1: Year and Age

age <- read_excel("~/Desktop/WFED 540 FA17/Class4Homework/Example1.xlsx")
age

## # A tibble: 10 x 3
##     X__1  year   age
##    <dbl> <dbl> <dbl>
##  1     1  2017    22
##  2     2  2016    21
##  3     3  2015    20
##  4     4  2014    19
##  5     5  2013    18
##  6     6  2012    17
##  7     7  2011    16
##  8     8  2010    15
##  9     9  2009    14
## 10    10  2008    13

Kinds of Data 1

Age is a ratio variable and year is a nominal variable.

Now, we will look at an Excel file that includes data about student gpa and amount of time studying:

Example 2: GPA/Time Studying

gpa <- read_excel("~/Desktop/WFED 540 FA17/Class4Homework/Example2.xlsx")
gpa

## # A tibble: 10 x 3
##     X__1   gpa `hours studying`
##    <dbl> <dbl>            <dbl>
##  1     1   4.0             38.0
##  2     2   3.9             29.0
##  3     3   2.2              2.2
##  4     4   3.5             32.0
##  5     5   3.2             12.0
##  6     6   2.9             21.0
##  7     7   3.1             14.0
##  8     8   3.6             26.0
##  9     9   2.5             19.0
## 10    10   2.0              5.0

Kinds of Data 2

Gpa is a ratio variable and hours of study is also a ratio variable.

Finally, we will look at an Excel file that includes data about worship attendance and offering/collection received:

Example 3: Worship Attendance/Giving

giving <- read_excel("~/Desktop/WFED 540 FA17/Class4Homework/Example3.xlsx")
giving

## # A tibble: 10 x 3
##     X__1 attendance offering
##    <dbl>      <dbl>    <dbl>
##  1     1         35      572
##  2     2         57      999
##  3     3        125     1357
##  4     4         20      782
##  5     5         46      777
##  6     6        222     3456
##  7     7        350     9891
##  8     8        425    12145
##  9     9         87     1101
## 10    10        108     1234

Kinds of Data 3

Worship attendance is a ratio variable and giving is also a ratio variable.