1. Examples of Tidy and Untidy Datasets

The data is from last years NOCTI data at my school (Franklin County CTC).

#Import Untidy Dataset
library(readxl)
Untidy_dataset <- read_excel("~/Box Sync/PracticeAssignment_4/Untidy_dataset.xlsx")
Untidy_dataset
# A tibble: 6 x 4
  `Sending District` Advanced Competent Basic
               <chr>    <dbl>     <dbl> <dbl>
1       Chambersburg       31        12     5
2      Fannett Metal        6         1     0
3        Greencastle        6         4    11
4       Shippensburg       21        17     1
5          Tuscarora       12         3     4
6          Waynesoro       18         1     4
#Import Tidy Dataset
Tidy_dataset <- read_excel("~/Box Sync/PracticeAssignment_4/Tidy_dataset.xlsx", 
                           col_types = c("text", "text", "text"))
Tidy_dataset
# A tibble: 157 x 3
   Student_ID Sending_District NOCTI_results
        <chr>            <chr>         <chr>
 1          1     Chambersburg      Advanced
 2          2     Chambersburg      Advanced
 3          3     Chambersburg      Advanced
 4          4     Chambersburg      Advanced
 5          5     Chambersburg      Advanced
 6          6     Chambersburg      Advanced
 7          7     Chambersburg      Advanced
 8          8     Chambersburg      Advanced
 9          9     Chambersburg      Advanced
10         10     Chambersburg      Advanced
# ... with 147 more rows

2. Three separate sets of 10 observations of two variables.

This dataset is from an online class that shows the pretest scores of the students. The first variable (Gender) is a nominal measurement. The numbers in the variable are used only to classify or name the data.

The second variable (Pretest_Score) is a ratio measurement. In this level of measurement, the observations, in addition to having equal intervals, can have a value of zero as well.

testscore_dataset <- read_excel("~/Box Sync/PracticeAssignment_4/test_score_dataset.xlsx")
testscore_dataset
# A tibble: 10 x 2
   Gender Pretest_Score
    <chr>         <dbl>
 1 Female            72
 2   Male            70
 3     NA            74
 4 Female            80
 5 Female            75
 6 Female            72
 7   Male            81
 8 Female            74
 9 Female            87
10   Male            83

This dataset is employment statistics from the US Department of Labor. The first variable (Year) is an interval variable. This interval level of measurement not only classifies and orders the measurements, but it also specifies that the distances between each interval on the scale are equivalent along the scale from low interval to high interval

The second variable (Percent_Unemployed) is a ratio variable. These observations, in addition to having equal intervals, can have a value of zero as well.

unemployment_dataset <- read_excel("~/Box Sync/PracticeAssignment_4/unemployment_dataset.xlsx")
unemployment_dataset
# A tibble: 10 x 2
    Year Percent_Unemployed
   <dbl>              <dbl>
 1  2000                4.0
 2  2001                4.7
 3  2002                5.8
 4  2003                6.0
 5  2004                5.5
 6  2005                5.1
 7  2006                4.6
 8  2007                4.6
 9  2008                5.8
10  2009                9.3

This dataset is violent crime statistics from the FBI. The first variable (population) is a ratio variable. These observations, in addition to having equal intervals, can have a value of zero as well. The second variable (violent_crime) is also a ratio variable. These observations, in addition to having equal intervals, can have a value of zero as well.

crimerate_dataset <- read_excel("~/Box Sync/PracticeAssignment_4/us_crimerate_dataset.xlsx")
crimerate_dataset
# A tibble: 10 x 2
   population violent_crime
        <dbl>         <dbl>
 1  260327021       1857670
 2  262803276       1798792
 3  265228572       1688540
 4  267783607       1636096
 5  270248003       1533887
 6  272690813       1426044
 7  281421906       1425486
 8  285317559       1439480
 9  287973924       1423677
10  290788976       1383676

I did not use the ordinal level of measurement which depicts some ordered relationship among the variable’s observations. Examples include size, ranking of favorite sports, class rankings, wellness ranking, and Likert scales.