Before we begin, I’d better make sure all necessary packages are loaded in the library.
library("utils")
library("datasets")
library("readxl")
First I will import the hypothetical untidy data set I created. This hypothetical data set contains the number of full service employees (fse) and independent contractors (icdr) injured at 6 different sites of a fictional organization. The data set, which I call ‘hypothetical_untidy’ contains several variables and is not tidy. I will use the ‘read_excel’ function to import this data and then print it.
hypothetical_untidy <- read_excel("hypothetical_untidy.xlsx")
hypothetical_untidy
## # A tibble: 6 x 5
## injured not_injured injured_icdr injured_male_fse injured_female_fse
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 15 12 9 4 2
## 2 17 16 4 5 8
## 3 27 23 16 6 5
## 4 12 17 8 3 1
## 5 0 15 0 0 0
## 6 32 43 17 10 5
I have tydied the hypothetical data set. I made sure each row had one observation, each column and one variable and each value placed in it’s own cell. * In this case, I decided to keep three variables - site, number of injured, employment status.
hypothetical_tidy <- read_excel("hypothetical_tidy.xlsx")
hypothetical_tidy
## # A tibble: 12 x 3
## site employment_status total_injured
## <chr> <chr> <dbl>
## 1 site 1 fse 6
## 2 site 1 icdr 9
## 3 site 2 fse 13
## 4 site 2 icdr 4
## 5 site 3 fse 11
## 6 site 3 icdr 16
## 7 site 4 fse 4
## 8 site 4 icdr 8
## 9 site 5 fse 0
## 10 site 5 icdr 0
## 11 site 6 fse 15
## 12 site 6 icdr 17
This is data I gathered from the Hyundai dealership where I took my car for regular scheduled maintenance this week. I asked if they would share with me the number of cars that had come in for service during the past 10 days, and they did. The data set, which I call “service_center,” contains 10 observations of two variables.
I use the ‘read_excel’ function to import the data set. I’ve already made sure the ‘readxl’ package has been loaded. It’s time to import and print the first data set.
dataset1 <- read_excel("service_center.xlsx")
dataset1
## # A tibble: 10 x 2
## day cars
## <chr> <dbl>
## 1 Monday 23
## 2 Tuesday 7
## 3 Wednesday 5
## 4 Thursday 12
## 5 Friday 19
## 6 Saturday 26
## 7 Sunday 0
## 8 Monday 18
## 9 Tuesday 9
## 10 Wednesday 13
This is data I gathered from my own time sheets over the past 10 weeks. It contains information about the amount of time I’ve spent on custom client projects during those weeks.
The data set, which I call “custom work,” contains 10 observations of two variables.
The ‘readxl’ package has been already been loaded, so it is time to import and print the second data set.
dataset2 <- read_excel("custom work.xlsx")
dataset2
## # A tibble: 10 x 2
## week `hours spent`
## <dbl> <dbl>
## 1 1 5
## 2 2 7
## 3 3 12
## 4 4 14
## 5 5 2
## 6 6 60
## 7 7 50
## 8 8 13
## 9 9 17
## 10 10 33
The third data set is very interesting. This week, at work, we hosted a major event for the maritime industry. Several different companies from the maritime industry sent their representatives. I was curious about who each company sent - male representatives, female representatives or a combination.
The data set, which I call “Company reps,” contains 10 observations of two variables.
The first variable is quantitative - a random value I’ve ascribed to each company. The second is qualitative - whether the representatives are all males, all females or a combination.
The level of measurement for both variables would be ‘nominal.’
The ‘readxl’ package has been already been loaded, so it is time to import and print the third data set.
dataset3 <- read_excel("Company reps.xlsx")
dataset3
## # A tibble: 10 x 2
## X__1 gender
## <chr> <chr>
## 1 company 1 males
## 2 company 2 females
## 3 company 3 males
## 4 company 4 males
## 5 company 5 both
## 6 company 6 both
## 7 company 7 females
## 8 company 8 both
## 9 company 9 males
## 10 company 10 males