It’s all about variables

Preparation: Loading Packages

Before we begin, I’d better make sure all necessary packages are loaded in the library.

library("utils")
library("datasets")
library("readxl")

Importing untidy hypothetical data set

First I will import the hypothetical untidy data set I created. This hypothetical data set contains the number of full service employees (fse) and independent contractors (icdr) injured at 6 different sites of a fictional organization. The data set, which I call ‘hypothetical_untidy’ contains several variables and is not tidy. I will use the ‘read_excel’ function to import this data and then print it.

hypothetical_untidy <- read_excel("hypothetical_untidy.xlsx")
hypothetical_untidy
## # A tibble: 6 x 5
##   injured not_injured injured_icdr injured_male_fse injured_female_fse
##     <dbl>       <dbl>        <dbl>            <dbl>              <dbl>
## 1      15          12            9                4                  2
## 2      17          16            4                5                  8
## 3      27          23           16                6                  5
## 4      12          17            8                3                  1
## 5       0          15            0                0                  0
## 6      32          43           17               10                  5

Importing tidy hypothetical data set

I have tydied the hypothetical data set. I made sure each row had one observation, each column and one variable and each value placed in it’s own cell. * In this case, I decided to keep three variables - site, number of injured, employment status.

hypothetical_tidy <- read_excel("hypothetical_tidy.xlsx")
hypothetical_tidy
## # A tibble: 12 x 3
##      site employment_status total_injured
##     <chr>             <chr>         <dbl>
##  1 site 1               fse             6
##  2 site 1              icdr             9
##  3 site 2               fse            13
##  4 site 2              icdr             4
##  5 site 3               fse            11
##  6 site 3              icdr            16
##  7 site 4               fse             4
##  8 site 4              icdr             8
##  9 site 5               fse             0
## 10 site 5              icdr             0
## 11 site 6               fse            15
## 12 site 6              icdr            17

Importing Data set 1

This is data I gathered from the Hyundai dealership where I took my car for regular scheduled maintenance this week. I asked if they would share with me the number of cars that had come in for service during the past 10 days, and they did. The data set, which I call “service_center,” contains 10 observations of two variables.

  • The first variable is a character or qualitative variable. It denotes the day of the week. The level of measurement for this variable would be ‘interval.’
  • The second variable is a numeric or quantitative variable. It denotes the number of cars that came in for service. The level of measurement for this variable would be ‘ratio.’

I use the ‘read_excel’ function to import the data set. I’ve already made sure the ‘readxl’ package has been loaded. It’s time to import and print the first data set.

dataset1 <- read_excel("service_center.xlsx")
dataset1
## # A tibble: 10 x 2
##          day  cars
##        <chr> <dbl>
##  1    Monday    23
##  2   Tuesday     7
##  3 Wednesday     5
##  4  Thursday    12
##  5    Friday    19
##  6  Saturday    26
##  7    Sunday     0
##  8    Monday    18
##  9   Tuesday     9
## 10 Wednesday    13

Importing Data set 2

This is data I gathered from my own time sheets over the past 10 weeks. It contains information about the amount of time I’ve spent on custom client projects during those weeks.

The data set, which I call “custom work,” contains 10 observations of two variables.

  • Both variables are quantitative. The first one is the number of the week. The second is the number of hours spent on a certain kind of task.
  • The level of measurement for the first variable would be ‘interval,’ while the one for the second would be ‘ratio.’

The ‘readxl’ package has been already been loaded, so it is time to import and print the second data set.

dataset2 <- read_excel("custom work.xlsx")
dataset2
## # A tibble: 10 x 2
##     week `hours spent`
##    <dbl>         <dbl>
##  1     1             5
##  2     2             7
##  3     3            12
##  4     4            14
##  5     5             2
##  6     6            60
##  7     7            50
##  8     8            13
##  9     9            17
## 10    10            33

Importing Data set 3

The third data set is very interesting. This week, at work, we hosted a major event for the maritime industry. Several different companies from the maritime industry sent their representatives. I was curious about who each company sent - male representatives, female representatives or a combination.

The data set, which I call “Company reps,” contains 10 observations of two variables.

  • The first variable is quantitative - a random value I’ve ascribed to each company. The second is qualitative - whether the representatives are all males, all females or a combination.

  • The level of measurement for both variables would be ‘nominal.’

The ‘readxl’ package has been already been loaded, so it is time to import and print the third data set.

dataset3 <- read_excel("Company reps.xlsx")
dataset3
## # A tibble: 10 x 2
##          X__1  gender
##         <chr>   <chr>
##  1  company 1   males
##  2  company 2 females
##  3  company 3   males
##  4  company 4   males
##  5  company 5    both
##  6  company 6    both
##  7  company 7 females
##  8  company 8    both
##  9  company 9   males
## 10 company 10   males
  • This concludes Assignment 4.