Charlie Stevens

bike_data_csv <- read.csv("bike_sharing_data.csv")
bike_data_txt <- read.delim("bike_sharing_data.txt")

Question 1

Business Intelligence (BI) can be an umbrella term that combines architectures, tools, databases, analytical tools, applications and methodologies. Its major objective is to enable interactive access to data, to enable manipulation of data, and to give ability to conduct appropriate analyses. Specifically, we approach BI by analyzing historical and current data, situations, and performances, and get valuable insights that enable them to make more informed and better decisions.

Answer: TRUE

Question 2:

Match description to data structure: Atomic Vector: It contains a sequence of values of the same data type. And you could use the c() function to denote. Matrix: A two-dimensional structure made of the same data type. List: A two-dimensional structure made of the same data type. Data frame: A list of vectors, possibly of different data types among vectors, but with all vectors (or columns) of the same length. A data frame has nrow(), ncol() or length(), rownames(), and colnames() or names().

Question 3:

A function call consists of the function name followed by one or more argument within parentheses. You can check a function’s documentation by typing ?function_name. When calling the function, you could either match the arguments by their positions, or by their names.

Answer: True

Question 4:

Which of the following code would be correct to extract the bike sharing datasets in R?

bike1 <- read.table("bike_sharing_data.csv", sep=",", header=TRUE)
bike2 <- read.table("bike_sharing_data.txt", sep="\t", header=TRUE)
bike3 <- read.csv("bike_sharing_data.csv")
bike4 <- read.delim("bike_sharing_data.txt")

Answer: All of the above

Question 5:

What is the total number of observations and variables for the bike sharing dataset?

dim(bike_data_csv)

## [1] 17379    13

Answer: 17379, 13

Question 6:

If you import the bike sharing dataset in R using the above selected coding approaches in Q4, what is data type of humidity perceived by R?

str(bike_data_csv)

## 'data.frame':    17379 obs. of  13 variables:
##  $ datetime  : chr  "1/1/11 0:00" "1/1/11 1:00" "1/1/11 2:00" "1/1/11 3:00" ...
##  $ season    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ holiday   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ workingday: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ weather   : int  1 1 1 1 1 2 1 1 1 1 ...
##  $ temp      : num  9.84 9.02 9.02 9.84 9.84 ...
##  $ atemp     : num  14.4 13.6 13.6 14.4 14.4 ...
##  $ humidity  : chr  "81" "80" "80" "75" ...
##  $ windspeed : num  0 0 0 0 0 ...
##  $ casual    : int  3 8 5 3 0 0 2 1 1 8 ...
##  $ registered: int  13 32 27 10 1 1 0 2 7 6 ...
##  $ count     : int  16 40 32 13 1 1 2 3 8 14 ...
##  $ sources   : chr  "ad campaign" "www.yahoo.com" "www.google.fi" "AD campaign" ...

Answer: CHARACTER

Question 7:

What is the value of season in row 6251?

bike_data_csv[6251, "season"]

## [1] 4

Answer: 4

Question 8:

How many observations have the season as winter?

season_table <- table(bike_data_csv$season)

season_table

## 
##    1    2    3    4 
## 4242 4409 4496 4232

Answer: 4232

Question 9:

If you need to add multiple conditions to obtain a subset of a data frame (e.g., observations in winter season and/or have high wind speed), you can use the logical operators such as & or I between the conditions. And within a condition, %in% could be used to denote a choice in a vector.

EXAMPLE:

subset_data <- bike_data_csv[bike_data_csv$season %in% c(1, 2), ]