Prescriptive analytics support business decision-making with data and statistical models. A flexible statistical software package that allows analytic researchers to apply and create new models is essential. R is an integrated suite of software routines that facilitates advanced analytics.

In this first homework, we will be learning some of the basics of the R command language to:

  1. Conduct basic calculations
  2. Visualize your data
  3. Summarize information from a survey dataset (Ford)

Compile your answers into a .html file and upload it to Carmen Canvas. The instructor will provide on how to compile a .html file. Insert an R code chunk when necessary to answer questions. Hide the results of the code when necessary.

We will be exploring the dataset (Ford_Data.csv), employing functions for summarization, and utilizing functions for visualization. Once the data (Ford_Data.csv) has been downloaded from Carmen, set the working directory and proceed to answer the following questions. Refer to FordSurveyInstrument.docx when necessary.

1. Read the Ford dataset (Ford_Data.csv) into your R workspace using the following command.
2. How many columns and rows are there in this dataset? What do those numbers represent? (Hint: use ?str and ?dim for help)
dim(Forddata)
## [1] 547 248
#str(Forddata)

There are 547 rows and 248 columns. They correspond to the number of respondents and the number of variables.

3. What are the variable names? Print the first ten variable names. Hint: use names().
names(Forddata)[1:10]
##  [1] "No"                 "ID"                 "RespondentID"      
##  [4] "StartTime"          "EndTime"            "ILength"           
##  [7] "SampleRespondentID" "Status"             "S0"                
## [10] "S1_1"
4. What are the meanings of Forddata[1,], Forddata[,6], and Forddata[1, 6]?

Forddata[1,] corresponds the inputs belong to first row in the data frame Forddata[, 6] corresponds to the inputs belong to sixth column the data frame Forddata[1, 6] will print the value in the first row and sixth column of the data frame.

Refer to Screeners

5. Find the number of respondents in the data set who have shopped for a laptop computer in the past 6 or 7 months. A good way to describe discrete data is with frequency counts. The table() function counts the observed prevalence of each value that occurs in a variable. Use the function table() to find the number of respondents who have shopped for a laptop computer in the past 6 or 7 months.
typeof(Forddata$S1_7)
## [1] "integer"
sum(Forddata$S1_7 == 2) # TRUE will appear for respondents who selected 2
## [1] 49
sum(Forddata$S1_7 == 3) # TRUE will appear for respondents who selected 3
## [1] 473
table(Forddata$S1_7)
## 
##   1   2   3 
##  25  49 473

Refer to General Questions

6. What is the average purchase price of respondents’ current car?
mean(Forddata$S5_Price)
## [1] 30966.61

The average purchase price of respondents’ current car is $30966.61.

7. Draw a histogram of the purchase prices of the respondents’ current cars.
hist(Forddata$S5_Price, main= "Purchase Prices of Respondents' Current Cars", 
     ylab = " ", xlab = " Price", 
     xlim = c(0, 150000),
     nclass = 150)

8. Refer to Q3 in the survey instrument. (Note: Q3 is named as Q2 in FordData. Use FordData$Q2) What proportion of respondents enjoy using their car’s technologies to help them drive more defensively?
sum(Forddata$Q2_4) / length(Forddata$Q2_4)
## [1] 0.1078611

There are approximately 11% of respondent

9. Refer to Q4 in the survey instrument. (Note: Q4 is named as Q3 in FordData. Use FordData$Q3) What proportion of respondents find it difficult to drive SUVs?
sum(Forddata$Q3_8) / length(Forddata$Q3_8)
## [1] 0.06398537

Only about 6% of the respondents find it difficult to drive SUVs.

Refer to Demographic Questions

10. Explore the respondents’ age groups.
table(Forddata$D2)
## 
##  1  2  3  4  5  6  7  8  9 10 11 
## 12 24 45 52 57 54 65 49 53 61 75

Column 1 represent age group of 18 - 22 years old Column 2 represent age group of 23 - 27 years old Column 3 represent age group of 28 - 32 years old Column 4 represent age group of 33 - 37 years old Column 5 represent age group of 38 - 42 years old Column 6 represent age group of 43 - 47 years old Column 7 represent age group of 48 - 52 years old Column 8 represent age group of 53 - 57 years old Column 9 represent age group of 58 - 62 years old Column 10 represent age group of 63 - 67 years old Column 11 represent age group greater than 67 years old

11. Explore the respondents’ gender.
table(Forddata$D3)
## 
##   1   2 
## 277 270

There is 277 respondents were male and 270 female respondents.

12. Explore the respondents’ marital status.
table(Forddata$D4)
## 
##   1   2   3 
## 363 162  22

There 363 respondents claim they were married record in first column There 162 respondents claim they were unmarried record in second column There 22 respondents answer they were other record in third column

13. Explore the respondents’ employment status.
table(Forddata$D6)
## 
##   1   2   3   4   5   6 
## 293  62 127  46  11   8

In column 1, 293 respondents claim they were full time employment. In column 2, 62 respondents claim they were part time employment. In column 3, 127 respondents claim they were retired. In column 4, 46 respondents claim they were unemployment. In column 5, 11 respondents claim they were student. In column 6, 8 respondents claim they were other.

14. Explore the respondents’ annual household income categories.
table(Forddata$D8)
## 
##   1   2   3   4   5 
##  27 107 194 114 105

In column 1, 27 respondents claim their annual income is up to $24,999. In column 2, 107 respondents claim their annual income is between $25,000 to $49,999 In column 3, 194 respondents claim their annual income is between $50,000 to $99,999 In column 4, 114 respondents claim their annual income is between $100,000 to $149,999 In column 5, 105 respondents claim their annual income is more than $150,000.

15. Provide a descriptive summary of the respondents’ demographics.

The survey respondents represent a diverse demographic profile. In the sample that represent genders, male respondent is slightly more than female respondent. Respondents’ age range consist from 18 - 80 years, with majority respondent fall into 30- 60, which is the middle age class. Most respondents are married, have average household size of 3, owning or leas 1-2 vehicles. The income level is distribute widely, but still most respondents fall within the middle-income brackets.