This is the first exercise 1.1 - Simple Regression - Econometrics: Methods and Applications Erasmus University Rotterdam
Notes: • This exercise uses the datafile TrainExer11 and requires a computer. • The dataset TrainExer11 is available on the website. Read the excel file into R
trainset11 <- read.csv("~/Google Drive/MOOC - courses/ecnometrics-Erasmus/TrainExer11.csv")
Questions Dataset TrainExer11 contains survey outcomes of a travel agency that wishes to improve recommendation strategies for its clients. The dataset contains 26 observations on age and average daily expenditures during holidays.
(a) Make two histograms, one of expenditures and the other of age. Make also a scatter diagram with expenditures on the vertical axis versus age on the horizontal axis.
library(rafalib)
mypar(1,2)
hist(trainset11$Expenditures)
hist(trainset11$Age)
mypar(1,1)
plot(trainset11$Age,trainset11$Expenditures)
(b) In what respect do the data in this scatter diagram look different from the case of the sales and price data discussed in the lecture?
Here the young group (<20 and <40) and the older group (>40 and <60) fall into two groups with a gap in the expenditure.
For 20 to 40 age group the expenditure is directly proportional to age For 40 to 50 age group too the expenditure is directly proportional to age.
Overall the expenditure reduces as the age increases but with a large variability within each age group.
(c) Propose a method to analyze these data in a way that assists the travel agent in making recommendations to future clients.
Break the dataset into two different groups of ages instead of them together. That way it is easy to make clear recommendation to future clients about the expenditure increase as age increases within that range.
The scatter diagram indicates two groups of clients. Younger clients spend more than older ones. Further, expenditures tend to increase with age for younger clients, whereas the pattern is less clear for older clients.
(d) Compute the sample mean of expenditures of all 26 clients.
mean(trainset11$Expenditures) #Ans 101.1154
## [1] 101.1154
(e) Compute two sample means of expenditures, one for clients of age forty or more and the other for clients of age below forty.
newdata <- trainset11[ which(trainset11$Age >= 40),]
mean(newdata$Expenditure) #95.84615
## [1] 95.84615
newdata1 <- trainset11[ which(trainset11$Age < 40),]
mean(newdata1$Expenditure) #106.3846
## [1] 106.3846
(f) What daily expenditures would you predict for a new client of fifty years old? And for someone who is twenty-five years old?
#For 25 years old the predicted expenditure is 106.4 and for 50 years old it is 95.8. Using
#linear models we get
lm(newdata1$Expenditures~newdata1$Age)
##
## Call:
## lm(formula = newdata1$Expenditures ~ newdata1$Age)
##
## Coefficients:
## (Intercept) newdata1$Age
## 100.232 0.198
expen25<-100.232 + 0.198 *25 # 105.182
expen25
## [1] 105.182
lm(newdata$Expenditures~newdata$Age)
##
## Call:
## lm(formula = newdata$Expenditures ~ newdata$Age)
##
## Coefficients:
## (Intercept) newdata$Age
## 88.8719 0.1465
expen50<-88.8719 + 0.1465 *50 # 96.1969
expen50
## [1] 96.1969