Exam 3

About

Clearly show and organise your work. Each answer should appear below the question marked in red. Where necessary add and execute code chunks, and add yours to perform required calculations.

Problem 1: (5 points)

A company manufactures two products (A and B) and the profit per unit sold is $3 and $5 respectively. Each product has to be assembled on a particular machine, each unit of product A takes 12 minutes of assembly time and each unit of product B takes 25 minutes of assembly time. The company estimates that the machine used for assembly has an effective working week of only 30 hours due to maintenance/breakdown. Technological constraints mean that for every 1 units of product A produced at least 2 units of product B must be produced. How many units of product A and product B must be produced (all products produced are sold) to optimize profits.

1A) Write the mathematical formulation of the objective function and state the type of optimization. Define the constraints in a tabular or matrix form. Clearly identify the decision variables (no coding required) (5 pts)

XA = number of units of product A produced XB = number of units of product B produced Z = Profit This is a linear optimization.

Constraints for this problem are: 12XA+25XB <= 30*60 (or 1800) XB>=2XA or XB-2XA>=0 or -2XA+XB>=0 XA >= 0 XB >= 0

We must maximize the objective Function Z = 3XA + 5XB

MATRIX FORM: XA XB 12 25 <= 1800 -2 1 >= 0 1 0 >= 0 0 1 >= 0

Problem 2: (12 points)

The HR department of a company wants to determine the salary of a new hiree based on years of experience. A data set was compiled for this purpose, showing the years of experience and yearly salary. The data is in file “salary”. Read the data in the file “salary” in R studio and solve the questions below

mydata <- read.csv("salary.csv", header=TRUE)
head(mydata)

##   Experience Salary
## 1        1.1  39343
## 2        1.3  46205
## 3        1.5  37731
## 4        2.0  43525
## 5        2.2  39891
## 6        2.9  56642

2A) Draw a scatter plot of salary vs experience. Make sure to label the X-axis with the independent (explanatory) variable and the Y-axis with the dependent (response) variable (2pts)

salary = mydata$Salary
experience = mydata$Experience

plot(experience, salary, xlab = "Experience", ylab = "Salary")

2B) Develop a simple linear regression model to describe the relationship between the two variables.Display the summary statistics, write down the mathematical representation for the linear regression model, and add the fitted line to the plot (2pts)

reg <- lm(salary ~ experience)
summary(reg)

## 
## Call:
## lm(formula = salary ~ experience)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8342.7 -4240.4   167.7  3082.7 10902.7 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  26335.6     2344.3   11.23 4.83e-11 ***
## experience    9450.3      393.6   24.01  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5600 on 24 degrees of freedom
## Multiple R-squared:   0.96,  Adjusted R-squared:  0.9584 
## F-statistic: 576.4 on 1 and 24 DF,  p-value: < 2.2e-16

#Mathematical representation: salary = 26355.6 + 9450.3*xexperience
plot(experience, salary, xlab = "Experience", ylab = "Salary", pch=16)
abline(reg, col="blue", lwd=2)

2C) Develop a non-linear regression model to describe the relationship between the two variables. Display the summary statistics, and write down the mathematical representation for the non-linear regression model. Hint: you have more than one choice for non-linear model; pick the one that you think is best and describe why you picked it (2pts)

#Based on the model of y=x+x^2
experience2 = experience^2
reg2 = lm(salary ~ experience + experience2)
summary(reg2)

## 
## Call:
## lm(formula = salary ~ experience + experience2)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -8654  -3662    -62   3250  10439 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  24913.5     4535.2   5.493 1.38e-05 ***
## experience   10121.6     1863.9   5.430 1.61e-05 ***
## experience2    -59.5      161.3  -0.369    0.716    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5703 on 23 degrees of freedom
## Multiple R-squared:  0.9603, Adjusted R-squared:  0.9568 
## F-statistic: 277.9 on 2 and 23 DF,  p-value: < 2.2e-16

#Mathematical Representation: salary = 24913.5+10121.6*xexperience-59.5*xexperience^2
# I chose this model because the model seemed to closely resemble a linear model, and so I figured it would be better to stick to a quadratic model as opposed to increasing by another degree to remain close to the values that are real. It seems to work better than I anticipated, but the linear seems to be a better predictor.

2D) Graph a plot of salary vs experience based on actual data. Overlay the predicted / fitted values based on the non linear model (2pts)

plot(experience, salary, xlab = "Experience", ylab = "Salary", pch=16)
predicted2 = predict(reg2,data=mydata)
par(new=TRUE, xaxt="n", yaxt="n", ann=FALSE)
plot(predicted2, col="red", pch=16)

2E) between the linear and non-linear model, Which one is better and why? (2pts)

#The linear model is better for the sake of predicting values. This is due to the higher adjusted R squared value associated with it. The quadratic model is better for the sake of fitting the values, as indicated by the higher R squared value.

2F) Using the better model, estimate the salary of a candidate with 9 years of experience. Calculate the corresponding error squared (2pts)

salary_predicted = coef(reg)[1] + coef(reg)[2]*(9)
salary_predicted

## (Intercept) 
##    111388.1

error = 111388.1-105582
error

## [1] 5806.1

error2 = error^2
error2

## [1] 33710797

#Predicted value = 111388.1, Error = 5806.1, squared error = 33710797

Problem 3: (8 points)

Read the data in file “Chicago_Public_Schools_Train” in R studio. Using the model you created in Exam 2 (The model that predicts ISAT Exceeding Math based on Safety Score and Parent Environment Score), you have to formulate a linear optimization problem. Your goal is to select the optimal values for the two predictors in order to optimize ISAT Exceeding Math. Assume maximum safety score = 99, and minimum Parent Environment Score = 40. Also, every increase in Parent Environment Score by 1 leads to increase in Safety Score by at most 2. Safety score and Parent Environment Score can’t be negative.

mydata2 = read.csv(file="Chicago_Public_Schools_Train.csv")
safetyscore = mydata2$Safety.Score
penvscore = mydata2$Parent.Environment.Score
exmath = mydata2$ISAT.Exceeding.Math
reg3 = lm(exmath ~ safetyscore+penvscore)
summary(reg3)

## 
## Call:
## lm(formula = exmath ~ safetyscore + penvscore)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.032  -7.241  -1.196   5.276  47.839 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 18.40039   14.27842   1.289   0.2006    
## safetyscore  0.60722    0.05855  10.371   <2e-16 ***
## penvscore   -0.55923    0.28658  -1.951   0.0539 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.04 on 97 degrees of freedom
## Multiple R-squared:  0.5264, Adjusted R-squared:  0.5166 
## F-statistic:  53.9 on 2 and 97 DF,  p-value: < 2.2e-16

3A)Write the mathematical formulation of the objective function and state the type of optimization. Define the constraints in a tabular or matrix form. Clearly identify the decision variables (no coding required) (8 points)

Xsafety = Safety Score Xpenv = Parent Environment Score Z = Exceeding Math Score This is a linear optimization.

We must maximize the objective function: Z = 0.60722xsafety+-0.55923xpenv

Constraints for this problem are: Xsafety <= 2Xpenv or Xsafety - 2Xpenv <= 0 or -2Xpenv + Xsafety <= 0 Xsafety <= 99 Xpenv >= 40 Xsafety >= 0 Xpenv >= 0

MATRIX FORM Xpenv Xsafety -2 1 <= 0 0 1 <= 99 1 0 >= 40 0 1 >= 0 1 0 >= 0

Extra Credit: (3 points)

EC1) Which of the below models has the highest correlation between Y and X

        a)  Y = X       b)  Y = 3X          c) Y = - X

Model B has the highest correlation, as there is a correlation factor of 3 between X and Y.

EC2) using the non-linear model that you created in 2C, identify the number of years of experience needed to maximize salary. Do you think HR department should use the number you came up with and why? salary = 24913.5+10121.6xexperience-59.5xexperience^2 find derivative and set = 0 0=10121.6-119x -10121.6 = -119x 119x = 10121.6 x = 85.06 or 85 years.

This is thenumber of years needed to maximize salary, 85. This however, nay not be a good number for HR to use, because accumulating 85 years of experience is pretty unrealistic for most people’s career aspirations. This model may be improved upon if a constraint for experience is used in order to allow for more realistic values to be used for experience.

Exam 3

BSAD 343, Business Analytics, Spring 2020

Uriel Reyes Vazquez

4/16/2020

About

Problem 1: (5 points)

Problem 2: (12 points)

Problem 3: (8 points)

Extra Credit: (3 points)