ISDA609: Mathmatical Modeling Techniques for Data Analytics \ Assignment:02

Page 69, Problem 12

For the scenarios presented in problem 9-17, identify a problem worth studying and list the variables that affect the behavior you have identified. Which variables would be neglected completely? Which might be considered as constants initially? Can you identify any submodels you would want to study in detail? Identify any data you would want collected.

Problem 12

A company with a fleet of trucks faces increasing maintenance costs as the age and mileage of the trucks increase

Answer Problem 12

identify a problem worth studying

Yes, this problem is worth studying as it illustrate the classical optimization problem where to either minimize or maximize the outcome given some constraints. In this problem we need to maximize our profit by minimizing our maintenance cost give the age of trucks.

List the variables that affect the behavior you have identified.

Lease expense, license, taxes, insurance, number of trucks, number of mechanics, type of fuel, maintenance and repair, labor, number of breakdowns, wait time to repair, loss of revenue and delay penalties, drivers retention and attrition, and number of customer reviews (negative and positive) the service.

Which variables would be neglected completely?

Unless there are plans to relocate to different state with different regulations, the following variables can neglected completely: Lease expense, licenses and permits, taxes, insurance, number of trucks, number of mechanics, type of fuel.

Which might be considered as constants initially?

Assuming that our mechanics are full time employees, the labor cost can be considered constant. However, the parts and materials associated with the labor are not constant. And any one time cosmetic fixes can be considered constants such as a small paint job or seat cleaning.

Can you identify any sub models you would want to study in detail?

The sub model that I want to study in more detail is as follow: \[Truck \ ownership\ cost = truck \ depreciation + truck\ Return\ on\ Investment\ (ROI)\] As the truck depreciation is constant, the main focus will be on truck Return on Investment (ROI). \[Truck \ Return \ on \ Investment\ (ROI) = (the \ gain \ from \ the \ truck - Cost\ of\ investment) /cost\ of\ investment.\] Hence the detailed subsystem can be as follow: \[Cost\ of\ investment = (Fuel\ cost\ + maintenance\ cost\ + breakdown\ cost +wait\ cost).\]

Identify any data you would want collected:

The data you would want collected is maintenance cost and type of maintenance and specifically the tracks and truck parts that break down the most. The wait time needed to fix and maintain the trucks. And finally customer reviews. In other words, I would collect any data that directly or indirectly impact revenues. With the collected data, I would well informed about the best time to decide replacing trucks that are performing very poorly and negatively impacting the bottom-line of the company.

Page 79, Problem 11

In problems 7-12, determine whether the data set supports the stated proportionality model.

\[ y \propto x^3 \quad Equation 2.1\]

df <- data.frame(y=c(0,1,2,6,14,24,37,58,82,114), x=c(1,2,3,4,5,6,7,8,9,10))
df

##      y  x
## 1    0  1
## 2    1  2
## 3    2  3
## 4    6  4
## 5   14  5
## 6   24  6
## 7   37  7
## 8   58  8
## 9   82  9
## 10 114 10

Answer Page 79, Problem 11

First we have to determine whether or not y and \(x^3\) are proportional, i.e., whether or not there is a positive constant k satisfying y = k\(x^3\). If they are not, we don’t have to proceed.

For this purpose, we compute the ratio \(y^{1/3}/x\), because \(y^{1/3}\) and x are proportional if and only if y and \(x^3\) are proportional.

Therefore, we are allowed to say that the given data can be approximated by: \((y^{1/3}, x) \quad\) and \(\quad y^{1/3}/x\)

Hence, our new data is:

df$y_cube_root<- (df$y)^(1/3)
#df$y_cube_root
df$y_cube_root_over_x <- (df$y)^(1/3) / df$x
#df$y_cube_root_over_x 
df

##      y  x y_cube_root y_cube_root_over_x
## 1    0  1    0.000000          0.0000000
## 2    1  2    1.000000          0.5000000
## 3    2  3    1.259921          0.4199737
## 4    6  4    1.817121          0.4542801
## 5   14  5    2.410142          0.4820285
## 6   24  6    2.884499          0.4807499
## 7   37  7    3.332222          0.4760317
## 8   58  8    3.870877          0.4838596
## 9   82  9    4.344481          0.4827202
## 10 114 10    4.848808          0.4848808

We get mean (\(\quad y^{1/3}/x\) ):

mean(df$y_cube_root_over_x)

## [1] 0.4264524

The mean value above is 0.4264524 = 0.42. Therefore:

\[ \begin{aligned} mean(y^{1/3}/x) =&0.42\\ y =&(0.42)^3 x ^3\\ y =&0.074 x^3 \end{aligned} \]

Now let’s poluplate the data for \(y=0.074 x^3\)

df$y_c_xcube <- (.074)*((df$x)^3)
#data.frame(ypred = df$y_c_xcube)

Now plot the results

# Given initial data
library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.2.3

qplot(x,y, data=df, xlab = "x", ylab = "y")

# Modeled data
qplot(x,y_c_xcube, data=df, xlab= "x", ylab = "y= .074 x^cube " )

Although the graph does not go exactly through the origin, the data supports the proportionality model since the relative error \((y_a - y_p ) y_a\) = 0.337 is small as the slope is small, 0.074.

Hence the data can be modeled against a function that goes through the origin as the case for: \(y =0.074 x^3\)

Page 94, Problem 4

Lumber Cutters - Lumber cutters wish to use readily available measurements to estimate the number of board feet of lumber in a tree. Assume they measure the diameter of the tree in inches at waist height. Develop a model that predicts board feet as a function of diameter in inches.

Use the following data for your test:

lumber_df <- data.frame(x=c(17, 19, 20, 23, 25, 28, 32, 38, 39, 41),
                     y=c(19, 25, 32, 57, 71, 113, 123, 252, 259, 294))

lumber_df

##     x   y
## 1  17  19
## 2  19  25
## 3  20  32
## 4  23  57
## 5  25  71
## 6  28 113
## 7  32 123
## 8  38 252
## 9  39 259
## 10 41 294

The variable x is the diameter of a ponderosa pine in inches, and y is the number of board feet divided by 10.

Consider two separate assumptions, allowing each to lead to a model. Completely analyze each model. \[\ \] i.Assume that all trees are right-circular cylinders, and are approximately the same height. \[\ \] ii.Assume that all trees are right-circular cylinders and that the height of the tree is proportional to the diameter. \[\ \]
Which model appears to be better? Why? Justify your conclusions. \[\ \]

Answer Page 94, Problem 4

This an example of Geometric Similarity in which f(x) = \(\pi r^2 h\). \[\ \]

i.Assume that all trees are right-circular cylinders, and are approximately the same height.

The assumption here is that h is constatnt as all trees have the same hight.
Hence our function f(x) = \(\pi r^2 h\) will be depedning or \(r^2\). Therefore, \[ y \propto x^2 \]

First we have to determine whether or not y and \(x^2\) are proportional,i.e., whether or not there is a positive constant k satisfying y = k\(x^2\). If they are not, we don’t have to proceed.

For this purpose, we compute the ratio \(y^{1/2}/x\), because \(y^{1/2}\) and x are proportional if and only if y and \(x^2\) are proportional.

Therefore, we are allowed to say that the given data can be approximated by: \((y^{1/2}, x) \quad\) and \(\quad y^{1/2}/x\)

lumber_df$y_sqrt<- (lumber_df$y)^(1/2)
#df$y_cube_root
lumber_df$y_sqrt_over_x <- (lumber_df$y)^(1/2) / lumber_df$x
#df$y_cube_root_over_x 
lumber_df

##     x   y    y_sqrt y_sqrt_over_x
## 1  17  19  4.358899     0.2564058
## 2  19  25  5.000000     0.2631579
## 3  20  32  5.656854     0.2828427
## 4  23  57  7.549834     0.3282537
## 5  25  71  8.426150     0.3370460
## 6  28 113 10.630146     0.3796481
## 7  32 123 11.090537     0.3465793
## 8  38 252 15.874508     0.4177502
## 9  39 259 16.093477     0.4126533
## 10 41 294 17.146428     0.4182056

We get mean (\(\quad y^{1/2}/x\) ):

mean(lumber_df$y_sqrt_over_x)

## [1] 0.3442542

The mean value above is: 0.34
Therefore:

\[ \begin{aligned} mean(y^{1/2}/x) =&0.34 \\ y =&(0.34)^2 x ^2 \\ y =&0.11 x^2 \\ \end{aligned} \]

Now let’s poluplate the data for \(y=0.11 x^2\)

lumber_df$y_c_xsqr <- (.11)*((lumber_df$x)^2)
lumber_df

##     x   y    y_sqrt y_sqrt_over_x y_c_xsqr
## 1  17  19  4.358899     0.2564058    31.79
## 2  19  25  5.000000     0.2631579    39.71
## 3  20  32  5.656854     0.2828427    44.00
## 4  23  57  7.549834     0.3282537    58.19
## 5  25  71  8.426150     0.3370460    68.75
## 6  28 113 10.630146     0.3796481    86.24
## 7  32 123 11.090537     0.3465793   112.64
## 8  38 252 15.874508     0.4177502   158.84
## 9  39 259 16.093477     0.4126533   167.31
## 10 41 294 17.146428     0.4182056   184.91

Now plot the results

# Given initial data
library(ggplot2) 

qplot(x,y, data=lumber_df, xlab = "x", ylab = "y")

# Modeled data
qplot(x,y_c_xsqr, data=lumber_df, xlab= "x", ylab = "y= .11 x^sqr " )

library(reshape2)
library(ggplot2)
final1 <- data.frame( x= lumber_df$x, y = lumber_df$y, ypredict = lumber_df$y_c_xsqr)
plot(final1)

DF1 <- melt(final1, id= 'x') 
ggplot(data = DF1, aes(x = x, y = value, color = variable)) +
  geom_point()

ii.Assume that all trees are right-circular cylinders and that the height of the tree is proportional to the diameter. \[\ \]

The assumption here is that h is not constatnt.
Hence our function f(x) = \(\pi r^2 h\) will be depedning or \(r^2\) and \(h\). Therefore, \[ y \propto x^3 \]

First we have to determine whether or not y and \(x^3\) are proportional, i.e., whether or not there is a positive constant k satisfying y = k\(x^3\). If they are not, we don’t have to proceed.

For this purpose, we compute the ratio \(y^{1/3}/x\), because \(y^{1/3}\) and x are proportional if and only if y and \(x^3\) are proportional.

Therefore, we are allowed to say that the given data can be approximated by: \((y^{1/3}, x) \quad\) and \(\quad y^{1/3}/x\)

Hence, our new data is:

#lumber_df$y_cube_root
lumber_df$y_cube_root<- (lumber_df$y)^(1/3)

#df$y_cube_root_over_x 
lumber_df$y_cube_root_over_x <- (lumber_df$y)^(1/3) / lumber_df$x

lumber_df

##     x   y    y_sqrt y_sqrt_over_x y_c_xsqr y_cube_root y_cube_root_over_x
## 1  17  19  4.358899     0.2564058    31.79    2.668402          0.1569648
## 2  19  25  5.000000     0.2631579    39.71    2.924018          0.1538957
## 3  20  32  5.656854     0.2828427    44.00    3.174802          0.1587401
## 4  23  57  7.549834     0.3282537    58.19    3.848501          0.1673261
## 5  25  71  8.426150     0.3370460    68.75    4.140818          0.1656327
## 6  28 113 10.630146     0.3796481    86.24    4.834588          0.1726639
## 7  32 123 11.090537     0.3465793   112.64    4.973190          0.1554122
## 8  38 252 15.874508     0.4177502   158.84    6.316360          0.1662200
## 9  39 259 16.093477     0.4126533   167.31    6.374311          0.1634439
## 10 41 294 17.146428     0.4182056   184.91    6.649400          0.1621805

We get mean (\(\quad y^{1/3}/x\) ):

mean(lumber_df$y_cube_root_over_x)

## [1] 0.162248

The mean value above is: 0.162248 = 0.16

Therefore:

\[ \begin{aligned} mean(y^{1/3}/x) =&0.16 \\ y =&(0.16)^3 x ^3 \\ y =&0.004 x^3 \\ \end{aligned} \]

Now let’s poluplate the data for \(y=0.004 x^3\)

#data.frame(ypred = lumber_df$y_c_xcube)
lumber_df$y_c_xcube <- (.004)*((lumber_df$x)^3)

Now plot the results

# Given initial data
library(ggplot2) 

qplot(x,y, data=lumber_df, xlab = "x", ylab = "y")

# Modeled data
qplot(x,y_c_xcube, data=lumber_df, xlab= "x", ylab = "y= .004 x^cube " )

library(reshape2)
library(ggplot2)
final2 <- data.frame( x= lumber_df$x, y = lumber_df$y, ypredict = lumber_df$y_c_xcube)
plot(final)

DF2 <- melt(final2, id= 'x') 
ggplot(data = DF2, aes(x = x, y = value, color = variable)) +
  geom_point()

library(reshape2)
library(ggplot2)
library(grid)
library(gridExtra)

## Warning: package 'gridExtra' was built under R version 3.2.3

p1<- ggplot(data = DF1, aes(x = x, y = value, color = variable)) +
  geom_point() + labs(title = "Same height Trees, y = k.x^2")

p2<- ggplot(data = DF2, aes(x = x, y = value, color = variable)) +
  geom_point() + labs(title = "High Proportional to Diameter Trees, y =k.x^3", size = .4)

grid.arrange(p1, p2, nrow=2, ncol=1)

b. Which model appears to be better? Why? Justify your conclusions. \[\ \]

From the plot above, the model, in which heights are proportional to diameters, appears to be much better than the one in which trees having the same heights. In other words, geometric similarity of using the volume function in which the diameter and high vary are better model fit than the surface model in which only the diameter is variables.. Therefore, a detail analysis is always recommended before assuming certain variables can be made constants

ISDA609: Mathmatical Modeling Techniques for Data Analytics \[ \ \] Assignment:02

Mohamed Elmoudni

February 13, 2016

Page 69, Problem 12

Problem 12

Answer Problem 12

identify a problem worth studying

List the variables that affect the behavior you have identified.

Which variables would be neglected completely?

Which might be considered as constants initially?

Can you identify any sub models you would want to study in detail?

Identify any data you would want collected:

Page 79, Problem 11

Answer Page 79, Problem 11

Now plot the results

Page 94, Problem 4

Answer Page 94, Problem 4

i.Assume that all trees are right-circular cylinders, and are approximately the same height.

Now plot the results

ii.Assume that all trees are right-circular cylinders and that the height of the tree is proportional to the diameter. \[\ \]

Now plot the results

b. Which model appears to be better? Why? Justify your conclusions. \[\ \]