Data 604 - Homework # 8

Following problems pertain to “Simio and Simulation 3rd edition” chapter 6: Input Analysis.

Problem 1 page 213

Based on a set of Service time, we will use Stat::Fit to determine distribution model that best represents the captured data. We will first import the data from the spreadsheet and run some basic statistics in R.

url1 <- "C:/Users/vbrio/Documents/Cuny/DATA_604/3eExamplesForSimioV5.91/Problem_Dataset_06_01.csv"
data1 <- read.table(url1, header = FALSE)

head(data1)

##      V1
## 1 27.88
## 2  4.52
## 3  6.82
## 4  3.64
## 5  4.41
## 6 17.14

nrow(data1)

## [1] 42

summary(data1)

##        V1        
##  Min.   : 2.060  
##  1st Qu.: 5.053  
##  Median : 9.015  
##  Mean   :11.920  
##  3rd Qu.:15.898  
##  Max.   :48.650

We have 42 observations that represent Service Time. Hence the data represent stricly positve values. We would not expect to have a negative value nor zero value.
The minimum for the captured data is 2.06 and maximum is 48.65.

Let us look at the histogram for these values.

hist(data1$V1, main = "Call Center", xlab = "Service Times", border = "blue", col = "lightblue", xlim = c(2, 52), breaks = 17, prob=TRUE)
lines(density(data1$V1))

We will now conduct the analysis in Stat::Fit.

We will first copy the values from the table into Stat::Fit and change the number of intervals to 17 (from 6).

Stat::Fit Data

We will now look at the descriptive statistics and histogram.

Stat::Fit Statistics

Stat::Fit Histogram

Since we are looking to continous distribution with positive values, We would consider; exponential, lognormal, triangle, and uniform. Although from the histogram, we would expect to have either lognormalor exponential to be the best fit.

Using the calculation tab under Fit option in Stat::Fit, we will choose following settings:

Stat::Fit Fit Settings

Let us look at the density distribution.

Stat::Fit Density Distributions

From the density distribution, it would appears that exponential and lognormal are the best fit. Let us look at the details results of the statistical testing (under details in Stat::Fit).

Stat::Fit Summary of Fit

In the detail of the good fit analysis, we would reject triangle, uniform, and normal. Let us now look at the details between exponential and lognormal.
Stat::Fit Details of Fit

Comparing the p-value for both exponential and lognormal distributions, we can see that both have high p-value so we would not reject H0 hyothesis that an alternative is a better fit. Let us look at the residuals, qq-plots, and pp-plots.

Stat::Fit Errors of Fit

Since QQ-plot are more sensitive to discrepencies between data and fitted distributions in the tails and in QQ-plot the lognormal distribution is the best fit in the QQ-plot. We would select the lognormal distribution to represent the serive times when building model in simio.

Stat::Fit Export Distribution

Problem 2 page 214

Based on a set of calls duration, we will use Stat::Fit to determine distribution model that best represents the captured data. We will first import the data from the spreadsheet and run some basic statistics in R.

url2 <- "C:/Users/vbrio/Documents/Cuny/DATA_604/3eExamplesForSimioV5.91/Problem_Dataset_06_02.csv"
data2 <- read.table(url2, header = FALSE)

head(data2)

##      V1
## 1 36.13
## 2  8.58
## 3 13.56
## 4  5.44
## 5 22.43
## 6  8.40

nrow(data2)

## [1] 47

summary(data2)

##        V1       
##  Min.   : 3.80  
##  1st Qu.: 6.73  
##  Median : 9.14  
##  Mean   :10.89  
##  3rd Qu.:12.91  
##  Max.   :36.13

We have 47 observations that represent call durations. Hence the data represent stricly positve values. We would not expect to have a negative value nor zero value.
The minimum for the captured data is 3.8 and maximum is 36.13.

Let us look at the histogram for these values.

hist(data2$V1, main = "Call Center", xlab = "Calls Duration", border = "blue", col = "lightblue", xlim = c(2, 52), breaks = 17, prob=TRUE)
lines(density(data2$V1))

We will now conduct the analysis in Stat::Fit.

We will first copy the values from the table into Stat::Fit and change the number of intervals to 17 (from 6).

Stat::Fit Data

We will now look at the descriptive statistics and histogram.

Stat::Fit Statistics

Stat::Fit Histogram

Using the calculation tab under Fit option in Stat::Fit, we will choose following settings:

Stat::Fit Fit Settings

Let us look at the density distribution.

Stat::Fit Density Distributions

From the density distribution, it would appears that lognormal is the best fit. Let us look at the details results of the statistical testing (under details in Stat::Fit).

Stat::Fit Summary of Fit

Let us now look at the details of the statistical testing of null hyphothesis; H0 fitted distribution under consideration is best fit.

Stat::Fit Details of Fit

Comparing the p-value for both exponential and lognormal distributions, we can seep-value is much higher for lognormal distribution. so we would not reject H0 hyothesis that an alternative is a better fit. Let us look at the residuals, qq-plots, and pp-plots.

Stat::Fit Errors of Fit

Since QQ-plots show that both distribution fail on the tail end but based on p-values result, pp=plot, and residuals we would favor the lognormal distribution as the best fit in this case. We would model it as folows in simio.

Stat::Fit Export Distribution

Problem # 3 page 214

url3 <- "C:/Users/vbrio/Documents/Cuny/DATA_604/3eExamplesForSimioV5.91/Problem_Dataset_06_03.csv"
data3 <- read.table(url3, header = FALSE)

head(data3)

##   V1
## 1  2
## 2  5
## 3  3
## 4  3
## 5  3
## 6  1

nrow(data3)

## [1] 45

summary(data3)

##        V1       
##  Min.   :1.000  
##  1st Qu.:2.000  
##  Median :3.000  
##  Mean   :2.956  
##  3rd Qu.:4.000  
##  Max.   :8.000

We have 45 observations that represent the number of extra tech-support people to close a call. Hence the data represent non-zero discrete values. We would not expect to have a negative value.
The minimum for the captured data is 1 and maximum is 8.

Although the minimum for these observations is 1, we would model with a minimum of zero. We would expect call center to resolve problem within initial contact.

Let us look at the histogram for these values.

hist(data3$V1, main = "Call Center", xlab = "Extra Tech-Support", border = "blue", col = "lightblue", xlim = c(2, 52), breaks = 5, prob=TRUE)
lines(density(data3$V1))

We will now conduct the analysis in Stat::Fit.

We will first copy the values from the table into Stat::Fit and keep the number of intervals to the default (5).

Stat::Fit Data

We will now look at the descriptive statistics and histogram.

Stat::Fit Statistics

Stat::Fit Histogram

Since we are looking to discrete distribution with non-zero values, We would consider; binomial and poisson distributions.

Using the calculation tab under Fit option in Stat::Fit, we will choose following settings:

Stat::Fit Fit Settings

Let us look at the density distribution.

Stat::Fit Density Distributions

We will now consider the details results of the statistical testing (under details in Stat::Fit).

Stat::Fit Summary of Fit

Let us now look at the details of the statistical testing of null hyphothesis; H0 fitted distribution under consideration is best fit.

Stat::Fit Details of Fit

Comparing the p-value for both binomial and poisson distributions, we would select the binomial distribution as it represent a better fit since p-value is highest.

Stat::Fit Errors of Fit

The residuals and pp-plot graph is not adding more insight. Based on p-value, We would model a binomial distribution, as folows in simio.

Stat::Fit Export Distribution ***

Problem 4 page 214

A continuous random variable X has a uniform distribution, denoted U(a, b), if its probability density function is:

\(f\left( x \right) =\frac { 1 }{ b-a } ,\quad for\quad a<x<b\)

\(F(x)\quad =\quad \frac { 1 }{ b-a } x\quad +\quad C\) with F(a)=0

hence, we have:
\(F(a)\quad =\quad \frac { a }{ b-a } +\quad C\quad =\quad 0\), \(C\quad =\quad -\frac { a }{ b-a }\),

which leads to:

\(F(x)\quad =\quad \frac { x }{ b-a } -\frac { a }{ b-a } =\frac { x-a }{ b-a }\)

We will set this to u and solve for u to find inverse cdf.

\(\frac { x-a }{ b-a } =u\quad \Leftrightarrow \quad x-a\quad =\quad (b-a)u\quad \Leftrightarrow \quad x\quad =\quad (b-a)u\quad +\quad a\)

Problem 5 page 214

The probability density function of a Weibull random variable is:

\(f\left( x \right) \quad =\quad \begin{cases} k{ x }^{ k-1 }{ e }^{ { -x }^{ k } },\quad where\quad k>0\quad for\quad x>0 \\ 0,\quad for\quad x<0 \end{cases}\)

Substituing \(u={ x }^{ k }\) and \(\frac { du }{ dx } =k{ x }^{ k-1 }\) we get for F(X):

\(F(x)=\int _{ 0 }^{ x }{ { e }^{ -u }du }\), hence we have over interval [0,x]

\(F(x)=-{ e }^{ -u }-(-{ e }^{ 0 })=1-{ e }^{ -u }\),

Substituing back, we get:

F(x)=1-{ e }^{ -{ x }^{ k } }, setting this equal to v and solving for v we get:

\(1-{ e }^{ { -x }^{ k } }=v\), solving for v we will have

\({ e }^{ { -x }^{ k } }=1-v\), taking logarithm on both side:

\({ -x }^{ k }=\ln { (1-v)\quad \Leftrightarrow \quad { x }^{ k } } =\quad -\ln { (1-v) } \quad \Leftrightarrow \quad x={ (-\ln { (1-v)) } }^{ \frac { 1 }{ k } }\)

Problem 8 page 215

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.3.3

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

oat_size <-c(0, 0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 5.0, 7.5, 10)
oat_prob <-c(0.05, 0.07, 0.09, 0.11, 0.15, 0.25, 0.10, 0.09, 0.06, 0.03)
df_oat <- as.data.frame(cbind(oat_size, oat_prob))

pea_size <-c(0, 0.5, 1.0, 1.5, 2.0, 3.0)  
pea_prob <-c(0.1, 0.2, 0.2, 0.3, 0.1, 0.1)  
df_pea <- as.data.frame(cbind(pea_size, pea_prob))

bean_size <- c(0, 1.0, 3.0, 4.5)
bean_prob <- c(0.2, 0.4, 0.3, 0.1)
df_bean <- as.data.frame(cbind(bean_size, bean_prob)) 

barley_size <-c(0, 0.5, 1.0, 3.5)
barley_prob <-c(0.2, 0.4, 0.3, 0.1)
df_barley <-as.data.frame(cbind(barley_size, barley_prob))


demand <- function(df){
 # pick a n number at random between 0,1 
 a <- runif(1,0,1)
 b <- numeric()
 colnames(df)<- c("size", "prob")
 df_cum <- mutate(df, cum_prob=cumsum(prob))
 l <- nrow(df)
 for (i in 2:l){
       if(a < df_cum[i, 3]){
         a <- df_cum[i-1,1]
         break;
       }
   }
    
  return(a)
  
}

demand(df_barley)

## Warning: package 'bindrcpp' was built under R version 3.3.3

## [1] 0

Data 604 - Homework # 8

Valerie Briot

October 21, 2017

Problem 1 page 213

Problem 2 page 214

Problem # 3 page 214

Problem 4 page 214

Problem 5 page 214

Problem 8 page 215