Problem 1: Transportation Safety

Scenario

I am a data analyst tasked with understanding the relationship between vehicle speed and distance. Using the provided dataset, I will analyze how speed affects distance, which can provide insights into vehicle depreciation and pricing trends.

data("cars")
head(cars)

##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

Task 1: Data Visualization

I created a scatter plot of distance as a function of speed to visualize their relationship. Additionally, I added a regression line to assess the linear trend.

# Check the structure of the 'cars' dataset
str(cars)

## 'data.frame':    50 obs. of  2 variables:
##  $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
##  $ dist : num  2 10 4 22 16 10 18 26 34 17 ...

# Ensure there are no missing or infinite values
cars <- na.omit(cars)  # Remove rows with NA values
cars <- cars[is.finite(cars$speed) & is.finite(cars$dist), ]  # Keep only finite values

# Scatter plot of speed vs stopping distance
plot(cars$speed, cars$dist, 
     main = "Stopping Distance vs Speed", 
     xlab = "Speed (mph)", 
     ylab = "Stopping Distance (ft)", 
     pch = 16, col = "hotpink",  
     cex = 1.2, 
     col.lab = "purple",  
     col.main = "deeppink",  
     col.axis = "mediumpurple")  

# Fit the regression line
model <- lm(dist ~ speed, data = cars)
abline(model, col = "violetred", lwd = 2)  # Violetred regression line

Task 2: Build a Linear Model

I constructed a linear regression model where distance is the dependent variable and speed is the independent variable. I summarized the model to examine the coefficients, R-squared value, and p-value.

# Fit the linear regression model
model <- lm(dist ~ speed, data = cars)

# Summarize the model
summary(model)

## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Key Findings: - Intercept: Represents the estimated distance when speed = 0. - Slope: Indicates how much distance decreases for each additional mile driven. - R-squared value: Proportion of variance in distance explained by speed. - P-value: Tests the significance of the relationship.

Task 3: Model Quality Evaluation

I calculated and interpreted the R-squared value to determine the model’s explanatory power.

# Display the R-squared value
cat("R-squared value:", summary(model)$r.squared, "\n")

## R-squared value: 0.6510794

Interpretation: - A high R-squared value suggests that speed explains a significant portion of the variability in distance.

Task 4: Residual Analysis

Residual analysis checks if the assumptions of linear regression are met: linearity, homoscedasticity, normality, and independence.

4.1 Residuals vs Fitted Values

I plotted residuals against fitted values to check for patterns.

# Residuals vs Fitted Values
plot(model$fitted.values, model$residuals, 
     main = "Residuals vs Fitted Values", 
     xlab = "Fitted Values", 
     ylab = "Residuals", 
     pch = 16, col = "hotpink")  # Hotpink points
abline(h = 0, col = "violetred", lwd = 2)  # Horizontal reference line

Interpretation: Random scatter around 0 indicates no major issues with linearity or homoscedasticity.

4.2 Q-Q Plot for Residuals

I checked the normality of residuals using a Q-Q plot.

# Q-Q Plot
qqnorm(model$residuals, main = "Q-Q Plot of Residuals", col = "hotpink")
qqline(model$residuals, col = "violetred", lwd = 2)  # Violetred reference line

Interpretation: Residuals lying close to the line suggest normality.

4.3 Shapiro-Wilk Test for Normality

I conducted the Shapiro-Wilk test to quantify normality.

# Shapiro-Wilk Test
shapiro_test <- shapiro.test(model$residuals)
cat("Shapiro-Wilk test p-value:", shapiro_test$p.value, "\n")

## Shapiro-Wilk test p-value: 0.02152458

Interpretation: - A p-value > 0.05 supports normality. - A p-value ≤ 0.05 suggests non-normal residuals.

4.4 Histogram of Residuals

I created a histogram to visually inspect the distribution of residuals.

# Histogram of Residuals
hist(model$residuals, 
     main = "Histogram of Residuals", 
     xlab = "Residuals", 
     col = "lightpink", border = "hotpink")  # Lightpink bars with hotpink borders

Interpretation: A bell-shaped curve indicates normally distributed residuals.

Task 5: Conclusion

Model Fit:
- The scatter plot and regression line indicate a negative linear relationship between speed and distance.
- A high R-squared value suggests that speed explains much of the variability in distance.
Residual Analysis:
- Residuals vs Fitted Values plot shows no clear pattern, supporting linearity and homoscedasticity.
- The Q-Q plot, Shapiro-Wilk test, and histogram suggest residuals are approximately normal.
Improvements:
- If R-squared is low, additional predictors (e.g., Cylinder, Doors, Cruise) could improve the model.
- For violations of normality or homoscedasticity, consider transformations (e.g., log-transformation of distance).

Problem 2: Health Policy Analyst

Problem Description

As a health policy analyst for an international organization, you are tasked with analyzing data from the World Health Organization (WHO) to inform global health policies. The dataset (who.csv) contains crucial health indicators for various countries from the year 2008.

Variables:

Country: Name of the country
LifeExp: Average life expectancy for the country in years
InfantSurvival: Proportion of those surviving to one year or more
Under5Survival: Proportion of those surviving to five years or more
TBFree: Proportion of the population without TB
PropMD: Proportion of the population who are MDs
PropRN: Proportion of the population who are RNs
PersExp: Mean personal expenditures on healthcare in US dollars at average exchange rate
GovtExp: Mean government expenditures per capita on healthcare, US dollars at average exchange rate
TotExp: Sum of personal and government expenditures

Question 1: Initial Assessment of Healthcare Expenditures and Life Expectancy

Task: 1. Create a scatterplot of LifeExp vs. TotExp to visualize the relationship. 2. Run a simple linear regression with LifeExp as the dependent variable and TotExp as the independent variable.

Data Import and Preprocessing

# Load necessary libraries
library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# Load the dataset
data <- read.csv("C:\\Users\\Admin\\Downloads\\who.csv")

# View the structure of the data
str(data)

## 'data.frame':    190 obs. of  12 variables:
##  $ Country       : chr  "Afghanistan" "Albania" "Algeria" "Andorra" ...
##  $ LifeExp       : int  42 71 71 82 41 73 75 69 82 80 ...
##  $ InfantSurvival: num  0.835 0.985 0.967 0.997 0.846 0.99 0.986 0.979 0.995 0.996 ...
##  $ Under5Survival: num  0.743 0.983 0.962 0.996 0.74 0.989 0.983 0.976 0.994 0.996 ...
##  $ TBFree        : num  0.998 1 0.999 1 0.997 ...
##  $ PropMD        : num  2.29e-04 1.14e-03 1.06e-03 3.30e-03 7.04e-05 ...
##  $ PropRN        : num  0.000572 0.004614 0.002091 0.0035 0.001146 ...
##  $ PersExp       : int  20 169 108 2589 36 503 484 88 3181 3788 ...
##  $ GovtExp       : int  92 3128 5184 169725 1620 12543 19170 1856 187616 189354 ...
##  $ X             : logi  NA NA NA NA NA NA ...
##  $ TotExp        : int  112 3297 5292 172314 1656 13046 19654 1944 190797 193142 ...
##  $ LifeExp.1     : int  42 71 71 82 41 73 75 69 82 80 ...

# Scatterplot
ggplot(data, aes(x = TotExp, y = LifeExp)) +
  geom_point(color = "pink") +
  geom_smooth(method = "lm", color = "purple") +
  labs(
    title = "Scatterplot of Life Expectancy vs. Total Expenditure",
    x = "Total Expenditure",
    y = "Life Expectancy"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(color = "hotpink", face = "bold", size = 14),
    axis.title = element_text(color = "darkviolet", size = 12)
  )

## `geom_smooth()` using formula = 'y ~ x'

# Linear Regression
model1 <- lm(LifeExp ~ TotExp, data = data)
summary(model1)

## 
## Call:
## lm(formula = LifeExp ~ TotExp, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -24.764  -4.778   3.154   7.116  13.292 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6.475e+01  7.535e-01  85.933  < 2e-16 ***
## TotExp      6.297e-05  7.795e-06   8.079 7.71e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.371 on 188 degrees of freedom
## Multiple R-squared:  0.2577, Adjusted R-squared:  0.2537 
## F-statistic: 65.26 on 1 and 188 DF,  p-value: 7.714e-14

Discussion: - The R-squared value of 0.2577 indicates that 25.77% of the variance in life expectancy is explained by total expenditures. - The F-statistic and p-value indicate that the model is statistically significant, but improvements are needed for a better fit.

Question 2: Transforming Variables for a Better Fit

Task: - Transform LifeExp to the 4.6 power and TotExp to the 0.06 power. - Create a new scatterplot and re-run the regression.

# Transform variables
data <- data %>%
  mutate(LifeExp_trans = LifeExp^4.6,
         TotExp_trans = TotExp^0.06)

# Scatterplot with transformed variables
ggplot(data, aes(x = TotExp_trans, y = LifeExp_trans)) +
  geom_point(color = "lavender") +
  geom_smooth(method = "lm", color = "deeppink") +
  labs(
    title = "Transformed Scatterplot of Life Expectancy vs. Total Expenditure",
    x = "Transformed Total Expenditure",
    y = "Transformed Life Expectancy"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(color = "orchid", face = "bold", size = 14),
    axis.title = element_text(color = "hotpink4", size = 12)
  )

## `geom_smooth()` using formula = 'y ~ x'

# Linear Regression with transformed variables
model2 <- lm(LifeExp_trans ~ TotExp_trans, data = data)
summary(model2)

## 
## Call:
## lm(formula = LifeExp_trans ~ TotExp_trans, data = data)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -308616089  -53978977   13697187   59139231  211951764 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -736527910   46817945  -15.73   <2e-16 ***
## TotExp_trans  620060216   27518940   22.53   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 90490000 on 188 degrees of freedom
## Multiple R-squared:  0.7298, Adjusted R-squared:  0.7283 
## F-statistic: 507.7 on 1 and 188 DF,  p-value: < 2.2e-16

Discussion: - The transformed model’s R-squared value of 0.7298 indicates a significant improvement in explanatory power. - The transformation reveals a more linear relationship, which is more appropriate for policy recommendations.

Question 3: Forecasting Life Expectancy Based on Transformed Expenditures

Task: Using the transformed model, forecast life expectancy for: - TotExp^0.06 = 1.5 - TotExp^0.06 = 2.5

# Forecast Life Expectancy
new_data <- data.frame(TotExp_trans = c(1.5, 2.5))
predictions <- predict(model2, newdata = new_data)
data.frame(TotExp_trans = new_data$TotExp_trans, Predicted_LifeExp = predictions)

##   TotExp_trans Predicted_LifeExp
## 1          1.5         193562414
## 2          2.5         813622630

Discussion: - Higher healthcare spending (transformed) correlates with significantly increased life expectancy. - These forecasts provide insights into the potential benefits of increased expenditures.

Question 4: Interaction Effects in Multiple Regression

Task: Build a multiple regression model to investigate the effect of the proportion of MDs (PropMD) and total healthcare expenditures (TotExp).

# Multiple Regression with interaction term
model3 <- lm(LifeExp ~ PropMD * TotExp, data = data)
summary(model3)

## 
## Call:
## lm(formula = LifeExp ~ PropMD * TotExp, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -27.320  -4.132   2.098   6.540  13.074 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    6.277e+01  7.956e-01  78.899  < 2e-16 ***
## PropMD         1.497e+03  2.788e+02   5.371 2.32e-07 ***
## TotExp         7.233e-05  8.982e-06   8.053 9.39e-14 ***
## PropMD:TotExp -6.026e-03  1.472e-03  -4.093 6.35e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.765 on 186 degrees of freedom
## Multiple R-squared:  0.3574, Adjusted R-squared:  0.3471 
## F-statistic: 34.49 on 3 and 186 DF,  p-value: < 2.2e-16

Discussion: - The interaction term suggests that increasing MD proportions amplifies the positive effect of healthcare spending on life expectancy. - Policy should prioritize hiring MDs to maximize the returns on healthcare spending.

Question 5: Forecasting Life Expectancy with Interaction Terms

Task: Forecast life expectancy for: - PropMD = 0.03 - TotExp = 14

# Forecast Life Expectancy for given conditions
new_data2 <- data.frame(PropMD = 0.03, TotExp = 14)
prediction2 <- predict(model3, newdata = new_data2)
data.frame(PropMD = new_data2$PropMD, TotExp = new_data2$TotExp, Predicted_LifeExp = prediction2)

##   PropMD TotExp Predicted_LifeExp
## 1   0.03     14           107.696

Discussion: - The forecast suggests a life expectancy of 107.7 years under these conditions. - While optimistic, this reflects the potential for optimal healthcare investment and resource allocation.

Conclusion

The analysis demonstrates a strong link between healthcare expenditures, the availability of MDs, and life expectancy. Transformations and interaction effects enhance the predictive power, making the findings more actionable for health policy recommendations.

Problem 3: Retail Company Analyst

Question 1: Inventory Cost

Scenario

A retail company plans its inventory strategy for the upcoming year. The total cost function is given by: \[ C(Q) = \frac{D}{Q} \cdot S + \frac{Q}{2} \cdot H \] where: - $D = 110$ (total demand in units), - $S = 8.25$ (fixed ordering cost per order in dollars), - $H = 3.75$ (holding cost per unit per year in dollars).

The goal is to determine the optimal order quantity $Q$ that minimizes the total inventory cost.

# Parameters
D <- 110  # Total demand
S <- 8.25  # Fixed ordering cost
H <- 3.75  # Holding cost

# Optimal order quantity
Q_optimal <- sqrt((2 * D * S) / H)
cat("The optimal order quantity is:", Q_optimal, "units.\n")

## The optimal order quantity is: 22 units.

Question 2: Revenue Maximization

Scenario

The revenue generated per day is modeled by: \[ R(t) = -3150t^{-4} - 220t + 6530 \] where $t$ is the time in days. The goal is to find the time $t$ at which revenue is maximized and determine the maximum revenue.

# Revenue function
R <- function(t) {
  -3150 * t^(-4) - 220 * t + 6530
}

# Derivative of R(t)
dR <- function(t) {
  12600 / t^5 - 220
}

# Find critical points
critical_t <- uniroot(dR, interval = c(1, 10))$root  # Solve dR/dt = 0
max_revenue <- R(critical_t)

cat("Revenue is maximized at t =", critical_t, "days.\n")

## Revenue is maximized at t = 2.246931 days.

cat("The maximum revenue is $", max_revenue, ".\n")

## The maximum revenue is $ 5912.094 .

Question 3: Demand Area Under Curve

Scenario

The distance function is: \[ P(x) = 2x - 9.3 \] and the company wants to calculate the total revenue between $x_1 = 2$ and $x_2 = 5$.

# distance function
P <- function(x) {
  2 * x - 9.3
}

# Definite integral
x1 <- 2
x2 <- 5
revenue <- integrate(P, lower = x1, upper = x2)$value
cat("Total revenue generated is $", revenue, ".\n")

## Total revenue generated is $ -6.9 .

Question 4: Profit Optimization

Scenario

The profit function is: \[ \Pi(x) = x \ln(9x) - \frac{x^6}{6} \] where $x$ is the number of flat irons sold. The goal is to maximize profit.

# Profit function
Profit <- function(x) {
  x * log(9 * x) - (x^6 / 6)
}

# Derivative of profit
dProfit <- function(x) {
  log(9 * x) + 1 - x^5
}

# Find critical points
optimal_x <- uniroot(dProfit, interval = c(1, 10))$root
max_profit <- Profit(optimal_x)

cat("Profit is maximized when x =", optimal_x, "units.\n")

## Profit is maximized when x = 1.280615 units.

cat("The maximum profit is $", max_profit, ".\n")

## The maximum profit is $ 2.395423 .

Question 5: Spending Behavior

Scenario

The spending behavior is modeled by the probability density function: \[ f(x) = \frac{1}{6x}, \quad x \in [1, e^6] \]

Solution

# PDF function
f <- function(x) {
  1 / (6 * x)
}

# Verify the integral
total_area <- integrate(f, lower = 1, upper = exp(6))$value
cat("The total area under f(x) is:", total_area, "\n")

## The total area under f(x) is: 1

# Probability calculation
probability <- total_area
cat("The probability a customer spends between $1 and $e^6 is:", probability, "\n")

## The probability a customer spends between $1 and $e^6 is: 1

Question 6: Market Share Estimation

Scenario

The rate of market penetration is: \[ \frac{dN}{dt} = \frac{500}{t^4 + 10} \] where $N(1) = 6530$.

The goal is to find $N(t)$ and calculate the market share at $t = 10$.

Solution

# Rate of market share change
dN_dt <- function(t) {
  500 / (t^4 + 10)
}

# Integrate to find N(t)
N_t <- function(t) {
  integrate(dN_dt, lower = 1, upper = t)$value + 6530
}

# Market share at t = 10
market_share_10 <- N_t(10)
cat("The market share at t = 10 days is:", market_share_10, "\n")

## The market share at t = 10 days is: 6579.54

Problem 4: Business Optimization

Question 1: Revenue and Cost

Scenario

A company’s revenue from a product can be approximated by the function $R(x) = e^x$, where $x$ is the number of units sold. The cost of production is given by $C(x) = \ln(1 + x)$. The company wants to maximize its profit, defined as: \[ \Pi(x) = R(x) - C(x). \]

Tasks

Approximate the Revenue Function $R(x)$ using the Taylor Series expansion around $x = 0$ (Maclaurin Series) up to the second degree.
Approximate the Cost Function $C(x)$ similarly using its Maclaurin Series expansion.
Using the Taylor Series expansions, approximate the profit function $\Pi(x)$ and compare the results with the exact nonlinear functions. Discuss when the approximation is sufficient.

1. Revenue Function Approximation

The revenue function $R(x) = e^x$ is approximated using the Maclaurin Series: \[ R(x) \approx 1 + x + \frac{x^2}{2}. \]

# Define the exact and approximated revenue functions
R_exact <- function(x) exp(x)
R_approx <- function(x) 1 + x + (x^2) / 2

# Generate a comparison
x_vals <- seq(0, 1, by = 0.1)
data.frame(
  x = x_vals,
  Exact = R_exact(x_vals),
  Approximation = R_approx(x_vals)
)

##      x    Exact Approximation
## 1  0.0 1.000000         1.000
## 2  0.1 1.105171         1.105
## 3  0.2 1.221403         1.220
## 4  0.3 1.349859         1.345
## 5  0.4 1.491825         1.480
## 6  0.5 1.648721         1.625
## 7  0.6 1.822119         1.780
## 8  0.7 2.013753         1.945
## 9  0.8 2.225541         2.120
## 10 0.9 2.459603         2.305
## 11 1.0 2.718282         2.500

2. Cost Function Approximation

The cost function $C(x) = \ln(1 + x)$ is approximated using the Maclaurin Series: \[ C(x) \approx x - \frac{x^2}{2}. \]

# Define the exact and approximated cost functions
C_exact <- function(x) log(1 + x)
C_approx <- function(x) x - (x^2) / 2

# Generate a comparison
data.frame(
  x = x_vals,
  Exact = C_exact(x_vals),
  Approximation = C_approx(x_vals)
)

##      x      Exact Approximation
## 1  0.0 0.00000000         0.000
## 2  0.1 0.09531018         0.095
## 3  0.2 0.18232156         0.180
## 4  0.3 0.26236426         0.255
## 5  0.4 0.33647224         0.320
## 6  0.5 0.40546511         0.375
## 7  0.6 0.47000363         0.420
## 8  0.7 0.53062825         0.455
## 9  0.8 0.58778666         0.480
## 10 0.9 0.64185389         0.495
## 11 1.0 0.69314718         0.500

3. Profit Function Approximation

The profit function is: \[ \Pi(x) = R(x) - C(x). \]

Using the approximations: \[ \Pi(x) \approx \left(1 + x + \frac{x^2}{2}\right) - \left(x - \frac{x^2}{2}\right). \]

# Define the exact and approximated profit functions
profit_exact <- function(x) R_exact(x) - C_exact(x)
profit_approx <- function(x) R_approx(x) - C_approx(x)

# Generate a comparison
data.frame(
  x = x_vals,
  Exact_Profit = profit_exact(x_vals),
  Approx_Profit = profit_approx(x_vals)
)

##      x Exact_Profit Approx_Profit
## 1  0.0     1.000000          1.00
## 2  0.1     1.009861          1.01
## 3  0.2     1.039081          1.04
## 4  0.3     1.087495          1.09
## 5  0.4     1.155352          1.16
## 6  0.5     1.243256          1.25
## 7  0.6     1.352115          1.36
## 8  0.7     1.483124          1.49
## 9  0.8     1.637754          1.64
## 10 0.9     1.817749          1.81
## 11 1.0     2.025135          2.00

Question 2: Financial Modeling

Scenario

A financial analyst is modeling the risk associated with a new investment. The risk is proportional to the square root of the invested amount, modeled as $f(x) = \sqrt{x}$, where $x$ is the amount invested. The analyst wants to use a Taylor Series expansion to approximate this function for small investments.

Tasks

Derive the Taylor Series expansion of $f(x) = \sqrt{x}$ around $x = 0$ up to the second degree.
Use the derived series to approximate the risk for small investments and compare the results with the actual function.
Suggest an optimal investment amount $x$ that minimizes risk while achieving a target return.

1. Taylor Series Expansion

Using the Maclaurin Series: \[ f(x) \approx \frac{1}{2}x + \frac{1}{8}x^2. \]

# Define the exact and approximated risk functions
f_exact <- function(x) sqrt(x)
f_approx <- function(x) (1/2) * x + (1/8) * x^2

# Generate a comparison
x_vals_small <- seq(0, 0.5, by = 0.05)
data.frame(
  x = x_vals_small,
  Exact = f_exact(x_vals_small),
  Approximation = f_approx(x_vals_small)
)

##       x     Exact Approximation
## 1  0.00 0.0000000     0.0000000
## 2  0.05 0.2236068     0.0253125
## 3  0.10 0.3162278     0.0512500
## 4  0.15 0.3872983     0.0778125
## 5  0.20 0.4472136     0.1050000
## 6  0.25 0.5000000     0.1328125
## 7  0.30 0.5477226     0.1612500
## 8  0.35 0.5916080     0.1903125
## 9  0.40 0.6324555     0.2200000
## 10 0.45 0.6708204     0.2503125
## 11 0.50 0.7071068     0.2812500

Question 3: Inventory Management

Scenario

In a manufacturing process, the demand for a product decreases as the distance increases, modeled by $D(p) = 1 - p$. The cost associated with producing and selling the product is modeled as $C(p) = e^p$. The company wants to maximize its profit: \[ \Pi(p) = pD(p) - C(p). \]

Tasks

Expand the cost function $C(p)$ into a Taylor Series up to the second degree.
Using the Taylor Series expansion, approximate the profit function $\Pi(p)$ and compare the results with the original function.
Suggest a pricing strategy that could maximize profit.

1. Cost Function Approximation

\[ C(p) \approx 1 + p + \frac{p^2}{2}. \]

# Define the exact and approximated cost functions
C_exact <- function(p) exp(p)
C_approx <- function(p) 1 + p + (p^2) / 2

# Generate a comparison
p_vals <- seq(0, 0.5, by = 0.1)
data.frame(
  p = p_vals,
  Exact = C_exact(p_vals),
  Approximation = C_approx(p_vals)
)

##     p    Exact Approximation
## 1 0.0 1.000000         1.000
## 2 0.1 1.105171         1.105
## 3 0.2 1.221403         1.220
## 4 0.3 1.349859         1.345
## 5 0.4 1.491825         1.480
## 6 0.5 1.648721         1.625

2. Profit Function Approximation

The profit function is approximated as: \[ \Pi(p) \approx p(1 - p) - \left(1 + p + \frac{p^2}{2}\right). \]

# Define profit functions
profit_exact <- function(p) p * (1 - p) - C_exact(p)
profit_approx <- function(p) p * (1 - p) - C_approx(p)

# Generate a comparison
data.frame(
  p = p_vals,
  Exact_Profit = profit_exact(p_vals),
  Approx_Profit = profit_approx(p_vals)
)

##     p Exact_Profit Approx_Profit
## 1 0.0    -1.000000        -1.000
## 2 0.1    -1.015171        -1.015
## 3 0.2    -1.061403        -1.060
## 4 0.3    -1.139859        -1.135
## 5 0.4    -1.251825        -1.240
## 6 0.5    -1.398721        -1.375

Question 4: Economic Forecasting

Scenario

An economist is forecasting economic growth using the logarithmic function $G(x) = \ln(1 + x)$. The government wants to predict growth under different levels of investment.

Tasks

Derive the Maclaurin Series expansion of $G(x)$ up to the second degree.
Approximate the growth function for small investments and compare the results.
Recommend a level of investment that could achieve a target growth rate.

1. Maclaurin Series Expansion

\[ G(x) \approx x - \frac{x^2}{2}. \]

# Define growth functions
G_exact <- function(x) log(1 + x)
G_approx <- function(x) x - (x^2) / 2

# Generate a comparison
x_vals_growth <- seq(0, 0.5, by = 0.1)
data.frame(
  x = x_vals_growth,
  Exact = G_exact(x_vals_growth),
  Approximation = G_approx(x_vals_growth)
)

##     x      Exact Approximation
## 1 0.0 0.00000000         0.000
## 2 0.1 0.09531018         0.095
## 3 0.2 0.18232156         0.180
## 4 0.3 0.26236426         0.255
## 5 0.4 0.33647224         0.320
## 6 0.5 0.40546511         0.375

2. Policy Recommendation

To achieve a target growth rate $G(x) = 0.1$:

target_growth <- function(x) G_approx(x) - 0.1
optimize(target_growth, interval = c(0, 1))

## $minimum
## [1] 6.610696e-05
## 
## $objective
## [1] -0.0999339

Discussion: Taylor Series expansions simplify nonlinear functions and are effective for small values of $x$. For larger values, exact functions or higher-order expansions may be necessary for accuracy.

Problem 5: Profit, Cost, & Pricing

Question 1: Profit Maximization

Scenario

A company produces two products, A and B, with the profit function: \[ \Pi(x, y) = 30x - 2x^2 - 3xy + 24y - 4y^2 \] where: - $x$: Quantity of Product A produced and sold. - $y$: Quantity of Product B produced and sold. - $\Pi(x, y)$: Profit in dollars.

Task

Find all local maxima, minima, and saddle points for $\Pi(x, y)$.
Write the solutions as $(x, y, \Pi(x, y))$.

Solution in R

# Install necessary packages if not already installed
if (!requireNamespace("rootSolve")) install.packages("rootSolve")

## Loading required namespace: rootSolve

if (!requireNamespace("Deriv")) install.packages("Deriv")

## Loading required namespace: Deriv

# Load the necessary libraries
library(rootSolve)
library(Deriv)

## Warning: package 'Deriv' was built under R version 4.4.2

# Define the profit function with separate arguments for x and y
profit <- function(x, y) {
  30 * x - 2 * x^2 - 3 * x * y + 24 * y - 4 * y^2
}

# Compute the first partial derivatives using Deriv
profit_dx <- Deriv(~ 30 * x - 2 * x^2 - 3 * x * y + 24 * y - 4 * y^2, "x")
profit_dy <- Deriv(~ 30 * x - 2 * x^2 - 3 * x * y + 24 * y - 4 * y^2, "y")

# Display the partial derivatives as readable strings
cat("First Partial Derivatives:\n")

## First Partial Derivatives:

cat(sprintf("∂Π/∂x = %s\n", as.character(profit_dx)))

## ∂Π/∂x = -
##  ∂Π/∂x = 30
##  ∂Π/∂x = 3 * y + 4 * x

cat(sprintf("∂Π/∂y = %s\n", as.character(profit_dy)))

## ∂Π/∂y = -
##  ∂Π/∂y = 24
##  ∂Π/∂y = 3 * x + 8 * y

# Define the gradient of the profit function (partial derivatives)
profit_gradient <- function(vars) {
  x <- vars[1]
  y <- vars[2]
  c(
    30 - 4 * x - 3 * y,        # Partial derivative with respect to x
    24 - 8 * y - 3 * x         # Partial derivative with respect to y
  )
}

# Initial guess for x and y
start_vals <- c(1, 1)

# Solve for roots (critical points) using multiroot
result <- multiroot(f = profit_gradient, start = start_vals)

# Extract critical points
x_crit <- result$root[1]
y_crit <- result$root[2]

# Compute the profit at the critical point
profit_value <- profit(x_crit, y_crit)  # Corrected function call with separate arguments

# Compute second derivatives
profit_dxx <- -4
profit_dyy <- -8
profit_dxy <- -3

# Compute the Hessian determinant
hessian <- profit_dxx * profit_dyy - profit_dxy^2

# Classify the critical point
nature <- if (hessian > 0 && profit_dxx < 0) {
  "Local Maximum"
} else if (hessian > 0 && profit_dxx > 0) {
  "Local Minimum"
} else {
  "Saddle Point"
}

# Display the results
cat("\nCritical Point:\n")

## 
## Critical Point:

cat(sprintf("x = %.2f, y = %.2f\n", x_crit, y_crit))

## x = 7.30, y = 0.26

cat(sprintf("Profit at Critical Point = $%.2f\n", profit_value))

## Profit at Critical Point = $112.70

cat(sprintf("Nature of Critical Point: %s\n", nature))

## Nature of Critical Point: Local Maximum

Here’s the updated Results and Discussion section using your computed results:

Results

The critical points are: - Local maximum: $(x, y, \Pi(x, y)) = (7.30, 0.26, 112.70)$ - Saddle points: None in this case.

Discussion

The company maximizes profit by producing and selling approximately 7.30 units of Product A and 0.26 units of Product B, yielding a maximum profit of $112.70.

No saddle points were identified in this case, which reduces the risk of unstable production decisions. However, the low production level of Product B (0.26 units) may suggest operational or market limitations. The company should focus production near this optimal point, but ensure that demand, production costs, and other assumptions align with this strategy.

Question 2: Pricing Strategy

Scenario

The demand for two competing brands of a product depends on their distances: \[ D_X(x, y) = 120 - 15x + 10y, \quad D_Y(x, y) = 80 + 5x - 20y \] where $x$ and $y$ are the distances of Brand X and Brand Y.

Task

Find the revenue function $R(x, y)$.
Determine the distances $x$ and $y$ that maximize the revenue.

Solution in R

# Define the demand functions for Brand X and Brand Y
demand_x <- function(x, y) { 120 - 15 * x + 10 * y }
demand_y <- function(x, y) { 80 + 5 * x - 20 * y }

# Define the revenue function
revenue <- function(x, y) {
  x * demand_x(x, y) + y * demand_y(x, y)
}

# Define the gradient (partial derivatives of revenue)
revenue_dx <- function(x, y) { demand_x(x, y) - 15 * x + 5 * y }
revenue_dy <- function(x, y) { demand_y(x, y) + 10 * x - 20 * y }

# Solve for critical points of the revenue function
solve_revenue <- function() {
  equations <- function(vars) {
    x <- vars[1]
    y <- vars[2]
    c(revenue_dx(x, y), revenue_dy(x, y))
  }
  rootSolve::multiroot(f = equations, start = c(5, 5))$root
}

# Find the critical point
critical_point <- solve_revenue()
x_crit <- critical_point[1]
y_crit <- critical_point[2]
revenue_val <- revenue(x_crit, y_crit)

# Compute second derivatives and Hessian determinant
revenue_dxx <- -15
revenue_dyy <- -20
revenue_dxy <- 10
hessian_revenue <- revenue_dxx * revenue_dyy - revenue_dxy^2

# Classify the critical point
nature_revenue <- if (hessian_revenue > 0 && revenue_dxx < 0) {
  "Local Maximum"
} else if (hessian_revenue > 0 && revenue_dxx > 0) {
  "Local Minimum"
} else {
  "Saddle Point"
}

# Output results for Pricing Strategy
cat("Pricing Strategy Results:\n")

## Pricing Strategy Results:

cat(sprintf("Optimal distances: Brand X = $%.2f, Brand Y = $%.2f\n", x_crit, y_crit))

## Optimal distances: Brand X = $6.15, Brand Y = $4.31

cat(sprintf("Maximum Revenue: $%.2f\n", revenue_val))

## Maximum Revenue: $541.54

cat(sprintf("Nature of Critical Point: %s\n", nature_revenue))

## Nature of Critical Point: Local Maximum

Results

Optimal distances:
- Brand X ($x$) = $6.15
- Brand Y ($y$) = $4.31
Maximum Revenue: $541.54

Discussions

Setting these distances maximizes revenue by balancing the demand and profitability for both brands.

Question 3: Cost Minimization

Scenario

The total weekly cost of production is: \[ C(x, y) = \frac{1}{8}x^2 + \frac{1}{10}y^2 + 12x + 18y + 1500 \] where $x$ and $y$ are the units produced in New York and Chicago, respectively.

Task

Determine production levels that minimize the cost.
Calculate the minimized total cost.

Solution in R

# Define the cost function
cost <- function(x, y) {
  (1 / 8) * x^2 + (1 / 10) * y^2 + 12 * x + 18 * y + 1500
}

# Constraint: x + y = 200
constraint <- function(x, y) {
  x + y - 200
}

# Define the Lagrangian function
lagrangian <- function(vars) {
  x <- vars[1]
  y <- vars[2]
  lambda <- vars[3]
  cost(x, y) + lambda * constraint(x, y)
}

# Define partial derivatives of the Lagrangian
lagrangian_gradient <- function(vars) {
  x <- vars[1]
  y <- vars[2]
  lambda <- vars[3]
  c(
    (1 / 4) * x + 12 + lambda,             # Partial derivative w.r.t. x
    (1 / 5) * y + 18 + lambda,             # Partial derivative w.r.t. y
    constraint(x, y)                       # Partial derivative w.r.t. lambda
  )
}

# Solve for critical points of the Lagrangian
solve_cost <- function() {
  rootSolve::multiroot(f = lagrangian_gradient, start = c(100, 100, 0))$root
}

# Find the optimal production distribution
optimal_solution <- solve_cost()
x_opt <- optimal_solution[1]
y_opt <- optimal_solution[2]
cost_min <- cost(x_opt, y_opt)

# Output results for Cost Minimization
cat("\nCost Minimization Results:\n")

## 
## Cost Minimization Results:

cat(sprintf("Optimal Production: New York = %.2f units, Chicago = %.2f units\n", x_opt, y_opt))

## Optimal Production: New York = 102.22 units, Chicago = 97.78 units

cat(sprintf("Minimum Total Cost: $%.2f\n", cost_min))

## Minimum Total Cost: $6748.89

Results

Optimal Production Levels:
- New York ($x$) = 102.22 units
- Chicago ($y$) = 97.78 units
Minimized Total Cost: $6752.43

Discussion

Allocating production optimally between New York and Chicago minimizes total costs while meeting production requirements.

Question 4: Marketing Mix

Scenario

The estimated customer reach is: \[ E(x, y) = 500x + 700y - 5x^2 - 10xy - 8y^2 \] where $x$ and $y$ are spending on online and television ads (in thousands of dollars).

Task

Find spending levels $x$ and $y$ that maximize reach.
Identify any saddle points.

Solution in R

# Define the effectiveness function (customer reach)
effectiveness <- function(x, y) {
  500 * x + 700 * y - 5 * x^2 - 10 * x * y - 8 * y^2
}

# Define the gradient (partial derivatives of E(x, y))
effectiveness_dx <- function(x, y) { 500 - 10 * x - 10 * y }
effectiveness_dy <- function(x, y) { 700 - 10 * x - 16 * y }

# Solve for critical points by setting the gradient to 0
solve_effectiveness <- function() {
  equations <- function(vars) {
    x <- vars[1]
    y <- vars[2]
    c(effectiveness_dx(x, y), effectiveness_dy(x, y))
  }
  rootSolve::multiroot(f = equations, start = c(10, 10))$root
}

# Find the critical point
critical_point <- solve_effectiveness()
x_crit <- critical_point[1]
y_crit <- critical_point[2]
effectiveness_val <- effectiveness(x_crit, y_crit)

# Compute second derivatives
effectiveness_dxx <- -10
effectiveness_dyy <- -16
effectiveness_dxy <- -10

# Compute the Hessian determinant
hessian <- effectiveness_dxx * effectiveness_dyy - effectiveness_dxy^2

# Classify the critical point
nature <- if (hessian > 0 && effectiveness_dxx < 0) {
  "Local Maximum"
} else if (hessian > 0 && effectiveness_dxx > 0) {
  "Local Minimum"
} else {
  "Saddle Point"
}

# Output results
cat("Marketing Mix Results:\n")

## Marketing Mix Results:

cat(sprintf("Optimal Spending: Online Ads = $%.2f thousand, TV Ads = $%.2f thousand\n", x_crit, y_crit))

## Optimal Spending: Online Ads = $16.67 thousand, TV Ads = $33.33 thousand

cat(sprintf("Maximum Estimated Customer Reach: %.2f\n", effectiveness_val))

## Maximum Estimated Customer Reach: 15833.33

cat(sprintf("Nature of Critical Point: %s\n", nature))

## Nature of Critical Point: Local Maximum

Results

Optimal spending levels:
- Online ads ($x$): $16,670
- Television ads ($y$): $33,330
Maximum customer reach: 15,833 customers
Saddle points: None identified.

Discussion

The results indicate that the company should allocate $16,670 to online ads and $33,330 to television ads to maximize customer reach at 15,833 customers. The absence of saddle points simplifies decision-making, ensuring stability in this allocation.

Practical Considerations

Budget Constraints:
Ensure that the marketing budget aligns with the suggested spending levels. If constraints exist, prioritize allocations that maintain the balance between online and television ad effectiveness.
Market Dynamics:
Continuously monitor changes in the effectiveness of online and television ads. Shifts in consumer behavior or ad performance may require adjustments to the strategy.
Diminishing Returns:
Consider the potential impact of diminishing returns on customer reach, especially if spending increases disproportionately in one category.

Homework3

Shamecca Marshall

2024-12-01

Problem 1: Transportation Safety

Scenario

Task 1: Data Visualization

Task 2: Build a Linear Model

Task 3: Model Quality Evaluation

Task 4: Residual Analysis

4.1 Residuals vs Fitted Values

4.2 Q-Q Plot for Residuals

4.3 Shapiro-Wilk Test for Normality

4.4 Histogram of Residuals

Task 5: Conclusion

Problem 2: Health Policy Analyst

Problem Description

Variables:

Question 1: Initial Assessment of Healthcare Expenditures and Life Expectancy

Data Import and Preprocessing

Question 2: Transforming Variables for a Better Fit

Question 3: Forecasting Life Expectancy Based on Transformed Expenditures

Question 4: Interaction Effects in Multiple Regression

Question 5: Forecasting Life Expectancy with Interaction Terms

Conclusion

Problem 3: Retail Company Analyst

Question 1: Inventory Cost

Scenario

Question 2: Revenue Maximization

Scenario

Question 3: Demand Area Under Curve

Scenario

Question 4: Profit Optimization

Scenario

Question 5: Spending Behavior

Scenario

Solution

Question 6: Market Share Estimation

Scenario

Solution

Problem 4: Business Optimization

Question 1: Revenue and Cost

Scenario

Tasks

1. Revenue Function Approximation

2. Cost Function Approximation

3. Profit Function Approximation

Question 2: Financial Modeling

Scenario

Tasks

1. Taylor Series Expansion

Question 3: Inventory Management

Scenario

Tasks

1. Cost Function Approximation

2. Profit Function Approximation

Question 4: Economic Forecasting

Scenario

Tasks

1. Maclaurin Series Expansion

2. Policy Recommendation

Problem 5: Profit, Cost, & Pricing

Question 1: Profit Maximization

Scenario

Task

Solution in R

Results

Discussion

Question 2: Pricing Strategy

Scenario

Task

Solution in R

Results

Discussions

Question 3: Cost Minimization

Scenario

Task

Solution in R

Results

Discussion

Question 4: Marketing Mix