data <- read.csv('Customerprofit.csv')
I have used simple data assignment, and built in dataset Customer Profit data that is included with R.
summary(data)
## Ship_Mode Profit Unit_Price Shipping_Cost
## Length:264 Min. : -1766 Min. : 2.88 Min. : 0.50
## Class :character 1st Qu.: 48154 1st Qu.: 5.28 1st Qu.:74.35
## Mode :character Median :123915 Median : 40.42 Median :74.35
## Mean :125237 Mean :101.48 Mean :70.51
## 3rd Qu.:199676 3rd Qu.:120.98 3rd Qu.:74.35
## Max. :275438 Max. :500.98 Max. :74.35
## Customer_Name
## Length:264
## Class :character
## Mode :character
##
##
##
Make sure to state the hypothesis, express a confidence interval, \(p\) value, and state the conclusion in the proper statistical terms for the mean.
Answer:
A statistical strategy for generating several simulated samples from a single dataset is known as bootstrapping. This technique can be used to quantify standard deviations, construct confidence intervals, and perform hypothesis testing with a number of sample statistics. Bootstrap methods are a form of hypothesis testing that is simpler to understand and apply than standard hypothesis testing.
The multiple linear regression using customer name, profit, and shipping cost.
customer <- lm(data$Profit ~ data$Customer_Name + data$Ship_Mode + data$Shipping_Cost, data = data)
summary(customer)
##
## Call:
## lm(formula = data$Profit ~ data$Customer_Name + data$Ship_Mode +
## data$Shipping_Cost, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -134973 -70155 415 66743 140054
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -16311.4 37756.7 -0.432 0.666
## data$Customer_NameBarry French -9472.5 30684.6 -0.309 0.758
## data$Customer_NameCarl Ludwig 6636.6 29246.3 0.227 0.821
## data$Customer_NameCarlos Soltero 6443.0 33769.8 0.191 0.849
## data$Customer_NameClaudia Miner -2384.6 33769.9 -0.071 0.944
## data$Customer_NameClay Rozendal -3505.4 33769.9 -0.104 0.917
## data$Customer_NameDon Miller 9612.6 33769.8 0.285 0.776
## data$Customer_NameEdward Hooks 5745.6 38546.4 0.149 0.882
## data$Customer_NameEugene Barchas 2241.7 28320.4 0.079 0.937
## data$Customer_NameJack Garza 10476.7 33769.8 0.310 0.757
## data$Customer_NameJim Radford 2755.7 29246.0 0.094 0.925
## data$Customer_NameJulia West 1088.1 38546.4 0.028 0.978
## data$Customer_NameMuhammed MacIntyre -11735.9 33782.4 -0.347 0.729
## data$Customer_NameNeola Schneider -1270.9 33769.9 -0.038 0.970
## data$Customer_NameSylvia Foulston -5219.3 30695.5 -0.170 0.865
## data$Ship_ModeExpress Air 637.1 38504.0 0.017 0.987
## data$Ship_ModeRegular Air 1657.6 18497.4 0.090 0.929
## data$Shipping_Cost 1979.1 329.2 6.013 6.55e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 82720 on 246 degrees of freedom
## Multiple R-squared: 0.1334, Adjusted R-squared: 0.07352
## F-statistic: 2.228 on 17 and 246 DF, p-value: 0.00421
Answer:
Cross-validation is a statistical method for estimating the ability of machine learning models.
It is easy to understand, simple to implement, and provides ability forecasts with lower bias than other methods, itβs commonly used in advanced machine learning to compare and select a model for a given predictive modelling issue.