This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
t = file.choose()
#"D:\\THU24\\LopAI-NCKH01.2026\\Obesity data .csv"
ob = read.csv(t)
ob = read.csv(t)
dim (ob)
## [1] 1217 11
head (ob)
## id gender height weight bmi age bmc bmd fat lean pcfat
## 1 1 F 150 49 21.8 53 1312 0.88 17802 28600 37.3
## 2 2 M 165 52 19.1 65 1309 0.84 8381 40229 16.8
## 3 3 F 157 57 23.1 64 1230 0.84 19221 36057 34.0
## 4 4 F 156 53 21.8 56 1171 0.80 17472 33094 33.8
## 5 5 M 160 51 19.9 54 1681 0.98 7336 40621 14.8
## 6 6 F 153 47 20.1 52 1358 0.91 14904 30068 32.2
library(lessR)
##
## lessR 4.5 feedback: gerbing@pdx.edu
## --------------------------------------------------------------
## > d <- Read("") Read data file, many formats available, e.g., Excel
## d is the default data frame, data= in analysis routines optional
##
## Many examples of reading, writing, and manipulating data, graphics,
## testing means and proportions, regression, factor analysis,
## customization, forecasting, and aggregation to pivot tables.
## Enter: browseVignettes("lessR")
##
## View lessR updates, now including modern time series forecasting
## and many, new Plotly interactive visualizations output. Most
## visualization functions are now reorganized to three functions:
## Chart(): type="bar", "pie", "radar", "bubble", "treemap", "icicle"
## X(): type="histogram", "density", "vbs" and more
## XY(): type="scatter" for a scatterplot, or "contour", "smooth"
## Most previous function calls still work, such as:
## BarChart(), Histogram, and Plot().
## Enter: news(package="lessR"), or ?Chart, ?X, or ?XY
## There is also Flows() for Sankey flow diagrams, see ?Flows
##
## Interactive data analysis for constructing visualizations.
## Enter: interact()
Histogram(pcfat,
data = ob,
fill = "blue",
xlab = "Tỉ trọng mỡ (%)",
ylab = "Số người",
main = "Phân bố tỉ trọng mỡ")
## lessR visualizations are now unified over just three core functions:
## - Chart() for pivot tables, such as bar charts. More info: ?Chart
## - X() for a single variable x, such as histograms. More info: ?X
## - XY() for scatterplots of two variables, x and y. More info: ?XY
##
## Histogram() is deprecated, though still working for now.
## Please use X(..., type = "histogram") going forward.
## [Interactive plot from the Plotly R package (Sievert, 2020)]
## >>> Suggestions
## bin_width: set the width of each bin
## bin_start: set the start of the first bin
## bin_end: set the end of the last bin
## X(pcfat, type="density") # smoothed curve + histogram
## X(pcfat, type="vbs") # Violin/Box/Scatterplot (VBS) plot
##
## --- pcfat ---
##
## n miss mean sd min mdn max
## 1217 0 31.604786 7.182862 9.200000 32.400000 48.400000
##
##
##
## --- Outliers --- from the box plot: 10
##
## Small Large
## ----- -----
## 9.2
## 9.7
## 9.8
## 10.3
## 10.3
## 10.7
## 11.0
## 11.4
## 11.7
## 11.9
##
##
## Bin Width: 5
## Number of Bins: 9
##
## Bin Midpnt Count Prop Cumul.c Cumul.p
## -------------------------------------------------
## 5 > 10 7.5 3 0.00 3 0.00
## 10 > 15 12.5 26 0.02 29 0.02
## 15 > 20 17.5 61 0.05 90 0.07
## 20 > 25 22.5 128 0.11 218 0.18
## 25 > 30 27.5 244 0.20 462 0.38
## 30 > 35 32.5 338 0.28 800 0.66
## 35 > 40 37.5 294 0.24 1094 0.90
## 40 > 45 42.5 107 0.09 1201 0.99
## 45 > 50 47.5 16 0.01 1217 1.00
##
library(table1)
##
## Attaching package: 'table1'
## The following object is masked from 'package:lessR':
##
## label
## The following objects are masked from 'package:base':
##
## units, units<-
table1(~ pcfat | gender, data = ob)
| F (N=862) |
M (N=355) |
Overall (N=1217) |
|
|---|---|---|---|
| pcfat | |||
| Mean (SD) | 34.7 (5.19) | 24.2 (5.76) | 31.6 (7.18) |
| Median [Min, Max] | 34.7 [14.6, 48.4] | 24.6 [9.20, 39.0] | 32.4 [9.20, 48.4] |
ttest(pcfat ~ gender, data = ob)
##
## Compare pcfat across gender with levels F and M
## Grouping Variable: gender
## Response Variable: pcfat
##
##
## ------ Describe ------
##
## pcfat for gender F: n.miss = 0, n = 862, mean = 34.672, sd = 5.187
## pcfat for gender M: n.miss = 0, n = 355, mean = 24.156, sd = 5.764
##
## Mean Difference of pcfat: 10.516
##
## Weighted Average Standard Deviation: 5.362
##
##
## ------ Assumptions ------
##
## Note: These hypothesis tests can perform poorly, and the
## t-test is typically robust to violations of assumptions.
## Use as heuristic guides instead of interpreting literally.
##
## Null hypothesis, for each group, is a normal distribution of pcfat.
## Group F: Sample mean assumed normal because n > 30, so no test needed.
## Group M: Sample mean assumed normal because n > 30, so no test needed.
##
## Null hypothesis is equal variances of pcfat, homogeneous.
## Variance Ratio test: F = 33.223/26.909 = 1.235, df = 354;861, p-value = 0.016
## Levene's test, Brown-Forsythe: t = -2.232, df = 1215, p-value = 0.026
##
##
## ------ Infer ------
##
## --- Assume equal population variances of pcfat for each gender
##
## t-cutoff for 95% range of variation: tcut = 1.962
## Standard Error of Mean Difference: SE = 0.338
##
## Hypothesis Test of 0 Mean Diff: t-value = 31.101, df = 1215, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 0.663
## 95% Confidence Interval for Mean Difference: 9.853 to 11.180
##
##
## --- Do not assume equal population variances of pcfat for each gender
##
## t-cutoff: tcut = 1.964
## Standard Error of Mean Difference: SE = 0.353
##
## Hypothesis Test of 0 Mean Diff: t = 29.768, df = 602.015, p-value = 0.000
##
## Margin of Error for 95% Confidence Level: 0.694
## 95% Confidence Interval for Mean Difference: 9.823 to 11.210
##
##
## ------ Effect Size ------
##
## --- Assume equal population variances of pcfat for each gender
##
## Standardized Mean Difference of pcfat, Cohen's d: 1.961
##
##
## ------ Practical Importance ------
##
## Minimum Mean Difference of practical importance: mmd
## Minimum Standardized Mean Difference of practical importance: msmd
## Neither value specified, so no analysis
##
##
## ------ Graphics Smoothing Parameter ------
##
## Density bandwidth for gender F: 1.475
## Density bandwidth for gender M: 1.867
model <- lm(pcfat ~ gender, data = ob)
summary(model)
##
## Call:
## lm(formula = pcfat ~ gender, data = ob)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.0724 -3.2724 0.1484 3.6276 14.8439
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.6724 0.1826 189.9 <2e-16 ***
## genderM -10.5163 0.3381 -31.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.362 on 1215 degrees of freedom
## Multiple R-squared: 0.4432, Adjusted R-squared: 0.4428
## F-statistic: 967.3 on 1 and 1215 DF, p-value: < 2.2e-16
Regression(pcfat ~ gender, data = ob, graphics = TRUE)
##
## >>> gender is not numeric. Converted to indicator variables.
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## Regression(my_formula=pcfat ~ gender, data=ob, graphics=TRUE, Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: ob
##
## Response Variable: pcfat
## Predictor Variable: genderM
##
## Number of cases (rows) of data: 1217
## Number of cases retained for analysis: 1217
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 34.672413 0.182622 189.859 0.000 34.314123 35.030703
## genderM -10.516344 0.338131 -31.101 0.000 -11.179729 -9.852959
##
## Standard deviation of pcfat: 7.182861
##
## Standard deviation of residuals: 5.361759 for df=1215
## 95% range of residuals: 21.038669 = 2 * (1.962 * 5.361759)
##
## R-squared: 0.443 Adjusted R-squared: 0.443 PRESS R-squared: 0.441
##
## Null hypothesis of all 0 population slope coefficients:
## F-statistic: 967.297 df: 1 and 1215 p-value: 0.000
##
## -- Analysis of Variance
##
## df Sum Sq Mean Sq F-value p-value
## Model 1 27808.311497 27808.311497 967.297285 0.000
## Residuals 1215 34929.384159 28.748464
## pcfat 1216 62737.695656 51.593500
##
##
## K-FOLD CROSS-VALIDATION
##
##
## RELATIONS AMONG THE VARIABLES
##
## pcfat genderM
## pcfat 1.00 -0.67
## genderM -0.67 1.00
##
##
## RESIDUALS AND INFLUENCE
##
## -- Data, Fitted, Residual, Studentized Residual, Dffits, Cook's Distance
## [sorted by Cook's Distance]
## [n_res_rows = 20, out of 1217 rows of data, or do n_res_rows="all"]
## ---------------------------------------------------------------------------
## genderM pcfat fitted resid rstdnt dffits cooks
## 210 1 9.200000 24.156069 -14.956069 -2.801192 -0.148882 0.011020
## 509 1 39.000000 24.156069 14.843931 2.780055 0.147758 0.010860
## 179 1 38.700000 24.156069 14.543931 2.723523 0.144754 0.010420
## 518 1 9.700000 24.156069 -14.456069 -2.706970 -0.143874 0.010300
## 200 1 9.800000 24.156069 -14.356069 -2.688132 -0.142873 0.010150
## 563 1 38.300000 24.156069 14.143931 2.648179 0.140749 0.009860
## 318 1 10.300000 24.156069 -13.856069 -2.593980 -0.137869 0.009460
## 972 1 10.300000 24.156069 -13.856069 -2.593980 -0.137869 0.009460
## 388 1 10.700000 24.156069 -13.456069 -2.518700 -0.133867 0.008920
## 203 1 11.000000 24.156069 -13.156069 -2.462262 -0.130868 0.008530
## 1137 0 14.600000 34.672413 -20.072413 -3.766065 -0.128347 0.008150
## 893 0 14.700000 34.672413 -19.972413 -3.747085 -0.127700 0.008070
## 688 1 11.400000 24.156069 -12.756069 -2.387042 -0.126870 0.008020
## 403 1 11.700000 24.156069 -12.456069 -2.330649 -0.123873 0.007640
## 858 1 11.900000 24.156069 -12.256069 -2.293064 -0.121875 0.007400
## 158 1 36.300000 24.156069 12.143931 2.271993 0.120755 0.007270
## 1106 1 36.300000 24.156069 12.143931 2.271993 0.120755 0.007270
## 827 1 36.000000 24.156069 11.843931 2.215637 0.117760 0.006910
## 756 1 12.400000 24.156069 -11.756069 -2.199135 -0.116883 0.006810
## 196 1 12.500000 24.156069 -11.656069 -2.180355 -0.115885 0.006690
##
##
## PREDICTION ERROR
##
## -- Data, Predicted, Standard Error of Prediction, 95% Prediction Intervals
## [sorted by lower bound of prediction interval]
## [to see all intervals add n_pred_rows="all"]
## ----------------------------------------------
##
## genderM pcfat pred s_pred pi.lwr pi.upr width
## 2 1 16.800000 24.156069 5.369306 13.621929 34.690209 21.068280
## 5 1 14.800000 24.156069 5.369306 13.621929 34.690209 21.068280
## ...
## 1209 1 26.400000 24.156069 5.369306 13.621929 34.690209 21.068280
## 1 0 37.300000 34.672413 5.364869 24.146979 45.197847 21.050869
## 3 0 34.000000 34.672413 5.364869 24.146979 45.197847 21.050869
## ...
## 1215 0 34.400000 34.672413 5.364869 24.146979 45.197847 21.050869
## 1216 0 41.300000 34.672413 5.364869 24.146979 45.197847 21.050869
## 1217 0 33.200000 34.672413 5.364869 24.146979 45.197847 21.050869
##
## ----------------------------------
## Plot 1: Distribution of Residuals
## Plot 2: Residuals vs Fitted Values
## ----------------------------------
Plot(pcfat, weight, data = ob, fit = "lm",)
## lessR visualizations are now unified over just three core functions:
## - Chart() for pivot tables, such as bar charts. More info: ?Chart
## - X() for a single variable x, such as histograms. More info: ?X
## - XY() for scatterplots of two variables, x and y. More info: ?XY
##
## Plot() is deprecated, though still working for now.
## Please use X(...) or XY(...) going forward.
## [Interactive chart from the Plotly R package (Sievert, 2020)]
##
##
## >>> Suggestions or enter: style(suggest=FALSE)
## XY(pcfat, weight, enhance=TRUE) # many options
## XY(pcfat, weight, color="red") # exterior edge color of points
## XY(pcfat, weight, MD_cut=6) # Mahalanobis distance from center > 6 is an outlier
##
##
## >>> Pearson's product-moment correlation
##
## Number of paired values with neither missing, n = 1217
## Sample Correlation of pcfat and weight: r = 0.057
##
## Hypothesis Test of 0 Correlation: t = 1.975, df = 1215, p-value = 0.049
## 95% Confidence Interval for Correlation: 0.000 to 0.112
##
##
## Line: b0 = 52.803 b1 = 0.074
## Linear Model MSE = 88.2435 Rsq = 0.003
##
#main = "Mối liên hệ giữa cân nặng và tỉ trọng mỡ",
#xlab = "Cân nặng (kg)",
#ylab = "Tỉ trọng mỡ (%)")
model <- lm(pcfat ~ weight, data = ob)
par(mfrow = c(2, 2))
plot(model)
summary(model)
##
## Call:
## lm(formula = pcfat ~ weight, data = ob)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.3122 -4.5234 0.8902 5.2695 16.9742
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.22295 1.22370 23.881 <2e-16 ***
## weight 0.04319 0.02188 1.975 0.0485 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.174 on 1215 degrees of freedom
## Multiple R-squared: 0.003199, Adjusted R-squared: 0.002378
## F-statistic: 3.899 on 1 and 1215 DF, p-value: 0.04855
Regression(pcfat ~ weight, data = ob, graphics = TRUE)
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## Regression(my_formula=pcfat ~ weight, data=ob, graphics=TRUE, Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: ob
##
## Response Variable: pcfat
## Predictor Variable: weight
##
## Number of cases (rows) of data: 1217
## Number of cases retained for analysis: 1217
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 29.222947 1.223696 23.881 0.000 26.822156 31.623738
## weight 0.043193 0.021875 1.975 0.049 0.000276 0.086111
##
## Standard deviation of pcfat: 7.182861
##
## Standard deviation of residuals: 7.174315 for df=1215
## 95% range of residuals: 28.150843 = 2 * (1.962 * 7.174315)
##
## R-squared: 0.003 Adjusted R-squared: 0.002 PRESS R-squared: 0.000
##
## Null hypothesis of all 0 population slope coefficients:
## F-statistic: 3.899 df: 1 and 1215 p-value: 0.049
##
## -- Analysis of Variance
##
## df Sum Sq Mean Sq F-value p-value
## Model 1 200.669670 200.669670 3.898709 0.049
## Residuals 1215 62537.025985 51.470803
## pcfat 1216 62737.695656 51.593500
##
##
## K-FOLD CROSS-VALIDATION
##
##
## RELATIONS AMONG THE VARIABLES
##
## pcfat weight
## pcfat 1.00 0.06
## weight 0.06 1.00
##
##
## RESIDUALS AND INFLUENCE
##
## -- Data, Fitted, Residual, Studentized Residual, Dffits, Cook's Distance
## [sorted by Cook's Distance]
## [n_res_rows = 20, out of 1217 rows of data, or do n_res_rows="all"]
## ----------------------------------------------------------------------------
## weight pcfat fitted resid rstdnt dffits cooks
## 373 85 47.400000 32.894372 14.505628 2.033775 0.194997 0.018960
## 923 95 43.500000 33.326305 10.173695 1.429871 0.179944 0.016180
## 972 42 10.300000 31.037063 -20.737063 -2.902805 -0.143205 0.010190
## 716 74 17.600000 32.419246 -14.819246 -2.072679 -0.133434 0.008880
## 858 42 11.900000 31.037063 -19.137063 -2.677456 -0.132088 0.008680
## 177 38 15.700000 30.864290 -15.164290 -2.120502 -0.126644 0.008000
## 1071 70 48.100000 32.246474 15.853526 2.216504 0.118990 0.007060
## 943 73 18.700000 32.376053 -13.676053 -1.911957 -0.117867 0.006930
## 876 34 18.800000 30.691517 -11.891517 -1.662860 -0.117617 0.006910
## 245 35 18.400000 30.734710 -12.334710 -1.724650 -0.117167 0.006850
## 762 80 22.600000 32.678406 -10.078406 -1.409997 -0.114628 0.006560
## 88 68 15.300000 32.160087 -16.860087 -2.357246 -0.114610 0.006540
## 200 47 9.800000 31.253029 -21.453029 -3.002259 -0.113942 0.006450
## 688 46 11.400000 31.209836 -19.809836 -2.771011 -0.110895 0.006120
## 184 39 17.200000 30.907483 -13.707483 -1.915842 -0.109309 0.005960
## 388 47 10.700000 31.253029 -20.553029 -2.875431 -0.109129 0.005920
## 895 65 13.600000 32.030507 -18.430507 -2.577138 -0.107125 0.005710
## 276 40 16.900000 30.950676 -14.050676 -1.963672 -0.106882 0.005700
## 528 39 17.600000 30.907483 -13.307483 -1.859774 -0.106110 0.005620
## 441 72 20.000000 32.332860 -12.332860 -1.723410 -0.101599 0.005150
##
##
## PREDICTION ERROR
##
## -- Data, Predicted, Standard Error of Prediction, 95% Prediction Intervals
## [sorted by lower bound of prediction interval]
## [to see all intervals add n_pred_rows="all"]
## ----------------------------------------------
##
## weight pcfat pred s_pred pi.lwr pi.upr width
## 876 34 18.800000 30.691517 7.192151 16.581104 44.801929 28.220825
## 55 35 27.100000 30.734710 7.190777 16.626993 44.842427 28.215435
## 245 35 18.400000 30.734710 7.190777 16.626993 44.842427 28.215435
## ...
## 1211 54 34.100000 31.555382 7.177306 17.474093 45.636670 28.162577
## 20 55 19.300000 31.598575 7.177263 17.517370 45.679779 28.162409
## 27 55 37.200000 31.598575 7.177263 17.517370 45.679779 28.162409
## ...
## 402 93 32.300000 33.239918 7.224879 19.065295 47.414541 28.349246
## 628 95 32.900000 33.326305 7.230024 19.141587 47.511022 28.369435
## 891 95 30.100000 33.326305 7.230024 19.141587 47.511022 28.369435
##
## ----------------------------------
## Plot 1: Distribution of Residuals
## Plot 2: Residuals vs Fitted Values
## ----------------------------------
model_adj <- lm(pcfat ~ weight + gender + age + height, data = ob)
par(mfrow = c(2, 2))
plot(model_adj)
summary(model_adj)
##
## Call:
## lm(formula = pcfat ~ weight + gender + age + height, data = ob)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.208 -2.543 0.019 2.582 15.706
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.368722 3.505431 13.798 < 2e-16 ***
## weight 0.439169 0.015594 28.163 < 2e-16 ***
## genderM -11.483254 0.344343 -33.348 < 2e-16 ***
## age 0.056166 0.007404 7.585 6.58e-14 ***
## height -0.257013 0.023768 -10.813 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.974 on 1212 degrees of freedom
## Multiple R-squared: 0.695, Adjusted R-squared: 0.694
## F-statistic: 690.4 on 4 and 1212 DF, p-value: < 2.2e-16
Regression(pcfat ~ weight + gender + age + height, data = ob, graphics = TRUE)
##
## >>> gender is not numeric. Converted to indicator variables.
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## Regression(my_formula=pcfat ~ weight + gender + age + height, data=ob, graphics=TRUE, Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: ob
##
## Response Variable: pcfat
## Predictor Variable 1: weight
## Predictor Variable 2: genderM
## Predictor Variable 3: age
## Predictor Variable 4: height
##
## Number of cases (rows) of data: 1217
## Number of cases retained for analysis: 1217
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 48.368722 3.505431 13.798 0.000 41.491335 55.246110
## weight 0.439169 0.015594 28.163 0.000 0.408576 0.469763
## genderM -11.483254 0.344343 -33.348 0.000 -12.158828 -10.807679
## age 0.056166 0.007404 7.585 0.000 0.041639 0.070693
## height -0.257013 0.023768 -10.813 0.000 -0.303644 -0.210382
##
## Standard deviation of pcfat: 7.182861
##
## Standard deviation of residuals: 3.973577 for df=1212
## 95% range of residuals: 15.591705 = 2 * (1.962 * 3.973577)
##
## R-squared: 0.695 Adjusted R-squared: 0.694 PRESS R-squared: 0.692
##
## Null hypothesis of all 0 population slope coefficients:
## F-statistic: 690.357 df: 4 and 1212 p-value: 0.000
##
## -- Analysis of Variance
##
## df Sum Sq Mean Sq F-value p-value
## weight 1 200.669670 200.669670 12.709209 0.000
## genderM 1 38576.550161 38576.550161 2443.206529 0.000
## age 1 2977.577719 2977.577719 188.581853 0.000
## height 1 1846.251961 1846.251961 116.930488 0.000
##
## Model 4 43601.049512 10900.262378 690.357020 0.000
## Residuals 1212 19136.646144 15.789312
## pcfat 1216 62737.695656 51.593500
##
##
## K-FOLD CROSS-VALIDATION
##
##
## RELATIONS AMONG THE VARIABLES
##
## pcfat weight genderM age height
## pcfat 1.00 0.06 -0.67 0.31 -0.48
## weight 0.06 1.00 0.47 -0.05 0.60
## genderM -0.67 0.47 1.00 -0.13 0.67
## age 0.31 -0.05 -0.13 1.00 -0.37
## height -0.48 0.60 0.67 -0.37 1.00
##
## Tolerance VIF
## weight 0.604 1.656
## genderM 0.530 1.888
## age 0.794 1.260
## height 0.361 2.769
##
## weight genderM age height R2adj X's
## 1 1 1 1 0.694 4
## 1 1 0 1 0.680 3
## 1 1 1 0 0.665 3
## 1 1 0 0 0.617 2
## 0 1 1 1 0.494 3
## 0 1 1 0 0.492 2
## 0 1 0 1 0.444 2
## 0 1 0 0 0.443 1
## 1 0 1 1 0.414 3
## 1 0 0 1 0.413 2
## 0 0 1 1 0.248 2
## 0 0 0 1 0.230 1
## 1 0 1 0 0.098 2
## 0 0 1 0 0.094 1
## 1 0 0 0 0.002 1
##
## [based on Thomas Lumley's leaps function from the leaps package]
##
##
## RESIDUALS AND INFLUENCE
##
## -- Data, Fitted, Residual, Studentized Residual, Dffits, Cook's Distance
## [sorted by Cook's Distance]
## [n_res_rows = 20, out of 1217 rows of data, or do n_res_rows="all"]
## ----------------------------------------------------------------------------------------------------------
## weight genderM age height pcfat fitted resid rstdnt dffits cooks
## 563 50 1 48 150 38.300000 22.987969 15.312031 3.892904 0.365059 0.026350
## 923 95 0 57 160 43.500000 52.169213 -8.669213 -2.212032 -0.347600 0.024090
## 1000 51 1 43 148 35.200000 23.660335 11.539665 2.929329 0.308975 0.018970
## 509 67 1 13 167 39.000000 24.118827 14.881173 3.777390 0.300716 0.017890
## 1106 49 1 67 150 36.300000 23.615951 12.684049 3.218255 0.299627 0.017820
## 562 85 0 21 167 34.800000 43.956458 -9.156458 -2.327441 -0.298503 0.017760
## 377 76 1 40 148 26.900000 34.471075 -7.571075 -1.929208 -0.291924 0.017010
## 893 53 0 18 163 14.700000 30.762588 -16.062588 -4.078127 -0.283131 0.015830
## 245 35 0 80 145 18.400000 30.966052 -12.566052 -3.186550 -0.279949 0.015560
## 49 45 1 77 156 32.100000 20.878854 11.221146 2.845738 0.278598 0.015430
## 876 34 0 76 141 18.800000 31.330271 -12.530271 -3.177361 -0.278691 0.015420
## 1008 54 0 82 165 25.500000 34.282346 -8.782346 -2.228558 -0.257750 0.013240
## 316 62 1 19 176 32.100000 19.946858 12.153142 3.079313 0.250519 0.012460
## 1137 50 0 55 158 14.600000 32.808281 -18.208281 -4.627167 -0.243679 0.011680
## 269 52 1 36 156 34.000000 21.650241 12.349759 3.128488 0.241383 0.011570
## 16 70 1 49 150 24.300000 31.827525 -7.527525 -1.909655 -0.225639 0.010160
## 179 75 1 23 168 38.700000 27.936828 10.763172 2.724895 0.222502 0.009850
## 891 95 1 41 172 30.100000 36.703152 -6.603152 -1.676741 -0.215937 0.009310
## 135 52 1 72 160 33.200000 22.644159 10.555841 2.671843 0.215079 0.009210
## 264 55 1 78 154 35.600000 25.840740 9.759260 2.470316 0.212774 0.009020
##
##
## PREDICTION ERROR
##
## -- Data, Predicted, Standard Error of Prediction, 95% Prediction Intervals
## [sorted by lower bound of prediction interval]
## [to see all intervals add n_pred_rows="all"]
## ----------------------------------------------
##
## weight genderM age height pcfat pred s_pred pi.lwr pi.upr width
## 177 38 1 55 162 15.700000 15.026941 3.996117 7.186868 22.867015 15.680148
## 186 52 1 22 175 15.400000 15.980674 3.991308 8.150033 23.811315 15.661281
## 331 45 1 50 169 17.500000 16.021208 3.993505 8.186259 23.856158 15.669899
## ...
## 734 55 0 47 158 36.900000 34.554801 3.977017 26.752200 42.357403 15.605203
## 348 52 0 48 153 33.600000 34.578523 3.975890 26.778132 42.378915 15.600783
## 164 50 0 50 150 31.900000 34.583554 3.976421 26.782122 42.384987 15.602865
## ...
## 373 85 0 61 153 47.400000 49.801272 4.007267 41.939322 57.663223 15.723901
## 923 95 0 57 160 43.500000 52.169213 4.021169 44.279988 60.058439 15.778452
##
## ----------------------------------
## Plot 1: Distribution of Residuals
## Plot 2: Residuals vs Fitted Values
## ----------------------------------
library(gapminder)
vn = subset(gapminder, country=="Vietnam")
library(ggplot2)
ggplot(data=vn, aes(x=year, y=gdpPercap)) + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
vn$knot = ifelse(vn$year>1992, 1, 0)
vn$diff = vn$year - 1992
vn$int = vn$diff * vn$knot
m = lm(gdpPercap ~ year + int, data=vn)
summary(m)
##
## Call:
## lm(formula = gdpPercap ~ year + int, data = vn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -106.291 -54.225 -5.144 41.823 133.329
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -11328.061 3963.982 -2.858 0.0188 *
## year 6.116 2.009 3.044 0.0139 *
## int 95.389 7.244 13.168 3.48e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 81.95 on 9 degrees of freedom
## Multiple R-squared: 0.9829, Adjusted R-squared: 0.9791
## F-statistic: 259.2 on 2 and 9 DF, p-value: 1.107e-08
data("mtcars")
dim(mtcars)
## [1] 32 11
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
ggplot(data=mtcars, aes(x=hp, y=mpg)) + geom_point()
m1 = lm(mpg ~ hp, data = mtcars)
summary(m1)
##
## Call:
## lm(formula = mpg ~ hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7121 -2.1122 -0.8854 1.5819 8.2360
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.09886 1.63392 18.421 < 2e-16 ***
## hp -0.06823 0.01012 -6.742 1.79e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892
## F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
library(ggfortify)
autoplot(m1)
## Warning: `fortify(<lm>)` was deprecated in ggplot2 4.0.0.
## ℹ Please use `broom::augment(<lm>)` instead.
## ℹ The deprecated feature was likely used in the ggfortify package.
## Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the ggfortify package.
## Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the ggfortify package.
## Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
mtcars$hp2 = mtcars$hp^2
m2 = lm(mpg ~ hp + hp2, data = mtcars)
summary(m2)
##
## Call:
## lm(formula = mpg ~ hp + hp2, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5512 -1.6027 -0.6977 1.5509 8.7213
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.041e+01 2.741e+00 14.744 5.23e-15 ***
## hp -2.133e-01 3.488e-02 -6.115 1.16e-06 ***
## hp2 4.208e-04 9.844e-05 4.275 0.000189 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.077 on 29 degrees of freedom
## Multiple R-squared: 0.7561, Adjusted R-squared: 0.7393
## F-statistic: 44.95 on 2 and 29 DF, p-value: 1.301e-09
mtcars$hp3 = mtcars$hp^3
m3 = lm(mpg ~ hp + hp2 + hp3, data = mtcars)
summary(m3)
##
## Call:
## lm(formula = mpg ~ hp + hp2 + hp3, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.8605 -1.3972 -0.5736 1.6461 9.0738
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.422e+01 5.961e+00 7.419 4.43e-08 ***
## hp -2.945e-01 1.178e-01 -2.500 0.0185 *
## hp2 9.115e-04 6.863e-04 1.328 0.1949
## hp3 -8.701e-07 1.204e-06 -0.722 0.4760
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.103 on 28 degrees of freedom
## Multiple R-squared: 0.7606, Adjusted R-squared: 0.7349
## F-statistic: 29.65 on 3 and 28 DF, p-value: 7.769e-09
ggplot(data=mtcars, aes(x=hp, y=mpg)) + geom_point() +
geom_smooth(method="lm", formula=y~x+I(x^2))
library(BMA)
## Loading required package: survival
## Loading required package: leaps
## Loading required package: robustbase
##
## Attaching package: 'robustbase'
## The following object is masked from 'package:survival':
##
## heart
## Loading required package: inline
## Loading required package: rrcov
## Scalable Robust Estimators with High Breakdown Point (version 1.7-7)
X <- ob[, c("gender", "height", "weight", "bmi", "age")]
Y <- ob$pcfat
bma_model <- bicreg(x = X, y = Y)
summary(bma_model)
##
## Call:
## bicreg(x = X, y = Y)
##
##
## 3 models were selected
## Best 3 models (cumulative posterior probability = 1 ):
##
## p!=0 EV SD model 1 model 2 model 3
## Intercept 100.0 5.26146 4.582901 7.958e+00 -7.928e-01 8.137e+00
## genderM 100.0 -11.25139 0.429659 -1.144e+01 -1.143e+01 -1.081e+01
## height 31.4 0.01759 0.028494 . 5.598e-02 .
## weight 39.2 0.03102 0.042611 7.921e-02 . .
## bmi 100.0 1.01265 0.111625 8.942e-01 1.089e+00 1.089e+00
## age 100.0 0.05259 0.008048 5.497e-02 5.473e-02 4.715e-02
##
## nVar 4 4 3
## r2 0.697 0.696 0.695
## BIC -1.423e+03 -1.423e+03 -1.422e+03
## post prob 0.392 0.314 0.294
m.bma = lm(pcfat ~ gender + age + weight + bmi, data = ob)
par(mfrow = c(2, 2))
plot(m.bma)
summary(m.bma)
##
## Call:
## lm(formula = pcfat ~ gender + age + weight + bmi, data = ob)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.225 -2.557 0.033 2.608 15.646
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.957732 0.852500 9.335 < 2e-16 ***
## genderM -11.444303 0.342565 -33.408 < 2e-16 ***
## age 0.054966 0.007395 7.433 1.99e-13 ***
## weight 0.079207 0.028620 2.768 0.00573 **
## bmi 0.894194 0.080297 11.136 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.963 on 1212 degrees of freedom
## Multiple R-squared: 0.6966, Adjusted R-squared: 0.6956
## F-statistic: 695.7 on 4 and 1212 DF, p-value: < 2.2e-16
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following objects are masked from 'package:lessR':
##
## recode, sp
vif(m.bma)
## gender age weight bmi
## 1.878778 1.263468 5.609651 4.663425