Faremos uma análise descritiva de um banco de dados, onde estão listados os dados de 525 alunos do ensino medio de uma escola pulblica da Paraiba.
Para analisar os dados utilizaremos o pacote “fBasics” para uma analises descritivas, entre outros pacotes para visualização gráfica.
setwd("C:\\Users\\Mateus\\Desktop\\P7\\Consultoria")
#install.packages("fBasics")
#install.packages("ggplot2")
#install.packages("car")
#install.packages("qqplot")
library(fBasics)
## Loading required package: timeDate
## Loading required package: timeSeries
library(ggplot2)
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:fBasics':
##
## densityPlot
library(corrplot)
## corrplot 0.84 loaded
importanto o banco de dados
dados<-read.table("Dados.txt",h=T)
dim(dados)
## [1] 525 13
summary(dados)
## PEMED PIMED IDADE IMC
## Min. : 43.0 Min. : 22.00 Min. :15.00 Min. :14.70
## 1st Qu.: 82.7 1st Qu.: 62.00 1st Qu.:16.00 1st Qu.:19.10
## Median : 96.3 Median : 76.00 Median :16.80 Median :20.70
## Mean :100.7 Mean : 78.31 Mean :16.82 Mean :21.57
## 3rd Qu.:115.3 3rd Qu.: 90.00 3rd Qu.:17.50 3rd Qu.:23.20
## Max. :237.3 Max. :182.00 Max. :19.90 Max. :40.00
## HRSEDCAL NMEDPAS NMEDPAD MEDCABDO
## Min. : 0.000 Min. : 86.5 Min. :48.50 Min. : 56.00
## 1st Qu.: 2.000 1st Qu.:103.0 1st Qu.:62.50 1st Qu.: 66.00
## Median : 3.000 Median :109.0 Median :66.50 Median : 69.60
## Mean : 3.173 Mean :109.9 Mean :66.87 Mean : 71.45
## 3rd Qu.: 4.000 3rd Qu.:116.5 3rd Qu.:71.50 3rd Qu.: 74.80
## Max. :12.000 Max. :143.5 Max. :93.50 Max. :116.00
## TOTAFIS HDL TG GLICEMIA
## Min. : 0.0 Min. : 20.00 Min. : 30.00 Min. :55.00
## 1st Qu.: 140.0 1st Qu.: 36.00 1st Qu.: 58.00 1st Qu.:71.00
## Median : 245.0 Median : 41.00 Median : 74.00 Median :76.00
## Mean : 331.8 Mean : 41.93 Mean : 82.73 Mean :75.95
## 3rd Qu.: 440.0 3rd Qu.: 47.00 3rd Qu.: 96.00 3rd Qu.:81.00
## Max. :2830.0 Max. :142.00 Max. :423.00 Max. :98.00
## ESCMATER
## Min. : 0.000
## 1st Qu.: 6.000
## Median : 9.000
## Mean : 8.808
## 3rd Qu.:12.000
## Max. :17.000
names(dados)
## [1] "PEMED" "PIMED" "IDADE" "IMC" "HRSEDCAL" "NMEDPAS"
## [7] "NMEDPAD" "MEDCABDO" "TOTAFIS" "HDL" "TG" "GLICEMIA"
## [13] "ESCMATER"
attach(dados)
Usando o pacote fBasics para uma análise descritiva.
basicStats(dados)
## PEMED PIMED IDADE IMC HRSEDCAL
## nobs 525.000000 525.000000 525.000000 525.000000 525.000000
## NAs 0.000000 0.000000 0.000000 0.000000 0.000000
## Minimum 43.000000 22.000000 15.000000 14.700000 0.000000
## Maximum 237.300000 182.000000 19.900000 40.000000 12.000000
## 1. Quartile 82.700000 62.000000 16.000000 19.100000 2.000000
## 3. Quartile 115.300000 90.000000 17.500000 23.200000 4.000000
## Mean 100.673905 78.310476 16.820571 21.572952 3.172952
## Median 96.300000 76.000000 16.800000 20.700000 3.000000
## Sum 52853.800000 41113.000000 8830.800000 11325.800000 1665.800000
## SE Mean 1.224955 1.040270 0.045468 0.170325 0.078835
## LCL Mean 98.267478 76.266865 16.731250 21.238348 3.018081
## UCL Mean 103.080332 80.354087 16.909893 21.907556 3.327824
## Variance 787.770787 568.134337 1.085339 15.230603 3.262855
## Stdev 28.067255 23.835569 1.041796 3.902641 1.806337
## Skewness 1.038744 0.820519 0.529699 1.453463 1.388155
## Kurtosis 2.056601 1.267316 -0.104454 2.942668 3.205288
## NMEDPAS NMEDPAD MEDCABDO TOTAFIS
## nobs 525.000000 525.000000 525.000000 525.00000
## NAs 0.000000 0.000000 0.000000 0.00000
## Minimum 86.500000 48.500000 56.000000 0.00000
## Maximum 143.500000 93.500000 116.000000 2830.00000
## 1. Quartile 103.000000 62.500000 66.000000 140.00000
## 3. Quartile 116.500000 71.500000 74.800000 440.00000
## Mean 109.898095 66.867238 71.454667 331.80571
## Median 109.000000 66.500000 69.600000 245.00000
## Sum 57696.500000 35105.300000 37513.700000 174198.00000
## SE Mean 0.432794 0.306405 0.382907 13.92042
## LCL Mean 109.047870 66.265306 70.702445 304.45904
## UCL Mean 110.748321 67.469170 72.206889 359.15239
## Variance 98.338279 49.288963 76.974430 101733.42401
## Stdev 9.916566 7.020610 8.773507 318.95677
## Skewness 0.375215 0.286196 1.586484 2.43862
## Kurtosis -0.046822 0.182574 3.590059 10.18325
## HDL TG GLICEMIA ESCMATER
## nobs 525.000000 525.000000 525.000000 525.000000
## NAs 0.000000 0.000000 0.000000 0.000000
## Minimum 20.000000 30.000000 55.000000 0.000000
## Maximum 142.000000 423.000000 98.000000 17.000000
## 1. Quartile 36.000000 58.000000 71.000000 6.000000
## 3. Quartile 47.000000 96.000000 81.000000 12.000000
## Mean 41.925714 82.729524 75.946667 8.807619
## Median 41.000000 74.000000 76.000000 9.000000
## Sum 22011.000000 43433.000000 39872.000000 4624.000000
## SE Mean 0.417345 1.720215 0.303467 0.157308
## LCL Mean 41.105839 79.350158 75.350506 8.498587
## UCL Mean 42.745590 86.108889 76.542827 9.116651
## Variance 91.442944 1553.548840 48.348295 12.991545
## Stdev 9.562580 39.415084 6.953294 3.604379
## Skewness 2.649277 2.654582 0.165793 -0.192904
## Kurtosis 22.520426 13.425566 -0.114004 -0.340546
vari <- cor(dados);head(vari)
## PEMED PIMED IDADE IMC HRSEDCAL
## PEMED 1.000000000 0.72559574 0.01486833 0.14716324 0.003280082
## PIMED 0.725595745 1.00000000 0.01019430 0.19818722 -0.049940080
## IDADE 0.014868333 0.01019430 1.00000000 0.09506044 -0.051798833
## IMC 0.147163235 0.19818722 0.09506044 1.00000000 0.122805758
## HRSEDCAL 0.003280082 -0.04994008 -0.05179883 0.12280576 1.000000000
## NMEDPAS 0.279391148 0.35365670 0.08501958 0.34032151 0.012047742
## NMEDPAS NMEDPAD MEDCABDO TOTAFIS HDL
## PEMED 0.27939115 -0.004540354 0.2252654 0.22055515 -0.15134771
## PIMED 0.35365670 0.055875790 0.2690124 0.19953510 -0.14193394
## IDADE 0.08501958 -0.031359245 0.1046211 -0.02992841 -0.01348557
## IMC 0.34032151 0.257262415 0.9076445 0.02844188 -0.14109484
## HRSEDCAL 0.01204774 0.115513480 0.1023404 -0.04451273 0.01840038
## NMEDPAS 1.00000000 0.613765014 0.4181621 0.10962250 -0.09825295
## TG GLICEMIA ESCMATER
## PEMED 0.003873632 0.12221209 0.0707419622
## PIMED 0.034973607 0.18091543 -0.0192287506
## IDADE -0.035575964 -0.09210784 -0.1729092766
## IMC 0.289920982 0.10481776 0.0001177863
## HRSEDCAL 0.058748966 0.05780541 0.0384181761
## NMEDPAS 0.171696767 0.13709032 -0.0384472453
Gerando um box-plot para uma melhor visualização das variáveis
Para uma boa análise estatística se faz necessário identificar a possível distribuição de probabilidade que os dados seguem, e a mais comumente utilizada é a Distribuição Normal. Para verificar isto pode-se utilizar um histograma acompanhado da distribuição normal da variável.
par(mfrow = c(3,3))
histPlot(as.timeSeries(dados))
densityPlot(as.timeSeries(dados))
Baseados nos gráficos anteriores, vimos que grande parte das variáveis não segue distribuição normal, ao contrario das variáveis GLICEMIA, NMEDPAS e NMEDPAD.
Usaremos o gráfico Q-qplot, para visualizar.
par(mfrow=c(2,2))
qqPlot(PEMED)
## [1] 337 377
qqPlot(PIMED)
## [1] 305 369
qqPlot(IDADE)
## [1] 282 369
qqPlot(IMC)
## [1] 385 252
qqPlot(HRSEDCAL)
## [1] 364 306
qqPlot(NMEDPAS)
## [1] 64 256
qqPlot(NMEDPAD)
## [1] 379 210
qqPlot(MEDCABDO)
## [1] 75 42
qqPlot(TOTAFIS)
## [1] 486 441
qqPlot(HDL)
## [1] 56 164
qqPlot(TG)
## [1] 213 485
qqPlot(GLICEMIA)
## [1] 74 274
par(mfrow=c(1,1))
qqPlot(ESCMATER)
## [1] 12 17
Logo percebemos que as variáveis GLICEMIA, NMEDPAS e NMEDPAD seguem, aproximadamente, uma distribuição normal.
O gráfico das correlações é dado por:
corrplot(vari,method="circle")
nortest ad.tesr, teste de normalidade Anderson Dales
comando das covariancias
PEMED
names(dados)
## [1] "PEMED" "PIMED" "IDADE" "IMC" "HRSEDCAL" "NMEDPAS"
## [7] "NMEDPAD" "MEDCABDO" "TOTAFIS" "HDL" "TG" "GLICEMIA"
## [13] "ESCMATER"
mod.lm <- lm(PEMED ~ IDADE + IMC + HRSEDCAL + NMEDPAS + NMEDPAD
+ MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA + ESCMATER, data=dados)
summary(mod.lm)
##
## Call:
## lm(formula = PEMED ~ IDADE + IMC + HRSEDCAL + NMEDPAS + NMEDPAD +
## MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA + ESCMATER, data = dados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -56.319 -15.392 -3.154 11.919 132.494
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.915719 26.835516 0.407 0.684351
## IDADE -0.425695 1.106764 -0.385 0.700671
## IMC -1.265368 0.699488 -1.809 0.071038 .
## HRSEDCAL 0.372214 0.627307 0.593 0.553207
## NMEDPAS 0.957581 0.157876 6.065 2.56e-09 ***
## NMEDPAD -0.913729 0.210209 -4.347 1.67e-05 ***
## MEDCABDO 0.915734 0.328320 2.789 0.005481 **
## TOTAFIS 0.013892 0.003573 3.888 0.000115 ***
## HDL -0.302855 0.123333 -2.456 0.014396 *
## TG -0.051642 0.030733 -1.680 0.093496 .
## GLICEMIA 0.288259 0.165993 1.737 0.083062 .
## ESCMATER 0.446438 0.318520 1.402 0.161639
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.42 on 513 degrees of freedom
## Multiple R-squared: 0.1969, Adjusted R-squared: 0.1797
## F-statistic: 11.44 on 11 and 513 DF, p-value: < 2.2e-16
#Removendo maior P-Value
mod.lm1 <- lm(PEMED ~ IMC + HRSEDCAL + NMEDPAS + NMEDPAD
+ MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA + ESCMATER, data=dados)
summary(mod.lm1)
##
## Call:
## lm(formula = PEMED ~ IMC + HRSEDCAL + NMEDPAS + NMEDPAD + MEDCABDO +
## TOTAFIS + HDL + TG + GLICEMIA + ESCMATER, data = dados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -56.007 -15.617 -3.132 12.247 133.238
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.616437 18.958009 0.191 0.848789
## IMC -1.264080 0.698900 -1.809 0.071086 .
## HRSEDCAL 0.381736 0.626298 0.610 0.542454
## NMEDPAS 0.951391 0.156924 6.063 2.59e-09 ***
## NMEDPAD -0.905115 0.208839 -4.334 1.76e-05 ***
## MEDCABDO 0.908534 0.327514 2.774 0.005738 **
## TOTAFIS 0.013972 0.003564 3.920 0.000101 ***
## HDL -0.303869 0.123202 -2.466 0.013972 *
## TG -0.051071 0.030671 -1.665 0.096502 .
## GLICEMIA 0.294755 0.164994 1.786 0.074615 .
## ESCMATER 0.466305 0.314044 1.485 0.138199
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.4 on 514 degrees of freedom
## Multiple R-squared: 0.1967, Adjusted R-squared: 0.1811
## F-statistic: 12.59 on 10 and 514 DF, p-value: < 2.2e-16
mod.lm2 <- lm(PEMED ~ IMC + NMEDPAS + NMEDPAD
+ MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA + ESCMATER, data=dados)
summary(mod.lm2)
##
## Call:
## lm(formula = PEMED ~ IMC + NMEDPAS + NMEDPAD + MEDCABDO + TOTAFIS +
## HDL + TG + GLICEMIA + ESCMATER, data = dados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -56.182 -15.768 -3.251 12.198 132.480
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.819005 18.943526 0.202 0.840309
## IMC -1.240335 0.697387 -1.779 0.075904 .
## NMEDPAS 0.942461 0.156143 6.036 3.03e-09 ***
## NMEDPAD -0.889417 0.207119 -4.294 2.10e-05 ***
## MEDCABDO 0.907415 0.327309 2.772 0.005767 **
## TOTAFIS 0.013925 0.003561 3.910 0.000105 ***
## HDL -0.302036 0.123091 -2.454 0.014467 *
## TG -0.050901 0.030651 -1.661 0.097393 .
## GLICEMIA 0.299799 0.164686 1.820 0.069274 .
## ESCMATER 0.471972 0.313714 1.504 0.133075
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.38 on 515 degrees of freedom
## Multiple R-squared: 0.1961, Adjusted R-squared: 0.1821
## F-statistic: 13.96 on 9 and 515 DF, p-value: < 2.2e-16
mod.lm3 <- lm(PEMED ~ IMC + NMEDPAS + NMEDPAD
+ MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA, data=dados)
summary(mod.lm3)
##
## Call:
## lm(formula = PEMED ~ IMC + NMEDPAS + NMEDPAD + MEDCABDO + TOTAFIS +
## HDL + TG + GLICEMIA, data = dados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.931 -15.891 -3.683 12.657 132.266
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.862387 18.953989 0.257 0.79764
## IMC -1.362617 0.693482 -1.965 0.04996 *
## NMEDPAS 0.920554 0.155653 5.914 6.08e-09 ***
## NMEDPAD -0.878236 0.207239 -4.238 2.67e-05 ***
## MEDCABDO 0.973064 0.324784 2.996 0.00287 **
## TOTAFIS 0.014203 0.003561 3.989 7.61e-05 ***
## HDL -0.288495 0.122911 -2.347 0.01929 *
## TG -0.050492 0.030688 -1.645 0.10051
## GLICEMIA 0.326482 0.163929 1.992 0.04694 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.41 on 516 degrees of freedom
## Multiple R-squared: 0.1926, Adjusted R-squared: 0.1801
## F-statistic: 15.38 on 8 and 516 DF, p-value: < 2.2e-16
mod.lm4 <- lm(PEMED ~ IMC + NMEDPAS + NMEDPAD
+ MEDCABDO + TOTAFIS + HDL + GLICEMIA, data=dados)
summary(mod.lm4)
##
## Call:
## lm(formula = PEMED ~ IMC + NMEDPAS + NMEDPAD + MEDCABDO + TOTAFIS +
## HDL + GLICEMIA, data = dados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.830 -15.865 -3.662 12.924 132.629
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.678006 18.953055 0.352 0.72472
## IMC -1.391634 0.694401 -2.004 0.04558 *
## NMEDPAS 0.935003 0.155661 6.007 3.57e-09 ***
## NMEDPAD -0.932204 0.204964 -4.548 6.75e-06 ***
## MEDCABDO 0.929307 0.324228 2.866 0.00432 **
## TOTAFIS 0.014332 0.003566 4.019 6.71e-05 ***
## HDL -0.251250 0.121008 -2.076 0.03836 *
## GLICEMIA 0.302468 0.163547 1.849 0.06497 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.46 on 517 degrees of freedom
## Multiple R-squared: 0.1884, Adjusted R-squared: 0.1774
## F-statistic: 17.14 on 7 and 517 DF, p-value: < 2.2e-16
mod.lm5 <- lm(PEMED ~ IMC + NMEDPAS + NMEDPAD
+ MEDCABDO + TOTAFIS + HDL, data=dados)
summary(mod.lm5)
##
## Call:
## lm(formula = PEMED ~ IMC + NMEDPAS + NMEDPAD + MEDCABDO + TOTAFIS +
## HDL, data = dados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -61.04 -15.70 -3.41 11.69 134.07
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.584180 16.331486 1.505 0.132850
## IMC -1.458466 0.695079 -2.098 0.036365 *
## NMEDPAS 0.965939 0.155121 6.227 9.84e-10 ***
## NMEDPAD -0.956936 0.205005 -4.668 3.88e-06 ***
## MEDCABDO 0.986000 0.323528 3.048 0.002424 **
## TOTAFIS 0.013974 0.003569 3.915 0.000102 ***
## HDL -0.231477 0.120816 -1.916 0.055922 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.52 on 518 degrees of freedom
## Multiple R-squared: 0.183, Adjusted R-squared: 0.1735
## F-statistic: 19.34 on 6 and 518 DF, p-value: < 2.2e-16
mod.lmf6 <- lm(PEMED ~ IMC + NMEDPAS + NMEDPAD
+ MEDCABDO + TOTAFIS, data=dados)
summary(mod.lmf6)
##
## Call:
## lm(formula = PEMED ~ IMC + NMEDPAS + NMEDPAD + MEDCABDO + TOTAFIS,
## data = dados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -62.592 -15.986 -3.646 11.463 134.977
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.74726 14.41573 0.676 0.49924
## IMC -1.64006 0.69036 -2.376 0.01788 *
## NMEDPAS 0.96243 0.15551 6.189 1.23e-09 ***
## NMEDPAD -0.95586 0.20553 -4.651 4.20e-06 ***
## MEDCABDO 1.11495 0.31726 3.514 0.00048 ***
## TOTAFIS 0.01442 0.00357 4.040 6.15e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.58 on 519 degrees of freedom
## Multiple R-squared: 0.1772, Adjusted R-squared: 0.1693
## F-statistic: 22.35 on 5 and 519 DF, p-value: < 2.2e-16
PIMED
mod.lm <- lm(PIMED ~ IDADE + IMC + HRSEDCAL + NMEDPAS + NMEDPAD
+ MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA + ESCMATER, data=dados)
summary(mod.lm)
##
## Call:
## lm(formula = PIMED ~ IDADE + IMC + HRSEDCAL + NMEDPAS + NMEDPAD +
## MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA + ESCMATER, data = dados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -58.083 -14.768 -0.931 12.424 86.327
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.466041 22.279458 -0.784 0.433430
## IDADE -0.924230 0.918861 -1.006 0.314965
## IMC -0.706403 0.580731 -1.216 0.224390
## HRSEDCAL -0.531781 0.520804 -1.021 0.307699
## NMEDPAS 0.917379 0.131073 6.999 8.10e-12 ***
## NMEDPAD -0.686498 0.174520 -3.934 9.52e-05 ***
## MEDCABDO 0.697984 0.272579 2.561 0.010732 *
## TOTAFIS 0.010167 0.002967 3.427 0.000659 ***
## HDL -0.204687 0.102394 -1.999 0.046133 *
## TG -0.036578 0.025515 -1.434 0.152296
## GLICEMIA 0.446816 0.137811 3.242 0.001263 **
## ESCMATER -0.254201 0.264443 -0.961 0.336868
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.1 on 513 degrees of freedom
## Multiple R-squared: 0.2325, Adjusted R-squared: 0.216
## F-statistic: 14.13 on 11 and 513 DF, p-value: < 2.2e-16
#Removendo maio P-Value
mod.lm1 <- lm(PIMED ~ IDADE + IMC + HRSEDCAL + NMEDPAS + NMEDPAD
+ MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA, data=dados)
summary(mod.lm1)
##
## Call:
## lm(formula = PIMED ~ IDADE + IMC + HRSEDCAL + NMEDPAS + NMEDPAD +
## MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA, data = dados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -56.640 -15.190 -1.036 12.123 86.658
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -20.460969 22.058909 -0.928 0.354071
## IDADE -0.780997 0.906632 -0.861 0.389404
## IMC -0.640988 0.576687 -1.112 0.266872
## HRSEDCAL -0.543190 0.520630 -1.043 0.297285
## NMEDPAS 0.926433 0.130724 7.087 4.55e-12 ***
## NMEDPAD -0.688858 0.174490 -3.948 8.99e-05 ***
## MEDCABDO 0.661120 0.269848 2.450 0.014619 *
## TOTAFIS 0.010046 0.002964 3.390 0.000754 ***
## HDL -0.212054 0.102099 -2.077 0.038303 *
## TG -0.036594 0.025513 -1.434 0.152090
## GLICEMIA 0.435213 0.137271 3.170 0.001613 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.1 on 514 degrees of freedom
## Multiple R-squared: 0.2311, Adjusted R-squared: 0.2161
## F-statistic: 15.45 on 10 and 514 DF, p-value: < 2.2e-16
mod.lm2 <- lm(PIMED ~ IMC + HRSEDCAL + NMEDPAS + NMEDPAD
+ MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA, data=dados)
summary(mod.lm2)
##
## Call:
## lm(formula = PIMED ~ IMC + HRSEDCAL + NMEDPAS + NMEDPAD + MEDCABDO +
## TOTAFIS + HDL + TG + GLICEMIA, data = dados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.187 -15.149 -1.309 12.452 87.543
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -33.773178 15.737078 -2.146 0.032332 *
## IMC -0.648195 0.576482 -1.124 0.261369
## HRSEDCAL -0.523568 0.520002 -1.007 0.314476
## NMEDPAS 0.913437 0.129818 7.036 6.32e-12 ***
## NMEDPAD -0.672279 0.173382 -3.877 0.000119 ***
## MEDCABDO 0.652984 0.269615 2.422 0.015783 *
## TOTAFIS 0.010215 0.002957 3.455 0.000596 ***
## HDL -0.212881 0.102069 -2.086 0.037501 *
## TG -0.035516 0.025476 -1.394 0.163895
## GLICEMIA 0.449161 0.136279 3.296 0.001049 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.1 on 515 degrees of freedom
## Multiple R-squared: 0.23, Adjusted R-squared: 0.2165
## F-statistic: 17.09 on 9 and 515 DF, p-value: < 2.2e-16
mod.lm3 <- lm(PIMED ~ NMEDPAS + NMEDPAD
+ MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA, data=dados)
summary(mod.lm3)
##
## Call:
## lm(formula = PIMED ~ NMEDPAS + NMEDPAD + MEDCABDO + TOTAFIS +
## HDL + TG + GLICEMIA, data = dados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -56.286 -14.936 -1.156 12.322 87.195
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -29.255726 15.201637 -1.925 0.054839 .
## NMEDPAS 0.947570 0.127966 7.405 5.37e-13 ***
## NMEDPAD -0.715083 0.171170 -4.178 3.46e-05 ***
## MEDCABDO 0.369939 0.122011 3.032 0.002551 **
## TOTAFIS 0.010184 0.002956 3.445 0.000617 ***
## HDL -0.232759 0.101031 -2.304 0.021628 *
## TG -0.036519 0.025477 -1.433 0.152339
## GLICEMIA 0.450480 0.135937 3.314 0.000985 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.11 on 517 degrees of freedom
## Multiple R-squared: 0.2264, Adjusted R-squared: 0.2159
## F-statistic: 21.61 on 7 and 517 DF, p-value: < 2.2e-16
mod.lmf4 <- lm(PIMED ~ NMEDPAS + NMEDPAD
+ MEDCABDO + TOTAFIS + HDL + GLICEMIA, data=dados)
summary(mod.lmf4)
##
## Call:
## lm(formula = PIMED ~ NMEDPAS + NMEDPAD + MEDCABDO + TOTAFIS +
## HDL + GLICEMIA, data = dados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -55.516 -14.716 -1.275 12.215 86.441
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -27.792813 15.182774 -1.831 0.067742 .
## NMEDPAS 0.958693 0.127861 7.498 2.83e-13 ***
## NMEDPAD -0.754794 0.169085 -4.464 9.88e-06 ***
## MEDCABDO 0.329498 0.118825 2.773 0.005755 **
## TOTAFIS 0.010275 0.002958 3.473 0.000557 ***
## HDL -0.206334 0.099436 -2.075 0.038476 *
## GLICEMIA 0.433368 0.135549 3.197 0.001473 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.13 on 518 degrees of freedom
## Multiple R-squared: 0.2233, Adjusted R-squared: 0.2143
## F-statistic: 24.82 on 6 and 518 DF, p-value: < 2.2e-16
Análise dos resíduos
par(mfrow=c(2,2))
plot(mod.lmf6)
par(mfrow=c(2,2))
plot(mod.lmf4)
Agora aplicaremos um teste de normalidade para verificar se os residuos seguem uma distribuição normal.
shapiro.test(mod.lmf6$residuals)
##
## Shapiro-Wilk normality test
##
## data: mod.lmf6$residuals
## W = 0.95893, p-value = 6.309e-11
shapiro.test(mod.lmf4$residuals)
##
## Shapiro-Wilk normality test
##
## data: mod.lmf4$residuals
## W = 0.97345, p-value = 3.716e-08
A normalidade dos resíduos é um dos pressupostos da análise de regressão. São os resíduos e não os dados que devem apresentar normalidade. Os resíduos não apresetam normalidade.