Análises Descritiva do Banco de Dados

Faremos uma análise descritiva de um banco de dados, onde estão listados os dados de 525 alunos do ensino medio de uma escola pulblica da Paraiba.

Atividade de Consultoria

Para analisar os dados utilizaremos o pacote “fBasics” para uma analises descritivas, entre outros pacotes para visualização gráfica.

setwd("C:\\Users\\Mateus\\Desktop\\P7\\Consultoria")


#install.packages("fBasics")
#install.packages("ggplot2")
#install.packages("car")
#install.packages("qqplot")

library(fBasics)
## Loading required package: timeDate
## Loading required package: timeSeries
library(ggplot2)
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:fBasics':
## 
##     densityPlot
library(corrplot)
## corrplot 0.84 loaded

importanto o banco de dados

dados<-read.table("Dados.txt",h=T)

dim(dados)
## [1] 525  13
summary(dados)
##      PEMED           PIMED            IDADE            IMC       
##  Min.   : 43.0   Min.   : 22.00   Min.   :15.00   Min.   :14.70  
##  1st Qu.: 82.7   1st Qu.: 62.00   1st Qu.:16.00   1st Qu.:19.10  
##  Median : 96.3   Median : 76.00   Median :16.80   Median :20.70  
##  Mean   :100.7   Mean   : 78.31   Mean   :16.82   Mean   :21.57  
##  3rd Qu.:115.3   3rd Qu.: 90.00   3rd Qu.:17.50   3rd Qu.:23.20  
##  Max.   :237.3   Max.   :182.00   Max.   :19.90   Max.   :40.00  
##     HRSEDCAL         NMEDPAS         NMEDPAD         MEDCABDO     
##  Min.   : 0.000   Min.   : 86.5   Min.   :48.50   Min.   : 56.00  
##  1st Qu.: 2.000   1st Qu.:103.0   1st Qu.:62.50   1st Qu.: 66.00  
##  Median : 3.000   Median :109.0   Median :66.50   Median : 69.60  
##  Mean   : 3.173   Mean   :109.9   Mean   :66.87   Mean   : 71.45  
##  3rd Qu.: 4.000   3rd Qu.:116.5   3rd Qu.:71.50   3rd Qu.: 74.80  
##  Max.   :12.000   Max.   :143.5   Max.   :93.50   Max.   :116.00  
##     TOTAFIS            HDL               TG            GLICEMIA    
##  Min.   :   0.0   Min.   : 20.00   Min.   : 30.00   Min.   :55.00  
##  1st Qu.: 140.0   1st Qu.: 36.00   1st Qu.: 58.00   1st Qu.:71.00  
##  Median : 245.0   Median : 41.00   Median : 74.00   Median :76.00  
##  Mean   : 331.8   Mean   : 41.93   Mean   : 82.73   Mean   :75.95  
##  3rd Qu.: 440.0   3rd Qu.: 47.00   3rd Qu.: 96.00   3rd Qu.:81.00  
##  Max.   :2830.0   Max.   :142.00   Max.   :423.00   Max.   :98.00  
##     ESCMATER     
##  Min.   : 0.000  
##  1st Qu.: 6.000  
##  Median : 9.000  
##  Mean   : 8.808  
##  3rd Qu.:12.000  
##  Max.   :17.000
names(dados)
##  [1] "PEMED"    "PIMED"    "IDADE"    "IMC"      "HRSEDCAL" "NMEDPAS" 
##  [7] "NMEDPAD"  "MEDCABDO" "TOTAFIS"  "HDL"      "TG"       "GLICEMIA"
## [13] "ESCMATER"
attach(dados)

Usando o pacote fBasics para uma análise descritiva.

basicStats(dados)
##                    PEMED        PIMED       IDADE          IMC    HRSEDCAL
## nobs          525.000000   525.000000  525.000000   525.000000  525.000000
## NAs             0.000000     0.000000    0.000000     0.000000    0.000000
## Minimum        43.000000    22.000000   15.000000    14.700000    0.000000
## Maximum       237.300000   182.000000   19.900000    40.000000   12.000000
## 1. Quartile    82.700000    62.000000   16.000000    19.100000    2.000000
## 3. Quartile   115.300000    90.000000   17.500000    23.200000    4.000000
## Mean          100.673905    78.310476   16.820571    21.572952    3.172952
## Median         96.300000    76.000000   16.800000    20.700000    3.000000
## Sum         52853.800000 41113.000000 8830.800000 11325.800000 1665.800000
## SE Mean         1.224955     1.040270    0.045468     0.170325    0.078835
## LCL Mean       98.267478    76.266865   16.731250    21.238348    3.018081
## UCL Mean      103.080332    80.354087   16.909893    21.907556    3.327824
## Variance      787.770787   568.134337    1.085339    15.230603    3.262855
## Stdev          28.067255    23.835569    1.041796     3.902641    1.806337
## Skewness        1.038744     0.820519    0.529699     1.453463    1.388155
## Kurtosis        2.056601     1.267316   -0.104454     2.942668    3.205288
##                  NMEDPAS      NMEDPAD     MEDCABDO      TOTAFIS
## nobs          525.000000   525.000000   525.000000    525.00000
## NAs             0.000000     0.000000     0.000000      0.00000
## Minimum        86.500000    48.500000    56.000000      0.00000
## Maximum       143.500000    93.500000   116.000000   2830.00000
## 1. Quartile   103.000000    62.500000    66.000000    140.00000
## 3. Quartile   116.500000    71.500000    74.800000    440.00000
## Mean          109.898095    66.867238    71.454667    331.80571
## Median        109.000000    66.500000    69.600000    245.00000
## Sum         57696.500000 35105.300000 37513.700000 174198.00000
## SE Mean         0.432794     0.306405     0.382907     13.92042
## LCL Mean      109.047870    66.265306    70.702445    304.45904
## UCL Mean      110.748321    67.469170    72.206889    359.15239
## Variance       98.338279    49.288963    76.974430 101733.42401
## Stdev           9.916566     7.020610     8.773507    318.95677
## Skewness        0.375215     0.286196     1.586484      2.43862
## Kurtosis       -0.046822     0.182574     3.590059     10.18325
##                      HDL           TG     GLICEMIA    ESCMATER
## nobs          525.000000   525.000000   525.000000  525.000000
## NAs             0.000000     0.000000     0.000000    0.000000
## Minimum        20.000000    30.000000    55.000000    0.000000
## Maximum       142.000000   423.000000    98.000000   17.000000
## 1. Quartile    36.000000    58.000000    71.000000    6.000000
## 3. Quartile    47.000000    96.000000    81.000000   12.000000
## Mean           41.925714    82.729524    75.946667    8.807619
## Median         41.000000    74.000000    76.000000    9.000000
## Sum         22011.000000 43433.000000 39872.000000 4624.000000
## SE Mean         0.417345     1.720215     0.303467    0.157308
## LCL Mean       41.105839    79.350158    75.350506    8.498587
## UCL Mean       42.745590    86.108889    76.542827    9.116651
## Variance       91.442944  1553.548840    48.348295   12.991545
## Stdev           9.562580    39.415084     6.953294    3.604379
## Skewness        2.649277     2.654582     0.165793   -0.192904
## Kurtosis       22.520426    13.425566    -0.114004   -0.340546
vari <- cor(dados);head(vari)
##                PEMED       PIMED       IDADE        IMC     HRSEDCAL
## PEMED    1.000000000  0.72559574  0.01486833 0.14716324  0.003280082
## PIMED    0.725595745  1.00000000  0.01019430 0.19818722 -0.049940080
## IDADE    0.014868333  0.01019430  1.00000000 0.09506044 -0.051798833
## IMC      0.147163235  0.19818722  0.09506044 1.00000000  0.122805758
## HRSEDCAL 0.003280082 -0.04994008 -0.05179883 0.12280576  1.000000000
## NMEDPAS  0.279391148  0.35365670  0.08501958 0.34032151  0.012047742
##             NMEDPAS      NMEDPAD  MEDCABDO     TOTAFIS         HDL
## PEMED    0.27939115 -0.004540354 0.2252654  0.22055515 -0.15134771
## PIMED    0.35365670  0.055875790 0.2690124  0.19953510 -0.14193394
## IDADE    0.08501958 -0.031359245 0.1046211 -0.02992841 -0.01348557
## IMC      0.34032151  0.257262415 0.9076445  0.02844188 -0.14109484
## HRSEDCAL 0.01204774  0.115513480 0.1023404 -0.04451273  0.01840038
## NMEDPAS  1.00000000  0.613765014 0.4181621  0.10962250 -0.09825295
##                    TG    GLICEMIA      ESCMATER
## PEMED     0.003873632  0.12221209  0.0707419622
## PIMED     0.034973607  0.18091543 -0.0192287506
## IDADE    -0.035575964 -0.09210784 -0.1729092766
## IMC       0.289920982  0.10481776  0.0001177863
## HRSEDCAL  0.058748966  0.05780541  0.0384181761
## NMEDPAS   0.171696767  0.13709032 -0.0384472453

Plot

Gerando um box-plot para uma melhor visualização das variáveis

Para uma boa análise estatística se faz necessário identificar a possível distribuição de probabilidade que os dados seguem, e a mais comumente utilizada é a Distribuição Normal. Para verificar isto pode-se utilizar um histograma acompanhado da distribuição normal da variável.

par(mfrow = c(3,3))
histPlot(as.timeSeries(dados))

densityPlot(as.timeSeries(dados))

Baseados nos gráficos anteriores, vimos que grande parte das variáveis não segue distribuição normal, ao contrario das variáveis GLICEMIA, NMEDPAS e NMEDPAD.

Usaremos o gráfico Q-qplot, para visualizar.

par(mfrow=c(2,2))
qqPlot(PEMED)
## [1] 337 377

qqPlot(PIMED)

## [1] 305 369
qqPlot(IDADE)

## [1] 282 369
qqPlot(IMC)

## [1] 385 252
qqPlot(HRSEDCAL)

## [1] 364 306
qqPlot(NMEDPAS)

## [1]  64 256
qqPlot(NMEDPAD)

## [1] 379 210
qqPlot(MEDCABDO)

## [1] 75 42
qqPlot(TOTAFIS)

## [1] 486 441
qqPlot(HDL)

## [1]  56 164
qqPlot(TG)

## [1] 213 485
qqPlot(GLICEMIA)

## [1]  74 274
par(mfrow=c(1,1))
qqPlot(ESCMATER)

## [1] 12 17

Logo percebemos que as variáveis GLICEMIA, NMEDPAS e NMEDPAD seguem, aproximadamente, uma distribuição normal.

O gráfico das correlações é dado por:

corrplot(vari,method="circle")

nortest ad.tesr, teste de normalidade Anderson Dales

Ajustando modelos

comando das covariancias

PEMED

names(dados)
##  [1] "PEMED"    "PIMED"    "IDADE"    "IMC"      "HRSEDCAL" "NMEDPAS" 
##  [7] "NMEDPAD"  "MEDCABDO" "TOTAFIS"  "HDL"      "TG"       "GLICEMIA"
## [13] "ESCMATER"
mod.lm <- lm(PEMED ~ IDADE + IMC + HRSEDCAL + NMEDPAS + NMEDPAD 
             + MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA + ESCMATER, data=dados)
summary(mod.lm)
## 
## Call:
## lm(formula = PEMED ~ IDADE + IMC + HRSEDCAL + NMEDPAS + NMEDPAD + 
##     MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA + ESCMATER, data = dados)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -56.319 -15.392  -3.154  11.919 132.494 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.915719  26.835516   0.407 0.684351    
## IDADE       -0.425695   1.106764  -0.385 0.700671    
## IMC         -1.265368   0.699488  -1.809 0.071038 .  
## HRSEDCAL     0.372214   0.627307   0.593 0.553207    
## NMEDPAS      0.957581   0.157876   6.065 2.56e-09 ***
## NMEDPAD     -0.913729   0.210209  -4.347 1.67e-05 ***
## MEDCABDO     0.915734   0.328320   2.789 0.005481 ** 
## TOTAFIS      0.013892   0.003573   3.888 0.000115 ***
## HDL         -0.302855   0.123333  -2.456 0.014396 *  
## TG          -0.051642   0.030733  -1.680 0.093496 .  
## GLICEMIA     0.288259   0.165993   1.737 0.083062 .  
## ESCMATER     0.446438   0.318520   1.402 0.161639    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.42 on 513 degrees of freedom
## Multiple R-squared:  0.1969, Adjusted R-squared:  0.1797 
## F-statistic: 11.44 on 11 and 513 DF,  p-value: < 2.2e-16
#Removendo maior P-Value

mod.lm1 <- lm(PEMED ~ IMC + HRSEDCAL + NMEDPAS + NMEDPAD 
             + MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA + ESCMATER, data=dados)
summary(mod.lm1)
## 
## Call:
## lm(formula = PEMED ~ IMC + HRSEDCAL + NMEDPAS + NMEDPAD + MEDCABDO + 
##     TOTAFIS + HDL + TG + GLICEMIA + ESCMATER, data = dados)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -56.007 -15.617  -3.132  12.247 133.238 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.616437  18.958009   0.191 0.848789    
## IMC         -1.264080   0.698900  -1.809 0.071086 .  
## HRSEDCAL     0.381736   0.626298   0.610 0.542454    
## NMEDPAS      0.951391   0.156924   6.063 2.59e-09 ***
## NMEDPAD     -0.905115   0.208839  -4.334 1.76e-05 ***
## MEDCABDO     0.908534   0.327514   2.774 0.005738 ** 
## TOTAFIS      0.013972   0.003564   3.920 0.000101 ***
## HDL         -0.303869   0.123202  -2.466 0.013972 *  
## TG          -0.051071   0.030671  -1.665 0.096502 .  
## GLICEMIA     0.294755   0.164994   1.786 0.074615 .  
## ESCMATER     0.466305   0.314044   1.485 0.138199    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.4 on 514 degrees of freedom
## Multiple R-squared:  0.1967, Adjusted R-squared:  0.1811 
## F-statistic: 12.59 on 10 and 514 DF,  p-value: < 2.2e-16
mod.lm2 <- lm(PEMED ~ IMC + NMEDPAS + NMEDPAD 
             + MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA + ESCMATER, data=dados)
summary(mod.lm2)
## 
## Call:
## lm(formula = PEMED ~ IMC + NMEDPAS + NMEDPAD + MEDCABDO + TOTAFIS + 
##     HDL + TG + GLICEMIA + ESCMATER, data = dados)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -56.182 -15.768  -3.251  12.198 132.480 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.819005  18.943526   0.202 0.840309    
## IMC         -1.240335   0.697387  -1.779 0.075904 .  
## NMEDPAS      0.942461   0.156143   6.036 3.03e-09 ***
## NMEDPAD     -0.889417   0.207119  -4.294 2.10e-05 ***
## MEDCABDO     0.907415   0.327309   2.772 0.005767 ** 
## TOTAFIS      0.013925   0.003561   3.910 0.000105 ***
## HDL         -0.302036   0.123091  -2.454 0.014467 *  
## TG          -0.050901   0.030651  -1.661 0.097393 .  
## GLICEMIA     0.299799   0.164686   1.820 0.069274 .  
## ESCMATER     0.471972   0.313714   1.504 0.133075    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.38 on 515 degrees of freedom
## Multiple R-squared:  0.1961, Adjusted R-squared:  0.1821 
## F-statistic: 13.96 on 9 and 515 DF,  p-value: < 2.2e-16
mod.lm3 <- lm(PEMED ~ IMC + NMEDPAS + NMEDPAD 
             + MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA, data=dados)
summary(mod.lm3)
## 
## Call:
## lm(formula = PEMED ~ IMC + NMEDPAS + NMEDPAD + MEDCABDO + TOTAFIS + 
##     HDL + TG + GLICEMIA, data = dados)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.931 -15.891  -3.683  12.657 132.266 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.862387  18.953989   0.257  0.79764    
## IMC         -1.362617   0.693482  -1.965  0.04996 *  
## NMEDPAS      0.920554   0.155653   5.914 6.08e-09 ***
## NMEDPAD     -0.878236   0.207239  -4.238 2.67e-05 ***
## MEDCABDO     0.973064   0.324784   2.996  0.00287 ** 
## TOTAFIS      0.014203   0.003561   3.989 7.61e-05 ***
## HDL         -0.288495   0.122911  -2.347  0.01929 *  
## TG          -0.050492   0.030688  -1.645  0.10051    
## GLICEMIA     0.326482   0.163929   1.992  0.04694 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.41 on 516 degrees of freedom
## Multiple R-squared:  0.1926, Adjusted R-squared:  0.1801 
## F-statistic: 15.38 on 8 and 516 DF,  p-value: < 2.2e-16
mod.lm4 <- lm(PEMED ~ IMC + NMEDPAS + NMEDPAD 
             + MEDCABDO + TOTAFIS + HDL + GLICEMIA, data=dados)
summary(mod.lm4)
## 
## Call:
## lm(formula = PEMED ~ IMC + NMEDPAS + NMEDPAD + MEDCABDO + TOTAFIS + 
##     HDL + GLICEMIA, data = dados)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.830 -15.865  -3.662  12.924 132.629 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.678006  18.953055   0.352  0.72472    
## IMC         -1.391634   0.694401  -2.004  0.04558 *  
## NMEDPAS      0.935003   0.155661   6.007 3.57e-09 ***
## NMEDPAD     -0.932204   0.204964  -4.548 6.75e-06 ***
## MEDCABDO     0.929307   0.324228   2.866  0.00432 ** 
## TOTAFIS      0.014332   0.003566   4.019 6.71e-05 ***
## HDL         -0.251250   0.121008  -2.076  0.03836 *  
## GLICEMIA     0.302468   0.163547   1.849  0.06497 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.46 on 517 degrees of freedom
## Multiple R-squared:  0.1884, Adjusted R-squared:  0.1774 
## F-statistic: 17.14 on 7 and 517 DF,  p-value: < 2.2e-16
mod.lm5 <- lm(PEMED ~ IMC + NMEDPAS + NMEDPAD 
             + MEDCABDO + TOTAFIS + HDL, data=dados)
summary(mod.lm5)
## 
## Call:
## lm(formula = PEMED ~ IMC + NMEDPAS + NMEDPAD + MEDCABDO + TOTAFIS + 
##     HDL, data = dados)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -61.04 -15.70  -3.41  11.69 134.07 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 24.584180  16.331486   1.505 0.132850    
## IMC         -1.458466   0.695079  -2.098 0.036365 *  
## NMEDPAS      0.965939   0.155121   6.227 9.84e-10 ***
## NMEDPAD     -0.956936   0.205005  -4.668 3.88e-06 ***
## MEDCABDO     0.986000   0.323528   3.048 0.002424 ** 
## TOTAFIS      0.013974   0.003569   3.915 0.000102 ***
## HDL         -0.231477   0.120816  -1.916 0.055922 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.52 on 518 degrees of freedom
## Multiple R-squared:  0.183,  Adjusted R-squared:  0.1735 
## F-statistic: 19.34 on 6 and 518 DF,  p-value: < 2.2e-16
mod.lmf6 <- lm(PEMED ~ IMC + NMEDPAS + NMEDPAD 
             + MEDCABDO + TOTAFIS, data=dados)
summary(mod.lmf6)
## 
## Call:
## lm(formula = PEMED ~ IMC + NMEDPAS + NMEDPAD + MEDCABDO + TOTAFIS, 
##     data = dados)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -62.592 -15.986  -3.646  11.463 134.977 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.74726   14.41573   0.676  0.49924    
## IMC         -1.64006    0.69036  -2.376  0.01788 *  
## NMEDPAS      0.96243    0.15551   6.189 1.23e-09 ***
## NMEDPAD     -0.95586    0.20553  -4.651 4.20e-06 ***
## MEDCABDO     1.11495    0.31726   3.514  0.00048 ***
## TOTAFIS      0.01442    0.00357   4.040 6.15e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.58 on 519 degrees of freedom
## Multiple R-squared:  0.1772, Adjusted R-squared:  0.1693 
## F-statistic: 22.35 on 5 and 519 DF,  p-value: < 2.2e-16

PIMED

mod.lm <- lm(PIMED ~ IDADE + IMC + HRSEDCAL + NMEDPAS + NMEDPAD 
             + MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA + ESCMATER, data=dados)
summary(mod.lm)
## 
## Call:
## lm(formula = PIMED ~ IDADE + IMC + HRSEDCAL + NMEDPAS + NMEDPAD + 
##     MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA + ESCMATER, data = dados)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -58.083 -14.768  -0.931  12.424  86.327 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.466041  22.279458  -0.784 0.433430    
## IDADE        -0.924230   0.918861  -1.006 0.314965    
## IMC          -0.706403   0.580731  -1.216 0.224390    
## HRSEDCAL     -0.531781   0.520804  -1.021 0.307699    
## NMEDPAS       0.917379   0.131073   6.999 8.10e-12 ***
## NMEDPAD      -0.686498   0.174520  -3.934 9.52e-05 ***
## MEDCABDO      0.697984   0.272579   2.561 0.010732 *  
## TOTAFIS       0.010167   0.002967   3.427 0.000659 ***
## HDL          -0.204687   0.102394  -1.999 0.046133 *  
## TG           -0.036578   0.025515  -1.434 0.152296    
## GLICEMIA      0.446816   0.137811   3.242 0.001263 ** 
## ESCMATER     -0.254201   0.264443  -0.961 0.336868    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.1 on 513 degrees of freedom
## Multiple R-squared:  0.2325, Adjusted R-squared:  0.216 
## F-statistic: 14.13 on 11 and 513 DF,  p-value: < 2.2e-16
#Removendo maio P-Value

mod.lm1 <- lm(PIMED ~ IDADE + IMC + HRSEDCAL + NMEDPAS + NMEDPAD 
             + MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA, data=dados)
summary(mod.lm1)
## 
## Call:
## lm(formula = PIMED ~ IDADE + IMC + HRSEDCAL + NMEDPAS + NMEDPAD + 
##     MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA, data = dados)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -56.640 -15.190  -1.036  12.123  86.658 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -20.460969  22.058909  -0.928 0.354071    
## IDADE        -0.780997   0.906632  -0.861 0.389404    
## IMC          -0.640988   0.576687  -1.112 0.266872    
## HRSEDCAL     -0.543190   0.520630  -1.043 0.297285    
## NMEDPAS       0.926433   0.130724   7.087 4.55e-12 ***
## NMEDPAD      -0.688858   0.174490  -3.948 8.99e-05 ***
## MEDCABDO      0.661120   0.269848   2.450 0.014619 *  
## TOTAFIS       0.010046   0.002964   3.390 0.000754 ***
## HDL          -0.212054   0.102099  -2.077 0.038303 *  
## TG           -0.036594   0.025513  -1.434 0.152090    
## GLICEMIA      0.435213   0.137271   3.170 0.001613 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.1 on 514 degrees of freedom
## Multiple R-squared:  0.2311, Adjusted R-squared:  0.2161 
## F-statistic: 15.45 on 10 and 514 DF,  p-value: < 2.2e-16
mod.lm2 <- lm(PIMED ~ IMC + HRSEDCAL + NMEDPAS + NMEDPAD 
             + MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA, data=dados)
summary(mod.lm2)
## 
## Call:
## lm(formula = PIMED ~ IMC + HRSEDCAL + NMEDPAS + NMEDPAD + MEDCABDO + 
##     TOTAFIS + HDL + TG + GLICEMIA, data = dados)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.187 -15.149  -1.309  12.452  87.543 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -33.773178  15.737078  -2.146 0.032332 *  
## IMC          -0.648195   0.576482  -1.124 0.261369    
## HRSEDCAL     -0.523568   0.520002  -1.007 0.314476    
## NMEDPAS       0.913437   0.129818   7.036 6.32e-12 ***
## NMEDPAD      -0.672279   0.173382  -3.877 0.000119 ***
## MEDCABDO      0.652984   0.269615   2.422 0.015783 *  
## TOTAFIS       0.010215   0.002957   3.455 0.000596 ***
## HDL          -0.212881   0.102069  -2.086 0.037501 *  
## TG           -0.035516   0.025476  -1.394 0.163895    
## GLICEMIA      0.449161   0.136279   3.296 0.001049 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.1 on 515 degrees of freedom
## Multiple R-squared:   0.23,  Adjusted R-squared:  0.2165 
## F-statistic: 17.09 on 9 and 515 DF,  p-value: < 2.2e-16
mod.lm3 <- lm(PIMED ~ NMEDPAS + NMEDPAD 
             + MEDCABDO + TOTAFIS + HDL + TG + GLICEMIA, data=dados)
summary(mod.lm3)
## 
## Call:
## lm(formula = PIMED ~ NMEDPAS + NMEDPAD + MEDCABDO + TOTAFIS + 
##     HDL + TG + GLICEMIA, data = dados)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -56.286 -14.936  -1.156  12.322  87.195 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -29.255726  15.201637  -1.925 0.054839 .  
## NMEDPAS       0.947570   0.127966   7.405 5.37e-13 ***
## NMEDPAD      -0.715083   0.171170  -4.178 3.46e-05 ***
## MEDCABDO      0.369939   0.122011   3.032 0.002551 ** 
## TOTAFIS       0.010184   0.002956   3.445 0.000617 ***
## HDL          -0.232759   0.101031  -2.304 0.021628 *  
## TG           -0.036519   0.025477  -1.433 0.152339    
## GLICEMIA      0.450480   0.135937   3.314 0.000985 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.11 on 517 degrees of freedom
## Multiple R-squared:  0.2264, Adjusted R-squared:  0.2159 
## F-statistic: 21.61 on 7 and 517 DF,  p-value: < 2.2e-16
mod.lmf4 <- lm(PIMED ~ NMEDPAS + NMEDPAD 
             + MEDCABDO + TOTAFIS + HDL + GLICEMIA, data=dados)
summary(mod.lmf4)
## 
## Call:
## lm(formula = PIMED ~ NMEDPAS + NMEDPAD + MEDCABDO + TOTAFIS + 
##     HDL + GLICEMIA, data = dados)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -55.516 -14.716  -1.275  12.215  86.441 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -27.792813  15.182774  -1.831 0.067742 .  
## NMEDPAS       0.958693   0.127861   7.498 2.83e-13 ***
## NMEDPAD      -0.754794   0.169085  -4.464 9.88e-06 ***
## MEDCABDO      0.329498   0.118825   2.773 0.005755 ** 
## TOTAFIS       0.010275   0.002958   3.473 0.000557 ***
## HDL          -0.206334   0.099436  -2.075 0.038476 *  
## GLICEMIA      0.433368   0.135549   3.197 0.001473 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.13 on 518 degrees of freedom
## Multiple R-squared:  0.2233, Adjusted R-squared:  0.2143 
## F-statistic: 24.82 on 6 and 518 DF,  p-value: < 2.2e-16

Análise dos resíduos

par(mfrow=c(2,2))
plot(mod.lmf6)

par(mfrow=c(2,2))
plot(mod.lmf4)

Agora aplicaremos um teste de normalidade para verificar se os residuos seguem uma distribuição normal.

shapiro.test(mod.lmf6$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  mod.lmf6$residuals
## W = 0.95893, p-value = 6.309e-11
shapiro.test(mod.lmf4$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  mod.lmf4$residuals
## W = 0.97345, p-value = 3.716e-08

A normalidade dos resíduos é um dos pressupostos da análise de regressão. São os resíduos e não os dados que devem apresentar normalidade. Os resíduos não apresetam normalidade.