Final Exam

Problem 1.

c. \(P(X < x\text{ and } X > y )\)

The requested probability is 0.42675
Y > y Y <= y
X > x 3760 1240
X >= x 3740 1260

Marginal probabilities:

\(P(Y>y)=\) 0.75.
\(P(Y\le y)=\) 0.25.
\(P(X>x)=\) 0.5.
\(P(X\le x)=\) 0.5.

Joint probabilities:

\(P(X>x\text{ and }Y>y)=\) 0.376.
\(P(X>x\text{ and }Y\le y)=\) 0.124.
\(P(X\le x\text{ and }Y>y)=\) 0.374.
\(P(X\le x\text{ and }Y\le y)=\) 0.126.

These probabilities are confirmed in the table below.

Y > y Y <= y
X > x 0.376 0.124
X >= x 0.374 0.126

Further, the joint probability \(P(X>x\text{ and }Y>y)=\) 0.376.

The product of marginal probabilities

\(P(X>x)=\) 0.5 and
\(P(Y>y)=\) 0.75 is

\(P(X>x)P(Y>y)=\) 0.375.


    Fisher's Exact Test for Count Data

data:  XY_matrix
p-value = 0.6608
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.932132 1.119534
sample estimates:
odds ratio 
  1.021566 

    Pearson's Chi-squared test with Yates' continuity correction

data:  tb_XY
X-squared = 0.19253, df = 1, p-value = 0.6608

FIsher’s Test is for 2-by-2 contingency tables; chi-square is appropriate when comparing two categorical variables of more than 2 categories each.

Since the p-value for both tests are equivalent and both greater than 0.05, we do not reject the null hypothesis of the test (the distribution of X is independent of the distribution of Y) at the 5% significance level. This is the expected result, as the variables were generated randomly and independently.

Problem 2.

Predict the final price, “Sale Price”, of each home.

Provide univariate descriptive statistics and appropriate plots for the training data set.

hp_train 

 80  Variables      1460  Observations
--------------------------------------------------------------------------------
MSSubClass 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0       15     0.94     56.9    43.19       20       20 
     .25      .50      .75      .90      .95 
      20       50       70      120      160 

lowest :  20  30  40  45  50, highest:  90 120 160 180 190
                                                                            
Value         20    30    40    45    50    60    70    75    80    85    90
Frequency    536    69     4    12   144   299    60    16    58    20    52
Proportion 0.367 0.047 0.003 0.008 0.099 0.205 0.041 0.011 0.040 0.014 0.036
                                  
Value        120   160   180   190
Frequency     87    63    10    30
Proportion 0.060 0.043 0.007 0.021
--------------------------------------------------------------------------------
MSZoning 
       n  missing distinct 
    1460        0        5 

lowest : C (all) FV      RH      RL      RM     
highest: C (all) FV      RH      RL      RM     
                                                  
Value      C (all)      FV      RH      RL      RM
Frequency       10      65      16    1151     218
Proportion   0.007   0.045   0.011   0.788   0.149
--------------------------------------------------------------------------------
LotFrontage 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1201      259      110    0.998    70.05    24.61       34       44 
     .25      .50      .75      .90      .95 
      59       69       80       96      107 

lowest :  21  24  30  32  33, highest: 160 168 174 182 313
--------------------------------------------------------------------------------
LotArea 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0     1073        1    10517     5718     3312     5000 
     .25      .50      .75      .90      .95 
    7554     9478    11602    14382    17401 

lowest :   1300   1477   1491   1526   1533, highest:  70761 115149 159000 164660 215245
--------------------------------------------------------------------------------
Street 
       n  missing distinct 
    1460        0        2 
                      
Value       Grvl  Pave
Frequency      6  1454
Proportion 0.004 0.996
--------------------------------------------------------------------------------
Alley 
       n  missing distinct 
      91     1369        2 
                      
Value       Grvl  Pave
Frequency     50    41
Proportion 0.549 0.451
--------------------------------------------------------------------------------
LotShape 
       n  missing distinct 
    1460        0        4 
                                  
Value        IR1   IR2   IR3   Reg
Frequency    484    41    10   925
Proportion 0.332 0.028 0.007 0.634
--------------------------------------------------------------------------------
LandContour 
       n  missing distinct 
    1460        0        4 
                                  
Value        Bnk   HLS   Low   Lvl
Frequency     63    50    36  1311
Proportion 0.043 0.034 0.025 0.898
--------------------------------------------------------------------------------
Utilities 
       n  missing distinct 
    1460        0        2 
                        
Value      AllPub NoSeWa
Frequency    1459      1
Proportion  0.999  0.001
--------------------------------------------------------------------------------
LotConfig 
       n  missing distinct 
    1460        0        5 

lowest : Corner  CulDSac FR2     FR3     Inside 
highest: Corner  CulDSac FR2     FR3     Inside 
                                                  
Value       Corner CulDSac     FR2     FR3  Inside
Frequency      263      94      47       4    1052
Proportion   0.180   0.064   0.032   0.003   0.721
--------------------------------------------------------------------------------
LandSlope 
       n  missing distinct 
    1460        0        3 
                            
Value        Gtl   Mod   Sev
Frequency   1382    65    13
Proportion 0.947 0.045 0.009
--------------------------------------------------------------------------------
Neighborhood 
       n  missing distinct 
    1460        0       25 

lowest : Blmngtn Blueste BrDale  BrkSide ClearCr
highest: Somerst StoneBr SWISU   Timber  Veenker
--------------------------------------------------------------------------------
Condition1 
       n  missing distinct 
    1460        0        9 

lowest : Artery Feedr  Norm   PosA   PosN  , highest: PosN   RRAe   RRAn   RRNe   RRNn  
                                                                         
Value      Artery  Feedr   Norm   PosA   PosN   RRAe   RRAn   RRNe   RRNn
Frequency      48     81   1260      8     19     11     26      2      5
Proportion  0.033  0.055  0.863  0.005  0.013  0.008  0.018  0.001  0.003
--------------------------------------------------------------------------------
Condition2 
       n  missing distinct 
    1460        0        8 

lowest : Artery Feedr  Norm   PosA   PosN  , highest: PosA   PosN   RRAe   RRAn   RRNn  
                                                                  
Value      Artery  Feedr   Norm   PosA   PosN   RRAe   RRAn   RRNn
Frequency       2      6   1445      1      2      1      1      2
Proportion  0.001  0.004  0.990  0.001  0.001  0.001  0.001  0.001
--------------------------------------------------------------------------------
BldgType 
       n  missing distinct 
    1460        0        5 

lowest : 1Fam   2fmCon Duplex Twnhs  TwnhsE, highest: 1Fam   2fmCon Duplex Twnhs  TwnhsE
                                             
Value        1Fam 2fmCon Duplex  Twnhs TwnhsE
Frequency    1220     31     52     43    114
Proportion  0.836  0.021  0.036  0.029  0.078
--------------------------------------------------------------------------------
HouseStyle 
       n  missing distinct 
    1460        0        8 

lowest : 1.5Fin 1.5Unf 1Story 2.5Fin 2.5Unf, highest: 2.5Fin 2.5Unf 2Story SFoyer SLvl  
                                                                  
Value      1.5Fin 1.5Unf 1Story 2.5Fin 2.5Unf 2Story SFoyer   SLvl
Frequency     154     14    726      8     11    445     37     65
Proportion  0.105  0.010  0.497  0.005  0.008  0.305  0.025  0.045
--------------------------------------------------------------------------------
OverallQual 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0       10    0.951    6.099    1.522        4        5 
     .25      .50      .75      .90      .95 
       5        6        7        8        8 

lowest :  1  2  3  4  5, highest:  6  7  8  9 10
                                                                      
Value          1     2     3     4     5     6     7     8     9    10
Frequency      2     3    20   116   397   374   319   168    43    18
Proportion 0.001 0.002 0.014 0.079 0.272 0.256 0.218 0.115 0.029 0.012
--------------------------------------------------------------------------------
OverallCond 
       n  missing distinct     Info     Mean      Gmd 
    1460        0        9    0.814    5.575    1.111 

lowest : 1 2 3 4 5, highest: 5 6 7 8 9
                                                                
Value          1     2     3     4     5     6     7     8     9
Frequency      1     5    25    57   821   252   205    72    22
Proportion 0.001 0.003 0.017 0.039 0.562 0.173 0.140 0.049 0.015
--------------------------------------------------------------------------------
YearBuilt 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0      112        1     1971    33.88     1916     1925 
     .25      .50      .75      .90      .95 
    1954     1973     2000     2006     2007 

lowest : 1872 1875 1880 1882 1885, highest: 2006 2007 2008 2009 2010
--------------------------------------------------------------------------------
YearRemodAdd 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0       61    0.997     1985    23.05     1950     1950 
     .25      .50      .75      .90      .95 
    1967     1994     2004     2006     2007 

lowest : 1950 1951 1952 1953 1954, highest: 2006 2007 2008 2009 2010
--------------------------------------------------------------------------------
RoofStyle 
       n  missing distinct 
    1460        0        6 

lowest : Flat    Gable   Gambrel Hip     Mansard
highest: Gable   Gambrel Hip     Mansard Shed   
                                                          
Value         Flat   Gable Gambrel     Hip Mansard    Shed
Frequency       13    1141      11     286       7       2
Proportion   0.009   0.782   0.008   0.196   0.005   0.001
--------------------------------------------------------------------------------
RoofMatl 
       n  missing distinct 
    1460        0        8 

lowest : ClyTile CompShg Membran Metal   Roll   
highest: Metal   Roll    Tar&Grv WdShake WdShngl
                                                                          
Value      ClyTile CompShg Membran   Metal    Roll Tar&Grv WdShake WdShngl
Frequency        1    1434       1       1       1      11       5       6
Proportion   0.001   0.982   0.001   0.001   0.001   0.008   0.003   0.004
--------------------------------------------------------------------------------
Exterior1st 
       n  missing distinct 
    1460        0       15 

lowest : AsbShng AsphShn BrkComm BrkFace CBlock 
highest: Stone   Stucco  VinylSd Wd Sdng WdShing
                                                                          
Value      AsbShng AsphShn BrkComm BrkFace  CBlock CemntBd HdBoard ImStucc
Frequency       20       1       2      50       1      61     222       1
Proportion   0.014   0.001   0.001   0.034   0.001   0.042   0.152   0.001
                                                                  
Value      MetalSd Plywood   Stone  Stucco VinylSd Wd Sdng WdShing
Frequency      220     108       2      25     515     206      26
Proportion   0.151   0.074   0.001   0.017   0.353   0.141   0.018
--------------------------------------------------------------------------------
Exterior2nd 
       n  missing distinct 
    1460        0       16 

lowest : AsbShng AsphShn Brk Cmn BrkFace CBlock 
highest: Stone   Stucco  VinylSd Wd Sdng Wd Shng
                                                                          
Value      AsbShng AsphShn Brk Cmn BrkFace  CBlock CmentBd HdBoard ImStucc
Frequency       20       3       7      25       1      60     207      10
Proportion   0.014   0.002   0.005   0.017   0.001   0.041   0.142   0.007
                                                                          
Value      MetalSd   Other Plywood   Stone  Stucco VinylSd Wd Sdng Wd Shng
Frequency      214       1     142       5      26     504     197      38
Proportion   0.147   0.001   0.097   0.003   0.018   0.345   0.135   0.026
--------------------------------------------------------------------------------
MasVnrType 
       n  missing distinct 
    1452        8        4 
                                          
Value       BrkCmn BrkFace    None   Stone
Frequency       15     445     864     128
Proportion   0.010   0.306   0.595   0.088
--------------------------------------------------------------------------------
MasVnrArea 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1452        8      327    0.791    103.7    156.9        0        0 
     .25      .50      .75      .90      .95 
       0        0      166      335      456 

lowest :    0    1   11   14   16, highest: 1115 1129 1170 1378 1600
--------------------------------------------------------------------------------
ExterQual 
       n  missing distinct 
    1460        0        4 
                                  
Value         Ex    Fa    Gd    TA
Frequency     52    14   488   906
Proportion 0.036 0.010 0.334 0.621
--------------------------------------------------------------------------------
ExterCond 
       n  missing distinct 
    1460        0        5 

lowest : Ex Fa Gd Po TA, highest: Ex Fa Gd Po TA
                                        
Value         Ex    Fa    Gd    Po    TA
Frequency      3    28   146     1  1282
Proportion 0.002 0.019 0.100 0.001 0.878
--------------------------------------------------------------------------------
Foundation 
       n  missing distinct 
    1460        0        6 

lowest : BrkTil CBlock PConc  Slab   Stone , highest: CBlock PConc  Slab   Stone  Wood  
                                                    
Value      BrkTil CBlock  PConc   Slab  Stone   Wood
Frequency     146    634    647     24      6      3
Proportion  0.100  0.434  0.443  0.016  0.004  0.002
--------------------------------------------------------------------------------
BsmtQual 
       n  missing distinct 
    1423       37        4 
                                  
Value         Ex    Fa    Gd    TA
Frequency    121    35   618   649
Proportion 0.085 0.025 0.434 0.456
--------------------------------------------------------------------------------
BsmtCond 
       n  missing distinct 
    1423       37        4 
                                  
Value         Fa    Gd    Po    TA
Frequency     45    65     2  1311
Proportion 0.032 0.046 0.001 0.921
--------------------------------------------------------------------------------
BsmtExposure 
       n  missing distinct 
    1422       38        4 
                                  
Value         Av    Gd    Mn    No
Frequency    221   134   114   953
Proportion 0.155 0.094 0.080 0.670
--------------------------------------------------------------------------------
BsmtFinType1 
       n  missing distinct 
    1423       37        6 

lowest : ALQ BLQ GLQ LwQ Rec, highest: BLQ GLQ LwQ Rec Unf
                                              
Value        ALQ   BLQ   GLQ   LwQ   Rec   Unf
Frequency    220   148   418    74   133   430
Proportion 0.155 0.104 0.294 0.052 0.093 0.302
--------------------------------------------------------------------------------
BsmtFinSF1 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0      637    0.967    443.6    484.5      0.0      0.0 
     .25      .50      .75      .90      .95 
     0.0    383.5    712.2   1065.5   1274.0 

lowest :    0    2   16   20   24, highest: 1904 2096 2188 2260 5644
--------------------------------------------------------------------------------
BsmtFinType2 
       n  missing distinct 
    1422       38        6 

lowest : ALQ BLQ GLQ LwQ Rec, highest: BLQ GLQ LwQ Rec Unf
                                              
Value        ALQ   BLQ   GLQ   LwQ   Rec   Unf
Frequency     19    33    14    46    54  1256
Proportion 0.013 0.023 0.010 0.032 0.038 0.883
--------------------------------------------------------------------------------
BsmtFinSF2 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0      144    0.305    46.55    86.58      0.0      0.0 
     .25      .50      .75      .90      .95 
     0.0      0.0      0.0    117.2    396.2 

lowest :    0   28   32   35   40, highest: 1080 1085 1120 1127 1474
--------------------------------------------------------------------------------
BsmtUnfSF 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0      780    0.999    567.2    486.6      0.0     74.9 
     .25      .50      .75      .90      .95 
   223.0    477.5    808.0   1232.0   1468.0 

lowest :    0   14   15   23   26, highest: 2042 2046 2121 2153 2336
--------------------------------------------------------------------------------
TotalBsmtSF 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0      721        1     1057    459.5    519.3    636.9 
     .25      .50      .75      .90      .95 
   795.8    991.5   1298.2   1602.2   1753.0 

lowest :    0  105  190  264  270, highest: 3094 3138 3200 3206 6110
--------------------------------------------------------------------------------
Heating 
       n  missing distinct 
    1460        0        6 

lowest : Floor GasA  GasW  Grav  OthW , highest: GasA  GasW  Grav  OthW  Wall 
                                              
Value      Floor  GasA  GasW  Grav  OthW  Wall
Frequency      1  1428    18     7     2     4
Proportion 0.001 0.978 0.012 0.005 0.001 0.003
--------------------------------------------------------------------------------
HeatingQC 
       n  missing distinct 
    1460        0        5 

lowest : Ex Fa Gd Po TA, highest: Ex Fa Gd Po TA
                                        
Value         Ex    Fa    Gd    Po    TA
Frequency    741    49   241     1   428
Proportion 0.508 0.034 0.165 0.001 0.293
--------------------------------------------------------------------------------
CentralAir 
       n  missing distinct 
    1460        0        2 
                      
Value          N     Y
Frequency     95  1365
Proportion 0.065 0.935
--------------------------------------------------------------------------------
Electrical 
       n  missing distinct 
    1459        1        5 

lowest : FuseA FuseF FuseP Mix   SBrkr, highest: FuseA FuseF FuseP Mix   SBrkr
                                        
Value      FuseA FuseF FuseP   Mix SBrkr
Frequency     94    27     3     1  1334
Proportion 0.064 0.019 0.002 0.001 0.914
--------------------------------------------------------------------------------
X1stFlrSF 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0      753        1     1163    416.4    673.0    756.9 
     .25      .50      .75      .90      .95 
   882.0   1087.0   1391.2   1680.0   1831.2 

lowest :  334  372  438  480  483, highest: 2633 2898 3138 3228 4692
--------------------------------------------------------------------------------
X2ndFlrSF 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0      417    0.817      347    450.2      0.0      0.0 
     .25      .50      .75      .90      .95 
     0.0      0.0    728.0    954.2   1141.0 

lowest :    0  110  167  192  208, highest: 1611 1796 1818 1872 2065
--------------------------------------------------------------------------------
LowQualFinSF 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0       24    0.052    5.845    11.55        0        0 
     .25      .50      .75      .90      .95 
       0        0        0        0        0 

lowest :   0  53  80 120 144, highest: 513 514 515 528 572
--------------------------------------------------------------------------------
GrLivArea 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0      861        1     1515    563.1      848      912 
     .25      .50      .75      .90      .95 
    1130     1464     1777     2158     2466 

lowest :  334  438  480  520  605, highest: 3627 4316 4476 4676 5642
--------------------------------------------------------------------------------
BsmtFullBath 
       n  missing distinct     Info     Mean      Gmd 
    1460        0        4    0.733   0.4253   0.5085 
                                  
Value          0     1     2     3
Frequency    856   588    15     1
Proportion 0.586 0.403 0.010 0.001
--------------------------------------------------------------------------------
BsmtHalfBath 
       n  missing distinct     Info     Mean      Gmd 
    1460        0        3    0.159  0.05753   0.1088 
                            
Value          0     1     2
Frequency   1378    80     2
Proportion 0.944 0.055 0.001
--------------------------------------------------------------------------------
FullBath 
       n  missing distinct     Info     Mean      Gmd 
    1460        0        4    0.766    1.565   0.5521 
                                  
Value          0     1     2     3
Frequency      9   650   768    33
Proportion 0.006 0.445 0.526 0.023
--------------------------------------------------------------------------------
HalfBath 
       n  missing distinct     Info     Mean      Gmd 
    1460        0        3    0.706   0.3829   0.4852 
                            
Value          0     1     2
Frequency    913   535    12
Proportion 0.625 0.366 0.008
--------------------------------------------------------------------------------
BedroomAbvGr 
       n  missing distinct     Info     Mean      Gmd 
    1460        0        8    0.815    2.866    0.818 

lowest : 0 1 2 3 4, highest: 3 4 5 6 8
                                                          
Value          0     1     2     3     4     5     6     8
Frequency      6    50   358   804   213    21     7     1
Proportion 0.004 0.034 0.245 0.551 0.146 0.014 0.005 0.001
--------------------------------------------------------------------------------
KitchenAbvGr 
       n  missing distinct     Info     Mean      Gmd 
    1460        0        4    0.133    1.047  0.09174 
                                  
Value          0     1     2     3
Frequency      1  1392    65     2
Proportion 0.001 0.953 0.045 0.001
--------------------------------------------------------------------------------
KitchenQual 
       n  missing distinct 
    1460        0        4 
                                  
Value         Ex    Fa    Gd    TA
Frequency    100    39   586   735
Proportion 0.068 0.027 0.401 0.503
--------------------------------------------------------------------------------
TotRmsAbvGrd 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0       12    0.958    6.518    1.762        4        5 
     .25      .50      .75      .90      .95 
       5        6        7        9       10 

lowest :  2  3  4  5  6, highest:  9 10 11 12 14
                                                                            
Value          2     3     4     5     6     7     8     9    10    11    12
Frequency      1    17    97   275   402   329   187    75    47    18    11
Proportion 0.001 0.012 0.066 0.188 0.275 0.225 0.128 0.051 0.032 0.012 0.008
                
Value         14
Frequency      1
Proportion 0.001
--------------------------------------------------------------------------------
Functional 
       n  missing distinct 
    1460        0        7 

lowest : Maj1 Maj2 Min1 Min2 Mod , highest: Min1 Min2 Mod  Sev  Typ 
                                                    
Value       Maj1  Maj2  Min1  Min2   Mod   Sev   Typ
Frequency     14     5    31    34    15     1  1360
Proportion 0.010 0.003 0.021 0.023 0.010 0.001 0.932
--------------------------------------------------------------------------------
Fireplaces 
       n  missing distinct     Info     Mean      Gmd 
    1460        0        4    0.806    0.613   0.6566 
                                  
Value          0     1     2     3
Frequency    690   650   115     5
Proportion 0.473 0.445 0.079 0.003
--------------------------------------------------------------------------------
FireplaceQu 
       n  missing distinct 
     770      690        5 

lowest : Ex Fa Gd Po TA, highest: Ex Fa Gd Po TA
                                        
Value         Ex    Fa    Gd    Po    TA
Frequency     24    33   380    20   313
Proportion 0.031 0.043 0.494 0.026 0.406
--------------------------------------------------------------------------------
GarageType 
       n  missing distinct 
    1379       81        6 

lowest : 2Types  Attchd  Basment BuiltIn CarPort
highest: Attchd  Basment BuiltIn CarPort Detchd 
                                                          
Value       2Types  Attchd Basment BuiltIn CarPort  Detchd
Frequency        6     870      19      88       9     387
Proportion   0.004   0.631   0.014   0.064   0.007   0.281
--------------------------------------------------------------------------------
GarageYrBlt 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1379       81       97        1     1979    27.63     1930     1945 
     .25      .50      .75      .90      .95 
    1961     1980     2002     2006     2007 

lowest : 1900 1906 1908 1910 1914, highest: 2006 2007 2008 2009 2010
--------------------------------------------------------------------------------
GarageFinish 
       n  missing distinct 
    1379       81        3 
                            
Value        Fin   RFn   Unf
Frequency    352   422   605
Proportion 0.255 0.306 0.439
--------------------------------------------------------------------------------
GarageCars 
       n  missing distinct     Info     Mean      Gmd 
    1460        0        5    0.802    1.767   0.7609 

lowest : 0 1 2 3 4, highest: 0 1 2 3 4
                                        
Value          0     1     2     3     4
Frequency     81   369   824   181     5
Proportion 0.055 0.253 0.564 0.124 0.003
--------------------------------------------------------------------------------
GarageArea 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0      441        1      473    234.9      0.0    240.0 
     .25      .50      .75      .90      .95 
   334.5    480.0    576.0    757.1    850.1 

lowest :    0  160  164  180  186, highest: 1220 1248 1356 1390 1418
--------------------------------------------------------------------------------
GarageQual 
       n  missing distinct 
    1379       81        5 

lowest : Ex Fa Gd Po TA, highest: Ex Fa Gd Po TA
                                        
Value         Ex    Fa    Gd    Po    TA
Frequency      3    48    14     3  1311
Proportion 0.002 0.035 0.010 0.002 0.951
--------------------------------------------------------------------------------
GarageCond 
       n  missing distinct 
    1379       81        5 

lowest : Ex Fa Gd Po TA, highest: Ex Fa Gd Po TA
                                        
Value         Ex    Fa    Gd    Po    TA
Frequency      2    35     9     7  1326
Proportion 0.001 0.025 0.007 0.005 0.962
--------------------------------------------------------------------------------
PavedDrive 
       n  missing distinct 
    1460        0        3 
                            
Value          N     P     Y
Frequency     90    30  1340
Proportion 0.062 0.021 0.918
--------------------------------------------------------------------------------
WoodDeckSF 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0      274    0.858    94.24      125        0        0 
     .25      .50      .75      .90      .95 
       0        0      168      262      335 

lowest :   0  12  24  26  28, highest: 668 670 728 736 857
--------------------------------------------------------------------------------
OpenPorchSF 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0      202    0.909    46.66    62.43        0        0 
     .25      .50      .75      .90      .95 
       0       25       68      130      175 

lowest :   0   4   8  10  11, highest: 406 418 502 523 547
--------------------------------------------------------------------------------
EnclosedPorch 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0      120    0.369    21.95    39.39      0.0      0.0 
     .25      .50      .75      .90      .95 
     0.0      0.0      0.0    112.0    180.1 

lowest :   0  19  20  24  30, highest: 301 318 330 386 552
--------------------------------------------------------------------------------
X3SsnPorch 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0       20    0.049     3.41    6.739        0        0 
     .25      .50      .75      .90      .95 
       0        0        0        0        0 

lowest :   0  23  96 130 140, highest: 290 304 320 407 508
                                                                            
Value          0    23    96   130   140   144   153   162   168   180   182
Frequency   1436     1     1     1     1     2     1     1     3     2     1
Proportion 0.984 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.002 0.001 0.001
                                                                
Value        196   216   238   245   290   304   320   407   508
Frequency      1     2     1     1     1     1     1     1     1
Proportion 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001
--------------------------------------------------------------------------------
ScreenPorch 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0       76     0.22    15.06    28.27        0        0 
     .25      .50      .75      .90      .95 
       0        0        0        0      160 

lowest :   0  40  53  60  63, highest: 385 396 410 440 480
--------------------------------------------------------------------------------
PoolArea 
       n  missing distinct     Info     Mean      Gmd 
    1460        0        8    0.014    2.759    5.497 

lowest :   0 480 512 519 555, highest: 519 555 576 648 738
                                                          
Value          0   480   512   519   555   576   648   738
Frequency   1453     1     1     1     1     1     1     1
Proportion 0.995 0.001 0.001 0.001 0.001 0.001 0.001 0.001
--------------------------------------------------------------------------------
PoolQC 
       n  missing distinct 
       7     1453        3 
                            
Value         Ex    Fa    Gd
Frequency      2     2     3
Proportion 0.286 0.286 0.429
--------------------------------------------------------------------------------
Fence 
       n  missing distinct 
     281     1179        4 
                                  
Value      GdPrv  GdWo MnPrv  MnWw
Frequency     59    54   157    11
Proportion 0.210 0.192 0.559 0.039
--------------------------------------------------------------------------------
MiscFeature 
       n  missing distinct 
      54     1406        4 
                                  
Value       Gar2  Othr  Shed  TenC
Frequency      2     2    49     1
Proportion 0.037 0.037 0.907 0.019
--------------------------------------------------------------------------------
MiscVal 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0       21    0.103    43.49    85.67        0        0 
     .25      .50      .75      .90      .95 
       0        0        0        0        0 

lowest :     0    54   350   400   450, highest:  2000  2500  3500  8300 15500
                                                                            
Value          0    50   350   400   450   500   550   600   700   800  1150
Frequency   1408     1     1    11     4    10     1     5     5     1     1
Proportion 0.964 0.001 0.001 0.008 0.003 0.007 0.001 0.003 0.003 0.001 0.001
                                                          
Value       1200  1300  1400  2000  2500  3500  8300 15500
Frequency      2     1     1     4     1     1     1     1
Proportion 0.001 0.001 0.001 0.003 0.001 0.001 0.001 0.001

For the frequency table, variable is rounded to the nearest 50
--------------------------------------------------------------------------------
MoSold 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0       12    0.985    6.322    3.041        2        3 
     .25      .50      .75      .90      .95 
       5        6        8       10       11 

lowest :  1  2  3  4  5, highest:  8  9 10 11 12
                                                                            
Value          1     2     3     4     5     6     7     8     9    10    11
Frequency     58    52   106   141   204   253   234   122    63    89    79
Proportion 0.040 0.036 0.073 0.097 0.140 0.173 0.160 0.084 0.043 0.061 0.054
                
Value         12
Frequency     59
Proportion 0.040
--------------------------------------------------------------------------------
YrSold 
       n  missing distinct     Info     Mean      Gmd 
    1460        0        5    0.955     2008    1.498 

lowest : 2006 2007 2008 2009 2010, highest: 2006 2007 2008 2009 2010
                                        
Value       2006  2007  2008  2009  2010
Frequency    314   329   304   338   175
Proportion 0.215 0.225 0.208 0.232 0.120
--------------------------------------------------------------------------------
SaleType 
       n  missing distinct 
    1460        0        9 

lowest : COD   Con   ConLD ConLI ConLw, highest: ConLw CWD   New   Oth   WD   
                                                                
Value        COD   Con ConLD ConLI ConLw   CWD   New   Oth    WD
Frequency     43     2     9     5     5     4   122     3  1267
Proportion 0.029 0.001 0.006 0.003 0.003 0.003 0.084 0.002 0.868
--------------------------------------------------------------------------------
SaleCondition 
       n  missing distinct 
    1460        0        6 

lowest : Abnorml AdjLand Alloca  Family  Normal 
highest: AdjLand Alloca  Family  Normal  Partial
                                                          
Value      Abnorml AdjLand  Alloca  Family  Normal Partial
Frequency      101       4      12      20    1198     125
Proportion   0.069   0.003   0.008   0.014   0.821   0.086
--------------------------------------------------------------------------------
SalePrice 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    1460        0      663        1   180921    81086    88000   106475 
     .25      .50      .75      .90      .95 
  129975   163000   214000   278000   326100 

lowest :  34900  35311  37900  39300  40000, highest: 582933 611657 625000 745000 755000
--------------------------------------------------------------------------------

Provide a scatterplot matrix for at least two of the independent variables and the dependent variable.

Derive a correlation matrix for any three quantitative variables in the dataset.

Test the hypotheses that the correlations between each pairwise set of variables is 0 and provide an 80% confidence interval.

Discuss the meaning of your analysis. Would you be worried about familywise error? Why or why not?

Invert your correlation matrix from above. (This is known as the precision matrix and contains variance inflation factors on the diagonal.) Multiply the correlation matrix by the precision matrix, and then multiply the precision matrix by the correlation matrix. Conduct LU decomposition on the matrix.

Many times, it makes sense to fit a closed form distribution to data. Select a variable in the Kaggle.com training dataset that is skewed to the right, shift it so that the minimum value is absolutely above zero if necessary. Then load the MASS package and run fitdistr to fit an exponential probability density function. (See https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/fitdistr.html).

Find the optimal value of \(\lambda\) for this distribution, and then take 1000 samples from this exponential distribution using this value (e.g., rexp(1000, \(\lambda\))). Plot a histogram and compare it with a histogram of your original variable.

Using the exponential pdf, find the 5th and 95th percentiles using the cumulative distribution function (CDF).

Also generate a 95% confidence interval from the empirical data, assuming normality.

Finally, provide the empirical 5th percentile and 95th percentile of the data. Discuss.

Build some type of multiple regression model and submit your model to the competition board. Provide your complete model summary and results with analysis. Report your Kaggle.com user name and score.

Stephen (Scott) Jones

12/12/2019