Project Proposal

Data Description

Context and Purpose of Data:

I will be exploring Eating and Health module datasets and will be interested in analyzing interaction of different parameters like health, exercise, income, weight etc.. This data is collected by USDA’s Economic Research Service along with other cosponsors. This data is about American Time Use Survey (ATUS) respondents primary and secondary eating habits- eating while doing another activity; soft drink consumption; grocery shopping preferences and fast food purchases; meal preparation and food safety practices; food assistance participation; general health, height and weight, and exercise; and income. This data is collected at different years and I will focus on data that is captured for year 2014

More can be found regarding the data on https://www.ers.usda.gov/data-products/eating-and-health-module-atus/

Actual data source:http://www.bls.gov/tus/special.requests/ehresp_2014.zip

Data Source for Analysis:https://raw.githubusercontent.com/taus01/EatingHabit/master/ehresp_2014.dat

Content: The EH Respondent file contains information about EH respondents, including general health and body mass index. There 11212 observations(respondents) and 37 variables.

There are 34 integer variables and 3 numeric variables in the data stet.

The complete data dictionary can be found at: http://www.bls.gov/tus/ehmintcodebk1416.pdf

Missing Values: There are curtain variables which have non valid entries. I will be treating them as missing values in our data. For example EUSTREASON variable have negative values which are not valid.

Import Data

library(tibble)
library(Hmisc)

url<-"https://raw.githubusercontent.com/taus01/EatingHabit/master/ehresp_2014.dat"
eh_respdt<-read.delim(url,header=T,sep=",")
eh_respdtt<-as_tibble(eh_respdt) ### converting dataframe as tibble
head(eh_respdtt)
## # A tibble: 6 × 37
##      TUCASEID TULINENO EEINCOME1 ERBMI ERHHCH ERINCOME ERSPEMCH ERTPREAT
##         <dbl>    <int>     <int> <dbl>  <int>    <int>    <int>    <int>
## 1 2.01401e+13        1        -2  33.2      1       -1       -1       30
## 2 2.01401e+13        1         1  22.7      3        1       -1       45
## 3 2.01401e+13        1         2  49.4      3        5       -1       60
## 4 2.01401e+13        1        -2  -1.0      3       -1       -1        0
## 5 2.01401e+13        1         2  31.0      3        5       -1       65
## 6 2.01401e+13        1         1  30.7      3        1        1       20
## # ... with 29 more variables: ERTSEAT <int>, ETHGT <int>, ETWGT <int>,
## #   EUDIETSODA <int>, EUDRINK <int>, EUEAT <int>, EUEXERCISE <int>,
## #   EUEXFREQ <int>, EUFASTFD <int>, EUFASTFDFRQ <int>, EUFFYDAY <int>,
## #   EUFDSIT <int>, EUFINLWGT <dbl>, EUSNAP <int>, EUGENHTH <int>,
## #   EUGROSHP <int>, EUHGT <int>, EUINCLVL <int>, EUINCOME2 <int>,
## #   EUMEAT <int>, EUMILK <int>, EUPRPMEL <int>, EUSODA <int>,
## #   EUSTORES <int>, EUSTREASON <int>, EUTHERM <int>, EUWGT <int>,
## #   EUWIC <int>, EXINCOME1 <int>
dim(eh_respdtt) ### dimention of data
## [1] 11212    37
str(eh_respdtt)
## Classes 'tbl_df', 'tbl' and 'data.frame':    11212 obs. of  37 variables:
##  $ TUCASEID   : num  2.01e+13 2.01e+13 2.01e+13 2.01e+13 2.01e+13 ...
##  $ TULINENO   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ EEINCOME1  : int  -2 1 2 -2 2 1 1 1 1 1 ...
##  $ ERBMI      : num  33.2 22.7 49.4 -1 31 30.7 33.3 27.5 25.8 28.3 ...
##  $ ERHHCH     : int  1 3 3 3 3 3 1 3 3 3 ...
##  $ ERINCOME   : int  -1 1 5 -1 5 1 1 1 1 1 ...
##  $ ERSPEMCH   : int  -1 -1 -1 -1 -1 1 5 -1 -1 5 ...
##  $ ERTPREAT   : int  30 45 60 0 65 20 30 30 117 80 ...
##  $ ERTSEAT    : int  2 14 0 0 0 10 5 5 10 0 ...
##  $ ETHGT      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ ETWGT      : int  0 0 0 -1 0 0 0 0 0 0 ...
##  $ EUDIETSODA : int  -1 -1 -1 2 -1 1 -1 -1 -1 2 ...
##  $ EUDRINK    : int  2 2 1 1 1 1 1 2 2 1 ...
##  $ EUEAT      : int  1 1 2 2 2 1 1 1 1 2 ...
##  $ EUEXERCISE : int  2 2 2 2 1 1 2 1 1 2 ...
##  $ EUEXFREQ   : int  -1 -1 -1 -1 5 2 -1 3 6 -1 ...
##  $ EUFASTFD   : int  2 1 2 2 2 1 1 1 2 1 ...
##  $ EUFASTFDFRQ: int  -1 1 -1 -1 -1 3 3 1 -1 2 ...
##  $ EUFFYDAY   : int  -1 2 -1 -1 -1 1 2 2 -1 1 ...
##  $ EUFDSIT    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ EUFINLWGT  : num  5202085 29396791 26009936 2728880 17527153 ...
##  $ EUSNAP     : int  1 2 2 2 1 2 2 2 2 2 ...
##  $ EUGENHTH   : int  1 2 5 2 4 3 2 2 3 1 ...
##  $ EUGROSHP   : int  1 3 2 1 1 2 3 1 1 1 ...
##  $ EUHGT      : int  60 63 62 64 69 71 65 63 70 65 ...
##  $ EUINCLVL   : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ EUINCOME2  : int  -2 -1 2 -2 2 -1 -1 -1 -1 -1 ...
##  $ EUMEAT     : int  1 1 -1 2 1 -1 1 1 1 1 ...
##  $ EUMILK     : int  2 2 -1 2 2 -1 2 2 2 2 ...
##  $ EUPRPMEL   : int  1 1 2 1 1 2 3 1 1 1 ...
##  $ EUSODA     : int  -1 -1 2 1 2 1 2 -1 -1 1 ...
##  $ EUSTORES   : int  2 1 -1 2 1 -1 2 1 1 3 ...
##  $ EUSTREASON : int  1 2 -1 6 1 -1 5 3 4 1 ...
##  $ EUTHERM    : int  2 2 -1 -1 2 -1 2 2 2 2 ...
##  $ EUWGT      : int  170 128 270 -2 210 220 200 155 180 170 ...
##  $ EUWIC      : int  1 2 2 2 1 2 2 -1 -1 -1 ...
##  $ EXINCOME1  : int  2 0 12 2 0 0 0 0 0 0 ...
describe(eh_respdtt)
## eh_respdtt 
## 
##  37  Variables      11212  Observations
## ---------------------------------------------------------------------------
## TUCASEID 
##         n   missing  distinct      Info      Mean       Gmd       .05 
##     11212         0     11212         1 2.014e+13 397798037 2.014e+13 
##       .10       .25       .50       .75       .90       .95 
## 2.014e+13 2.014e+13 2.014e+13 2.014e+13 2.014e+13 2.014e+13 
## 
## lowest : 2.014010e+13 2.014010e+13 2.014010e+13 2.014010e+13 2.014010e+13
## highest: 2.014121e+13 2.014121e+13 2.014121e+13 2.014121e+13 2.014121e+13 
##                                                               
## Value      2.014010e+13 2.014011e+13 2.014020e+13 2.014021e+13
## Frequency           107          766          786          136
## Proportion        0.010        0.068        0.070        0.012
##                                                               
## Value      2.014030e+13 2.014031e+13 2.014040e+13 2.014050e+13
## Frequency          1051            4          836          865
## Proportion        0.094        0.000        0.075        0.077
##                                                               
## Value      2.014051e+13 2.014060e+13 2.014061e+13 2.014070e+13
## Frequency            80          163          872           11
## Proportion        0.007        0.015        0.078        0.001
##                                                               
## Value      2.014071e+13 2.014081e+13 2.014091e+13 2.014101e+13
## Frequency           830          916         1056          846
## Proportion        0.074        0.082        0.094        0.075
##                                     
## Value      2.014111e+13 2.014121e+13
## Frequency           948          939
## Proportion        0.085        0.084
## ---------------------------------------------------------------------------
## TULINENO 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        1        0        1        0 
## 
## 1 (11212, 1)
## ---------------------------------------------------------------------------
## EEINCOME1 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        6    0.728    1.294   0.7169 
## 
## lowest : -3 -2 -1  1  2, highest: -2 -1  1  2  3 
## 
## -3 (140, 0.012), -2 (155, 0.014), -1 (21, 0.002), 1 (6990, 0.623), 2
## (3454, 0.308), 3 (452, 0.040)
## ---------------------------------------------------------------------------
## ERBMI 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##    11212        0      375        1    26.29    8.727     -1.0     19.9 
##      .25      .50      .75      .90      .95 
##     23.0     26.5     30.4     35.4     39.2 
## 
## lowest : -1.0 13.0 13.7 13.9 14.5, highest: 60.2 61.4 66.4 68.7 73.6 
## ---------------------------------------------------------------------------
## ERHHCH 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        3    0.188    2.885    0.216 
## 
## 1 (534, 0.048), 2 (219, 0.020), 3 (10459, 0.933)
## ---------------------------------------------------------------------------
## ERINCOME 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        6    0.747    2.036    1.653 
## 
## lowest : -1  1  2  3  4, highest:  1  2  3  4  5 
## 
## -1 (280, 0.025), 1 (6990, 0.623), 2 (533, 0.048), 3 (976, 0.087), 4 (36,
## 0.003), 5 (2397, 0.214)
## ---------------------------------------------------------------------------
## ERSPEMCH 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        6    0.794    1.873    2.988 
## 
## lowest : -1  1  2  3  4, highest:  1  2  3  4  5 
## 
## -1 (5535, 0.494), 1 (232, 0.021), 2 (93, 0.008), 3 (238, 0.021), 4 (172,
## 0.015), 5 (4942, 0.441)
## ---------------------------------------------------------------------------
## ERTPREAT 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##    11212        0      205    0.997    65.68    50.84        5       15 
##      .25      .50      .75      .90      .95 
##       30       60       90      125      150 
## 
## lowest :   0   1   2   3   4, highest: 365 390 466 490 508 
## ---------------------------------------------------------------------------
## ERTSEAT 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##    11212        0      201    0.904    16.76    26.74        0        0 
##      .25      .50      .75      .90      .95 
##        0        3       15       30       60 
## 
## lowest :  -3  -2   0   1   2, highest: 735 765 810 844 990 
## ---------------------------------------------------------------------------
## ETHGT 
##         n   missing  distinct      Info      Mean       Gmd 
##     11212         0         4     0.064 -0.003122   0.05065 
## 
## -1 (161, 0.014), 0 (10968, 0.978), 1 (40, 0.004), 2 (43, 0.004)
## ---------------------------------------------------------------------------
## ETWGT 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        4    0.152 -0.03113    0.112 
## 
## -1 (500, 0.045), 0 (10610, 0.946), 1 (53, 0.005), 2 (49, 0.004)
## ---------------------------------------------------------------------------
## EUDIETSODA 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        6    0.608  -0.2867    1.081 
## 
## lowest : -3 -2 -1  1  2, highest: -2 -1  1  2  3 
## 
## -3 (2, 0.000), -2 (4, 0.000), -1 (8169, 0.729), 1 (1181, 0.105), 2 (1780,
## 0.159), 3 (76, 0.007)
## ---------------------------------------------------------------------------
## EUDRINK 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        4    0.663    1.326   0.4469 
## 
## -3 (1, 0.000), -2 (9, 0.001), 1 (7517, 0.670), 2 (3685, 0.329)
## ---------------------------------------------------------------------------
## EUEAT 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        4    0.747    1.432   0.5288 
## 
## -3 (2, 0.000), -2 (61, 0.005), 1 (6112, 0.545), 2 (5037, 0.449)
## ---------------------------------------------------------------------------
## EUEXERCISE 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        5    0.705    1.353   0.4982 
## 
## lowest : -3 -2 -1  1  2, highest: -3 -2 -1  1  2 
## 
## -3 (30, 0.003), -2 (8, 0.001), -1 (19, 0.002), 1 (7014, 0.626), 2 (4141,
## 0.369)
## ---------------------------------------------------------------------------
## EUEXFREQ 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##    11212        0       29    0.941    2.237    3.434       -1       -1 
##      .25      .50      .75      .90      .95 
##       -1        2        4        7        7 
## 
## lowest : -3 -2 -1  1  2, highest: 25 28 30 35 38 
## ---------------------------------------------------------------------------
## EUFASTFD 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        5    0.734    1.407   0.5108 
## 
## lowest : -3 -2 -1  1  2, highest: -3 -2 -1  1  2 
## 
## -3 (11, 0.001), -2 (26, 0.002), -1 (6, 0.001), 1 (6470, 0.577), 2 (4699,
## 0.419)
## ---------------------------------------------------------------------------
## EUFASTFDFRQ 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##    11212        0       20    0.913    1.133    2.499       -1       -1 
##      .25      .50      .75      .90      .95 
##       -1        1        2        4        6 
## 
## lowest : -2 -1  1  2  3, highest: 14 15 17 20 21 
##                                                                       
## Value         -2    -1     1     2     3     4     5     6     7     8
## Frequency     30  4742  2119  1779  1065   537   376   130   251    34
## Proportion 0.003 0.423 0.189 0.159 0.095 0.048 0.034 0.012 0.022 0.003
##                                                                       
## Value          9    10    11    12    13    14    15    17    20    21
## Frequency     10    66     5    18     2    25    11     4     4     4
## Proportion 0.001 0.006 0.000 0.002 0.000 0.002 0.001 0.000 0.000 0.000
## ---------------------------------------------------------------------------
## EUFFYDAY 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        5    0.866   0.5181    1.442 
## 
## lowest : -3 -2 -1  1  2, highest: -3 -2 -1  1  2 
## 
## -3 (2, 0.000), -2 (2, 0.000), -1 (4745, 0.423), 1 (2362, 0.211), 2 (4101,
## 0.366)
## ---------------------------------------------------------------------------
## EUFDSIT 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        6    0.184    1.059   0.1663 
## 
## lowest : -3 -2 -1  1  2, highest: -2 -1  1  2  3 
## 
## -3 (21, 0.002), -2 (12, 0.001), -1 (18, 0.002), 1 (10477, 0.934), 2 (548,
## 0.049), 3 (136, 0.012)
## ---------------------------------------------------------------------------
## EUFINLWGT 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##    11212        0    11191        1  8206540  6819055  1887407  2324833 
##      .25      .50      .75      .90      .95 
##  3497206  6005618 10273325 16871894 21768757 
## 
## lowest :    756843.8    809689.9    824944.5    847037.5    849728.2
## highest:  77669591.6  77792880.8  81002063.2  86042323.1 103211628.8 
## ---------------------------------------------------------------------------
## EUSNAP 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        5    0.296    1.868   0.2384 
## 
## lowest : -3 -2 -1  1  2, highest: -3 -2 -1  1  2 
## 
## -3 (21, 0.002), -2 (38, 0.003), -1 (18, 0.002), 1 (1164, 0.104), 2 (9971,
## 0.889)
## ---------------------------------------------------------------------------
## EUGENHTH 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        8    0.924    2.477    1.212 
## 
## lowest : -3 -2 -1  1  2, highest:  1  2  3  4  5 
## 
## -3 (29, 0.003), -2 (36, 0.003), -1 (19, 0.002), 1 (2017, 0.180), 2 (3757,
## 0.335), 3 (3491, 0.311), 4 (1367, 0.122), 5 (496, 0.044)
## ---------------------------------------------------------------------------
## EUGROSHP 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        5    0.746    1.503    0.687 
## 
## lowest : -3 -2  1  2  3, highest: -3 -2  1  2  3 
## 
## -3 (1, 0.000), -2 (2, 0.000), 1 (6914, 0.617), 2 (2940, 0.262), 3 (1355,
## 0.121)
## ---------------------------------------------------------------------------
## EUHGT 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##    11212        0       25    0.995    65.63    6.492    60.00    61.00 
##      .25      .50      .75      .90      .95 
##    63.00    66.00    70.00    72.00    73.45 
## 
## lowest : -3 -2 -1 56 57, highest: 73 74 75 76 77 
## ---------------------------------------------------------------------------
## EUINCLVL 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        2    0.436    5.177   0.2908 
## 
## 5 (9232, 0.823), 6 (1980, 0.177)
## ---------------------------------------------------------------------------
## EUINCOME2 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        6    0.768  -0.2313    1.453 
## 
## lowest : -3 -2 -1  1  2, highest: -2 -1  1  2  3 
## 
## -3 (282, 0.025), -2 (599, 0.053), -1 (6818, 0.608), 1 (1116, 0.100), 2
## (2038, 0.182), 3 (359, 0.032)
## ---------------------------------------------------------------------------
## EUMEAT 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        4    0.716   0.5293   0.9542 
## 
## -2 (10, 0.001), -1 (3089, 0.276), 1 (7182, 0.641), 2 (931, 0.083)
## ---------------------------------------------------------------------------
## EUMILK 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        5    0.621    1.158    1.212 
## 
## lowest : -3 -2 -1  1  2, highest: -3 -2 -1  1  2 
## 
## -3 (2, 0.000), -2 (1, 0.000), -1 (3090, 0.276), 1 (158, 0.014), 2 (7961,
## 0.710)
## ---------------------------------------------------------------------------
## EUPRPMEL 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        6    0.734    1.465   0.6607 
## 
## lowest : -3 -2 -1  1  2, highest: -2 -1  1  2  3 
## 
## -3 (12, 0.001), -2 (4, 0.000), -1 (10, 0.001), 1 (7011, 0.625), 2 (3061,
## 0.273), 3 (1114, 0.099)
## ---------------------------------------------------------------------------
## EUSODA 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        4    0.881   0.7385    1.365 
## 
## -2 (4, 0.000), -1 (3695, 0.330), 1 (3043, 0.271), 2 (4470, 0.399)
## ---------------------------------------------------------------------------
## EUSTORES 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        8    0.855   0.7889    1.339 
## 
## lowest : -3 -2 -1  1  2, highest:  1  2  3  4  5 
## 
## -3 (5, 0.000), -2 (58, 0.005), -1 (2941, 0.262), 1 (5549, 0.495), 2 (2058,
## 0.184), 3 (358, 0.032), 4 (37, 0.003), 5 (206, 0.018)
## ---------------------------------------------------------------------------
## EUSTREASON 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        9    0.946    1.367    2.047 
## 
## lowest : -3 -2 -1  1  2, highest:  2  3  4  5  6 
## 
## -3 (8, 0.001), -2 (65, 0.006), -1 (3008, 0.268), 1 (2648, 0.236), 2 (3047,
## 0.272), 3 (1094, 0.098), 4 (710, 0.063), 5 (172, 0.015), 6 (460, 0.041)
## ---------------------------------------------------------------------------
## EUTHERM 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        5    0.773    0.844    1.415 
## 
## lowest : -3 -2 -1  1  2, highest: -3 -2 -1  1  2 
## 
## -3 (1, 0.000), -2 (5, 0.000), -1 (4030, 0.359), 1 (846, 0.075), 2 (6330,
## 0.565)
## ---------------------------------------------------------------------------
## EUWGT 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##    11212        0      227    0.999    168.2    59.58      100      120 
##      .25      .50      .75      .90      .95 
##      140      168      200      232      260 
## 
## lowest :  -5  -3  -2  -1  98, highest: 333 334 335 337 340 
## ---------------------------------------------------------------------------
## EUWIC 
##        n  missing distinct     Info     Mean      Gmd 
##    11212        0        5    0.779   0.5121    1.507 
## 
## lowest : -3 -2 -1  1  2, highest: -3 -2 -1  1  2 
## 
## -3 (12, 0.001), -2 (25, 0.002), -1 (5370, 0.479), 1 (412, 0.037), 2 (5393,
## 0.481)
## ---------------------------------------------------------------------------
## EXINCOME1 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##    11212        0       20    0.248    4.475    8.437     0.00     0.00 
##      .25      .50      .75      .90      .95 
##     0.00     0.00     0.00     0.00    72.45 
## 
## lowest : -1  0  2  3 12, highest: 83 84 85 86 87 
##                                                                       
## Value         -1     0     2     3    12    13    71    72    73    74
## Frequency     21 10197   155   140    50     3    47    38   237    30
## Proportion 0.002 0.909 0.014 0.012 0.004 0.000 0.004 0.003 0.021 0.003
##                                                                       
## Value         75    76    77    81    82    83    84    85    86    87
## Frequency    126     6     4    75    24    11    31    10     1     6
## Proportion 0.011 0.001 0.000 0.007 0.002 0.001 0.003 0.001 0.000 0.001
## ---------------------------------------------------------------------------
sum(is.na(eh_respdtt)) ### is there any NA values in data
## [1] 0
sapply(eh_respdtt, class) ### count of different types of variables
##    TUCASEID    TULINENO   EEINCOME1       ERBMI      ERHHCH    ERINCOME 
##   "numeric"   "integer"   "integer"   "numeric"   "integer"   "integer" 
##    ERSPEMCH    ERTPREAT     ERTSEAT       ETHGT       ETWGT  EUDIETSODA 
##   "integer"   "integer"   "integer"   "integer"   "integer"   "integer" 
##     EUDRINK       EUEAT  EUEXERCISE    EUEXFREQ    EUFASTFD EUFASTFDFRQ 
##   "integer"   "integer"   "integer"   "integer"   "integer"   "integer" 
##    EUFFYDAY     EUFDSIT   EUFINLWGT      EUSNAP    EUGENHTH    EUGROSHP 
##   "integer"   "integer"   "numeric"   "integer"   "integer"   "integer" 
##       EUHGT    EUINCLVL   EUINCOME2      EUMEAT      EUMILK    EUPRPMEL 
##   "integer"   "integer"   "integer"   "integer"   "integer"   "integer" 
##      EUSODA    EUSTORES  EUSTREASON     EUTHERM       EUWGT       EUWIC 
##   "integer"   "integer"   "integer"   "integer"   "integer"   "integer" 
##   EXINCOME1 
##   "integer"

Data Cleaning

I am yet to clean the data. From primary summary of data i have observed following things which need attention with respect to data cleaning:

  1. The data dictionary provided above gives the valid range of different variables. I need to check for invalid values and treat them appropriately.

  2. There are some variables which are coded in integer format. though these are categories.

  3. There are other eating-and-health-module data sets which i need to combine with this to get interesting facts about respondents

Planned Analysis

I am planning following analysis on the data:

  1. Effect of different food eating habits on Physical health
  2. Relationship between time spent in primary and secondary eating vs Physical health
  3. Do doing regular exerciser is related to higher income?
  4. Respondents shopping habit is related to income?

Note: I might change some of the analysis and add some other analysis as I proceed further.