Data Visualization

Introduction

This is the Week 2 Assignment (Running Your First Program) of Coursera Data Visualization Course.

Reading the data and viweing the variables’ headings

# Reading the data
data1 <- read.csv("F:/Data Visualization Coursera/Data/addhealth.csv")

Frequency Tables of selected variables

library(plyr)

# Frequency table for variable 'H2GI1M' 
count(data1, 'H2GI1M')

##    H2GI1M freq
## 1       1  368
## 2       2  356
## 3       3  409
## 4       4  391
## 5       5  413
## 6       6  399
## 7       7  437
## 8       8  423
## 9       9  445
## 10     10  432
## 11     11  389
## 12     12  375
## 13     NA 1667

In the freq. table clearly we can observe that there is almost equal distribution of the
data over 12 values of variable and there are 1667 missing cases (NAs).

# Frequency table for variable 'H1DA8'
count(data1, 'H1DA8')

##    H1DA8 freq
## 1      0  136
## 2      1  136
## 3      2  303
## 4      3  338
## 5      4  330
## 6      5  517
## 7      6  247
## 8      7  261
## 9      8  255
## 10     9   74
## 11    10  624
## 12    11   26
## 13    12  209
## 14    13   28
## 15    14  294
## 16    15  336
## 17    16   52
## 18    17   17
## 19    18   41
## 20    19   10
## 21    20  436
## 22    21  268
## 23    22   10
## 24    23   11
## 25    24  102
## 26    25  158
## 27    26    5
## 28    27    9
## 29    28  197
## 30    29    4
## 31    30  230
## 32    31    1
## 33    32   10
## 34    33    5
## 35    34    1
## 36    35  256
## 37    36   13
## 38    37    1
## 39    38    3
## 40    40  112
## 41    42   66
## 42    43    1
## 43    44    1
## 44    45   16
## 45    46    2
## 46    47    2
## 47    48   30
## 48    49   33
## 49    50   50
## 50    51    1
## 51    52    2
## 52    53    1
## 53    54    1
## 54    55    3
## 55    56   37
## 56    57    1
## 57    60   24
## 58    63    7
## 59    64    3
## 60    65    2
## 61    68    1
## 62    69    1
## 63    70   49
## 64    72   14
## 65    74    2
## 66    75    6
## 67    76    1
## 68    77    4
## 69    78    6
## 70    80    4
## 71    84    8
## 72    85    1
## 73    86    1
## 74    90    9
## 75    91    1
## 76    94    1
## 77    95    1
## 78    96    2
## 79    98    2
## 80    99   14
## 81   996    3
## 82   998   24

In the freq. table above, clearly, we can observe that the freq. distribution varied from as 
low as 1 to as high as 624 and there are no missing cases.

# Frequency table for variable 'IMONTH'
count(data1, 'IMONTH')

##    IMONTH freq
## 1       1    3
## 2       4   92
## 3       5 1593
## 4       6 1810
## 5       7 1300
## 6       8 1102
## 7       9  438
## 8      10  117
## 9      11   31
## 10     12   18

Finally, we can see In the freq. table above that May through August months has highest frequency of occurnaces and there are no missing cases in the data.

Plotting Histograms of a selected variables

## Histogram of imonths
hist(data1$IMONTH)

## Histograms of H1DA8
hist(data1$H1DA8)

## Histograms of H2GI1M
hist(data1$H2GI1M)

## Histograms of age
hist(data1$age)

Data Visualization

Manoj Kumar

November 21, 2015

Introduction

Reading the data and viweing the variables’ headings

Frequency Tables of selected variables

Plotting Histograms of a selected variables