Exploratory Data Analysis in R. Choose an interesting dataset and use R graphics to describe the data. You may use base R graphics, or a graphics package of your choice. You should include at least one example of each of the following: . histogram . boxplot . scatterplot Do the graphics provide insight into any relationships in the data?
For this assignment I am using a dataset which I found on https://www.reddit.com/r/datasets/ few months back and i always wanted to analyze it. Now I am familiar with R so I will use this opportunity to get some insights about the precipitation levels in California. You should find a file name “cal_precipitation.txt” with the submission. This tab delimited text file contains monthly and annual precipitation amounts of California from 1895 through April of 2016.
ls()
## character(0)
getwd()
## [1] "C:/Users/rkothari/Documents/MSDA Brigde courses/R"
setwd("C:/Users/rkothari/Documents/MSDA Brigde courses/R")
getwd()
## [1] "C:/Users/rkothari/Documents/MSDA Brigde courses/R"
df.california = read.table("cal_precipitation.txt", header = TRUE)
df.california
## ST.CD YEAR JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV
## 42335 CA-ST 1895 11.26 3.17 3.47 1.65 1.78 0.08 0.23 0.20 1.63 0.53 1.84
## 42336 CA-ST 1896 8.88 0.86 4.80 4.77 2.84 0.19 0.74 0.87 0.73 1.81 5.00
## 42337 CA-ST 1897 3.93 7.92 5.93 0.87 0.49 0.69 0.19 0.18 0.56 2.38 2.21
## 42338 CA-ST 1898 1.97 3.93 1.38 0.92 2.42 0.59 0.12 0.34 0.69 0.86 1.96
## 42339 CA-ST 1899 5.51 1.35 6.72 0.98 1.28 0.67 0.13 0.46 0.20 4.06 5.17
## 42340 CA-ST 1900 3.32 1.72 3.25 3.35 1.77 0.54 0.33 0.12 0.66 3.49 5.99
## 42341 CA-ST 1901 6.25 6.91 1.57 2.48 1.69 0.08 0.09 0.67 1.35 1.95 2.81
## 42342 CA-ST 1902 1.74 10.11 4.48 2.45 1.19 0.13 0.32 0.32 0.13 1.87 4.80
## 42343 CA-ST 1903 6.31 2.48 6.80 2.01 0.46 0.30 0.09 0.16 0.38 0.94 6.49
## 42344 CA-ST 1904 1.82 10.67 9.88 2.59 0.52 0.12 0.42 0.74 2.35 2.70 1.48
## 42345 CA-ST 1905 4.70 5.28 6.85 1.59 2.34 0.20 0.05 0.26 0.53 0.30 3.08
## 42346 CA-ST 1906 8.43 4.71 9.70 2.30 3.56 1.19 0.35 0.69 0.41 0.22 2.72
## 42347 CA-ST 1907 7.97 4.32 11.88 1.48 1.12 1.43 0.26 0.36 0.31 2.48 0.60
## 42348 CA-ST 1908 4.82 4.76 1.85 0.87 1.92 0.31 0.31 0.34 0.99 1.99 1.88
## 42349 CA-ST 1909 16.23 8.53 3.74 0.35 0.53 0.35 0.23 0.66 0.66 1.87 5.61
## 42350 CA-ST 1910 4.93 2.64 2.75 0.83 0.34 0.14 0.50 0.08 0.87 1.22 3.05
## 42351 CA-ST 1911 13.06 4.54 6.09 2.23 0.97 0.33 0.52 0.01 0.83 0.69 1.34
## 42352 CA-ST 1912 3.41 1.22 5.96 3.87 2.01 0.67 0.45 0.20 1.47 1.50 2.92
## 42353 CA-ST 1913 5.18 2.71 2.20 1.67 1.40 1.32 1.55 0.94 0.28 0.20 4.71
## 42354 CA-ST 1914 14.39 5.03 1.23 3.30 0.80 1.00 0.30 0.13 0.61 1.82 1.00
## 42355 CA-ST 1915 6.97 9.76 2.40 2.52 4.26 0.06 0.38 0.25 0.23 0.18 2.24
## 42356 CA-ST 1916 16.44 3.89 3.72 0.71 0.83 0.32 0.63 0.49 0.76 2.41 1.72
## 42357 CA-ST 1917 2.76 6.37 2.06 2.17 1.25 0.04 0.55 0.24 0.26 0.03 2.03
## 42358 CA-ST 1918 1.41 6.04 7.72 0.79 0.61 0.53 0.37 0.39 2.46 1.63 3.08
## 42359 CA-ST 1919 2.70 7.82 4.12 1.30 0.74 0.05 0.45 0.10 1.45 0.80 1.36
## 42360 CA-ST 1920 1.09 3.38 6.32 3.05 0.49 0.84 0.25 0.71 0.58 3.05 4.95
## 42361 CA-ST 1921 7.08 2.70 3.25 0.65 2.83 0.34 0.19 0.31 0.60 1.19 1.77
## 42362 CA-ST 1922 3.85 7.23 4.24 1.07 1.45 0.46 0.36 0.58 0.16 1.66 3.25
## 42363 CA-ST 1923 4.14 1.28 0.66 4.32 0.69 0.99 0.26 0.32 1.69 1.28 1.10
## 42364 CA-ST 1924 1.69 1.64 3.43 1.35 0.14 0.06 0.08 0.19 0.28 3.71 2.95
## 42365 CA-ST 1925 1.92 6.46 3.00 3.70 1.67 1.09 0.52 0.63 1.06 1.82 2.27
## 42366 CA-ST 1926 3.47 6.23 0.65 5.93 1.07 0.29 0.27 0.26 0.10 1.44 8.91
## 42367 CA-ST 1927 3.86 10.41 3.27 2.88 0.97 0.62 0.24 0.12 0.42 2.57 4.54
## 42368 CA-ST 1928 2.26 2.79 5.97 2.05 1.02 0.21 0.07 0.05 0.15 0.80 3.06
## 42369 CA-ST 1929 2.30 2.60 3.15 3.00 0.26 1.80 0.13 0.32 0.44 0.29 0.04
## 42370 CA-ST 1930 6.03 3.85 3.75 2.09 2.41 0.10 0.12 0.30 0.94 0.67 3.13
## 42371 CA-ST 1931 3.72 3.03 1.98 2.09 1.25 1.09 0.19 0.81 0.47 1.81 3.64
## 42372 CA-ST 1932 3.86 5.90 1.59 2.12 1.62 0.47 0.15 0.07 0.39 0.58 1.48
## 42373 CA-ST 1933 7.79 1.14 3.44 1.05 2.47 0.42 0.20 0.26 0.26 1.90 0.32
## 42374 CA-ST 1934 2.81 3.75 1.21 0.79 1.00 1.08 0.16 0.41 0.55 2.77 4.44
## 42375 CA-ST 1935 6.16 3.18 4.22 6.20 0.46 0.03 0.26 0.79 0.31 1.59 1.24
## 42376 CA-ST 1936 6.04 11.64 2.22 1.94 0.84 1.21 0.58 0.51 0.30 1.71 0.18
## 42377 CA-ST 1937 4.85 8.67 6.05 2.21 0.48 1.02 0.40 0.10 0.21 1.50 4.47
## 42378 CA-ST 1938 4.37 11.37 11.55 2.51 0.74 0.58 0.53 0.25 0.58 2.09 1.62
## 42379 CA-ST 1939 3.98 3.01 3.50 0.93 1.33 0.27 0.31 0.29 2.60 1.62 0.64
## 42380 CA-ST 1940 9.24 9.81 5.00 1.65 0.77 0.21 0.04 0.05 0.79 2.42 1.66
## 42381 CA-ST 1941 6.66 8.17 5.59 4.99 1.60 0.63 0.30 0.64 0.28 1.93 2.29
## 42382 CA-ST 1942 4.60 4.62 2.08 4.35 2.47 0.14 0.07 0.44 0.14 0.65 5.18
## 42383 CA-ST 1943 10.62 3.59 6.21 2.66 0.68 0.78 0.20 0.09 0.18 1.77 1.29
## 42384 CA-ST 1944 3.79 7.13 2.49 2.65 0.97 0.84 0.21 0.07 0.18 1.50 6.90
## 42385 CA-ST 1945 1.39 7.63 5.87 0.89 2.01 0.67 0.24 0.87 0.37 4.22 4.11
## 42386 CA-ST 1946 1.97 2.79 4.54 0.54 0.84 0.09 0.78 0.22 0.42 1.94 7.07
## 42387 CA-ST 1947 1.42 2.29 3.82 1.10 0.66 1.02 0.26 0.32 0.23 3.52 0.98
## 42388 CA-ST 1948 2.75 3.16 4.91 6.04 1.52 0.87 0.18 0.14 0.48 1.02 1.46
## 42389 CA-ST 1949 3.22 4.02 5.53 0.32 1.91 0.12 0.11 0.22 0.20 0.48 2.71
## 42390 CA-ST 1950 7.50 3.72 3.89 2.15 0.95 0.41 0.28 0.13 0.63 4.55 6.81
## 42391 CA-ST 1951 5.16 3.29 1.86 2.30 1.28 0.17 0.21 0.45 0.19 2.49 4.29
## 42392 CA-ST 1952 9.72 3.53 6.76 2.02 0.57 0.84 0.69 0.13 0.65 0.15 2.83
## 42393 CA-ST 1953 6.01 0.85 3.28 2.95 2.32 0.79 0.25 0.42 0.09 1.06 3.52
## 42394 CA-ST 1954 7.80 4.26 6.14 2.04 0.23 1.01 0.31 0.44 0.28 0.34 3.44
## 42395 CA-ST 1955 4.76 2.38 1.02 3.02 1.23 0.22 0.38 0.44 0.45 0.48 3.62
## 42396 CA-ST 1956 9.12 3.86 0.58 2.59 2.09 0.27 0.52 0.05 0.33 2.94 0.16
## 42397 CA-ST 1957 5.38 4.73 4.12 2.09 3.95 0.25 0.15 0.13 1.17 3.27 2.29
## 42398 CA-ST 1958 5.41 9.83 7.77 5.26 0.97 1.19 0.54 0.55 0.92 0.45 1.38
## 42399 CA-ST 1959 5.72 7.12 1.14 0.84 0.76 0.08 0.16 0.29 2.08 0.38 0.23
## 42400 CA-ST 1960 4.78 6.47 3.83 2.01 1.30 0.04 0.27 0.07 0.37 0.86 5.41
## 42401 CA-ST 1961 2.18 2.58 4.17 1.30 1.36 0.34 0.21 0.88 0.47 0.97 3.64
## 42402 CA-ST 1962 2.87 10.62 3.87 0.67 1.29 0.24 0.21 0.37 0.50 4.90 1.58
## 42403 CA-ST 1963 4.46 4.98 4.56 5.89 1.64 0.92 0.03 0.38 1.44 2.23 5.95
## 42404 CA-ST 1964 4.84 0.46 3.06 1.36 1.70 0.85 0.27 0.28 0.26 1.09 5.16
## 42405 CA-ST 1965 5.07 1.21 1.80 5.54 0.38 0.49 0.47 1.41 0.38 0.28 9.54
## 42406 CA-ST 1966 3.43 2.54 1.74 1.03 0.37 0.20 0.18 0.24 0.43 0.26 5.98
## 42407 CA-ST 1967 7.91 0.56 6.52 6.56 0.93 0.88 0.44 0.46 0.97 0.87 3.37
## 42408 CA-ST 1968 4.44 3.33 2.95 0.77 0.87 0.36 0.65 1.06 0.12 1.66 3.37
## 42409 CA-ST 1969 16.07 10.69 2.18 2.34 0.60 0.99 0.29 0.10 0.31 1.84 1.76
## 42410 CA-ST 1970 10.35 2.62 3.25 1.20 0.28 0.87 0.27 0.34 0.07 1.15 8.34
## 42411 CA-ST 1971 3.16 1.18 3.94 1.59 2.05 0.59 0.34 0.43 0.72 0.91 3.21
## 42412 CA-ST 1972 2.42 2.32 1.54 2.10 0.61 0.81 0.06 0.30 1.08 2.05 4.88
## 42413 CA-ST 1973 7.19 6.79 4.71 0.59 0.67 0.17 0.12 0.34 0.64 2.27 8.68
## 42414 CA-ST 1974 7.31 2.11 7.04 2.53 0.30 0.17 1.36 0.34 0.05 1.99 1.39
## 42415 CA-ST 1975 2.29 6.52 7.62 3.30 0.49 0.35 0.25 0.60 0.53 3.64 1.80
## 42416 CA-ST 1976 0.58 4.98 2.15 1.89 0.48 0.23 0.57 1.33 2.63 0.59 0.98
## 42417 CA-ST 1977 2.60 1.68 2.11 0.35 3.06 0.66 0.14 1.12 1.29 0.68 2.79
## 42418 CA-ST 1978 10.25 7.90 8.18 4.80 0.50 0.27 0.21 0.38 2.28 0.21 2.65
## 42419 CA-ST 1979 6.37 6.56 4.92 1.37 1.23 0.08 0.43 0.64 0.42 3.35 2.97
## 42420 CA-ST 1980 10.34 10.53 3.99 2.12 1.43 0.44 0.43 0.07 0.32 0.86 0.89
## 42421 CA-ST 1981 5.71 2.65 5.07 1.29 1.30 0.13 0.06 0.15 0.73 3.15 7.54
## 42422 CA-ST 1982 6.71 4.17 8.08 5.08 0.35 1.12 0.36 0.57 2.18 3.57 6.62
## 42423 CA-ST 1983 7.35 8.87 11.52 4.21 0.78 0.35 0.22 2.09 1.29 1.56 8.89
## 42424 CA-ST 1984 0.35 3.08 2.30 1.65 0.58 0.75 1.14 0.98 0.40 2.10 7.81
## 42425 CA-ST 1985 1.09 2.54 4.15 0.42 0.31 0.44 0.60 0.10 1.40 1.64 5.33
## 42426 CA-ST 1986 4.95 12.57 6.20 1.28 0.95 0.13 0.38 0.26 2.45 0.74 0.98
## 42427 CA-ST 1987 3.85 3.69 4.40 0.68 1.03 0.34 0.37 0.15 0.18 1.95 2.74
## 42428 CA-ST 1988 4.84 0.79 0.91 3.29 1.32 0.77 0.23 0.52 0.27 0.11 5.58
## 42429 CA-ST 1989 1.84 2.51 6.66 1.21 1.13 0.44 0.09 0.48 1.93 2.75 1.50
## 42430 CA-ST 1990 5.03 3.32 1.90 1.36 3.00 0.37 0.53 0.46 0.62 0.59 1.04
## 42431 CA-ST 1991 1.29 2.66 12.12 0.97 1.25 0.49 0.43 0.39 0.30 2.06 1.54
## 42432 CA-ST 1992 2.39 7.30 4.58 1.18 0.32 0.98 0.90 0.47 0.14 2.89 0.72
## 42433 CA-ST 1993 12.47 8.01 3.79 1.88 1.65 1.61 0.06 0.30 0.07 1.10 1.81
## 42434 CA-ST 1994 2.14 5.53 1.74 1.96 1.59 0.14 0.07 0.13 0.58 1.12 4.87
## 42435 CA-ST 1995 15.59 1.99 12.54 3.79 2.59 1.57 0.30 0.14 0.14 0.08 0.54
## 42436 CA-ST 1996 6.82 8.47 3.76 2.86 2.61 0.31 0.36 0.15 0.40 2.15 4.92
## 42437 CA-ST 1997 11.12 0.98 0.95 1.07 0.50 1.10 0.36 0.37 1.30 1.42 4.58
## 42438 CA-ST 1998 8.97 14.08 5.24 3.02 4.68 1.33 0.30 0.22 1.13 0.89 5.43
## 42439 CA-ST 1999 5.17 6.51 2.81 2.64 0.55 0.52 0.49 0.39 0.29 0.93 2.64
## 42440 CA-ST 2000 7.41 9.33 2.36 2.27 1.34 0.45 0.10 0.38 0.62 2.85 1.04
## 42441 CA-ST 2001 4.06 5.50 2.55 2.79 0.33 0.32 0.53 0.13 0.45 0.93 5.37
## 42442 CA-ST 2002 2.84 1.95 2.93 1.29 0.92 0.17 0.17 0.04 0.24 0.16 4.82
## 42443 CA-ST 2003 2.18 3.48 3.62 5.45 1.54 0.07 0.49 0.88 0.39 0.15 2.92
## 42444 CA-ST 2004 2.78 7.07 1.31 1.18 0.61 0.17 0.19 0.32 0.39 5.84 1.95
## 42445 CA-ST 2005 7.72 4.19 4.87 2.24 3.33 0.80 0.18 0.41 0.44 1.16 2.59
## 42446 CA-ST 2006 6.03 3.75 6.92 5.70 0.91 0.31 0.41 0.12 0.13 0.65 2.67
## 42447 CA-ST 2007 1.13 5.60 1.10 1.61 0.51 0.21 0.42 0.27 0.74 1.73 1.06
## 42448 CA-ST 2008 8.35 3.92 0.91 0.39 0.88 0.04 0.21 0.14 0.07 1.01 2.83
## 42449 CA-ST 2009 1.82 5.66 3.06 0.82 1.86 0.90 0.13 0.19 0.15 2.85 1.16
## 42450 CA-ST 2010 6.71 4.49 2.81 4.31 1.45 0.44 0.23 0.12 0.24 3.74 3.40
## 42451 CA-ST 2011 1.58 4.65 8.18 1.62 1.97 1.32 0.29 0.16 0.44 2.04 2.03
## 42452 CA-ST 2012 4.03 1.60 6.16 3.40 0.34 0.54 0.32 0.45 0.13 1.33 4.31
## 42453 CA-ST 2013 1.43 0.90 1.73 0.87 1.00 0.50 0.51 0.57 1.11 0.69 1.07
## 42454 CA-ST 2014 0.88 4.88 4.23 1.60 0.56 0.09 0.47 0.67 0.92 1.25 2.27
## 42455 CA-ST 2015 0.66 3.50 1.04 1.45 1.60 0.40 1.13 0.24 0.57 1.37 2.46
## 42456 CA-ST 2016 7.67 1.68 6.09 2.09 NA NA NA NA NA NA NA
## DEC ANNUAL
## 42335 2.95 28.79
## 42336 4.27 35.77
## 42337 3.09 28.41
## 42338 1.66 16.82
## 42339 4.64 31.16
## 42340 2.33 26.87
## 42341 2.32 28.17
## 42342 4.14 31.68
## 42343 1.55 27.97
## 42344 4.00 37.31
## 42345 1.91 27.10
## 42346 8.05 42.35
## 42347 6.14 38.34
## 42348 2.07 22.13
## 42349 7.87 46.64
## 42350 2.36 19.69
## 42351 2.49 33.10
## 42352 1.84 25.54
## 42353 6.26 28.43
## 42354 4.30 33.93
## 42355 5.38 34.62
## 42356 5.04 36.96
## 42357 1.45 19.21
## 42358 2.31 27.34
## 42359 4.07 24.94
## 42360 5.70 30.39
## 42361 9.68 30.59
## 42362 7.25 31.58
## 42363 2.37 19.10
## 42364 4.53 20.06
## 42365 2.02 26.15
## 42366 3.64 32.26
## 42367 3.30 33.20
## 42368 3.80 22.24
## 42369 4.90 19.23
## 42370 0.66 24.06
## 42371 9.39 29.47
## 42372 3.38 21.62
## 42373 6.83 26.08
## 42374 3.84 22.80
## 42375 3.18 27.62
## 42376 6.94 34.10
## 42377 7.22 37.17
## 42378 3.42 39.62
## 42379 3.36 21.84
## 42380 10.52 42.16
## 42381 10.32 43.41
## 42382 4.68 29.42
## 42383 3.52 31.59
## 42384 3.28 30.02
## 42385 9.61 37.87
## 42386 3.67 24.88
## 42387 1.90 17.53
## 42388 4.82 27.35
## 42389 2.70 21.54
## 42390 5.62 36.63
## 42391 10.27 31.97
## 42392 8.06 35.94
## 42393 1.56 23.10
## 42394 4.64 30.92
## 42395 14.11 32.11
## 42396 1.43 23.93
## 42397 5.38 32.91
## 42398 1.34 35.60
## 42399 2.02 20.83
## 42400 2.50 27.90
## 42401 3.10 21.20
## 42402 2.30 29.43
## 42403 0.89 33.37
## 42404 11.32 30.65
## 42405 5.33 31.90
## 42406 9.60 26.01
## 42407 3.84 33.32
## 42408 6.47 26.04
## 42409 6.00 43.16
## 42410 7.41 36.16
## 42411 7.41 25.53
## 42412 4.03 22.20
## 42413 5.09 37.25
## 42414 3.86 28.45
## 42415 1.32 28.71
## 42416 0.66 17.08
## 42417 8.08 24.56
## 42418 2.79 40.41
## 42419 3.99 32.34
## 42420 3.25 34.67
## 42421 5.89 33.67
## 42422 6.68 45.50
## 42423 9.10 56.24
## 42424 3.70 24.85
## 42425 2.79 20.82
## 42426 1.67 32.56
## 42427 5.91 25.29
## 42428 4.22 22.87
## 42429 0.22 20.77
## 42430 1.24 19.47
## 42431 2.91 26.40
## 42432 8.47 30.35
## 42433 3.08 35.83
## 42434 3.54 23.42
## 42435 6.79 46.07
## 42436 11.86 44.66
## 42437 3.24 26.99
## 42438 2.15 47.43
## 42439 0.81 23.74
## 42440 1.06 29.23
## 42441 6.77 29.71
## 42442 9.60 25.12
## 42443 7.40 28.58
## 42444 6.11 27.91
## 42445 9.20 37.13
## 42446 4.04 31.62
## 42447 4.03 18.40
## 42448 3.82 22.56
## 42449 4.17 22.78
## 42450 10.25 38.18
## 42451 0.62 24.89
## 42452 7.73 30.35
## 42453 0.76 11.14
## 42454 7.03 24.84
## 42455 6.42 20.86
## 42456 NA NA
Generated different plots for the annual precipitation column from the dataset:
## [1] 11.14
## [1] 56.24
As we can see from the various plots that the average rainfall over the years is between 25 and 35 inches.All these graphs also gives an outlier which was in 1983 and the annual precipitation was 56.24 which can be seen in all the plots.