1. Data Exploration: This should include summary statistics, means, medians, quartiles, or any other relevant information about the data set. Please include some conclusions in the R Markdown text.

The question for analysis is whether there is a correlation between the Case-Shiller (CS) Home Price Index (National) and 3 popular Exchange-Traded-Funds (symbols: ITB, XHB, REZ) related to housing, home building and leasing residential properties. The CS index is a 3-month average of home prices across 20 US cities, available with a 2-month lag. The ETF prices are intended to reflect the market’s expectation of the future returns for the referenced assets. While ETF prices are likley to be impacted by overall market sentiment, a preliminary hypothesis could expect both the index and the ETF closing prices to be positively correlated, assuming that both are jointly dependent on the housing market. The associated data-set includes monthly values for the index and the ETF closing prices between the years 2014 to 2018, obtained from the Internet (S&P website, Yahoo Finance website.

# require(RCurl)
#sf_houses<-read.csv(text=getURL("https://raw.githubusercontent.com/Jagdish16/jagdish_r_repo/master/case_shiller_index_home_etfs.csv"), header=T)
theURL<-"https://raw.githubusercontent.com/Jagdish16/jagdish_r_repo/master/case_shiller_index_home_etfs.csv"
housing<-read.table(file=theURL, header=TRUE, fill = TRUE, sep=",",quote="\"")
print(housing)
##         Date Case_Shiller_Index ITB_Close XHB_Close REZ_Close
## 1   2/1/2014            162.575  25.78047  32.74688  41.36633
## 2   3/1/2014            163.138  23.75059  31.32268  41.27561
## 3   4/1/2014            163.441  22.95388  29.86284  43.49641
## 4   5/1/2014            163.696  23.58195  30.34450  44.39548
## 5   6/1/2014            164.067  24.33760  31.54864  44.39548
## 6   7/1/2014            164.540  21.81393  28.42053  45.07732
## 7   8/1/2014            165.221  23.64076  30.43679  46.58885
## 8   9/1/2014            165.919  22.08894  28.54595  42.93599
## 9  10/1/2014            166.666  23.66217  30.08561  48.43508
## 10 11/1/2014            167.367  25.51031  32.34710  49.57014
## 11 12/1/2014            168.090  25.44150  32.97529  49.90051
## 12  1/1/2015            168.701  24.93919  33.26696  54.40937
## 13  2/1/2015            169.224  27.20281  35.25580  52.30915
## 14  3/1/2015            169.889  27.78348  35.76028  53.06045
## 15  4/1/2015            170.380  25.53012  33.60630  50.69436
## 16  5/1/2015            170.940  26.21959  34.90782  50.78040
## 17  6/1/2015            171.471  27.03711  35.56828  48.20739
## 18  7/1/2015            172.035  28.02564  36.53482  52.09622
## 19  8/1/2015            172.947  27.34546  35.22237  49.51746
## 20  9/1/2015            173.860  25.72878  33.27799  51.47975
## 21 10/1/2015            174.847  26.82299  34.77804  52.81220
## 22 11/1/2015            175.834  28.17450  35.37162  53.90716
## 23 12/1/2015            176.648  26.73421  33.26002  55.58025
## 24  1/1/2016            177.405  24.02615  29.75089  54.23069
## 25  2/1/2016            177.817  24.18415  30.09207  53.34803
## 26  3/1/2016            178.315  26.74180  32.98722  58.23798
## 27  4/1/2016            178.907  26.29164  32.86353  56.19393
## 28  5/1/2016            179.530  27.28042  33.37123  58.27419
## 29  6/1/2016            180.093  27.33975  32.74637  60.52482
## 30  7/1/2016            180.683  28.65432  35.28005  62.09016
## 31  8/1/2016            181.881  28.74340  35.33872  59.87428
## 32  9/1/2016            182.887  27.26862  33.18750  59.25925
## 33 10/1/2016            183.847  25.53981  30.96495  57.03197
## 34 11/1/2016            184.924  27.11500  33.14874  54.32354
## 35 12/1/2016            185.905  27.22397  33.14874  56.54865
## 36  1/1/2017            187.040  28.68217  33.85790  56.75985
## 37  2/1/2017            187.643  30.21942  35.34961  60.18798
## 38  3/1/2017            188.328  31.71700  36.53709  58.82791
## 39  4/1/2017            189.057  32.09753  37.17158  59.77786
## 40  5/1/2017            189.859  32.17695  36.85698  60.63304
## 41  6/1/2017            190.577  33.69595  37.87942  61.14991
## 42  7/1/2017            191.242  33.65738  37.77567  61.75650
## 43  8/1/2017            192.730  33.87600  37.60822  61.59532
## 44  9/1/2017            193.879  36.31062  39.22366  60.28700
## 45 10/1/2017            194.969  39.54638  40.81405  60.17881
## 46 11/1/2017            196.218  42.61978  43.25145  60.87690
## 47 12/1/2017            197.449  43.48511  43.67577  59.51898
## 48  1/1/2018            198.678  42.64963  44.26730  56.78278
## 49  2/1/2018            199.733  38.10418  40.10272  52.83980
## 50  3/1/2018            200.579  39.27784  40.35002  55.57771
## 51  4/1/2018            201.330  38.26920  38.61639  56.53675
## 52  5/1/2018            201.942  38.31900  38.93357  58.99063
## 53  6/1/2018            202.465  37.99038  39.20119  61.39584
## 54  7/1/2018            202.729  37.75783  39.36177  61.77779
## 55  8/1/2018            203.835  37.59830  39.75917  64.13648
## 56  9/1/2018            204.612  35.23532  38.18946  61.80728
## 57 10/1/2018            205.622  31.11254  33.84528  62.16099
#summary(housing)
  1. Data wrangling: Please perform some basic transformations. They will need to make sense but could include column renaming, creating a subset of the data, replacing values, or creating new columns with derived data (for example - if it makes sense you could sum two columns together)

Calculate the month-over-month percent changes for all 4 variables.

library(knitr)
## Warning: package 'knitr' was built under R version 3.5.2
opts_chunk$set(echo = FALSE, message = FALSE, results = 'asis')
housing$Case_Shiller_Diff = ((housing$Case_Shiller_Index - lag(housing$Case_Shiller_Index))*100/ housing$Case_Shiller_Index)
housing$Case_Shiller_Diff<-round(housing$Case_Shiller_Diff,2)
housing$ITB_Diff = ((housing$ITB_Close - lag(housing$ITB_Close))*100/ housing$ITB_Close)
housing$ITB_Diff<-round(housing$ITB_Diff,2)
housing$XHB_Diff = ((housing$XHB_Close - lag(housing$XHB_Close))*100/ housing$XHB_Close)
housing$XHB_Diff<-round(housing$XHB_Diff,2)
housing$REZ_Diff = ((housing$REZ_Close - lag(housing$REZ_Close))*100/ housing$REZ_Close)
housing$REZ_Diff<-round(housing$REZ_Diff,2)
housing$Date<-as.Date(housing$Date, format = "%m/%d/%Y")
#kable(head(housing), digits = 2, align = c(rep("l", 4), rep("c", 4), rep("r", 4)))
housing
##          Date Case_Shiller_Index ITB_Close XHB_Close REZ_Close
## 1  2014-02-01            162.575  25.78047  32.74688  41.36633
## 2  2014-03-01            163.138  23.75059  31.32268  41.27561
## 3  2014-04-01            163.441  22.95388  29.86284  43.49641
## 4  2014-05-01            163.696  23.58195  30.34450  44.39548
## 5  2014-06-01            164.067  24.33760  31.54864  44.39548
## 6  2014-07-01            164.540  21.81393  28.42053  45.07732
## 7  2014-08-01            165.221  23.64076  30.43679  46.58885
## 8  2014-09-01            165.919  22.08894  28.54595  42.93599
## 9  2014-10-01            166.666  23.66217  30.08561  48.43508
## 10 2014-11-01            167.367  25.51031  32.34710  49.57014
## 11 2014-12-01            168.090  25.44150  32.97529  49.90051
## 12 2015-01-01            168.701  24.93919  33.26696  54.40937
## 13 2015-02-01            169.224  27.20281  35.25580  52.30915
## 14 2015-03-01            169.889  27.78348  35.76028  53.06045
## 15 2015-04-01            170.380  25.53012  33.60630  50.69436
## 16 2015-05-01            170.940  26.21959  34.90782  50.78040
## 17 2015-06-01            171.471  27.03711  35.56828  48.20739
## 18 2015-07-01            172.035  28.02564  36.53482  52.09622
## 19 2015-08-01            172.947  27.34546  35.22237  49.51746
## 20 2015-09-01            173.860  25.72878  33.27799  51.47975
## 21 2015-10-01            174.847  26.82299  34.77804  52.81220
## 22 2015-11-01            175.834  28.17450  35.37162  53.90716
## 23 2015-12-01            176.648  26.73421  33.26002  55.58025
## 24 2016-01-01            177.405  24.02615  29.75089  54.23069
## 25 2016-02-01            177.817  24.18415  30.09207  53.34803
## 26 2016-03-01            178.315  26.74180  32.98722  58.23798
## 27 2016-04-01            178.907  26.29164  32.86353  56.19393
## 28 2016-05-01            179.530  27.28042  33.37123  58.27419
## 29 2016-06-01            180.093  27.33975  32.74637  60.52482
## 30 2016-07-01            180.683  28.65432  35.28005  62.09016
## 31 2016-08-01            181.881  28.74340  35.33872  59.87428
## 32 2016-09-01            182.887  27.26862  33.18750  59.25925
## 33 2016-10-01            183.847  25.53981  30.96495  57.03197
## 34 2016-11-01            184.924  27.11500  33.14874  54.32354
## 35 2016-12-01            185.905  27.22397  33.14874  56.54865
## 36 2017-01-01            187.040  28.68217  33.85790  56.75985
## 37 2017-02-01            187.643  30.21942  35.34961  60.18798
## 38 2017-03-01            188.328  31.71700  36.53709  58.82791
## 39 2017-04-01            189.057  32.09753  37.17158  59.77786
## 40 2017-05-01            189.859  32.17695  36.85698  60.63304
## 41 2017-06-01            190.577  33.69595  37.87942  61.14991
## 42 2017-07-01            191.242  33.65738  37.77567  61.75650
## 43 2017-08-01            192.730  33.87600  37.60822  61.59532
## 44 2017-09-01            193.879  36.31062  39.22366  60.28700
## 45 2017-10-01            194.969  39.54638  40.81405  60.17881
## 46 2017-11-01            196.218  42.61978  43.25145  60.87690
## 47 2017-12-01            197.449  43.48511  43.67577  59.51898
## 48 2018-01-01            198.678  42.64963  44.26730  56.78278
## 49 2018-02-01            199.733  38.10418  40.10272  52.83980
## 50 2018-03-01            200.579  39.27784  40.35002  55.57771
## 51 2018-04-01            201.330  38.26920  38.61639  56.53675
## 52 2018-05-01            201.942  38.31900  38.93357  58.99063
## 53 2018-06-01            202.465  37.99038  39.20119  61.39584
## 54 2018-07-01            202.729  37.75783  39.36177  61.77779
## 55 2018-08-01            203.835  37.59830  39.75917  64.13648
## 56 2018-09-01            204.612  35.23532  38.18946  61.80728
## 57 2018-10-01            205.622  31.11254  33.84528  62.16099
##    Case_Shiller_Diff ITB_Diff XHB_Diff REZ_Diff
## 1                  0        0        0        0
## 2                  0        0        0        0
## 3                  0        0        0        0
## 4                  0        0        0        0
## 5                  0        0        0        0
## 6                  0        0        0        0
## 7                  0        0        0        0
## 8                  0        0        0        0
## 9                  0        0        0        0
## 10                 0        0        0        0
## 11                 0        0        0        0
## 12                 0        0        0        0
## 13                 0        0        0        0
## 14                 0        0        0        0
## 15                 0        0        0        0
## 16                 0        0        0        0
## 17                 0        0        0        0
## 18                 0        0        0        0
## 19                 0        0        0        0
## 20                 0        0        0        0
## 21                 0        0        0        0
## 22                 0        0        0        0
## 23                 0        0        0        0
## 24                 0        0        0        0
## 25                 0        0        0        0
## 26                 0        0        0        0
## 27                 0        0        0        0
## 28                 0        0        0        0
## 29                 0        0        0        0
## 30                 0        0        0        0
## 31                 0        0        0        0
## 32                 0        0        0        0
## 33                 0        0        0        0
## 34                 0        0        0        0
## 35                 0        0        0        0
## 36                 0        0        0        0
## 37                 0        0        0        0
## 38                 0        0        0        0
## 39                 0        0        0        0
## 40                 0        0        0        0
## 41                 0        0        0        0
## 42                 0        0        0        0
## 43                 0        0        0        0
## 44                 0        0        0        0
## 45                 0        0        0        0
## 46                 0        0        0        0
## 47                 0        0        0        0
## 48                 0        0        0        0
## 49                 0        0        0        0
## 50                 0        0        0        0
## 51                 0        0        0        0
## 52                 0        0        0        0
## 53                 0        0        0        0
## 54                 0        0        0        0
## 55                 0        0        0        0
## 56                 0        0        0        0
## 57                 0        0        0        0
  1. Graphics: Please make sure to display at least one scatter plot, box plot and histogram. Don’t be limited to this. Please explore the many other options in R packages such as ggplot2.

  1. Meaningful question for analysis: Please state at the beginning a meaningful question for analysis. Use the first three steps and anything else that would be helpful to answer the question you are posing from the data set you chose. Please write a brief conclusion paragraph in R markdown at the end.

The scatterplot between the percentage changes in Case Shiller Index and ITB does not show any patterns. However the scatterplot between the percentafe changes in ITB and XHB does show a positive correlation which would be expected since they’re both ETFs associated with housing assets. However the same cannot be said of ITB and REZ, and XHB and REZ which is a little unexpected. This would indicate that the underlying assets for these ETFs are not similar.

The initial conclusion would be that a deeper analysis is required into: 1) Difference in type of underlying assets for the 3 ETFs. 2) Construction methodlogy for the Case Shiller Index. 3) Additional independent variables that both the index and ETFs depend on.

  1. BONUS - place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.