Crypto

df <- read_csv("coin_Ethereum.csv")

## Rows: 2160 Columns: 10

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): Name, Symbol
## dbl  (7): SNo, High, Low, Open, Close, Volume, Marketcap
## dttm (1): Date

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Introduction

—————————–

Crypto currency has been an growing sensation in the community of 21st century. Crypto can be defined as electronic currency that is used to buy or sell items and services online. Recently in these past year it has reached its highest peak with millions of fans following the crypto world. In this data set we are going through this vast world of Crypto called Etherium and find out how have they been changing through out time. I found this data from kraggle and this the link for the data set “https://www.kaggle.com/sudalairajkumar/cryptocurrencypricehistory?select=coin_Ethereum.csv”. Lets look at some data we have

head(df)

## # A tibble: 6 × 10
##     SNo Name     Symbol Date                 High   Low  Open Close  Volume
##   <dbl> <chr>    <chr>  <dttm>              <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1     1 Ethereum ETH    2015-08-08 23:59:59 2.80  0.715 2.79  0.753  674188
## 2     2 Ethereum ETH    2015-08-09 23:59:59 0.880 0.629 0.706 0.702  532170
## 3     3 Ethereum ETH    2015-08-10 23:59:59 0.730 0.637 0.714 0.708  405283
## 4     4 Ethereum ETH    2015-08-11 23:59:59 1.13  0.663 0.708 1.07  1463100
## 5     5 Ethereum ETH    2015-08-12 23:59:59 1.29  0.884 1.06  1.22  2150620
## 6     6 Ethereum ETH    2015-08-13 23:59:59 1.97  1.17  1.22  1.83  4068680
## # … with 1 more variable: Marketcap <dbl>

Summary of High (quantative variable)

Mean

Here i have tried to find out the mean, median, standard deviation and five number summary of the row High.

mean(df$High,trim = 0, na.rm = FALSE)

## [1] 398.2586

print(" Standard Deviation: ")

## [1] " Standard Deviation: "

sd(df$High)

## [1] 628.0823

print("Five number summary: ")

## [1] "Five number summary: "

summary(df$High)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    0.483   14.265  205.125  398.259  396.495 4362.351

Grapical Representation

There are various ways of presenting the data in R studio. Some of them are shown as follows.

Histrogram

This is a type of graphical representation presented by the R Studio.

hist(df$High,
     main = "Popularity",
     xlab = "Views",
     col = "blue",
     probability = TRUE)
     
#
lines(density(df$High),col = "blue")

Box Plot

boxplot(df$High,
        main ="BoxPlot",
        horizontal = TRUE,
        xlab = "Date")

boxplot.stats(df$High)$out

##   [1]  974.471 1045.080 1075.390 1060.710 1153.170 1266.930 1320.980 1417.380
##   [9] 1337.300 1296.040 1432.880 1400.560 1390.590 1292.630 1090.230 1100.310
##  [17] 1093.220 1167.110 1155.680 1089.100 1023.230 1062.440 1104.660 1080.600
##  [25] 1121.980 1257.770 1256.700 1184.630 1128.660 1161.350 1035.770  991.943
##  [33]  976.595  982.933 1006.565 1153.189 1129.371 1209.429 1282.580 1273.828
##  [41] 1303.872 1347.926 1261.623 1149.240 1134.338 1244.163 1250.506 1290.054
##  [49] 1265.645 1259.450 1432.300 1405.744 1382.684 1271.688 1272.151 1395.111
##  [57] 1467.785 1376.085 1368.074 1356.289 1428.981 1402.400 1378.916 1373.846
##  [65] 1542.991 1660.910 1689.187 1756.511 1738.314 1690.037 1770.591 1815.964
##  [73] 1826.697 1806.539 1861.357 1871.604 1848.154 1833.831 1824.519 1853.668
##  [81] 1949.903 1969.547 2036.286 1974.260 1936.454 1781.409 1710.984 1670.224
##  [89] 1559.029 1524.932 1468.392 1567.695 1597.610 1650.361 1622.954 1547.878
##  [97] 1669.107 1730.924 1835.192 1868.049 1873.803 1843.819 1839.497 1937.646
## [105] 1930.780 1889.197 1817.060 1839.819 1848.646 1841.196 1874.709 1823.353
## [113] 1811.968 1725.109 1740.428 1625.912 1702.923 1732.824 1728.584 1837.188
## [121] 1860.975 1947.838 1989.055 2152.452 2144.962 2110.353 2140.985 2151.223
## [129] 2133.188 2091.516 2102.874 2196.996 2165.192 2199.719 2318.423 2449.688
## [137] 2544.267 2547.556 2497.385 2365.460 2276.777 2345.835 2467.201 2641.095
## [145] 2439.537 2367.741 2354.087 2536.337 2676.393 2757.477 2797.972 2796.055
## [153] 2951.441 2984.892 3450.038 3523.586 3541.463 3598.896 3573.290 3950.165
## [161] 3981.259 4197.473 4178.209 4362.351 4032.564 4171.017 4129.186 3878.896
## [169] 3587.766 3562.465 3437.936 2993.145 2938.205 2483.983 2384.412 2672.596
## [177] 2750.535 2911.736 2888.752 2761.363 2566.939 2472.188 2715.855 2739.738
## [185] 2801.392 2891.255 2857.166 2817.485 2743.441 2845.185 2620.846 2625.071
## [193] 2619.958 2495.415 2447.228 2547.368 2606.433 2639.229 2554.629 2457.175
## [201] 2377.195 2278.415 2275.383 2259.464 1993.160 2043.530 2032.339 2017.759
## [209] 1850.180 1979.958 2139.805 2242.239 2282.989 2274.398 2155.596 2237.567
## [217] 2384.287 2321.923 2346.295

Quantile-Quantile Plots qqplots

qqnorm(df$High,
       main = "QQPlot (Positive Skewed Distribution)",
       ylab = "date",
       xlab = "Time")
qqline(df$High)

Outliers

IQR(df$High, na.rm = FALSE)

## [1] 382.2293

out <- boxplot.stats(df$High)$out
out_ind <- which(df$High %in% c(out))
out_ind

##   [1]  880  881  882  883  884  885  886  887  888  889  890  891  892  893  894
##  [16]  895  896  897  898  899  900  901  902  903  904  905  906  907  908  909
##  [31]  910  911  925  926 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986
##  [46] 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
##  [61] 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
##  [76] 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031
##  [91] 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046
## [106] 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061
## [121] 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076
## [136] 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091
## [151] 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106
## [166] 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121
## [181] 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136
## [196] 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151
## [211] 2152 2153 2154 2155 2156 2157 2158 2159 2160

df[out_ind,]

## # A tibble: 219 × 10
##      SNo Name     Symbol Date                 High   Low  Open Close     Volume
##    <dbl> <chr>    <chr>  <dttm>              <dbl> <dbl> <dbl> <dbl>      <dbl>
##  1   880 Ethereum ETH    2018-01-03 23:59:59  974.  868.  886   963. 5093159936
##  2   881 Ethereum ETH    2018-01-04 23:59:59 1045.  946.  962.  981. 6502859776
##  3   882 Ethereum ETH    2018-01-05 23:59:59 1075.  956.  976.  998. 6683149824
##  4   883 Ethereum ETH    2018-01-06 23:59:59 1061.  995.  995. 1042. 4662219776
##  5   884 Ethereum ETH    2018-01-07 23:59:59 1153. 1043. 1043. 1153. 5569880064
##  6   885 Ethereum ETH    2018-01-08 23:59:59 1267. 1016. 1158. 1149. 8450970112
##  7   886 Ethereum ETH    2018-01-09 23:59:59 1321. 1145. 1146  1300. 7965459968
##  8   887 Ethereum ETH    2018-01-10 23:59:59 1417. 1227. 1300. 1256. 9214950400
##  9   888 Ethereum ETH    2018-01-11 23:59:59 1337. 1135. 1268. 1155. 7235899904
## 10   889 Ethereum ETH    2018-01-12 23:59:59 1296. 1120. 1158. 1273. 5222300160
## # … with 209 more rows, and 1 more variable: Marketcap <dbl>

Graphical Display of multiple variables

plot(df$High,df$Low,
     main = "Multiple variable Graphical Display",
     xlab = "High",
     ylab ="LOw",
     col = "red",
     pch = 25)

print("Correlation: From the scatter plot we can assume that the higher rank showes are more famous and popular. ")

## [1] "Correlation: From the scatter plot we can assume that the higher rank showes are more famous and popular. "

Frequency Table

Frequency table is a table that lists and shows the number of times the items occur.

print("Frequency Table ")

## [1] "Frequency Table "

print(" ")

## [1] " "

head(table(df$Open))

## 
## 0.431589007377625 0.444988012313843 0.489629000425339 0.517620980739594 
##                 1                 1                 1                 1 
## 0.523277997970581 0.534116983413696 
##                 1                 1

Relative Frequency Table

A Relative frequency table is a chart that shows the popularity or mode of a certain type of data based on the population sampled.

print("Relative Frequency Table")

## [1] "Relative Frequency Table"

print("")

## [1] ""

head(table(df$Open)/length(df$Open))

## 
## 0.431589007377625 0.444988012313843 0.489629000425339 0.517620980739594 
##       0.000462963       0.000462963       0.000462963       0.000462963 
## 0.523277997970581 0.534116983413696 
##       0.000462963       0.000462963

Two way table

A two way table is a way to display frequencies or relative frequencies for two categorical variables.

smalldf <- sample_n(df,15)

#create data frame
data_R <- data.frame(df$High,df$Low)

#view data frame
head(data_R)

##    df.High   df.Low
## 1 2.798810 0.714725
## 2 0.879810 0.629191
## 3 0.729854 0.636546
## 4 1.131410 0.663235
## 5 1.289940 0.883608
## 6 1.965070 1.171990

#create two way table from data frame
two_way_table <- table(smalldf$High,smalldf$Low)
#display two way table
head(two_way_table)

##                    
##                     0.518688023090363 9.40268039703369 11.5437002182007
##   0.577395975589752                 1                0                0
##   11.8908996582031                  0                1                0
##   12.2151002883911                  0                0                1
##   14.058500289917                   0                0                0
##   49.9889984130859                  0                0                0
##   93.0567016601562                  0                0                0
##                    
##                     13.7349004745483 48.1323013305664 80.3899002075195
##   0.577395975589752                0                0                0
##   11.8908996582031                 0                0                0
##   12.2151002883911                 0                0                0
##   14.058500289917                  1                0                0
##   49.9889984130859                 0                1                0
##   93.0567016601562                 0                0                1
##                    
##                     141.636455251 151.091123923 194.159 196.351
##   0.577395975589752             0             0       0       0
##   11.8908996582031              0             0       0       0
##   12.2151002883911              0             0       0       0
##   14.058500289917               0             0       0       0
##   49.9889984130859              0             0       0       0
##   93.0567016601562              0             0       0       0
##                    
##                     311.786987304688 465.70424953 516.148010253906 646.61623951
##   0.577395975589752                0            0                0            0
##   11.8908996582031                 0            0                0            0
##   12.2151002883911                 0            0                0            0
##   14.058500289917                  0            0                0            0
##   49.9889984130859                 0            0                0            0
##   93.0567016601562                 0            0                0            0
##                    
##                     726.51190829
##   0.577395975589752            0
##   11.8908996582031             0
##   12.2151002883911             0
##   14.058500289917              0
##   49.9889984130859             0
##   93.0567016601562             0

Side by side plot for one categorical and quantative varible.

This is an example of side by side plot created using Rstudio.

par(mfrow=c(1,2))
plot(df$Volume,
     main="volume",
     xlab = "",
     ylab = "volume")

plot(df$Marketcap,
     main="Scatterplot of Marketcap",
     xlab = "",
     ylab = "Volume")

print("")

## [1] ""

print("Volume Summary: ")

## [1] "Volume Summary: "

summary(df$Volume)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 1.021e+05 3.825e+07 2.149e+09 7.057e+09 9.629e+09 8.448e+10

print("Marketcap Summary: ")

## [1] "Marketcap Summary: "

summary(df$Marketcap)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 3.221e+07 1.136e+09 2.070e+10 4.172e+10 4.231e+10 4.829e+11

Visualization of Data

Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.

barplot(df$Volume, main = "bar chart",xlab = "Volume")

scatter.smooth(df$Volume,main = "Scatter Plot")

require(ggplot2)
data <- read_csv("coin_Ethereum.csv")

## Rows: 2160 Columns: 10

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): Name, Symbol
## dbl  (7): SNo, High, Low, Open, Close, Volume, Marketcap
## dttm (1): Date

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

data <- data.matrix(data[,-1])
heatmap(data, Colv = NA, Rowv = NA, scale = "row")

print("Summary statistics: ")

## [1] "Summary statistics: "

summary(df$Open)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    0.432   13.758  198.425  382.880  386.265 4174.636

Decision Tree

Here I am trying to make the decision tree for volume variable. I am going to make a decision tree between the volumes that are 15000 and 350000.

df <- df %>% mutate(
  genre = factor(Volume == TRUE, levels = c(TRUE, FALSE), labels = c('15000','350000'))
)

To control the data i have created a new variable call smalldf so that it doesnot have to read all the data that the file has.

library(rpart)
library(rpart.plot)
smalldf <- sample_n(df,150)
tree <- rpart(Volume ~., data = smalldf)
tree

## n= 150 
## 
## node), split, n, deviance, yval
##       * denotes terminal node
## 
## 1) root 150 1.669679e+22  7178462000  
##   2) SNo< 1633 116 1.375380e+21  2669434000  
##     4) SNo< 1301.5 87 1.656638e+20   917887400 *
##     5) SNo>=1301.5 29 1.420812e+20  7924075000 *
##   3) SNo>=1633 34 4.916548e+21 22562200000  
##     6) SNo< 1969 19 7.406434e+20 15133940000 *
##     7) SNo>=1969 15 1.799524e+21 31971340000 *

This shows how the tree looks.

rpart.plot(tree, extra = 1)

pred <- predict(tree, smalldf, type="matrix")
head(pred)

##           1           2           3           4           5           6 
## 15133938043   917887364   917887364  7924074922  7924074922   917887364

predict(tree,smalldf) %>%
  head()

##           1           2           3           4           5           6 
## 15133938043   917887364   917887364  7924074922  7924074922   917887364

Confusion Table

Here i have tried to get a confusion table for the row called Volume of the crypto.

confusion_table <- with(smalldf, table(Volume, pred))
head(confusion_table)

##         pred
## Volume   917887364.108073 7924074922.11199 15133938042.7234 31971339341.442
##   102128                1                0                0               0
##   187114                1                0                0               0
##   273614                1                0                0               0
##   411848                1                0                0               0
##   455976                1                0                0               0
##   474391                1                0                0               0

Here i am trying to predict the volumes of top 50 data

library(caret)

## Loading required package: lattice

## 
## Attaching package: 'caret'

## The following object is masked from 'package:purrr':
## 
##     lift

inTrain <- createDataPartition(y = df$Volume, p = .66, list = FALSE)
df_train <- df %>% slice(inTrain)
df_test <- df %>% slice(-inTrain)

dim(df_train)

## [1] 1428   11

dim(df_test)

## [1] 732  11

tree_from_train <- rpart(Volume ~.,data = df_train)
pred_test <- predict(tree_from_train,data = df_train)
head(pred_test)

##          1          2          3          4          5          6 
## 1058117747 1058117747 1058117747 1058117747 1058117747 1058117747

df_no_Volume <- subset(smalldf)
tree_full <- sample_n(df_no_Volume,50) %>% #only keeps 50 of the data points ()
  rpart(Volume ~., data = ., control = rpart.control(minsplit = 2, cp = 0))

rpart.plot(tree_full, extra = 1, roundint=FALSE,
  box.palette = list( "blue", "green")) # specify 2 colors

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in which(col == 0): NAs introduced by coercion

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

## Warning in polygon(x[, i], y[, i], col = col[i], border = border.col[i], :
## supplied color is neither numeric nor character

pred_full <- predict(tree_full)
head(pred_full)

##           1           2           3           4           5           6 
## 25817455560   363433984 23645428606     9688630  6685082868   246775008

Association

Now i am going to use the associations rule for the data i have on etherium.

tail(df)

## # A tibble: 6 × 11
##     SNo Name     Symbol Date                 High   Low  Open Close       Volume
##   <dbl> <chr>    <chr>  <dttm>              <dbl> <dbl> <dbl> <dbl>        <dbl>
## 1  2155 Ethereum ETH    2021-07-01 23:59:59 2274. 2081. 2274. 2114. 29061701793.
## 2  2156 Ethereum ETH    2021-07-02 23:59:59 2156. 2022. 2110. 2150. 31796212554.
## 3  2157 Ethereum ETH    2021-07-03 23:59:59 2238. 2118. 2151. 2226. 17433361641.
## 4  2158 Ethereum ETH    2021-07-04 23:59:59 2384. 2191. 2227. 2322. 18787107473.
## 5  2159 Ethereum ETH    2021-07-05 23:59:59 2322. 2163. 2322. 2199. 20103794829.
## 6  2160 Ethereum ETH    2021-07-06 23:59:59 2346. 2198. 2198. 2325. 20891861314.
## # … with 2 more variables: Marketcap <dbl>, genre <fct>

I have added new libraries to use and convert the data for association analysis.

library(arules)

## Loading required package: Matrix

## 
## Attaching package: 'Matrix'

## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack

## 
## Attaching package: 'arules'

## The following object is masked from 'package:dplyr':
## 
##     recode

## The following objects are masked from 'package:base':
## 
##     abbreviate, write

library(arulesViz)

I used transaction to know which datas are in need of conversion.

transactions(df)

## Warning: Column(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 not logical or factor. Applying
## default discretization (see '? discretizeDF').

## transactions in sparse format with
##  2160 transactions (rows) and
##  2185 items (columns)

colnames(df)[c(1,2,3,4,5,6,7,8,9,10)]

##  [1] "SNo"       "Name"      "Symbol"    "Date"      "High"      "Low"      
##  [7] "Open"      "Close"     "Volume"    "Marketcap"

I am trying to find the data when they were opened more that 2 and closed in less than 1.

df <- df %>% mutate(
        Open = (Open > 2),
        Close = (Close >1)
)

i will let the R clean up all the data and show us what we have.

as(df,"transactions")

## Warning: Column(s) 1, 2, 3, 4, 5, 6, 9, 10 not logical or factor. Applying
## default discretization (see '? discretizeDF').

## transactions in sparse format with
##  2160 transactions (rows) and
##  2181 items (columns)

trans <- transactions(df)

## Warning: Column(s) 1, 2, 3, 4, 5, 6, 9, 10 not logical or factor. Applying
## default discretization (see '? discretizeDF').

summary(trans)

## transactions as itemMatrix in sparse format with
##  2160 rows (elements/itemsets/transactions) and
##  2181 columns (items) and a density of 0.004983061 
## 
## most frequent items:
## Name=Ethereum    Symbol=ETH  genre=350000         Close          Open 
##          2160          2160          2160          2044          1991 
##       (Other) 
##         12960 
## 
## element (itemset/transaction) length distribution:
## sizes
##    9   10   11 
##  115   55 1990 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9.00   11.00   11.00   10.87   11.00   11.00 
## 
## includes extended item information - examples:
##                    labels variables              levels
## 1             SNo=[1,721)       SNo             [1,721)
## 2      SNo=[721,1.44e+03)       SNo      [721,1.44e+03)
## 3 SNo=[1.44e+03,2.16e+03]       SNo [1.44e+03,2.16e+03]
## 
## includes extended transaction information - examples:
##   transactionID
## 1             1
## 2             2
## 3             3

head(colnames(trans))

## [1] "SNo=[1,721)"              "SNo=[721,1.44e+03)"      
## [3] "SNo=[1.44e+03,2.16e+03]"  "Name=Ethereum"           
## [5] "Symbol=ETH"               "Date=2015-08-08 23:59:59"

inspect(trans[1:3])

##     items                           transactionID
## [1] {SNo=[1,721),                                
##      Name=Ethereum,                              
##      Symbol=ETH,                                 
##      Date=2015-08-08 23:59:59,                   
##      High=[0.483,130),                           
##      Low=[0.421,121),                            
##      Open,                                       
##      Volume=[1.02e+05,6.26e+08),                 
##      Marketcap=[3.22e+07,1.31e+10),              
##      genre=350000}                              1
## [2] {SNo=[1,721),                                
##      Name=Ethereum,                              
##      Symbol=ETH,                                 
##      Date=2015-08-09 23:59:59,                   
##      High=[0.483,130),                           
##      Low=[0.421,121),                            
##      Volume=[1.02e+05,6.26e+08),                 
##      Marketcap=[3.22e+07,1.31e+10),              
##      genre=350000}                              2
## [3] {SNo=[1,721),                                
##      Name=Ethereum,                              
##      Symbol=ETH,                                 
##      Date=2015-08-10 23:59:59,                   
##      High=[0.483,130),                           
##      Low=[0.421,121),                            
##      Volume=[1.02e+05,6.26e+08),                 
##      Marketcap=[3.22e+07,1.31e+10),              
##      genre=350000}                              3

image(trans)

itemFrequencyPlot(trans,topN = 10)

We can see from the graph that there is not a lot of diiference in opening and closing of the day.

vertical <- as(trans, "tidLists")
as(vertical, "matrix")[1:10, 1:5]

##                              1     2     3     4     5
## SNo=[1,721)               TRUE  TRUE  TRUE  TRUE  TRUE
## SNo=[721,1.44e+03)       FALSE FALSE FALSE FALSE FALSE
## SNo=[1.44e+03,2.16e+03]  FALSE FALSE FALSE FALSE FALSE
## Name=Ethereum             TRUE  TRUE  TRUE  TRUE  TRUE
## Symbol=ETH                TRUE  TRUE  TRUE  TRUE  TRUE
## Date=2015-08-08 23:59:59  TRUE FALSE FALSE FALSE FALSE
## Date=2015-08-09 23:59:59 FALSE  TRUE FALSE FALSE FALSE
## Date=2015-08-10 23:59:59 FALSE FALSE  TRUE FALSE FALSE
## Date=2015-08-11 23:59:59 FALSE FALSE FALSE  TRUE FALSE
## Date=2015-08-12 23:59:59 FALSE FALSE FALSE FALSE  TRUE

trans

## transactions in sparse format with
##  2160 transactions (rows) and
##  2181 items (columns)

I am starting to use Apriori rules from now on.

its <- apriori(trans, parameter=list(target = "frequent"))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##          NA    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen            target  ext
##      10 frequent itemsets TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 216 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[2180 item(s), 2160 transaction(s)] done [0.00s].
## sorting and recoding items ... [20 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10 done [0.00s].
## sorting transactions ... done [0.00s].
## writing ... [4127 set(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

its

## set of 4127 itemsets

inspect(head(its, n = 10))

##      items                           support   transIdenticalToItemsets count
## [1]  {SNo=[1,721)}                   0.3333333 0                        720  
## [2]  {High=[0.483,130)}              0.3333333 0                        720  
## [3]  {Low=[0.421,121)}               0.3333333 0                        720  
## [4]  {Volume=[1.02e+05,6.26e+08)}    0.3333333 0                        720  
## [5]  {Marketcap=[3.22e+07,1.31e+10)} 0.3333333 0                        720  
## [6]  {SNo=[721,1.44e+03)}            0.3333333 0                        720  
## [7]  {SNo=[1.44e+03,2.16e+03]}       0.3333333 0                        720  
## [8]  {Marketcap=[1.31e+10,2.9e+10)}  0.3333333 0                        720  
## [9]  {Low=[121,287)}                 0.3333333 0                        720  
## [10] {Volume=[6.95e+09,8.45e+10]}    0.3333333 0                        720

ggplot(tibble(`Itemset Size` = factor(size(its))), aes(`Itemset Size`)) + geom_bar()

rules <- apriori(trans, parameter = list(support = 0.05, confidence = 0.9))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.9    0.1    1 none FALSE            TRUE       5    0.05      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 108 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[2180 item(s), 2160 transaction(s)] done [0.00s].
## sorting and recoding items ... [20 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10 done [0.00s].
## writing ... [16904 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

Here i have crated a rule for the data set.

inspect(head(rules))

##     lhs              rhs                support   confidence coverage  lift    
## [1] {}            => {Open}             0.9217593 0.9217593  1.0000000 1.000000
## [2] {}            => {Close}            0.9462963 0.9462963  1.0000000 1.000000
## [3] {}            => {Name=Ethereum}    1.0000000 1.0000000  1.0000000 1.000000
## [4] {}            => {Symbol=ETH}       1.0000000 1.0000000  1.0000000 1.000000
## [5] {}            => {genre=350000}     1.0000000 1.0000000  1.0000000 1.000000
## [6] {SNo=[1,721)} => {High=[0.483,130)} 0.3009259 0.9027778  0.3333333 2.708333
##     count
## [1] 1991 
## [2] 2044 
## [3] 2160 
## [4] 2160 
## [5] 2160 
## [6]  650

plot(rules,jitter = 1)

The graph shows the confidence of the data.

plot(rules, shading = "order")

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

plot(head(rules, n = 20), method = "graph")

Linear Regression

tail(df)

## # A tibble: 6 × 11
##     SNo Name     Symbol Date                 High   Low Open  Close       Volume
##   <dbl> <chr>    <chr>  <dttm>              <dbl> <dbl> <lgl> <lgl>        <dbl>
## 1  2155 Ethereum ETH    2021-07-01 23:59:59 2274. 2081. TRUE  TRUE  29061701793.
## 2  2156 Ethereum ETH    2021-07-02 23:59:59 2156. 2022. TRUE  TRUE  31796212554.
## 3  2157 Ethereum ETH    2021-07-03 23:59:59 2238. 2118. TRUE  TRUE  17433361641.
## 4  2158 Ethereum ETH    2021-07-04 23:59:59 2384. 2191. TRUE  TRUE  18787107473.
## 5  2159 Ethereum ETH    2021-07-05 23:59:59 2322. 2163. TRUE  TRUE  20103794829.
## 6  2160 Ethereum ETH    2021-07-06 23:59:59 2346. 2198. TRUE  TRUE  20891861314.
## # … with 2 more variables: Marketcap <dbl>, genre <fct>

lets see if volume can predict the high.

linear <- lm(High ~ Volume, df)
summary(linear)

## 
## Call:
## lm(formula = High ~ Volume, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1591.78  -179.31   -65.71   153.79  2113.33 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.547e+01  1.025e+01   7.367 2.48e-13 ***
## Volume      4.574e-08  8.023e-10  57.012  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 396.8 on 2158 degrees of freedom
## Multiple R-squared:  0.601,  Adjusted R-squared:  0.6008 
## F-statistic:  3250 on 1 and 2158 DF,  p-value: < 2.2e-16

Here using the predict function i am trying to predict the high using volume.

a <- data.frame(Volume = c(674188))
predict(linear,a)

##        1 
## 75.50184

I am just testing out different features provided by Rstudio.

sum((predict(linear,df)-df$High)^2)

## [1] 339838018

sqrt(sum((predict(linear,df)-df$High)^2)/178)

## [1] 1381.739

I am just calling this function to get all the numbers in one place.

summary(linear)

## 
## Call:
## lm(formula = High ~ Volume, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1591.78  -179.31   -65.71   153.79  2113.33 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.547e+01  1.025e+01   7.367 2.48e-13 ***
## Volume      4.574e-08  8.023e-10  57.012  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 396.8 on 2158 degrees of freedom
## Multiple R-squared:  0.601,  Adjusted R-squared:  0.6008 
## F-statistic:  3250 on 1 and 2158 DF,  p-value: < 2.2e-16

Here i am plotting the graph for the linear to be examined the efficancy of performing the regression.

plot(linear)

Regression With No Intercept

I am revisiting it and force the intercept to be zero, since it was not statistically significant.

linearNoIntercept <- lm(High ~ 0 + Volume, df)
summary(linearNoIntercept)

## 
## Call:
## lm(formula = High ~ 0 + Volume, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1714.67  -131.36     9.26   220.89  2050.25 
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)    
## Volume 4.901e-08  6.768e-10    72.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 401.7 on 2159 degrees of freedom
## Multiple R-squared:  0.7083, Adjusted R-squared:  0.7082 
## F-statistic:  5242 on 1 and 2159 DF,  p-value: < 2.2e-16

We see a slight increase in some of the p-values but overall this helps our regression in interprability.

Multiple Regression

From this point i am starting to do multiple regression and check how the data comes out if i add the Open variable in this system.

multiple <- lm(High ~ Volume + Open, df)
summary(multiple)

## 
## Call:
## lm(formula = High ~ Volume + Open, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1576.39  -186.16   -73.07   145.33  2121.23 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 9.825e-01  3.049e+01   0.032  0.97429    
## Volume      4.533e-08  8.166e-10  55.511  < 2e-16 ***
## OpenTRUE    8.394e+01  3.236e+01   2.594  0.00955 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 396.3 on 2157 degrees of freedom
## Multiple R-squared:  0.6022, Adjusted R-squared:  0.6019 
## F-statistic:  1633 on 2 and 2157 DF,  p-value: < 2.2e-16

Since the p-value is low so I probably should not keep or use it.

multiple2 <- lm(High ~ Volume : Open, df)
summary(multiple2)

## 
## Call:
## lm(formula = High ~ Volume:Open, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1585.97  -181.92   -67.64   151.12  2116.31 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       7.904e+01  1.048e+01   7.539 6.94e-14 ***
## Volume:OpenFALSE -3.217e-05  2.026e-05  -1.588    0.112    
## Volume:OpenTRUE   4.559e-08  8.079e-10  56.427  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 396.7 on 2157 degrees of freedom
## Multiple R-squared:  0.6015, Adjusted R-squared:  0.6011 
## F-statistic:  1628 on 2 and 2157 DF,  p-value: < 2.2e-16

This interaction gives a different coefficient for Volume depending on when it was open. we can see above that p value are very low so they are beta cannot be 0.

multiple3 <- lm(High ~ Volume * Open, df)
summary(multiple3)

## 
## Call:
## lm(formula = High ~ Volume * Open, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1576.39  -186.16   -73.07   145.33  2121.23 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)  
## (Intercept)      8.340e-01  3.961e+01   0.021   0.9832  
## Volume           1.962e-07  2.568e-05   0.008   0.9939  
## OpenTRUE         8.409e+01  4.107e+01   2.047   0.0407 *
## Volume:OpenTRUE -1.509e-07  2.568e-05  -0.006   0.9953  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 396.4 on 2156 degrees of freedom
## Multiple R-squared:  0.6022, Adjusted R-squared:  0.6017 
## F-statistic:  1088 on 3 and 2156 DF,  p-value: < 2.2e-16

This one looks quite similar to the original multiple regression with the amount of volume increasing a lot if they open in high amount.

Visualizations

Here i have tried to show my data in a graph so that a reader can get a graph data. I have used three different ways to bring out the graphs which are on same data but they all have different results.

ggplot(df,aes(x= High, y = Volume))+
  geom_jitter(color = "Red") +
  geom_smooth(method = lm)

## `geom_smooth()` using formula 'y ~ x'

From this graph i can say that i can use them to predict the value of high. as you can see the shadow by the line is the confidence interval and some do fall out of the line.

ggplot(df,aes(x= High, y = Volume, color = Low))+
  geom_point()+
  geom_smooth(method = lm)

## `geom_smooth()` using formula 'y ~ x'

This graph is same as the previous graph so i can say that adding the variable Low in the equation does not change anything for High or volume.

ggplot(df,aes(x=High, y=Volume))+
  geom_point()+
  geom_smooth()

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

These are all the graph that show relation between High of the Coin in a day and the Volume of the coin.

Data Analysis

This project contains the dataset of all the changes that has happened in etherium coin from past couple years. This dataset contains lots of variables like, “SNo”, “Name”, “Symbol”, “Date”, “High”, “Low”, “Open”, “Close”, “Volume”, “Marketcap”.While sampling for the data i have tried to pick out the most usefull of the data and being aware that i wont pick any dirty data. All the data for this project was extracted from “https://www.kaggle.com/sudalairajkumar/cryptocurrencypricehistory?select=coin_Ethereum.csv” and i did clear some data since it had way too many data in it. I have try to use some of the graphical representation for some data. I have tried using different library to use all kinds of functions in the programming. I have also tried to apply the Association rules and regression on the data i have and tried to predict the high of a day using volume.

Conclusion

We are in the end of this project doing this project has helped me a lots of different things about Rstudio as well as how we can read and represent the data. From This data set i have learned that there has been a lot of changes in crypto market past couple years.A crypto currency has a lot of variable. The marketcap of the crypto helps to determine their open and close of the day. As for now according to this data set their has been a lot of up and down in the market specially in year 2020.I have also added a decision tree in this project. I got most of the data that i wanted from this process. There has been cases and function i could not make it work so i have left them as as a comment. I have also used Association rules in this data and also confusion table has also been created. With everything i have found i can say that the market cap is the link to connect all the data in crypto business.I found that they are connected and with the help of volume of etherium you can predict the high for the day.