Problem: 1.1
##Reading Excel File
library(readxl)
Q1 <- read_excel("C:/Users/kamil/Desktop/CE605.xlsx", sheet = "one")
View(Q1)
library(e1071) #Package for skewness
##Statistical Analysis of RDEN & ACCEL
#Analysis on RDEN
print("Analysis on RDEN")
[1] "Analysis on RDEN"
RDEN <- Q1$RDEN
mean(RDEN)
[1] 61
sd(RDEN)
[1] 17.38572
skewness(RDEN)
[1] 0.6007688
stem(RDEN, scale = 1, width = 80, atom = 1e-08)
The decimal point is 1 digit(s) to the right of the |
3 | 02
4 | 00034
5 | 000233335555888
6 | 444558
7 | 022558
8 | 0
9 | 0
10 | 000
#Analysis on ACCEL
print("Analysis on ACCEL")
[1] "Analysis on ACCEL"
ACCEL <- Q1$ACCEL
mean(ACCEL)
[1] 0.3272051
sd(ACCEL)
[1] 0.1423584
skewness(ACCEL)
[1] 0.6774351
stem(ACCEL, scale = 1, width = 80, atom = 1e-08)
The decimal point is 1 digit(s) to the left of the |
1 | 2455779
2 | 022233568999
3 | 011335568
4 | 22235
5 | 2569
6 | 18
#Scatter plot and R value
print("Scatter plot and R value")
[1] "Scatter plot and R value"
plot(RDEN, ACCEL, main = "Scatter Plot",
xlab = "RDEN", ylab = "ACCEL",
pch = 19, frame = FALSE)
abline(lm(ACCEL ~ RDEN), col = "blue")
cor(RDEN, ACCEL)
[1] 0.2806802
Inferences from the scatter Diagram: 1) Poor correlation 2) low coefficients of variation 3) Posiitve skewness —————————————————————————————-
Problem 1.2
library(readxl)
Q2 <- read_excel("C:/Users/kamil/Desktop/CE605.xlsx", sheet = "two")
head(Q2)
library(e1071) #Package for skewness
print("Analysis on Annual Max flow")
[1] "Analysis on Annual Max flow"
Annual_Max <- Q2$Max
mean(Annual_Max)
[1] 5407.869
sd(Annual_Max)
[1] 1749.072
hist(Annual_Max)
plot(ecdf(Annual_Max))
quantile(Annual_Max)
0% 25% 50% 75% 100%
2220 4030 5400 6810 8940
boxplot(Annual_Max)
Comments on Distribution: 1)The distribution has a positive skewness. 2) The maximum frequency lies between 5000-6000. 3) The data is equally distributed on lower and upper sides.
Frequency based probability of exceedance of 5000 m3/s of rainfall.
Prabability = (No. of samples greater than 5000)/ (total no. of samples)
= 36/61 = 0.590163934 ——————————————————————————-
Problem: 1.6
library(readxl)
Q3 <- read_excel("C:/Users/kamil/Desktop/CE605.xlsx", sheet = "three")
head(Q3)
##Statistical Analysis of Maximum Load and COmpressive strength
#Analysis on Load
print("Analysis on Max Load")
[1] "Analysis on Max Load"
Load <- Q3$Load
mean(Load)
[1] 917.6875
sd(Load)
[1] 86.38805
skewness(Load)
[1] -1.914846
stem(Load, scale = 1, width = 80, atom = 1e-08)
The decimal point is 2 digit(s) to the right of the |
6 | 5
7 |
8 | 35
9 | 014445566789
10 | 0
#Analysis on Compressive Strength
print("Analysis on Compressive Strength")
[1] "Analysis on Compressive Strength"
Strength <- Q3$Strength
mean(Strength)
[1] 40.29688
sd(Strength)
[1] 4.246782
skewness(Strength)
[1] -1.375219
stem(Strength, scale = 1, width = 80, atom = 1e-08)
The decimal point is 1 digit(s) to the right of the |
2 | 9
3 | 4
3 | 789
4 | 02222333444
#Two (or comparative) stem n leaf plot
library(aplpack)
stem.leaf.backback(Load,Strength)
______________________________________________________
1 | 2: represents 120, leaf unit: 10
Load Strength
______________________________________________________
| 0 |2333344444444444 (16)
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
1 4| 6 |
| 7 |
4 942| 8 |
(12) 988765544330| 9 |
| 10 |
______________________________________________________
n: 16 16
______________________________________________________
#Scatter plot and R value
print("Scatter plot and R value")
[1] "Scatter plot and R value"
plot(Load, Strength, main = "Scatter Plot",
xlab = "Load", ylab = "Strength",
pch = 19, frame = FALSE)
abline(lm(Strength ~ Load), col = "blue")
cor(Load, Strength)
[1] 0.8457072
Inferences: 1) Negative Skewness 2) positive Correlation 3) One outlier 4) Fitting to a linear trendline ———————————————————————————————–
Problem: 1.12
library(readxl)
Q4 <- read_excel("C:/Users/kamil/Desktop/CE605.xlsx", sheet = "four")
head(Q4)
print("Analysis on Compressive Strength")
[1] "Analysis on Compressive Strength"
Strength <- Q4$Strength
hist(Strength, breaks = seq(0, 70, 8), main = "Histogram with w=8")
hist(Strength, breaks = seq(0, 70, 2), main = "Histogram with w=2")
boxplot(Strength)
Inferences from the histogram: 1) Assymetrical distribution of data 2) Large Skweness 3) 4 outliers The shape of the histogram and large number of outliers support contractor’s claim. ————————————————————————————————
Problem: 1.15
library(readxl)
Q5 <- read_excel("C:/Users/kamil/Desktop/CE605.xlsx", sheet = "five")
head(Q5)
Chloride <- Q5$Chloride
Phosphate <- Q5$Phosphate
cv_Cl <- sd(Chloride, na.rm=TRUE)/mean(Chloride, na.rm=TRUE)*100
cv_P <- sd(Phosphate, na.rm=TRUE)/mean(Phosphate, na.rm=TRUE)*100
plot(Chloride,Phosphate)
cor(Chloride,Phosphate)
[1] 0.02708073
Inferences from Scatter Plot: 1) Poor corelation There is no association for predictive purposes. ————————————————————————————————
Problem 1.20
library(readxl)
Q6 <- read_excel("C:/Users/kamil/Desktop/CE605.xlsx", sheet = "six")
head(Q6)
min5 <- Q6$d5
min10 <- Q6$d10
min20 <- Q6$d20
min30 <- Q6$d30
min40 <- Q6$d40
min50 <- Q6$d50
min60 <- Q6$d60
min120 <- Q6$d120
min180 <- Q6$d180
library(expss)
Q6 %>%
tab_cells(min5, min10, min20, min30, min40, min50, min60, min120, min180) %>%
tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, skewness, method = list) %>%
tab_pivot()
| | #Total | | |
| | Mean | Std. dev. | skewness |
| ------ | ------ | --------- | -------- |
| min5 | 13.5 | 5.9 | 0.4 |
| min10 | 20.4 | 9.2 | 0.4 |
| min20 | 31.9 | 16.1 | 0.5 |
| min30 | 38.7 | 20.4 | 0.6 |
| min40 | 45.5 | 26.0 | 0.8 |
| min50 | 52.5 | 31.7 | 0.8 |
| min60 | 57.9 | 36.8 | 0.8 |
| min120 | 74.8 | 46.3 | 1.0 |
| min180 | 83.7 | 46.2 | 0.9 |
Inferences from the data: 1) The mean and standard deviations increases with incresing durations. 2) The distribution is more skewed. 3) There is high certainty for longer storms. 4) The lower variability of short bursts of rainfall suggest that rainfalls of short duration, 30 mins or less, have similar physical charactersitcs. ———————————————————————————————
Problem 1.21
library(readxl)
Q7 <- read_excel("C:/Users/kamil/Desktop/CE605.xlsx", sheet = "seven")
head(Q7)
library(expss)
Q7 = apply_labels(Q7,
Jan = "January",
Feb = "February",
Mar = "March",
Apr = "April",
May = "May",
Jun = "June",
Jul = "July",
Aug = "August",
Sep = "September",
Oct = "October",
Nov = "November",
Dec = "December"
)
Q7 %>%
tab_cells(Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec) %>%
tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, method = list) %>%
tab_pivot()
| | #Total | |
| | Mean | Std. dev. |
| --------- | ------ | --------- |
| January | 346.3 | 4.9 |
| February | 347.9 | 5.0 |
| March | 348.8 | 5.0 |
| April | 348.2 | 4.5 |
| May | 344.9 | 4.9 |
| June | 341.0 | 4.7 |
| July | 337.4 | 5.4 |
| August | 336.0 | 5.2 |
| September | 337.8 | 4.1 |
| October | 342.0 | 4.6 |
| November | 345.5 | 5.0 |
| December | 347.2 | 4.2 |
Q7a <- read_excel("C:/Users/kamil/Desktop/CE605.xlsx", sheet = "sevena")
head(Q7a)
Q7a = apply_labels(Q7a,
zero = "1980",
one = "1981",
two = "1982",
three = "1983",
four = "1984",
five = "1985",
six = "1986",
seven = "1987",
eight = "1988"
)
Q7a %>%
tab_cells(zero, one, two, three, four, five, six, seven, eight) %>%
tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, method = list) %>%
tab_pivot()
| | #Total | |
| | Mean | Std. dev. |
| ---- | ------ | --------- |
| 1980 | 337.0 | 4.8 |
| 1981 | 338.2 | 4.8 |
| 1982 | 340.2 | 4.6 |
| 1983 | 341.3 | 4.0 |
| 1984 | 345.1 | 4.8 |
| 1985 | 345.2 | 5.1 |
| 1986 | 346.5 | 5.3 |
| 1987 | 348.7 | 4.9 |
| 1988 | 350.2 | 4.6 |
From the graph, one can conclude: 1) The CO2 concentration is more during winter months. 2) It has lowest value in the month of July-August (Monsoon Period). ———————————————————————————————– ———————————————————————————————–
Submitted by: Mohd Kamil Vakil (PhD, 18203264)