This is one the the research data on wildblueberries and used to predict yield and behaviour of bees. I found this data in kaggle database. This data base includes 251 observation and all together 7 variables of different catogerical and quantative variables. clonesize,rain,fruitsize are the catogerical and average temperature, seeds and yields are quantative variables.
I took alot of time to finalize my final data. I wanted to work in the sales of Avocado and was working on it but i figure out alot of dirty data. i used some techique to short and filter some data to clean some of my dirty data.
df <- read.csv("wildblueberry.csv")
mean(df$yield)
## [1] 5817.303
sd(df$yield)
## [1] 1490.436
summary(df$yield)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1946 4703 6031 5817 7020 8622
hist(df$yield,
main = "HIST yield",
xlab = "YIELD(KG)",
col = "purple",
probability = TRUE)
lines(df$yield, col ="blue")
I used hist function to show in histogram of function of yield
boxplot(df$yield,
main = "BOX PLOT",
horizontal = FALSE,
xlab = " YIELD (KG)")
I used boxplot.
qqnorm(df$yield,
main = "QQplot Yield",
ylab = "YIELD (KG)")
qqline(df$yield, col = "red")
qqline function to show path with red line. qqnorm function is used to show in grapgical distribution.
In data set, there will be some data which are relative far from each others those value are called outliers. i have used to IQR function to detrerming some outliers.
IQR(df$yield, na.rm = FALSE)
## [1] 2316.45
boxplot(df$yield, plot = FALSE) $out
## numeric(0)
outliners <- boxplot(df$yield, plot = FALSE)$out
outliners
## numeric(0)
This are the values of data set which are outliers. We dont have alot of outliners.
plot(df$yield,df$averageT,
main = "MULTIPLE VARIABLES GRAPHICAL DISPLAY",
xlab = "YIELD(KG)",
ylab = "AVERAGE TEMP(.C)",
pch = 20)
I have used plot for this multiple variable graphical display.
TABLES FREQUENCY TABLE
table(df$clonesize)
##
## 20 35
## 226 25
table(df$clonesize/length(df$clonesize))
##
## 0.0796812749003984 0.139442231075697
## 226 25
two_way_table <-table(df$fruitsize,df$clonesize)
This table helps us to understand that small size colony of bees helps to produce more large fruits than big size colony.
par(mfrow=c(1,2))
plot(df$yield,
main = "scatter plot of yield(KG)",
xlab = "",
ylab = "yield")
plot(df$clonesize,
main = "Scatter plot of colony",
xlab = "",
ylab = "Colone size")
print("yield summary:")
## [1] "yield summary:"
summary(df$yield)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1946 4703 6031 5817 7020 8622
print("colonesize summary:")
## [1] "colonesize summary:"
summary(df$clonesize)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.00 20.00 20.00 21.49 20.00 35.00
barplot(df$clonesize,
main = "BAR CHART",
xlab = "Colone Size")
scatter.smooth(df$clonesize,
main = "SCATTER PLOT")
data <-read.csv("wildblueberry.csv",header = TRUE)
data <-data.matrix(data[,-1])
heatmap(t(data),
main = "HEAT MAP",
Rowv = NA,
Colv = NA,
col = heat.colors(200,alpha = 1,rev = FALSE),
scale = "row")
summary(df$seeds)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.3 32.3 36.0 35.6 39.1 46.9
this is my first time working in R markdown and i really liked the results come even it give took alot of time to figure out things.I really enjoy the project. i felt like i can ply with some data.