PROBABILITY THEORY AND INTRODUCTORY STATISTICS
SWAPNESH TIWARI
THE MEGASTORE EMPLOYEE REPORT
Date : 14 October, 2022








INTRO



In the 21st century, Jobs have been one of the most important aspects for every individual to survive and strive, some of the jobs are designed the way it is comfortable for people to work. Since in the USA people need money and to provide for the government has created a system called hourly wages where people work whenever they are comfortable and have flexible hours, especially for students.
This data set is a representation of the rate at which hourly wages are set and defined for each employee in a total of 20 stores. This project will focus on using R studio to define some statistics, analyze the statistics, interpret the statistics and give meaningful observations using the data we have gathered.



SALARY SURVEYS


As discussed above in the 21st century Jobs are a very important aspect, Jobs also create salary and salary creates massive amounts of data, from companies’ financial department to government financial department. Jobs are a very important aspect of every company, and every job needs an employee which is the most important asset for every organization. Every organization tries to pursue a good quality of life for their employees, to do this organization can use the most important strategy which is a salary survey that gives an edge on are employees paid a fair amount of money according to their market value? Secondly, whenever you conduct a salary survey you develop a psychological feeling for every employee that you are concerned about their well-being.



CONFIDENCE INTERVALS :



Let’s take an example of the airlines as a practical application, Airlines XYZ has to design a seating layout for their aircraft which is also not too condensed and can also accommodate more passengers. To do this aircraft company hires a statistician and a data analyst to get the desired outcome. The data analyst starts gathering data from the population, Data analyst uses the mean weight of a passenger to determine the size of the seat, but the population is very huge, and it I difficult to get the mean weight of every passenger, so to accomplish this he takes a group of population which we call as the sample, the problem is will the sample determine that if the mean weight of a passenger is 72kgs is accurate?, to do that he has to get data from every population which is impossible therefore he uses a strategy called as Confidence Interval. So basically, confidence interval will give the analyst confidence that the mean weight lies and is related to the given population. Confidence interval is affected by two factors the sample size and variation in the weight of the population. So, if more variation is present in the population the wider the confidence interval will be and with less variation the narrower it will be. Similar to the sample size, the larger the sample more accurate and confident we will be about the data. of applications.






Analysis





# LIBRARIES 

library(readxl)
library(tidyverse)
library(dplyr)
library(RColorBrewer)
library(dbplyr)
library(magrittr)
library(mosaic)
library(ggplot2)

#PATH TO DATA SOURCE (DATASET)

StoreSet <- read_excel("C://Users//User//OneDrive//Documents//ALY_6010//Datasets//M2Data-1.xlsx")
PetSet <- read_excel("C://Users//User//OneDrive//Documents//ALY_6010//Datasets//M2Data-1.xlsx",sheet = "pets")







FIRST TASK

#Calculating Confidence Intervals for the below values 1.1 , for this I have calculated the grand mean and grand standard deviation of all the values using excel and then entered here manually to use it to calculate confidence interval.

#Calculating values:

StoreMean = 11.81
StoreSD = 3.25 #Value round up to 3.25 from 3.248
n = 247

#The above values were calculated using excel we can also calculate the above values using R as follows : 

#Calculate the grand mean and standard deviation value using R
#Mean
Sample  = 247

SumOfStores =  sum(StoreSet$`Store 1`,na.rm=T)+sum(StoreSet$`Store 2`,na.rm = T)+sum(StoreSet$`Store 3`,na.rm = T)+sum(StoreSet$`Store 4`,na.rm = T)+sum(StoreSet$`Store 5`,na.rm = T)+sum(StoreSet$`Store 6`,na.rm = T)+sum(StoreSet$`Store 7`,na.rm = T)+sum(StoreSet$`Store 8`,na.rm = T)+sum(StoreSet$`Store 9`,na.rm = T)+sum(StoreSet$`Store 10`,na.rm = T)+sum(StoreSet$`Store 11`,na.rm = T)+sum(StoreSet$`Store 12`,na.rm = T)+sum(StoreSet$`Store 13`,na.rm = T)+sum(StoreSet$`Store 14`,na.rm = T)+sum(StoreSet$`Store 15`,na.rm = T)+sum(StoreSet$`Store 16`,na.rm = T)+sum(StoreSet$`Store 17`,na.rm = T)+sum(StoreSet$`Store 18`,na.rm = T)+sum(StoreSet$`Store 19`,na.rm = T)+sum(StoreSet$`Store 20`,na.rm = T)
SalaryMean  = round(SumOfStores/Sample,2)

#Standard Deviation

StoreSTDEV=c(StoreSet$`Store 1`,StoreSet$`Store 2`,StoreSet$`Store 3`,StoreSet$`Store 4`,StoreSet$`Store 5`,StoreSet$`Store 6`,StoreSet$`Store 7`,StoreSet$`Store 8`,StoreSet$`Store 9`,StoreSet$`Store 10`,StoreSet$`Store 11`,StoreSet$`Store 12`,StoreSet$`Store 13`,StoreSet$`Store 14`,StoreSet$`Store 15`,StoreSet$`Store 16`,StoreSet$`Store 17`,StoreSet$`Store 18`,StoreSet$`Store 19`,StoreSet$`Store 20`)
GrandSD= sd(StoreSTDEV,na.rm = T)

GrandTable = rbind(SalaryMean, GrandSD)

knitr::kable(GrandTable) %>% 
  kableExtra::kable_classic()
SalaryMean 11.810000
GrandSD 3.248082
#For 90%, 92% and 96% interval
# TO do this we will take from the value to the farthest point of interest, so for example we have value as 0.90 but the rest value 0.10 is distributed on the both sides as 0.05 and 0.05 so add the 0.05 to 0.90 = 0.95 and therefore 0.95 is the value for which we will calculate.

CI90 = qnorm(0.95) 
CI92 = qnorm(0.96)
CI96 = qnorm(0.98)


#Or we can also use the other which are the alphas divided by 2 which are 0.05, 0.04 and 0.02 percent of the area to calculate the confidence interval.

CI0.5 = qnorm(0.05)
CI0.4 = qnorm(0.04)
CI0.2 = qnorm(0.02)

#therefore we calculate the margin of error to estimate the true mean value, we will calculate it for all the values above
Error90 = pnorm(0.95) * StoreSD/sqrt(n)
Error92 = pnorm(0.96) * StoreSD/sqrt(n)
Error96 = pnorm(0.98) * StoreSD/sqrt(n)

#Calculating width 

Width90 = 2 * Error90
Width92 = 2 * Error92
Width96 = 2 * Error96

#Now to calculate the lower and upper bounds for : 

#90%
LowerBound90 = StoreMean - CI90
UpperBound90 = StoreMean + CI90

#92%
LowerBound92 = StoreMean - CI92
UpperBound92 = StoreMean + CI92

#96%
LowerBound96 = StoreMean - CI96
UpperBound96 = StoreMean + CI96

#Creating a table

Objects1 = c(CI90, Error90,LowerBound90, UpperBound90, Width90)

Objects2 = c(CI92, Error92,LowerBound92,UpperBound92,Width92)

Objects3 = c(CI96, Error96,LowerBound96, UpperBound96,Width96)

Table90   = matrix(Objects1, ncol = 5,  byrow = TRUE)
Values   = c("90%")
Name     = c("Z Score ","Margin Of Error ","Lower Confidence interval ","Upper Confidence interval ", "Width ")
colnames(Table90)  = Name
rownames(Table90)  = Values

Table92    = matrix(Objects2,ncol = 5,  byrow = TRUE)
Values   = c("92%")
Name     = c("Z Score ","Margin Of Error ","Lower Confidence interval ","Upper Confidence interval ", "Width ")
colnames(Table92)  = Name
rownames(Table92)  = Values

Table96    = matrix(Objects3,ncol = 5,  byrow = TRUE)
Values   = c("96%")
Name     = c("Z Score ","Margin Of Error ","Lower Confidence interval ","Upper Confidence interval ", "Width ")
colnames(Table96)  = Name
rownames(Table96)  = Values


FinalTable = rbind(Table90,Table92, Table96)

#Creating a beautiful table using Knitr Kable function and then adding visualization using the kableExtra package.
knitr::kable(FinalTable) %>% 
  kableExtra::kable_material_dark()
Z Score Margin Of Error Lower Confidence interval Upper Confidence interval Width
90% 1.644854 0.1714194 10.165146 13.45485 0.3428388
92% 1.750686 0.1719423 10.059314 13.56069 0.3438846
96% 2.053749 0.1729731 9.756251 13.86375 0.3459461


According to our table above we have different values, since we have pick a sample we need to know how confident are we with the hourly wages of the employees in average of all stores.

  • We can see in the table above that 10.1651464 to 13.4548536 is the mean hourly wages of an employee, being 90% confident which shows minimal variations and therefore we have narrow confidence interval.
  • Secondly, 10.0593139 to 13.5606861 is the mean hourly wages of an employee, being 92% confident which again shows moderate variations and therefore we still have narrow confidence interval.
  • Thirdly, 9.7562511 to 13.8637489 is the mean hourly wages of an employee, being 96% confident which shows more variations in data and therefore we have wider confidence interval.
  • The width of the different confidence intervals are different , it increases as we increase our confidence interval, it is because as we increase our margin of error thus resulting in more wider and flatter graph distribution.







    FIRST TASK 1.3

    Mean1 = 11.84
    SD1 = 1.21
    n1 = 20
    
    #We have a grand mean of 8.450 got by diving the values in store 1 by sample number = 20, same for the standard deviation 
    
    #For 90%, 92% and 96% interval
    # TO do this we will take from the value to the farthest point of interest, so for example we have value as 0.90 but the rest value 0.10 is distributed on the both sides as 0.05 and 0.05 so add the 0.05 to 0.90 = 0.95 and therefore 0.95 is the value for which we will calculate.
    
    Store1CF90 = qt(0.95,19) 
    Store1CF92 = qt(0.96,19)
    Store1CF96 = qt(0.98,19)
    
    
    #Or we can also use the other which are the alphas divided by 2 which are 0.05, 0.04 and 0.02 percent of the area to calculate the confidence interval.
    
    Store1CI0.5 = qt(0.05,19)
    Store1CI0.4 = qt(0.04,19)
    Store1CI0.2 = qt(0.02,19)
    
    #therefore we calculate the margin of error to estimate the true mean value, we will calculate it for all the values above
    Store1Error90 = pnorm(0.95) * SD1/sqrt(n1)
    Store1Error92 = pnorm(0.96) * SD1/sqrt(n1)
    Store1Error96 = pnorm(0.98) * SD1/sqrt(n1)
    
    #Calculating width 
    
    Store1Width90 = 2 * Store1Error90
    Store1Width92 = 2 * Store1Error92
    Store1Width96 = 2 * Store1Error96
    
    #Now to calculate the lower and upper bounds for : 
    
    #90%
    Store1LowerBound90 = Mean1 - Store1CF90
    Store1UpperBound90 = Mean1 + Store1CF90
    
    #92%
    Store1LowerBound92 = Mean1 - Store1CF92
    Store1UpperBound92 = Mean1 + Store1CF92
    
    #96%
    Store1LowerBound96 = Mean1 - Store1CF96
    Store1UpperBound96 = Mean1 + Store1CF96
    
    
    #Creating a table
    
    StoreAT1 = c(Store1CF90, Store1Error90,Store1LowerBound90, Store1UpperBound90, Store1Width90)
    
    StoreAT2 = c(Store1CF92, Store1Error92,Store1LowerBound92,Store1UpperBound92,Store1Width92)
    
    StoreAT3 = c(Store1CF96, Store1Error96,Store1LowerBound96, Store1UpperBound96,Store1Width96)
    
    ST190   = matrix(StoreAT1, ncol = 5,  byrow = TRUE)
    Values   = c("90%")
    Name     = c("T Values ","Margin Of Error ","Lower Confidence interval ","Upper Confidence interval ", "Width ")
    colnames(ST190)  = Name
    rownames(ST190)  = Values
    
    ST192    = matrix(StoreAT2,ncol = 5,  byrow = TRUE)
    Values   = c("92%")
    Name     = c("T Values ","Margin Of Error ","Lower Confidence interval ","Upper Confidence interval ", "Width ")
    colnames(ST192)  = Name
    rownames(ST192)  = Values
    
    ST196    = matrix(StoreAT3,ncol = 5,  byrow = TRUE)
    Values  = c("96%")
    Name    = c("T Values ","Margin Of Error ","Lower Confidence interval ","Upper Confidence interval ", "Width ")
    colnames(ST196)  = Name
    rownames(ST196)  = Values
    
    
    Store1Final = rbind(ST190,ST192, ST196)
    
    #Creating a beautiful table using Knitr Kable function and then adding visualization using the kableExtra package.
    knitr::kable(Store1Final) %>% 
      kableExtra::kable_classic_2()
    T Values Margin Of Error Lower Confidence interval Upper Confidence interval Width
    90% 1.729133 0.2242826 10.110867 13.56913 0.4485651
    92% 1.849530 0.2249667 9.990470 13.68953 0.4499334
    96% 2.204701 0.2263153 9.635299 14.04470 0.4526306




    FIRST TASK 1.3


    According to our table above we have different values, since we have pick a sample we need to know how confident are we with the hourly wages of the employees per store in ever store individually.

  • We can see in the table above that 10.1108672 to 13.5691328 is the mean hourly wages of an employee per store, being 90% confident which shows minimal variations and therefore we have narrow confidence interval.
  • Secondly, 9.99047 to 13.68953 is the mean hourly wages of an employee per store, being 92% confident which again shows moderate variations and therefore we still have narrow confidence interval.
  • Thirdly, 9.6352986 to 14.0447014 is the mean hourly wages of an employee per store, being 96% confident which shows more variations in data and therefore we have wider confidence interval.
  • The width of the different confidence intervals are different , it increases as we increase our confidence interval, it is because as we increase our margin of error thus resulting in more wider and flatter graph distribution.
  • Compared with the previous task we can see that the the lower bound value and the upper bound value has been changed as we see this table as per store, therefore the mean values has been completely changed since the sample population has been changed. Huge variations are present in this data compared to the grand average of whole sample.







    FIRST TASK 1.4

    #Plotting a density graph using the table 1 from this report which is named as FinalTable in task 1
    
    #Density Plot
    
    Denseplot = density(FinalTable, adjust = 2) %>% 
      plot()
    
    #To plot the values on graph we need to calculate the x from the critical values therefore x = z * sd + mean
    
    # For 92% to the right 
    X1 = CI92 * StoreSD + StoreMean
    
    #For 92% to the left that is negative
    X2 = CI0.4 * StoreSD + StoreMean
    
    abline(v=StoreMean, col = "green")
    abline(v=X1, col = "darkred")
    abline(v=X2, col = "purple")
    
           
    
    text(x = StoreMean,
         paste("Mean :", StoreMean),
         y = 0.03,
         col = "green",
         cex = 0.9,
         srt = 90,
         pos = 2)
    
    text(x = X1,
         paste("CV : 1.750686, Value :", X1),
         y = 0.04,
         col = "darkred",
         cex = 0.6,
         srt = 90,
         pos = 2.8)
    
    text(x = X2,
         paste("CV : -1.750686, Value :", X2),
         y = 0.04,
         col = "purple",
         cex = 0.56,
         srt = 90,
         pos = 2)

    #Just a short example of highlighting a small portion of 92% confidence interval to the left value which is 17.49973
    Denseplot2 = density(FinalTable, adjust = 2)
    plot(Denseplot2)
    polygon(c(Denseplot2$x[Denseplot2$x >= X1 ], X1),
            c(Denseplot2$y[Denseplot2$x >= X1 ], 0),
            col = "slateblue1",
            border = 1)
    abline(v=X1, col = "darkred") #Adding vertical line
    text(x = X1,
         paste("CV : 1.750686, Value :", X1),
         y = 0.04,
         col = "darkred",
         cex = 0.6,
         srt = 90,
         pos = 2.8)

    #Another way of highlighting the critical value for Z using mosaic library
    
    #Z critical value to the right side
    xqnorm(0.96)

    ## [1] 1.750686
    #Z Critical value to the left side
    xqnorm(1-0.96)

    ## [1] -1.750686







    FIRST TASK 1.5

    Store1 = boxplot.stats(StoreSet$`Store 1`)$stats %>% 
      round(1)
    
    Store2 = boxplot.stats(StoreSet$`Store 2`)$stats %>% 
      round(1)
    
    Store3 = boxplot.stats(StoreSet$`Store 3`)$stats %>% 
      round(1)
    
    Store4 = boxplot.stats(StoreSet$`Store 4`)$stats %>% 
      round(1)
    
    Store5 = boxplot.stats(StoreSet$`Store 5`)$stats %>% 
      round(1)
    
    Store6 = boxplot.stats(StoreSet$`Store 6`)$stats %>% 
      round(1)
    
    Store7 = boxplot.stats(StoreSet$`Store 7`)$stats %>% 
      round(1)
    
    Store8 = boxplot.stats(StoreSet$`Store 8`)$stats %>% 
      round(1)
    
    Store9 = boxplot.stats(StoreSet$`Store 9`)$stats %>% 
      round(1)
    
    Store10 = boxplot.stats(StoreSet$`Store 10`)$stats %>% 
      round(1)
    
    Store11 = boxplot.stats(StoreSet$`Store 11`)$stats %>% 
      round(1)
    
    Store12 = boxplot.stats(StoreSet$`Store 12`)$stats %>% 
      round(1)
    
    Store13 = boxplot.stats(StoreSet$`Store 13`)$stats %>% 
      round(1)
    
    Store14 = boxplot.stats(StoreSet$`Store 14`)$stats %>% 
      round(1)
    
    Store15 = boxplot.stats(StoreSet$`Store 15`)$stats %>% 
      round(1)
    
    Store16 = boxplot.stats(StoreSet$`Store 16`)$stats %>% 
      round(1)
    
    Store17 = boxplot.stats(StoreSet$`Store 17`)$stats %>% 
      round(1)
    
    Store18 = boxplot.stats(StoreSet$`Store 18`)$stats %>% 
      round(1)
    
    Store19 = boxplot.stats(StoreSet$`Store 19`)$stats %>% 
      round(1)
    
    Store20 = boxplot.stats(StoreSet$`Store 20`)$stats %>% 
      round(1)
    
    
    #Creating a vector named store stats
    StoreStats = c(Store1,Store2,Store3,Store4,Store5,Store6,Store7,Store8,Store9,Store10,Store11,Store12,Store13,Store14,Store15,Store16,Store17,Store18,Store19,Store20)
    
    #Then creating a matrix to store all stats into one table
    StoreMatrix    = matrix(StoreStats,ncol = 20,  byrow = TRUE)
    NewValues   = c("min", "Q1", "Q2", "Q3", "Max")
    NewName     = c("1 ","2 ","3 ","4 ", "5", "6", "7", "8", "9", "10", "11","12", "13", "14", "15", "16", "17", "18", "19", "20")
    colnames(StoreMatrix)  = NewName
    rownames(StoreMatrix)  = NewValues
    
    #Creating a dataframe to use with ggplot
    
    StoreStats2 = as.data.frame(StoreMatrix)
    
    
    #Creating multiple box plots using ggplot
    
    ggplot(data = stack(StoreStats2), aes(x = ind, y = values)) +
      
           geom_boxplot(fill = "#06D0E1", colour = "#1F3552", # Colors
                        alpha = 0.9, outlier.colour = "green", width = 0.13) +
      
      stat_summary(geom="text", fun = quantile, #For Quantiles
                   aes(label=sprintf("%1.1f", ..y..), color = factor(ind)), position = position_nudge(x=0.33), size = 2.5) +
      
        stat_summary(geom="pointrange", fun = "mean", #For mean
                   aes(label=sprintf("%1.1f", ..y..), color = "D7E106"), position = position_nudge(x=0.10), size = 0.4)+
      
      scale_y_continuous(breaks = c(5:20))+
      
           ggtitle("Boxplot from data frame ggplot2") + # Plot title
      
           theme(legend.position = "none"
                ) +
      
    labs(x= "Number of Store",
         y = "Employee Wages",
         title = "Boxplot containing employee salary information of 20 different stores",
      )




    FIRST TASK 1.3


    The graph above is a box plot containing values of hourly wages from 20 different stores, this box plot shows different values such as Minimum value, maximum value, Quantile 1,2 and 3. The data of the store 14 is very small, it means the data is more condensed and very close to the mean value therefore the data in store 14 is very accurate and reliable whereas on the other hand we can see store 20 has a data which is too much dispersed away from the mean value which shows that data is away from mean and less accurate and reliable. Store 14 relative wages varies much less than the other stores.So the predictions made in store 14 are more dependable than any other store in our observation.







    SECOND TASK

    # Creating a statistical analysis of population proportion
    
    n = 204
    Yes = 100
    no = n - Yes
    PSucess = Yes/n
    PFail = no/n
    
    
    z90 = qnorm(0.95)
    z92 = qnorm(0.96)
    z96 = qnorm(0.98)
    
    
    M90 = z90*sqrt((PSucess*PFail/n))
    M92 = z92*sqrt((PSucess*PFail/n))
    M96 = z96*sqrt((PSucess*PFail/n))
    
    #Lower Limit
    L90 = PSucess - M90
    L92 = PSucess - M92
    L96 = PSucess - M96
    
    #Upper Limit
    U90 = PSucess + M90
    U92 = PSucess + M92
    U96 = PSucess + M96
    
    #Width
    W90 = 2 * M90
    W92 = 2 * M92
    W96 = 2 * M96
    
    Pet90 = c(z90, M90,L90, U90, W90)
    Pet92 = c(z92, M92,L92, U92, W92)
    Pet96 = c(z96, M96,L96, U96, W96)
    
    
    Pet290   = matrix(Pet90, ncol = 5,  byrow = TRUE)
    Values   = c("90%")
    Name     = c("Z Score ","MOE ","Lower CF ","Upper CF ", "Width ")
    colnames(Pet290)  = Name
    rownames(Pet290)  = Values
    
    
    Pet292   = matrix(Pet92, ncol = 5,  byrow = TRUE)
    Values   = c("92%")
    Name     = c("Z Score ","MOE ","Lower CF ","Upper CF ", "Width ")
    colnames(Pet292)  = Name
    rownames(Pet292)  = Values
    
    Pet296    = matrix(Pet96,ncol = 5,  byrow = TRUE)
    Values   = c("96%")
    Name     = c("Z Score ","MOE ","Lower CF ","Upper CF ", "Width ")
    colnames(Pet296)  = Name
    rownames(Pet296)  = Values
    
    
    FinalTable2 = rbind(Pet290,Pet292, Pet296)
    
    #Creating a beautiful table using Knitr Kable function and then adding visualization using the kableExtra package.
    knitr::kable(FinalTable2) %>% 
      kableExtra::kable_material_dark()
    Z Score MOE Lower CF Upper CF Width
    90% 1.644854 0.0575703 0.4326258 0.5477664 0.1151406
    92% 1.750686 0.0612745 0.4289216 0.5514706 0.1225490
    96% 2.053749 0.0718818 0.4183143 0.5620778 0.1437635







    SECOND TASK 2.2

    #Creating objects to create bar plot and pie chart
    
    
    par(mfrow = c(1,2))
    
    YesNo = c(Yes,no)
    
    tableyes = matrix(YesNo, ncol = 2,  byrow = TRUE)
    Values   = c("Values")
    Name     = c("YES", "NO")
    colnames(tableyes)  = Name
    rownames(tableyes)  = Values
    
    #Bar plot
    
    PropBar = barplot(tableyes,
            main = "Bar plot - Yes & No",
            xlab = "",
            ylab = "",
            ylim = c(0,120),
            col = terrain.colors(2),
            )
    text(x = PropBar, 
         y = 110, 
         paste(round(tableyes))
         )
    
    title(ylab = "Frequencies", line = 2.5, cex.lab = 1.0, col.lab = "#0687E1")
    title(xlab = "Response of population", line = 2.6, cex.lab = 1.0, col.lab = "#991A04")
    
    box(which  = "figure", col = "green")
    
    #Pie Chart
    
    pie1 <- paste0(round(100 * YesNo/sum(YesNo), 4), "%") #Create percentage to display on pie chart
    
    pie(YesNo,
        main = "Pie Chart - Yes & No",
        labels = pie1,
        col = terrain.colors(2))
    
    legend ("bottomright", #Legend
           legend=paste(unique(sort(pie1)), c("Number of people having pets", "Number of people not having pets")),
           fill = terrain.colors(2),
           cex = 0.6)
    
    box(which  = "figure", col = "red")


    The bar plot and the pie chart explains us how many people have pets and how many people do not have pets, in the survey questionnaire and answers in form of yes and no as you can see in the bar plot the values have been plot accordingly. This means that people prefer not to have pets rather than having pets but the difference is not significant.
    The pie chart shows the same information as the bar plot but now in percentage format, the value 49.0196%, 50.9804% , 49% having pets and 50% not having pets.







    CONCLUSION


    This project puts a light on two different excel datasheets, one is the salary survey and the other one is the number of people with and without pets. The different task in this project focuses on analysis and visualizing different data needed to interpret and generate an outcome. Starting with the confidence interval of different stores and their salary data and ending with the pet’s section has given me an idea of how to gather the data, analyze the data, implement the data and how interpret the data, and provide the best outcome and recommendations for further analysis.
    The outcome of this report is based on two data:
    Salary Survey - in the 21st-century salary survey is an important aspect of every individual, it helps in maintaining financial stability and productivity in terms of profit and expenditure. In task 1 as a whole, we observed different aspects and data of the salary survey from 20 different stores by obtaining the grand mean and standard deviation from which we observed the outcome of the hourly wages of employees.
    Pets - Value for pets was in the form of Yes and No, which needed to use different functions in excel such as count if to count the number of individuals who said yes and the number of individuals who said no. This task was completed by calculating the population proportion of yes and no and then using formulas to calculate the confidence interval to compare the small data with the whole population.







    BIBLOGRAPHY



    Bluman, A. (2014). Elementary Statistics: A step by step approach 9e. McGraw Hill.
    Kabacoff, R. I. (2015). R in action: data analysis and graphics with R. Simon and Schuster.




    APPENDIX



    An R Markdown file has been attached to this report. The name of the file is M2Project_Tiwari.rmd