WWWusageWWWusage?WWWusage is a 2001 time series of the number of users on a server every minute for 100 minutes. The data set is 100 rows, one row for each minute, with a column for user frequency. From this data set, the average number of users on this server during these 100 minutes is about 137 with a minimum of 83 users (blue) and a maximum of 228 users (red). The line plot of user frequency is shown below.
plot(WWWusage, main="Server Usage Every Minute for 100 Minutes", xlab = "Time (minutes)", ylab = "# of Users")
points(which.min(WWWusage), min(WWWusage), col="blue")
points(which.max(WWWusage), max(WWWusage), col="red")
WWWusageUsing the sample() command, five random samples of size 20 and and five random samples of size 50 are taken. So that the data is easier to work with, each sample is produced into a vector.
s1<- sample(WWWusage, size = 20)
s2<- sample(WWWusage, size = 20)
s3<- sample(WWWusage, size = 20)
s4<- sample(WWWusage, size = 20)
s5<- sample(WWWusage, size = 20)
v1<- sample(WWWusage, size = 50)
v2<- sample(WWWusage, size = 50)
v3<- sample(WWWusage, size = 50)
v4<- sample(WWWusage, size = 50)
v5<- sample(WWWusage, size = 50)
Using the mean() command, the mean of each sample of size 20 creates a vector X with five entries. Similarly the mean of each sample of size 50 creates a vector Y with five entries.
X<-(c(mean(s1),
mean(s2),
mean(s3),
mean(s4),
mean(s5)))
Y<-(c(mean(v1),
mean(v2),
mean(v3),
mean(v4),
mean(v5)))
X
## [1] 140.70 138.70 137.45 138.65 144.75
Y
## [1] 136.34 134.38 136.30 138.42 142.42
Before plotting, a vector z containing digits 1-5 is created for x-values on the plot, and vector L is created to plot the average of WWWusageas a line for comparison to the average of each sample.
z<-c(1:5)
L<-c(137.1, 137.1, 137.1, 137.1, 137.1)
Using the previously described vectors, plot(), points(), and lines() , a “Samples Vs. WWWusage Mean” plot is created.
plot(z,X, main="Samples Vs. WWWusage Mean", xlab="Sample #", ylab="# of Users-20/sample(bk), 50/sample(bl), Mean(red)")
points(z,Y,col="blue")
lines(z,L,col="red")
From this plot, it is determined that the range of the data set WWWusage must be quite large. The points representing the samples of size 50 are closer to the mean of WWWusage than those representing the samples of size 20, but this is to be expected, as a random sample is to be more indicitive of the entire data set as the sample size increases. With this determination in relation to range, it is natural to look at the standard deviation.
As depicted above, WWWusage has a large range, so we look at the standard deviation of the same samples using the sd() command and plot these results against the standard deviation of WWWusage.
Similar to the previous vectors of entires representing means, the sd() command is used to create vector A with entries for the standard deviation of samples of size 20. Vector B is created with entries for the standard deviation of samples of size 50.
A<-(c(sd(s1),
sd(s2),
sd(s3),
sd(s4),
sd(s5)))
B<-(c(sd(v1),
sd(v2),
sd(v3),
sd(v4),
sd(v5)))
A
## [1] 40.49314 44.37531 45.88943 36.94701 40.36136
B
## [1] 37.83563 35.17646 42.69314 41.39363 39.43379
As before, a line of comparison is to be drawn on the plot, but for standard devation of WWWusage rather than mean. For this purpose, vector S is created.
S<-c(39.999, 39.999, 39.999, 39.999, 39.999)
Using the previously described vectors, plot(), points(), and lines() , a “Samples Vs. WWWusage Standard Deviation” plot is created.
plot(z,A, main="Samples Vs. WWWusage Standard Deviation", xlab="Sample #", ylab="# of Users-20/sample(bk), 50/sample(bl), SD(red)")
points(z,B,col="blue")
lines(z,S,col="red")
This plot depicts more consistency between samples than seen in the plotting of mean values. The consistency in standard deviaiton of samples tells us there are not just a few minutes with high frequency of server visitors and a few minutes with low frequency of server visitors. Rather, the plot tells us that although visitor frequency is not consistent, the variation in frequency of visitors on a server is fairly consistent.
In practical terms, data such as this may be used in trying to improve servers by understanding the frequency of visits to some server. We see from this data that it may be difficult to predict these frequencies since the range of the data is large, although the standard deviation is fairly consistent. A larger data set with actual times of the day-rather than 100 minutes of data that could have been taken at any time-may be benificial if using similar data to solve real-life questions. A larger data set may not solve the issue of the large standard deviation, but with more data, patterns on more of a macroscopic level could be found between different times of the day, week, month, or even years.