An Analysis of a Server’s Internet Usage

Using the WWWusage dataset

WWWusage was the dataset used for the following analysis. This dataset is a time-series of 100 minutes cataloging how many users an internet server had for each minute. The source of the data is J. Durbin and S.J. Koopman’s piece Time Series Analysisby State Space Methods.

Data Analysis

Because this a fairly straight forward dataset let’s start off with some simple analysis. The number below is the mean of the data. In this case the value is the average number of users on this server per minute.

mean(WWWusage)
## [1] 137.08

The output below now shows the fewest amount of people who were on the server in one minute.

min(WWWusage)
## [1] 83

Now the output shows the maximum number of people that were on this server during a minute.

max(WWWusage)
## [1] 228

Finally, the output below shows the total number of people who wee on this internet server throughout the whole 100 minutes

sum(WWWusage)
## [1] 13708

If we would want to look at the amount of spread within the data, we could easily see the standard deviation for WWWusage which is shown below. This just gives us a better idea for how spread out this data is. We will sort of see this physically represented in the next section.

sd(WWWusage)
## [1] 39.99941

This number means that abot 65% of our data points fall with in 39.999 of our mean of 137.08 visitors, so 65% falls between 97 to 177 visitors. This will match up with our visuals in the next section.

Visuals

An easier way to interpret this data is through visuals. Various different graphics can be created to help understand this data. For the first two following graphs WWWusage was divided into four different time segments. These increments were each 25 minutes long and the graphs display the number of visitors that the internet server had during that 25 minute period. 25 minute increments were chosen so the time lapse could be broken up into even fourths, for easier analysis.

Bar Graph
Users<-c(2823,3707,2967,4211)
barplot(Users,ylim=c(0,4500), xlim = c(0,4.5), main="Internet Users over 100 minutes \n in 25 Minute Increments", ylab="Number of Internet Users",names.arg = c("1 to 25 Minutes","26 to 50 Minutes","51 to 75 Minutes","76 to 100 Minutes"),col="lightblue")
text(.7,3000,"2823")
text(1.9,3850,"3707")
text(3.1,3150,"2967")
text(4.3,4400,"4211")

As you can see on the graph above the busiest time for the internet server was in the last 25 minute increment with 4211 visitors. Also the slowest time for the server was in the first 25 minutes with only 2823 visitors. Also having the graph in light blue is just fun.

Pie Chart

Let’s look at these 25 minute increments in a different way. By using a pie chart we can see what percentage of the total visitors were on the server during each increment. A pie chart is helpful for those who would want to see how a specific increment is compared to the whole. A bar graph is nice to visualise comparisons between increments but makes it harder to visualize how that increments compares to the dataset as a whole.

lbls<-c("1 to 25 Minutes","26 to 50 Minutes","51 to 75 Minutes","76 to 100 Minutes")
pct<-round((Users/13708)*100)
lbls<-paste(lbls,pct)
lbls<-paste(lbls,"%",sep="")
pie(Users,labels = lbls, main = "Internet Users over 100 minutes \n in 25 Minute Increments")

AS you can see the last 25 minutes provided 31% of the total visitors to the server. The second 25 minutes had 27% of the visitors. Followed by the third increment which had 22% of the visitors. And finally, the first 25 minute increment with 21% of the total visitors.

Line Chart

We can actually get a better visual of what looking at the traffic at the server by looking at a line chart of the data.

plot(WWWusage,type="o",col="red",xlab="Minute",ylab="Number of Visitors",main="Number of Visitors to the Serve \n by Each Minute")

As you can see the data fluctuates in the same way the bar graph with the 25 minute increments does. However, we get a much more exact picture by looking at it from a minute to minute perspective. You can see each of the most visited times and how they line up with those larger increments. The peak of the traffic happened in the last 20 or so minutes which would agree with why the largest increment was the last one in the bar graph above.

Histogram

Perhaps we would like to look at just how many busy minutes there were. The histogram below shows how many minutes there were with visitors in varying categories.

hist(WWWusage, breaks=4,labels = TRUE,col="violet",ylim=c(0,50))

As you can see above there were 27 seperate minutes with 50 to 100 visitors. Also there were 41 different minutes with 100 to 150 visitors, 23 minutes with 150 to 200 visitors, and only 9 minutes with over 200 visitors. As you can see from the line chart, all 9 of these high traffic minutes were in that last 25 minute increment of the time observed which is why that increment has such a large share of the pie chart.

Conclusion

A dataset can and should be analyzed in many different ways. In the first section the raw numbers are manipulated to give information regarding the dataset. The mean, maximum users, minimum users, total users, and standard deviation give useful information. However, the visuals in the second section are needed to better communicate the information. As stated above each of the visuals is useful in different contexts and for different information. Although these are seemingly easy graphs to use, the coding in R in order to make these graphs useful gets fairly long. These are just some ways to analyze the simple dataset WWWusage.