R_Final

Hello! As a new Openintro employee I was assigned the tasks of reorganizing the company’s website data from our parent company. Our company is deciding if the openintro’s website generates enough traffic and ROI to keep the site online. The web domain costs 25k annual to retain web domain and a extra 3k for upkeep.

In order to gauge site activity, our company values high consumer interactions. We defined this as populated comment sections, overall views over 50,000, and a high average in number of likes.

My first task was renaming our data frame columns, as it had the wrong column names for company/sector Views/likes/Comments.

setwd("C:/Users/walki/Documents/")
d<-read.csv("datasets.csv")
colnames(d)<-c('Sector','WebURL','Title','Views','Likes','Comments')
print(d[1,])

##   Sector  WebURL                            Title Views Likes Comments NA NA NA
## 1    AER Affairs Fair's Extramarital Affairs Data   601     9        2  0  2  0
##   NA                                                                 NA
## 1  7 https://vincentarelbundock.github.io/Rdatasets/csv/AER/Affairs.csv
##                                                                    NA
## 1 https://vincentarelbundock.github.io/Rdatasets/doc/AER/Affairs.html

Now,I will extract my sector’s information from the master’s csv. My boss noted that I do not need the last two columns in our copy.

openIntro<-subset(d,Sector=='openintro')
openIntro<-openIntro[,1:6]
write.csv(openIntro,file="openIntro_data.csv",row.names = FALSE)
print(openIntro[1,1:6])

##         Sector      WebURL                                      Title Views
## 1062 openintro absenteeism Absenteeism from school in New South Wales   146
##      Likes Comments
## 1062     5        3

For our analytic team, we need to calculate and insert our KPI in the search. Our KPI is measured through engagement, so we included it in the data frame.

KPI<-openIntro$Likes/openIntro$Views
openIntro$KPI<-KPI

Now, We have the ability to search for KPI’s in our frame. Let’s add the new column and see what journal entry had the largest engagement.

Top.Post<-openIntro$Title[which(openIntro$KPI==max(openIntro$KPI))]
print(openIntro[1,])

##         Sector      WebURL                                      Title Views
## 1062 openintro absenteeism Absenteeism from school in New South Wales   146
##      Likes Comments        KPI
## 1062     5        3 0.03424658

print(Top.Post)

## [1] "Findings on n-3 Fatty Acid Supplement Health Benefits"

The data frame is now corrected. let’s print out the summary for openintro’s web journals.

summary(openIntro)

##     Sector             WebURL             Title               Views          
##  Length:206         Length:206         Length:206         Min.   :      2.0  
##  Class :character   Class :character   Class :character   1st Qu.:     59.5  
##  Mode  :character   Mode  :character   Mode  :character   Median :    198.0  
##                                                           Mean   :  10182.9  
##                                                           3rd Qu.:   1000.0  
##                                                           Max.   :1414593.0  
##      Likes            Comments           KPI           
##  Min.   :  1.000   Min.   : 0.000   Min.   : 0.000004  
##  1st Qu.:  2.000   1st Qu.: 0.000   1st Qu.: 0.003727  
##  Median :  3.000   Median : 0.500   Median : 0.016097  
##  Mean   :  7.248   Mean   : 1.437   Mean   : 0.187771  
##  3rd Qu.:  7.750   3rd Qu.: 2.000   3rd Qu.: 0.073642  
##  Max.   :123.000   Max.   :46.000   Max.   :24.000000

From our summary, we can see the overall average for views is ~10183 views. Our average views have the potential of reaching a large audience,as our max views was 1,414,593 views. However, Let concentrated our search within the Views’ medium for clarity.

library(ggplot2)
ggplot(openIntro, aes(x=Views, y=Comments,color=Comments>0)) + geom_point()+xlim(0,10900)+ylim(0,46)

## Warning: Removed 11 rows containing missing values (geom_point).

From this scatter plot, we can see a majority of openIntro’s web journal received no comments. There seems to be a small trend of higher views getting more comments; However, this is too small to show potential. To confirm, I made a test data frame grouping comments by tens to see more details. A discovery was made that this was not possible as the box plots were too

temp<-openIntro
temp$G<-NA
temp$G[temp$Comments<5]<-5
temp$G[temp$Comments>4]<-10
temp$G[temp$Comments>10]<-50
ggplot(temp, aes(x = G, y = Comments,group=G,fill=G)) + geom_boxplot()

By this box plot, the range of 0:5 and 5:10 is heavily concentrated compared to the 20:50 range. There isn’t a variety ranges for comments and users do not comment a lot above 10 comments per post.

For Likes, the website pulls at most 125 likes on a post. It appears that the average post ranges below 25 likes. Likes are essential for our KPI, so these current statistic are troubling.

ggplot(openIntro, aes(x=Views,y=Likes,color=Comments>0)) + geom_point()+xlim(0,50000)

## Warning: Removed 4 rows containing missing values (geom_point).

There is now a raised concern over views, as our comments are not in their target range. In reference towards our view target, we have to view how frequently we reached our targeted medium.

temp<-subset(openIntro,Views<=100000)
ggplot(temp, aes(x=Views))+geom_histogram(color="red",fill='pink')

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(temp, aes(x=Views))+geom_histogram(color="red",fill='pink',binwidth = 1000)+geom_vline(aes(xintercept=mean(Views)),color="red", linetype="dashed", size=1)

Unfortunately, our overall views on our domain do not met their targeted average views of 50k. Its average is below 5000.

The average post generates an average of 5k views and our KPI rates are low. The parent company needs their target range covered at minimum to cover operational costs. One suggestion is revamping the content on this website, but these decisions are up for the analytics department for further analysis.In conclusion, The openintro website is not generating enough traffic to justify the cost to our parent company.

R_Final

Vyanna Hill

1/12/2022