Phase 1: Clean up data from the Franklin College Division of Natural Science Facebook Page.
Phase 2: Explore data and find the most interesting relationships.
Phase 3: Use simulation-based statistical inference to do dive deeper into two of the most interesting relationships found during Phase 2.
1.) Do posts that are posted in the morning get more positive activity than those posted at night?
The means between the two subsets of posts differed greatly because of the positively skewed distribution. And the medians were relatively close. More investigation on the outlier posts in the night subset would help determine if there is truly a difference in activity between morning and night post.
Posts in the morning was defined as having a posting time of 3am to 11am. And Night was considered for the rest of hours in the day. We are trying to see if there is a relationship between time of day and Activity. The Activity variable is already positive activity; which is a sum of all the likes, shares, comments, and other clicks. Hide clicks are left out because they are considered negative activity. A difference in means will be used to determine whether or not the time of day a post was made has an effect on the average amount of positive activity the post recieves.
## [1] 0.2598
First, set the value for alpha to be 0.05 to test our p-value against.
\(H_o: \mu_{m} = \mu_{n}\)
\(H_a: \mu_{m} \neq \mu_{n}\)
The mean of activity for post at night is higher than the mean and median for post in the morning. The mean is a lot higher because the distribution of activity on post at night is skewed right, where the outliers raise the mean. The test statistic to be used to compare our simulated differences to, will be the origial difference between the two sample means. The sample means were found to be 60.2222222 for the morning and 147.3809524
\(\bar x_{m} = 60.2222\); \(\bar x_{n} = 147.381\)
Compute test statistic.
\(\bar x_{diff} = \bar x_{m} - \bar x_{n} = -87.1588\)
Randomization simulation will be done to compute our p-value. The test is two tailed, thus both sides of the distribution must be accounted for the p-value.
P-value:
The calculated p-value was calculated to be 0.2598, which is more than alpha. Thus we fail to reject the null hypothesis, and there is not enough evidence to conclude whether the time of day has any effect on the amount of activity a post gets.
2.) Relationship between amount of shares and positive activity?
The xyplot of shares and activity showed signs of a linear relationship and more measurements (such as \(R^2\)). The relationship between amount of shares and Positive Activity is between two quantitative variables.
##
## Call:
## lm(formula = y_values ~ x_values)
##
## Coefficients:
## (Intercept) x_values
## 15.53 46.63
## [1] 0.6854408
## [1] 0.0017
In order to determine whether there is evidence of a positive, linear relationship between the amount of shares and the positive activity of posts, we will perform correlation test. First, set the value for alpha to be 0.05 to test our p-value against.They null and alternative hypotheses are stated, where \(\rho\) represents the correlation cofficient.
\(H_o: \rho = 0\)
\(H_a: \rho > 0\)
Compute test statistic R. The correlation coefficient was computed 0.6854408.
Randomization simulation will be done to compute our p-value. The test is one-sided, thus only one side of the distribution will be accounted for the p-value.
P-value:
The calculated p-value was calculated to be 0.0017, which is less than alpha. Thus we reject the null hypothesis, and there is statistically significant evidence to conclude that there is a positive, linear correlation between the amount of shares a post gets and it’s positive activity.
Regression line: \(\hat{y} = 15.53 + 46.63x\)
This line shows our line equation for predicting positive activity, \(\hat{y}\) based on the number of shares, x.
The y-intercept, of 15.53, tells us that posts that recieve 0 shares are expected/predicted to recieve 15.53 positive clicks. And the slope tells us that for every one additional share the predicted positive clicks increases by 46.63.
Idea: Other Franklin College pages should share eachothers posts. This could also possibly encourage the followers of those other pages to also follow the Science page which can only lead to more activity.