Hello, my name is Jordan La Mar and today I’ll be doing some website scraping comparisons on the game breakout horror game “Outlast”. If you’ve been a fan of my previous articles, you’ll be familiar with my most recent article, “Scary Games”. I conducted a sentimental analysis on the tweets containing, relative or mentioning the Outlast, and directly compared to the similar horror genre series, “Five Nights at Freddy’s.” Well today I’ll be doing further analysis on the game Outlast with similar visuals, but instead compared to another game, it will be compared to itself a.k.a scraping from a different site. Specifically the popular game site, Steam. For all those who are unaware of Outlast and it’s history, I’ll include a brief paragraph under this one and we can get on with the data dictionaries and with the visualizations.
Outlast, is an extremely theatrical, psychological, and unnerving horror game made by the company Red Barrels. The game was first released on PC in 2013 and on all other consoles by mid-2014. It is the first installment of the trio game series, soon to be quartet come the end of this year I believe. Ever since my discovery of YouTube as a preteen and the walkthrough gamers that ruled YouTube during that time, I fell in love with the horror gaming genre! Specifically this game with its critical scares, chase gamplay, and gripping story. The game itself is incredibly controversial just with the displayed pictures from the game on it’s Steam page. Despite for it’s upcoming fourth installment this year I wanted to research the thought, feelings and potentially joy of gaming it still brings gamers nearly a decade since it’s initial release.
| DD 1 | Data Dictionary of scraped Tweets: outlast_counts |
|---|---|
| Word | The specific word(s) used within the scraped tweets |
| Created_at | The date the overall tweet was published |
| Favorite_Count | Number of favorites the individual tweet received |
| Location | Location of tweet when user published |
| Display_text | Character width amount behind entirety of specific tweet |
| n | The count of word(s) scraped from tweet |
| Sentiment | The positive or negative description of specific word |
| DD 2 | Data Dictionary of scraped Reviews from Steam: outlast_steam |
|---|---|
| Word | The specific word(s) used within the scraped tweets |
| Username | The ‘title’ of the account user who left review |
| Comment_date | The date the review was published |
| User_time | The amount of time user has spent (on record/Steam) playing Outlast |
| Product_user | The numerical amount of games the specific user owns or has stored in their steam library |
| Helpful_count | Similar to tweet ‘favorites’, the count of how many other users found the individual review ‘helpful’ |
| n | The count of word(s) scraped from tweet |
| Sentiment | The positive or negative description of specific word |
The first visualization I wanted to conduct between the two websites scrapes were their conclusive results on review sentiment. I used the bing sentiment analysis for the twitter scraped data, however I used the NRC sentiment analysis for Steam as it provided a much bigger result.
The first two graphs display the positive/negative sentiment analysis from the twitter scrape. The “Simplified” graph shows less ‘mass’ of information, although I felt the “Full” graph was still beneficial to add. The simplified displays all word sentiments above 1, concluding with the words ‘trophy’, ‘nightmare’ and ‘hurt’. I believe trophy was added because of the trophy award system for accomplishing certain actions within the games. Moving onto the Steam NRC sentiment, we see words that struck out and ones I thought were interesting to comment on was the increased sentiment behind “horror”, hope being attached to anticipation, and simply the appearance of masterpiece. I genuinely was happy to see masterpiece to a game I agree deserve to connected with that adjective.
The next visualization performed is off the interest I had if character amount of the individual tweet has an affect and/or relationship with the amount of favorites it received.
As you can see the column graph I conducted is in fact displaying a left-skewed graph. Although for the most part the amount of likes compared to character amount does vary especially with the single review around 60+ characters receiving around 25 favorites. Yet once the reviews pass 200 characters, the possibility of receiving at, close to or above 30 favorites undoubtedly increases. This doesn’t mean you can type the letter ‘x’ 200 times and expect to receive 30 likes. Yet twitter is a very weird yet intriguing place so you honestly never know, sometimes favorites of tweets result from a liking of actual text substance and other times it can be just cause, as I said, Twitter is a very weird yet intriguing place.
Simply some statistics I found interesting and hopefully you do as well!
## [1] 23.06061
## [1] 7.478788
I found this last question to be very interesting because the game is said to be able to be completed in 5 hours, however gamers who take the “COMPLETION” long route, it can take up to 10.5! That means the possibility that all of the review users scraped, actually finished the game is possible, but potentially not all 100% completing the game. Maybe the fear just got too much, not surprising really.
Arriving at our second to last graph visual, I conducted a timeline from Steam, of words sentiment and surprisingly today seemed to have the most activity! I’m unsure, more than likely don’t believe the activity of Outlast conversation will grow as the fourth installment grows nearer. Despite again, being a very avid lover of the horror genre of games, I’m simply satisfied with the continuation of its legacy, period.
In this last graph, I display the total text character amount of all scraped tweets, hence the 20,000+ character count. As we can see, there is a negative sentiment majority compared to positive. I would still like to believe the Outlast community and newcomers in general have a more ‘positive’ look to the game that changed all horror games.
Finally, I conducted a predictive analysis from the BAIS-related program JMP on the data set, outlast_counts, from Twitter. I added images and further descriptions for dependent variable, “favorite_count.”
## [1] "Please Check Pdf-Image Attachments on Canvas OR Slack Page"
Due to R complications (or mainly my insufficiency in R capabilities), I had issues including the two analysis pictures collected from JMP result. Regardless, I’ll touch on them here.
After conducting a linear regression on the outlast_counts (Twitter) data set, it came down to three variables significant/impacting to “favorite_count”, being created_at, location & display_text_width. The ANOVA p-value is below .05, therefore proving it’s significant model, however we do see that the “created_at”’s p-value is above .05. As taught through the great Prof. Laker, I would normally remove that variable as I did with all the others. Yet, after further analysis moving to the model’s “summary of fit” values, RMSE and RSquared Adj, we see an interesting occurrence. Model values being:
RMSE = 1.2781
RSquared Adj. = 0.9486
As I mentioned in “Three Variables”, the ‘outlier’ variable “created_at” had a high p-value of 0.0774. However the two-variable model’s summary of fits come out to:
RMSE of 1.2947 & RSquared Adj. of 0.9472
This shows that the three variable model, despite created_at’s p-value, had a much better overall model. A smaller root mean squared error and a greater “proportion of variance” for the entire model. If the outlast_counts data set was to actually be turned in to a supervisor, I would keep the three-variable model due to its betterment.
Prof. Asay, I just want to quickly say thank you for all the help and time you’ve been willing to place towards me and assisting in my understanding of this complicated R language, and conversations we’ve had of moving forward with the class, etc. I hope you truly enjoy this article and despite your dislike of horror games, find the conclusive evidence interesting.