R Markdown

INTRODUCTION:

I have scraped data from two datasets accessed on Kaggle.com: “20,000 Board Games Dataset” and “Top 5000 Board Games.” Both of these datasets include a ranking of board games based on average rating as well as information such as minimum age, description, year published, etc about each specific board game. I merged these datasets together to create a large dataset that includes the following variables: objectid, name, yearpublished, sortindex, minplayers, maxplayers, minplaytime, maxplaytime, minage, average, news, blogs, weblink, podcast, boardgamecategory, boardgamemechanic. I also created a separate dataset that just included the description of each board game which allowed me perform text analysis to look at common words and positive and negative sentiments. Overall, my project aims to explore what how certain factors such as category, mechanic, and year published, to name a few, impact the popularity of a board game.

## Rows: 19981 Columns: 52
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (16): name, min_community, max_community, playerage, label, boardgamedes...
## dbl (36): objectid, yearpublished, sortindex, minplayers, maxplayers, minpla...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 4927 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (8): bgg_url, names, image_url, thumb_url, mechanic, category, designer...
## dbl (17): rank, game_id, min_players, max_players, avg_time, min_time, max_t...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Joining, by = c("objectid", "name", "yearpublished", "minplayers", "maxplayers", "minplaytime", "maxplaytime", "minage", "average", "avgweight", "boardgamedesigner", "boardgamepublisher", "boardgamecategory", "boardgamemechanic")

Board Game Category vs. Average Rating:

Just like movies, music, etc, board games also have their own categories! Here, I picked the five most common categories that appeared in the dataset to analyze. The boxplots that are shown are looking at the average rating for each common category of board game based on whether or not a board game falls into that category. For adventure board games, we can see that average rating is higher for board games that fall into the adventure category than board games that do not fall into the adventure category. For card games, it appears that board games that are not part of the card game category have a higher average rating. For political games, the average rating is higher for games that are in this category. For economic games, the average rating seemn to be higher for games that fall into the this category. However, for the city building category and the exploration category, there does not seem to be much of a relationship between average rating and category.

Board Game Mechanic vs. Average Rating:

Different board games require different skills to play. The board game mechanic variable looks at which types of skills are required for a specific board game in the dataset. In this dataset, I chose to analyze the top five most common mechanics. The boxplots that have been created show the average rating for board games that include the specific mechanic being analyzed versus the average rating for board games that do not include that specific mechanic. We can see that, overall, whether or not a board game includes a specific mechanic does not really have an impact on the average rating. However, we can see that for board games that include dice rolling, the average rating is slightly higher than for games that do not.

Year Published vs. Average Rating:

## `geom_smooth()` using formula 'y ~ x'

There appears to be a positive linear relationship between the year that a game was published and the average rating that it received. Board games that were created in the late two thousands seem to have received higher ratings than board games that were prior to approximately 2010.

Are Board Game Descriptions More Positive or Negative?

## Joining, by = "word"
## # A tibble: 2 × 2
##   sentiment     n
##   <chr>     <int>
## 1 negative  26055
## 2 positive  31730
## Joining, by = "word"

We can see that there are 26,223 negative words and 38,269 positive words meaning that board game descriptions tend to be more positive. The top ten positive words were victory, wins, win, best, like, great, gain, available, right, and powerful. The top ten negative words are die, attack, opponent, enemy, dungeon, dark, lost, lose, evil, and monster.

Top 10 Words Used in Board Game Descriptions:

## Joining, by = "word"

Here are the 10 words that are most commonly seen in board game descriptions.