Team info

  • Group name: SEJ
  • Group members: Joshua Kim, Stanley Dunwell, Elizabeth Do

Purpose

State your research question, a description of the variables you’ll use, and your data sources (please include website links if possible).

We will try to understand what affects the number of followers on Instagram for the most popular accounts. For example, does the number of posts relate to the number of followers? What kind of account draws the most followers?

Our data is from Data.world which derives it’s data from Iconosquare. This data was collected on December 26, 2016.

The variables we will be using are brand, categories_1, media_posted, and num.

  • Brand is an identification variable which indicates who/what is behind the account.
  • Categories_1 is a categorical explanatory/predictor variable which divides the accounts into celebrities, fashion, media, and sport.
  • Media_posted is a numerical explanatory/predictor variable which shows the number of Instagram posts for each account.
  • Num is the outcome variable which indicates how many followers each account has.
  1. Load all necessary packages
  2. Load the dataset then run the clean_names() function from the janitor package then select() only the variables you are going to use.
brand categories_1 num media_posted
Selena Gomez celebrities 105.4 1200
Taylor Swift celebrities 95.2 958
Ariana Grande celebrities 92.3 2800
Beyonce celebrities 90.6 1400
Kim Kardashian West celebrities 89.3 3600
Cristiano Ronaldo celebrities 85.1 1600

Create EDA visualizations

Create “exploratory data analysis” visualizations of your data. At this point these are preliminary and can change for the submission, but the only requirement is that your visualizations use each of the measurement variables included in your dataset to test out if they work.