Team info

Group name: SEJ
Group members: Joshua Kim, Stanley Dunwell, Elizabeth Do

Purpose

State your research question, a description of the variables you’ll use, and your data sources (please include website links if possible).

We will try to understand what affects the number of followers on Instagram for the most popular accounts. For example, does the number of posts relate to the number of followers? What kind of account draws the most followers?

Our data is from Data.world which derives it’s data from Iconosquare. This data was collected on December 26, 2016.

The variables we will be using are brand, categories_1, media_posted, and num.

Brand is an identification variable which indicates who/what is behind the account.
Categories_1 is a categorical explanatory/predictor variable which divides the accounts into celebrities, fashion, media, and sport.
Media_posted is a numerical explanatory/predictor variable which shows the number of Instagram posts for each account.
Num is the outcome variable which indicates how many followers each account has.

Load all necessary packages
Load the dataset then run the clean_names() function from the janitor package then select() only the variables you are going to use.

brand	categories_1	num	media_posted
Selena Gomez	celebrities	105.4	1200
Taylor Swift	celebrities	95.2	958
Ariana Grande	celebrities	92.3	2800
Beyonce	celebrities	90.6	1400
Kim Kardashian West	celebrities	89.3	3600
Cristiano Ronaldo	celebrities	85.1	1600

Create EDA visualizations

Create “exploratory data analysis” visualizations of your data. At this point these are preliminary and can change for the submission, but the only requirement is that your visualizations use each of the measurement variables included in your dataset to test out if they work.

Term Project Proposal

Monday, March 5, 2018

Team info

Purpose

Create EDA visualizations