Morgan State University

Department of Information Science & Systems

Fall 2024

INSS 615: Data Wrangling for Visualization

Name: Enter your Full-name here

Due: October 29, 2024 (Tuesday) Before Class

This homework is designed to assess students’ understanding of data slicing, descriptive statistics, handling missing values and outliers, and basic visualizations. The exercises involve analyzing a fake consumer dataset, including variables such as age, income, and purchase frequency, with some missing values. Students will use Base R to clean, analyze, and visualize the data. Load the provided fake_data_with_missing.csv dataset as a dataframe and answer the 25 questions, with each question carrying 4 points.

Before you start answering questions, you want to load the dataset.

# Load the dataset (modify the path to match the location of your fake_data.csv)
data <- read.csv("C:/Users/user/Downloads/fake_data_with_missing.csv")
head(data, n=5)

Questions

  1. Extract all rows where Gender is “Female”. Display the first 5 rows.

Solution:

  1. Extract the Income and Purchase_Amount columns for people older than 40. Display the first 5 rows.

Solution:

  1. Get all rows where Product_Category is “Electronics” and Income is greater than 60,000. Display the first 5 rows.

Solution:

  1. Select the Age and Gender columns for the first 200 rows. Display the first 5 rows.

Solution:

  1. Get rows where Education is missing. Display the first 5 rows.

Solution:

  1. Calculate the mean and median of the Income column, ignoring missing values. Display the first 5 rows.

Solution:

  1. Find the standard deviation of the Purchase_Amount column. Display the first 5 rows.

Solution:

  1. Get the summary statistics of the Days_Since_Last_Purchase column.

Solution:

  1. Find the frequency count of Education levels.

Solution:

  1. Create a frequency table for Product_Category.

Solution:

  1. Determine the proportion of Gender in the dataset, ignoring missing values.

Solution:

  1. Calculate the total Purchase_Amount for all individuals who have made a purchase within the last 100 days.

Solution:

  1. Add 1000 to the Income of all individuals over the age of 50. Display the first 5 rows.

Solution:

  1. Create a new column Discounted_Purchase that contains 80% of the original Purchase_Amount. Display the first 5 rows

Solution:

  1. Count the number of missing values in the Income column.

Solution:

  1. Replace missing Income values with the mean of the non-missing Income values. Count the number of missing values in the income column to show it is now 0

Solution:

  1. Remove all rows with missing values in the Purchase_Amount column. Count the number of missing values in the Purchase_Amount column to show it is now 0

Solution:

  1. Identify any outliers in the Income column using the interquartile range (IQR) method.

Solution:

  1. Replace Income outliers that are above the upper whisker with the 95th percentile of the Income values.

Solution:

  1. Create a new column Income_No_Outliers by removing outliers from the Income column.

Solution:

  1. Compute the correlation between Income and Purchase_Amount, ignoring missing values.

Solution:

  1. Compute the correlation between Age and Days_Since_Last_Purchase.

Solution:

  1. Create a histogram for the Income column.

Solution:

  1. Create a bar plot showing the count of individuals in each Education level.

Solution:

  1. Create a scatter plot of Income vs Purchase_Amount, highlighting missing values with a different color.

Solution:

LS0tDQp0aXRsZTogIklOU1M2MTUgSG9tZXdvcmsgMyINCm91dHB1dDoNCiAgI3dvcmRfZG9jdW1lbnQ6IGRlZmF1bHQNCiAgaHRtbF9ub3RlYm9vazogZGVmYXVsdA0KICBodG1sX2RvY3VtZW50Og0KICAgIGRmX3ByaW50OiBwYWdlZA0KLS0tDQoNCg0KKipNb3JnYW4gU3RhdGUgVW5pdmVyc2l0eSoqDQoNCioqRGVwYXJ0bWVudCBvZiBJbmZvcm1hdGlvbiBTY2llbmNlICYgU3lzdGVtcyoqDQoNCioqRmFsbCAyMDI0KioNCg0KKipJTlNTIDYxNTogRGF0YSBXcmFuZ2xpbmcgZm9yIFZpc3VhbGl6YXRpb24qKg0KDQoqKk5hbWU6IEVudGVyIHlvdXIgRnVsbC1uYW1lIGhlcmUqKg0KDQoqRHVlOiBPY3RvYmVyIDI5LCAyMDI0IChUdWVzZGF5KSBCZWZvcmUgQ2xhc3MqDQoNCg0KVGhpcyBob21ld29yayBpcyBkZXNpZ25lZCB0byBhc3Nlc3Mgc3R1ZGVudHMnIHVuZGVyc3RhbmRpbmcgb2YgZGF0YSBzbGljaW5nLCBkZXNjcmlwdGl2ZSBzdGF0aXN0aWNzLCBoYW5kbGluZyBtaXNzaW5nIHZhbHVlcyBhbmQgb3V0bGllcnMsIGFuZCBiYXNpYyB2aXN1YWxpemF0aW9ucy4gVGhlIGV4ZXJjaXNlcyBpbnZvbHZlIGFuYWx5emluZyBhIGZha2UgY29uc3VtZXIgZGF0YXNldCwgaW5jbHVkaW5nIHZhcmlhYmxlcyBzdWNoIGFzIGFnZSwgaW5jb21lLCBhbmQgcHVyY2hhc2UgZnJlcXVlbmN5LCB3aXRoIHNvbWUgbWlzc2luZyB2YWx1ZXMuIFN0dWRlbnRzIHdpbGwgdXNlICoqQmFzZSBSKiogdG8gY2xlYW4sIGFuYWx5emUsIGFuZCB2aXN1YWxpemUgdGhlIGRhdGEuIExvYWQgdGhlIHByb3ZpZGVkIGZha2VfZGF0YV93aXRoX21pc3NpbmcuY3N2IGRhdGFzZXQgYXMgYSBkYXRhZnJhbWUgYW5kIGFuc3dlciB0aGUgMjUgcXVlc3Rpb25zLCB3aXRoIGVhY2ggcXVlc3Rpb24gY2FycnlpbmcgNCBwb2ludHMuDQoNCkJlZm9yZSB5b3Ugc3RhcnQgYW5zd2VyaW5nIHF1ZXN0aW9ucywgeW91IHdhbnQgdG8gbG9hZCB0aGUgZGF0YXNldC4NCg0KYGBge3J9DQojIExvYWQgdGhlIGRhdGFzZXQgKG1vZGlmeSB0aGUgcGF0aCB0byBtYXRjaCB0aGUgbG9jYXRpb24gb2YgeW91ciBmYWtlX2RhdGEuY3N2KQ0KZGF0YSA8LSByZWFkLmNzdigiQzovVXNlcnMvdXNlci9Eb3dubG9hZHMvZmFrZV9kYXRhX3dpdGhfbWlzc2luZy5jc3YiKQ0KaGVhZChkYXRhLCBuPTUpDQpgYGANCg0KUXVlc3Rpb25zDQoNCg0KMS4gRXh0cmFjdCBhbGwgcm93cyB3aGVyZSBHZW5kZXIgaXMgIkZlbWFsZSIuIERpc3BsYXkgdGhlIGZpcnN0IDUgcm93cy4NCg0KIA0KICBTb2x1dGlvbjoNCmBgYHtyfQ0KDQpgYGANCg0KDQoyLiBFeHRyYWN0IHRoZSBJbmNvbWUgYW5kIFB1cmNoYXNlX0Ftb3VudCBjb2x1bW5zIGZvciBwZW9wbGUgb2xkZXIgdGhhbiA0MC4gRGlzcGxheSB0aGUgZmlyc3QgNSByb3dzLg0KDQogIA0KICBTb2x1dGlvbjoNCmBgYHtyfQ0KDQpgYGANCg0KDQozLiBHZXQgYWxsIHJvd3Mgd2hlcmUgUHJvZHVjdF9DYXRlZ29yeSBpcyAiRWxlY3Ryb25pY3MiIGFuZCBJbmNvbWUgaXMgZ3JlYXRlciB0aGFuIDYwLDAwMC4gRGlzcGxheSB0aGUgZmlyc3QgNSByb3dzLg0KDQoNCiAgU29sdXRpb246DQpgYGB7cn0NCg0KYGBgDQoNCg0KNC4gU2VsZWN0IHRoZSBBZ2UgYW5kIEdlbmRlciBjb2x1bW5zIGZvciB0aGUgZmlyc3QgMjAwIHJvd3MuIERpc3BsYXkgdGhlIGZpcnN0IDUgcm93cy4NCg0KDQogIFNvbHV0aW9uOg0KYGBge3J9DQoNCg0KYGBgDQoNCg0KNS4gR2V0IHJvd3Mgd2hlcmUgRWR1Y2F0aW9uIGlzIG1pc3NpbmcuIERpc3BsYXkgdGhlIGZpcnN0IDUgcm93cy4NCg0KDQogIFNvbHV0aW9uOg0KYGBge3J9DQoNCg0KYGBgDQoNCg0KNi4gQ2FsY3VsYXRlIHRoZSBtZWFuIGFuZCBtZWRpYW4gb2YgdGhlIEluY29tZSBjb2x1bW4sIGlnbm9yaW5nIG1pc3NpbmcgdmFsdWVzLiBEaXNwbGF5IHRoZSBmaXJzdCA1IHJvd3MuDQoNCg0KICBTb2x1dGlvbjoNCmBgYHtyfQ0KDQoNCmBgYA0KDQoNCjcuIEZpbmQgdGhlIHN0YW5kYXJkIGRldmlhdGlvbiBvZiB0aGUgUHVyY2hhc2VfQW1vdW50IGNvbHVtbi4gRGlzcGxheSB0aGUgZmlyc3QgNSByb3dzLg0KDQoNCiAgU29sdXRpb246DQpgYGB7cn0NCg0KDQpgYGANCg0KDQo4LiBHZXQgdGhlIHN1bW1hcnkgc3RhdGlzdGljcyBvZiB0aGUgRGF5c19TaW5jZV9MYXN0X1B1cmNoYXNlIGNvbHVtbi4NCg0KDQogIFNvbHV0aW9uOg0KYGBge3J9DQoNCg0KYGBgDQoNCg0KOS4gRmluZCB0aGUgZnJlcXVlbmN5IGNvdW50IG9mIEVkdWNhdGlvbiBsZXZlbHMuDQoNCg0KICBTb2x1dGlvbjoNCmBgYHtyfQ0KDQoNCmBgYA0KDQoNCjEwLiBDcmVhdGUgYSBmcmVxdWVuY3kgdGFibGUgZm9yIFByb2R1Y3RfQ2F0ZWdvcnkuDQoNCg0KICBTb2x1dGlvbjoNCmBgYHtyfQ0KDQoNCmBgYA0KDQoNCg0KDQoxMS4gRGV0ZXJtaW5lIHRoZSBwcm9wb3J0aW9uIG9mIEdlbmRlciBpbiB0aGUgZGF0YXNldCwgaWdub3JpbmcgbWlzc2luZyB2YWx1ZXMuDQoNCg0KICBTb2x1dGlvbjoNCiANCmBgYHtyfQ0KDQoNCmBgYA0KDQoxMi4gQ2FsY3VsYXRlIHRoZSB0b3RhbCBQdXJjaGFzZV9BbW91bnQgZm9yIGFsbCBpbmRpdmlkdWFscyB3aG8gaGF2ZSBtYWRlIGEgcHVyY2hhc2Ugd2l0aGluIHRoZSBsYXN0IDEwMCBkYXlzLg0KDQoNCiBTb2x1dGlvbjoNCg0KYGBge3J9DQoNCg0KYGBgDQoNCg0KMTMuIEFkZCAxMDAwIHRvIHRoZSBJbmNvbWUgb2YgYWxsIGluZGl2aWR1YWxzIG92ZXIgdGhlIGFnZSBvZiA1MC4gRGlzcGxheSB0aGUgZmlyc3QgNSByb3dzLg0KDQoNCiBTb2x1dGlvbjoNCg0KYGBge3J9DQoNCg0KYGBgDQoNCg0KDQoxNC4gQ3JlYXRlIGEgbmV3IGNvbHVtbiBEaXNjb3VudGVkX1B1cmNoYXNlIHRoYXQgY29udGFpbnMgODAlIG9mIHRoZSBvcmlnaW5hbCBQdXJjaGFzZV9BbW91bnQuIERpc3BsYXkgdGhlIGZpcnN0IDUgcm93cw0KDQoNCiBTb2x1dGlvbjoNCg0KYGBge3J9DQoNCmBgYA0KDQoNCg0KDQoxNS4gQ291bnQgdGhlIG51bWJlciBvZiBtaXNzaW5nIHZhbHVlcyBpbiB0aGUgSW5jb21lIGNvbHVtbi4NCg0KDQogU29sdXRpb246DQoNCmBgYHtyfQ0KDQoNCmBgYA0KDQoNCg0KMTYuIFJlcGxhY2UgbWlzc2luZyBJbmNvbWUgdmFsdWVzIHdpdGggdGhlIG1lYW4gb2YgdGhlIG5vbi1taXNzaW5nIEluY29tZSB2YWx1ZXMuIENvdW50IHRoZSBudW1iZXIgb2YgbWlzc2luZyB2YWx1ZXMgaW4gdGhlIGluY29tZSBjb2x1bW4gdG8gc2hvdyBpdCBpcyBub3cgMCANCg0KDQogIFNvbHV0aW9uOg0KYGBge3J9DQoNCg0KYGBgDQoNCg0KMTcuIFJlbW92ZSBhbGwgcm93cyB3aXRoIG1pc3NpbmcgdmFsdWVzIGluIHRoZSBQdXJjaGFzZV9BbW91bnQgY29sdW1uLiBDb3VudCB0aGUgbnVtYmVyIG9mIG1pc3NpbmcgdmFsdWVzIGluIHRoZSBQdXJjaGFzZV9BbW91bnQgY29sdW1uIHRvIHNob3cgaXQgaXMgbm93IDANCg0KDQogIFNvbHV0aW9uOg0KYGBge3J9DQoNCg0KYGBgDQoNCg0KMTguIElkZW50aWZ5IGFueSBvdXRsaWVycyBpbiB0aGUgSW5jb21lIGNvbHVtbiB1c2luZyB0aGUgaW50ZXJxdWFydGlsZSByYW5nZSAoSVFSKSBtZXRob2QuDQoNCg0KICBTb2x1dGlvbjoNCmBgYHtyfQ0KDQoNCmBgYA0KDQoNCjE5LiAgUmVwbGFjZSBJbmNvbWUgb3V0bGllcnMgdGhhdCBhcmUgYWJvdmUgdGhlIHVwcGVyIHdoaXNrZXIgd2l0aCB0aGUgOTV0aCBwZXJjZW50aWxlIG9mIHRoZSBJbmNvbWUgdmFsdWVzLg0KDQoNCiAgU29sdXRpb246DQpgYGB7cn0NCg0KDQpgYGANCg0KDQoNCg0KMjAuIENyZWF0ZSBhIG5ldyBjb2x1bW4gSW5jb21lX05vX091dGxpZXJzIGJ5IHJlbW92aW5nIG91dGxpZXJzIGZyb20gdGhlIEluY29tZSBjb2x1bW4uDQoNCg0KICBTb2x1dGlvbjoNCmBgYHtyfQ0KDQoNCmBgYA0KDQoNCg0KDQoyMS4gQ29tcHV0ZSB0aGUgY29ycmVsYXRpb24gYmV0d2VlbiBJbmNvbWUgYW5kIFB1cmNoYXNlX0Ftb3VudCwgaWdub3JpbmcgbWlzc2luZyB2YWx1ZXMuDQoNCg0KICBTb2x1dGlvbjoNCiANCmBgYHtyfQ0KDQoNCmBgYA0KDQoyMi4gIENvbXB1dGUgdGhlIGNvcnJlbGF0aW9uIGJldHdlZW4gQWdlIGFuZCBEYXlzX1NpbmNlX0xhc3RfUHVyY2hhc2UuDQoNCg0KIFNvbHV0aW9uOg0KDQpgYGB7cn0NCg0KDQpgYGANCg0KDQoyMy4gQ3JlYXRlIGEgaGlzdG9ncmFtIGZvciB0aGUgSW5jb21lIGNvbHVtbi4NCg0KDQogU29sdXRpb246DQoNCmBgYHtyfQ0KDQoNCmBgYA0KDQoNCg0KMjQuIENyZWF0ZSBhIGJhciBwbG90IHNob3dpbmcgdGhlIGNvdW50IG9mIGluZGl2aWR1YWxzIGluIGVhY2ggRWR1Y2F0aW9uIGxldmVsLg0KDQoNCiBTb2x1dGlvbjoNCg0KYGBge3J9DQoNCmBgYA0KDQoNCg0KDQoyNS4gQ3JlYXRlIGEgc2NhdHRlciBwbG90IG9mIEluY29tZSB2cyBQdXJjaGFzZV9BbW91bnQsIGhpZ2hsaWdodGluZyBtaXNzaW5nIHZhbHVlcyB3aXRoIGEgZGlmZmVyZW50IGNvbG9yLg0KDQoNCiBTb2x1dGlvbjoNCg0KYGBge3J9DQoNCg0KYGBg