Morgan State University
Department of Information Science & Systems
Fall 2024
INSS 615: Data Wrangling for Visualization
Name: Enter your Full-name here
Due: October 29, 2024 (Tuesday) Before Class
This homework is designed to assess students’ understanding of data
slicing, descriptive statistics, handling missing values and outliers,
and basic visualizations. The exercises involve analyzing a fake
consumer dataset, including variables such as age, income, and purchase
frequency, with some missing values. Students will use Base
R to clean, analyze, and visualize the data. Load the provided
fake_data_with_missing.csv dataset as a dataframe and answer the 25
questions, with each question carrying 4 points.
Before you start answering questions, you want to load the
dataset.
# Load the dataset (modify the path to match the location of your fake_data.csv)
data <- read.csv("C:/Users/user/Downloads/fake_data_with_missing.csv")
head(data, n=5)
Questions
- Extract all rows where Gender is “Female”. Display the first 5
rows.
Solution:
- Extract the Income and Purchase_Amount columns for people older than
40. Display the first 5 rows.
Solution:
- Get all rows where Product_Category is “Electronics” and Income is
greater than 60,000. Display the first 5 rows.
Solution:
- Select the Age and Gender columns for the first 200 rows. Display
the first 5 rows.
Solution:
- Get rows where Education is missing. Display the first 5 rows.
Solution:
- Calculate the mean and median of the Income column, ignoring missing
values. Display the first 5 rows.
Solution:
- Find the standard deviation of the Purchase_Amount column. Display
the first 5 rows.
Solution:
- Get the summary statistics of the Days_Since_Last_Purchase
column.
Solution:
- Find the frequency count of Education levels.
Solution:
- Create a frequency table for Product_Category.
Solution:
- Determine the proportion of Gender in the dataset, ignoring missing
values.
Solution:
- Calculate the total Purchase_Amount for all individuals who have
made a purchase within the last 100 days.
Solution:
- Add 1000 to the Income of all individuals over the age of 50.
Display the first 5 rows.
Solution:
- Create a new column Discounted_Purchase that contains 80% of the
original Purchase_Amount. Display the first 5 rows
Solution:
- Count the number of missing values in the Income column.
Solution:
- Replace missing Income values with the mean of the non-missing
Income values. Count the number of missing values in the income column
to show it is now 0
Solution:
- Remove all rows with missing values in the Purchase_Amount column.
Count the number of missing values in the Purchase_Amount column to show
it is now 0
Solution:
- Identify any outliers in the Income column using the interquartile
range (IQR) method.
Solution:
- Replace Income outliers that are above the upper whisker with the
95th percentile of the Income values.
Solution:
- Create a new column Income_No_Outliers by removing outliers from the
Income column.
Solution:
- Compute the correlation between Income and Purchase_Amount, ignoring
missing values.
Solution:
- Compute the correlation between Age and
Days_Since_Last_Purchase.
Solution:
- Create a histogram for the Income column.
Solution:
- Create a bar plot showing the count of individuals in each Education
level.
Solution:
- Create a scatter plot of Income vs Purchase_Amount, highlighting
missing values with a different color.
Solution:
LS0tDQp0aXRsZTogIklOU1M2MTUgSG9tZXdvcmsgMyINCm91dHB1dDoNCiAgI3dvcmRfZG9jdW1lbnQ6IGRlZmF1bHQNCiAgaHRtbF9ub3RlYm9vazogZGVmYXVsdA0KICBodG1sX2RvY3VtZW50Og0KICAgIGRmX3ByaW50OiBwYWdlZA0KLS0tDQoNCg0KKipNb3JnYW4gU3RhdGUgVW5pdmVyc2l0eSoqDQoNCioqRGVwYXJ0bWVudCBvZiBJbmZvcm1hdGlvbiBTY2llbmNlICYgU3lzdGVtcyoqDQoNCioqRmFsbCAyMDI0KioNCg0KKipJTlNTIDYxNTogRGF0YSBXcmFuZ2xpbmcgZm9yIFZpc3VhbGl6YXRpb24qKg0KDQoqKk5hbWU6IEVudGVyIHlvdXIgRnVsbC1uYW1lIGhlcmUqKg0KDQoqRHVlOiBPY3RvYmVyIDI5LCAyMDI0IChUdWVzZGF5KSBCZWZvcmUgQ2xhc3MqDQoNCg0KVGhpcyBob21ld29yayBpcyBkZXNpZ25lZCB0byBhc3Nlc3Mgc3R1ZGVudHMnIHVuZGVyc3RhbmRpbmcgb2YgZGF0YSBzbGljaW5nLCBkZXNjcmlwdGl2ZSBzdGF0aXN0aWNzLCBoYW5kbGluZyBtaXNzaW5nIHZhbHVlcyBhbmQgb3V0bGllcnMsIGFuZCBiYXNpYyB2aXN1YWxpemF0aW9ucy4gVGhlIGV4ZXJjaXNlcyBpbnZvbHZlIGFuYWx5emluZyBhIGZha2UgY29uc3VtZXIgZGF0YXNldCwgaW5jbHVkaW5nIHZhcmlhYmxlcyBzdWNoIGFzIGFnZSwgaW5jb21lLCBhbmQgcHVyY2hhc2UgZnJlcXVlbmN5LCB3aXRoIHNvbWUgbWlzc2luZyB2YWx1ZXMuIFN0dWRlbnRzIHdpbGwgdXNlICoqQmFzZSBSKiogdG8gY2xlYW4sIGFuYWx5emUsIGFuZCB2aXN1YWxpemUgdGhlIGRhdGEuIExvYWQgdGhlIHByb3ZpZGVkIGZha2VfZGF0YV93aXRoX21pc3NpbmcuY3N2IGRhdGFzZXQgYXMgYSBkYXRhZnJhbWUgYW5kIGFuc3dlciB0aGUgMjUgcXVlc3Rpb25zLCB3aXRoIGVhY2ggcXVlc3Rpb24gY2FycnlpbmcgNCBwb2ludHMuDQoNCkJlZm9yZSB5b3Ugc3RhcnQgYW5zd2VyaW5nIHF1ZXN0aW9ucywgeW91IHdhbnQgdG8gbG9hZCB0aGUgZGF0YXNldC4NCg0KYGBge3J9DQojIExvYWQgdGhlIGRhdGFzZXQgKG1vZGlmeSB0aGUgcGF0aCB0byBtYXRjaCB0aGUgbG9jYXRpb24gb2YgeW91ciBmYWtlX2RhdGEuY3N2KQ0KZGF0YSA8LSByZWFkLmNzdigiQzovVXNlcnMvdXNlci9Eb3dubG9hZHMvZmFrZV9kYXRhX3dpdGhfbWlzc2luZy5jc3YiKQ0KaGVhZChkYXRhLCBuPTUpDQpgYGANCg0KUXVlc3Rpb25zDQoNCg0KMS4gRXh0cmFjdCBhbGwgcm93cyB3aGVyZSBHZW5kZXIgaXMgIkZlbWFsZSIuIERpc3BsYXkgdGhlIGZpcnN0IDUgcm93cy4NCg0KIA0KICBTb2x1dGlvbjoNCmBgYHtyfQ0KDQpgYGANCg0KDQoyLiBFeHRyYWN0IHRoZSBJbmNvbWUgYW5kIFB1cmNoYXNlX0Ftb3VudCBjb2x1bW5zIGZvciBwZW9wbGUgb2xkZXIgdGhhbiA0MC4gRGlzcGxheSB0aGUgZmlyc3QgNSByb3dzLg0KDQogIA0KICBTb2x1dGlvbjoNCmBgYHtyfQ0KDQpgYGANCg0KDQozLiBHZXQgYWxsIHJvd3Mgd2hlcmUgUHJvZHVjdF9DYXRlZ29yeSBpcyAiRWxlY3Ryb25pY3MiIGFuZCBJbmNvbWUgaXMgZ3JlYXRlciB0aGFuIDYwLDAwMC4gRGlzcGxheSB0aGUgZmlyc3QgNSByb3dzLg0KDQoNCiAgU29sdXRpb246DQpgYGB7cn0NCg0KYGBgDQoNCg0KNC4gU2VsZWN0IHRoZSBBZ2UgYW5kIEdlbmRlciBjb2x1bW5zIGZvciB0aGUgZmlyc3QgMjAwIHJvd3MuIERpc3BsYXkgdGhlIGZpcnN0IDUgcm93cy4NCg0KDQogIFNvbHV0aW9uOg0KYGBge3J9DQoNCg0KYGBgDQoNCg0KNS4gR2V0IHJvd3Mgd2hlcmUgRWR1Y2F0aW9uIGlzIG1pc3NpbmcuIERpc3BsYXkgdGhlIGZpcnN0IDUgcm93cy4NCg0KDQogIFNvbHV0aW9uOg0KYGBge3J9DQoNCg0KYGBgDQoNCg0KNi4gQ2FsY3VsYXRlIHRoZSBtZWFuIGFuZCBtZWRpYW4gb2YgdGhlIEluY29tZSBjb2x1bW4sIGlnbm9yaW5nIG1pc3NpbmcgdmFsdWVzLiBEaXNwbGF5IHRoZSBmaXJzdCA1IHJvd3MuDQoNCg0KICBTb2x1dGlvbjoNCmBgYHtyfQ0KDQoNCmBgYA0KDQoNCjcuIEZpbmQgdGhlIHN0YW5kYXJkIGRldmlhdGlvbiBvZiB0aGUgUHVyY2hhc2VfQW1vdW50IGNvbHVtbi4gRGlzcGxheSB0aGUgZmlyc3QgNSByb3dzLg0KDQoNCiAgU29sdXRpb246DQpgYGB7cn0NCg0KDQpgYGANCg0KDQo4LiBHZXQgdGhlIHN1bW1hcnkgc3RhdGlzdGljcyBvZiB0aGUgRGF5c19TaW5jZV9MYXN0X1B1cmNoYXNlIGNvbHVtbi4NCg0KDQogIFNvbHV0aW9uOg0KYGBge3J9DQoNCg0KYGBgDQoNCg0KOS4gRmluZCB0aGUgZnJlcXVlbmN5IGNvdW50IG9mIEVkdWNhdGlvbiBsZXZlbHMuDQoNCg0KICBTb2x1dGlvbjoNCmBgYHtyfQ0KDQoNCmBgYA0KDQoNCjEwLiBDcmVhdGUgYSBmcmVxdWVuY3kgdGFibGUgZm9yIFByb2R1Y3RfQ2F0ZWdvcnkuDQoNCg0KICBTb2x1dGlvbjoNCmBgYHtyfQ0KDQoNCmBgYA0KDQoNCg0KDQoxMS4gRGV0ZXJtaW5lIHRoZSBwcm9wb3J0aW9uIG9mIEdlbmRlciBpbiB0aGUgZGF0YXNldCwgaWdub3JpbmcgbWlzc2luZyB2YWx1ZXMuDQoNCg0KICBTb2x1dGlvbjoNCiANCmBgYHtyfQ0KDQoNCmBgYA0KDQoxMi4gQ2FsY3VsYXRlIHRoZSB0b3RhbCBQdXJjaGFzZV9BbW91bnQgZm9yIGFsbCBpbmRpdmlkdWFscyB3aG8gaGF2ZSBtYWRlIGEgcHVyY2hhc2Ugd2l0aGluIHRoZSBsYXN0IDEwMCBkYXlzLg0KDQoNCiBTb2x1dGlvbjoNCg0KYGBge3J9DQoNCg0KYGBgDQoNCg0KMTMuIEFkZCAxMDAwIHRvIHRoZSBJbmNvbWUgb2YgYWxsIGluZGl2aWR1YWxzIG92ZXIgdGhlIGFnZSBvZiA1MC4gRGlzcGxheSB0aGUgZmlyc3QgNSByb3dzLg0KDQoNCiBTb2x1dGlvbjoNCg0KYGBge3J9DQoNCg0KYGBgDQoNCg0KDQoxNC4gQ3JlYXRlIGEgbmV3IGNvbHVtbiBEaXNjb3VudGVkX1B1cmNoYXNlIHRoYXQgY29udGFpbnMgODAlIG9mIHRoZSBvcmlnaW5hbCBQdXJjaGFzZV9BbW91bnQuIERpc3BsYXkgdGhlIGZpcnN0IDUgcm93cw0KDQoNCiBTb2x1dGlvbjoNCg0KYGBge3J9DQoNCmBgYA0KDQoNCg0KDQoxNS4gQ291bnQgdGhlIG51bWJlciBvZiBtaXNzaW5nIHZhbHVlcyBpbiB0aGUgSW5jb21lIGNvbHVtbi4NCg0KDQogU29sdXRpb246DQoNCmBgYHtyfQ0KDQoNCmBgYA0KDQoNCg0KMTYuIFJlcGxhY2UgbWlzc2luZyBJbmNvbWUgdmFsdWVzIHdpdGggdGhlIG1lYW4gb2YgdGhlIG5vbi1taXNzaW5nIEluY29tZSB2YWx1ZXMuIENvdW50IHRoZSBudW1iZXIgb2YgbWlzc2luZyB2YWx1ZXMgaW4gdGhlIGluY29tZSBjb2x1bW4gdG8gc2hvdyBpdCBpcyBub3cgMCANCg0KDQogIFNvbHV0aW9uOg0KYGBge3J9DQoNCg0KYGBgDQoNCg0KMTcuIFJlbW92ZSBhbGwgcm93cyB3aXRoIG1pc3NpbmcgdmFsdWVzIGluIHRoZSBQdXJjaGFzZV9BbW91bnQgY29sdW1uLiBDb3VudCB0aGUgbnVtYmVyIG9mIG1pc3NpbmcgdmFsdWVzIGluIHRoZSBQdXJjaGFzZV9BbW91bnQgY29sdW1uIHRvIHNob3cgaXQgaXMgbm93IDANCg0KDQogIFNvbHV0aW9uOg0KYGBge3J9DQoNCg0KYGBgDQoNCg0KMTguIElkZW50aWZ5IGFueSBvdXRsaWVycyBpbiB0aGUgSW5jb21lIGNvbHVtbiB1c2luZyB0aGUgaW50ZXJxdWFydGlsZSByYW5nZSAoSVFSKSBtZXRob2QuDQoNCg0KICBTb2x1dGlvbjoNCmBgYHtyfQ0KDQoNCmBgYA0KDQoNCjE5LiAgUmVwbGFjZSBJbmNvbWUgb3V0bGllcnMgdGhhdCBhcmUgYWJvdmUgdGhlIHVwcGVyIHdoaXNrZXIgd2l0aCB0aGUgOTV0aCBwZXJjZW50aWxlIG9mIHRoZSBJbmNvbWUgdmFsdWVzLg0KDQoNCiAgU29sdXRpb246DQpgYGB7cn0NCg0KDQpgYGANCg0KDQoNCg0KMjAuIENyZWF0ZSBhIG5ldyBjb2x1bW4gSW5jb21lX05vX091dGxpZXJzIGJ5IHJlbW92aW5nIG91dGxpZXJzIGZyb20gdGhlIEluY29tZSBjb2x1bW4uDQoNCg0KICBTb2x1dGlvbjoNCmBgYHtyfQ0KDQoNCmBgYA0KDQoNCg0KDQoyMS4gQ29tcHV0ZSB0aGUgY29ycmVsYXRpb24gYmV0d2VlbiBJbmNvbWUgYW5kIFB1cmNoYXNlX0Ftb3VudCwgaWdub3JpbmcgbWlzc2luZyB2YWx1ZXMuDQoNCg0KICBTb2x1dGlvbjoNCiANCmBgYHtyfQ0KDQoNCmBgYA0KDQoyMi4gIENvbXB1dGUgdGhlIGNvcnJlbGF0aW9uIGJldHdlZW4gQWdlIGFuZCBEYXlzX1NpbmNlX0xhc3RfUHVyY2hhc2UuDQoNCg0KIFNvbHV0aW9uOg0KDQpgYGB7cn0NCg0KDQpgYGANCg0KDQoyMy4gQ3JlYXRlIGEgaGlzdG9ncmFtIGZvciB0aGUgSW5jb21lIGNvbHVtbi4NCg0KDQogU29sdXRpb246DQoNCmBgYHtyfQ0KDQoNCmBgYA0KDQoNCg0KMjQuIENyZWF0ZSBhIGJhciBwbG90IHNob3dpbmcgdGhlIGNvdW50IG9mIGluZGl2aWR1YWxzIGluIGVhY2ggRWR1Y2F0aW9uIGxldmVsLg0KDQoNCiBTb2x1dGlvbjoNCg0KYGBge3J9DQoNCmBgYA0KDQoNCg0KDQoyNS4gQ3JlYXRlIGEgc2NhdHRlciBwbG90IG9mIEluY29tZSB2cyBQdXJjaGFzZV9BbW91bnQsIGhpZ2hsaWdodGluZyBtaXNzaW5nIHZhbHVlcyB3aXRoIGEgZGlmZmVyZW50IGNvbG9yLg0KDQoNCiBTb2x1dGlvbjoNCg0KYGBge3J9DQoNCg0KYGBg