About

My name is Cindy Lopez and I’m aspiring to enter a career in data science/data analytics. This space is where I document what I’ve been up to in terms of educating myself, whether it’s learning coding languages, practicing data analysis with data sets or reading up about statistics. Here I’ll post projects I’m undergoing, books I’m reading, and data sets that I’m practicing with.

I’m looking for internships or programs to begin working in data analytics, data science, or just data entry jobs. Feel free to contact me at .

Education

Amherst College: BA in Mathematics (3.9 GPA)
Graduated in May 2020 in the midst of the COVID-19 pandemic

Whitney M. Young Magnet High School : Graduated June 2016

Udemy Courses

I have completed the following courses on Udemy learning site:

Work Experience

CDW

IT Asset Management Analyst August 2019 - June 2020

  • Maintained asset database routinely by updating Software & SaaS maintenance dates, SKUs and inconsistencies
  • Built tutorial documents on new implemented processes that instructed coworkers and the ITAM team on their use
  • Validated Software and SaaS contract information on contract database and discarded outdated information
  • Entered new contracts into ONIT contract management system to keep track of purchased products
  • Communicated with product vendors and contract owners to ensure SaaS and Software information was up to date
  • Assisted ITAM team members daily with tasks to guarantee they met their deadlines

IT Asset Management Intern June 2019 - August 2019

  • Analyzed the use of a Service Now subscription service and presented its benefits to ITAM team
  • Interviewed subject matter experts regarding a website technology and presented a proposal for its integration into CDW along with 18 interns to several senior managers and the CIO
  • Updated inventory information for over 600 software, subscription and SaaS products

Deloitte

IT Hardware Asset Intern September 2015 - August 2016

  • Managed the unboxing of large shipments of new laptops and added their information to the office inventory
  • Physically transported inventory items to and from various departments such as IT Help Desk or Repair room
  • Created over a hundred newly imaged machines for new employees
  • Provided laptop computers for IT help desk daily and removed broken machines
  • Assisted in the daily wiping of company hard drives to maintain data governance policies

Amherst College Math Fellow

Math Fellow September 2016 - August 2019

  • Was a teaching assistant and held biweekly office hours of two-hour long sessions for Calculus I to support students’ learning and understanding
  • Motivated clear logical thinking to help students develop problem-solving and analytical skills
  • Empathetic to student needs and welcoming attitude towards constructive feedback
  • Communicated areas where students had difficulty to course professor

Math Grader September 2016 - August 2019

  • Graded weekly coursework for Calculus I, II and Linear Algebra and provided feedback for classes of 25 students each
  • Communicated with course professors regarding assignments and how to provide clear, detailed explanations for grading

Mathnasium

Research Experience: Clare Boothe Luce Scholar

Math and Statistics Researcher Summer 2018

Photo of Amherst College Summer Science Poster Fair - shown is Lisa Cenek (left), Yuuna Klindziuk (middle), and Cindy Lopez

Valentine Dining Catering Services

Catering Assistant September 2017 - May 2018

  • Assisted in the preparation and management of catered events
  • Worked with a group of colleagues to set up event spaces and ensure the smooth delivery of food for catered guests
  • Managed the safe retrieval of fragile dishware, such as porcelain bowls and plates
  • Distributed dishware for the use of the consumers during their mealtimes

R

To study and practice R, I have been reading Rafael A. Irizarry’s Introduction to Data Science: Data Analysis and Prediction Algorithms with R to educate myself on some common data analysis methods.

EDA Project – Playing with Palmer Penguins


The data set I work with is from the palmer penguins website provided by Dr. Kristen Gorman, Dr. Allison Horst, and Dr. Alison Hill.

The data set includes 8 variables, detailing 3 species of penguins on 3 different islands. I have eliminated NAs from the data.

## # A tibble: 6 x 8
##   species island bill_length_mm bill_depth_mm flipper_length_~ body_mass_g sex  
##   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
## 1 Adelie  Torge~           39.1          18.7              181        3750 male 
## 2 Adelie  Torge~           39.5          17.4              186        3800 fema~
## 3 Adelie  Torge~           40.3          18                195        3250 fema~
## 4 Adelie  Torge~           36.7          19.3              193        3450 fema~
## 5 Adelie  Torge~           39.3          20.6              190        3650 male 
## 6 Adelie  Torge~           38.9          17.8              181        3625 fema~
## # ... with 1 more variable: year <int>

The table below summarizes how many of each penguins we are working with. The largest subset is the Adelie penguins, their numbers more than double that of the Chinstraps. It also appears that the male to female ratio is almost 1:1 for all three species.

## # A tibble: 6 x 3
## # Groups:   species [3]
##   species   sex    total
##   <fct>     <fct>  <int>
## 1 Adelie    female    73
## 2 Adelie    male      73
## 3 Chinstrap female    34
## 4 Chinstrap male      34
## 5 Gentoo    female    58
## 6 Gentoo    male      61

Let’s take a look at the distribution of the variables body_mass_g, bill_depth_mm, bill_length_mm and flipper_length_mm. I’ve arranged the boxplots for each variable in ascending order from left to right according to the median.

We can make the following obversations from above:

  • bill_length_mm has quite a few noted outliers
  • Except for bill_depth_mm, Adelie penguins have the smallest measurements. (Although note that male Adelie have larger body mass than male and female Chinstraps)
  • Gentoo penguins have the highest body_mass_g and flipper_length_mm with relatively high bill_length_mm, while having the smallest bill_depth_mm.

Could there be an inverse relationship between bill_depth_mm and the other variables?

A quick regression analysis shows the following:

There appears to be a positive correlation between bill_depth_mm and the other variables. Below shows the strength of these relationships.

## # A tibble: 3 x 4
##   species   depth_length depth_mass depth_flipper
##   <fct>            <dbl>      <dbl>         <dbl>
## 1 Adelie           0.386      0.580         0.311
## 2 Chinstrap        0.654      0.604         0.580
## 3 Gentoo           0.654      0.723         0.711

include p - value above?

It looks like Gentoo penguins show the strongest correlations while Adelie penguins show the weakest correlations. What is causing this?

First, a better look into bill_depth_mm shows us:

Notice how Adelie penguins have the largest subset of penguins and the weakest correlations. We’ll zero our focus on them for now to investigate why the relationships between bill_depth_mm and the other variables are so weak.

Before we begin, notice how bill_depth_mm for the Adelie penguins changes across the years.

It appears that female Adelie bill depths decrease over the years, male Adelie too, albeit less noticably. Is this data tracking the same penguins? Do Adelie bill depths drop as Adelies get older? As a side track, just for comparison, observe the bill depths of Gentoo penguins:

## Picking joint bandwidth of 0.239
## Picking joint bandwidth of 0.262

## Picking joint bandwidth of 0.402
## Picking joint bandwidth of 0.402

Chinstrap penguins’ bill depths also slightly decrease across the years, but Gentoo penguins bill depths increase over the years. I’m no penguin expert, but perhaps this is because of biological differences in the each species.

Back to focusing on the Adelie, let’s check if any outliers in the data led to weak correlations by removing them and checking the relationship between the variables again.

Since we’re comparing bill_depth_mm to all the other variables, let’s remove its outliers from the data.

To ensure these are outliers, I’ll use Tukey’s definition of outliers to exclude any data points outside of the acceptable range and plot this range below. Below is how I calculated and plotted the graphs.

tukey_ranges <- penguins %>% filter(species == "Adelie") %>%
  group_by(sex) %>% 
  summarise(mean= mean(bill_depth_mm), 
            sd = sd(bill_depth_mm), 
            q1 = quantile(bill_depth_mm, 0.25), 
            q3 = quantile(bill_depth_mm, 0.75), 
            IQR = IQR(bill_depth_mm), 
            min = q1 - 1.5*IQR, max = q3 + 1.5*IQR, 
            three_sds_out = mean + 3*sd) %>% 
  select(sex, min, mean, max, three_sds_out)

female_Adelie <- penguins %>% filter(species == "Adelie", sex == "female") %>% 
  ggplot(aes( x = bill_depth_mm)) +
  geom_histogram(binwidth = 0.3, fill = "orange")+
  xlab("")+
  xlim(15, 23)+
  ggtitle("Female Adelie bill depth")+
  geom_vline(xintercept = tukey_ranges$min[1], color = "blue", size = 1)+
  geom_vline(xintercept = tukey_ranges$max[1], color = "blue", size = 1 )+
  geom_vline(xintercept = tukey_ranges$three_sds_out[1], color = "red", size = 1) +
  geom_text(data = data.frame(c("min range", "max range", "3 sds out"), 
                              x = c(15, 20, 21), 
                              y = c(10, 10, 10)), aes(x, y, label = c("min", "max", "3 sds out"))) +
  theme_clean()
  

male_Adelie <- penguins %>% filter(species == "Adelie", sex == "male") %>% 
  ggplot(aes( x = bill_depth_mm)) +
  geom_histogram(binwidth = 0.3, fill = "blue")+
  xlab("Bill depth (mm)")+
  xlim(15, 23)+
  ggtitle("Male Adelie bill depth") +
  geom_vline(xintercept = tukey_ranges$min[2], color = "blue", size = 1)+
  geom_vline(xintercept = tukey_ranges$max[2], color = "blue", size = 1)+
  geom_vline(xintercept = tukey_ranges$three_sds_out[2], color = "red", size = 1) +
  geom_text(data = data.frame(c("min range", "max range", "3 sds out"), 
                              x = c(17, 21, 22), 
                              y = c(10, 10, 10)), aes(x, y, label = c("min", "max", "3 sds out")))+
  theme_clean() 

tukey_ranges
## # A tibble: 2 x 5
##   sex      min  mean   max three_sds_out
##   <fct>  <dbl> <dbl> <dbl>         <dbl>
## 1 female  15.0  17.6  20.2          20.5
## 2 male    16.8  19.1  21.3          22.1
## Warning: Removed 2 rows containing missing values (geom_bar).

## Warning: Removed 2 rows containing missing values (geom_bar).

Anything outside of the blue min and max bars are outliers. For extra measure, I added a red line signifying what three standard deviations away was to emphasize how far out one of the female Adelie data points was. I will remove occurences outside of the ranges and update our penguins table.

Now let’s see if this has affected the correlations for Adelie.

## # A tibble: 1 x 3
##   depth_length depth_mass depth_flipper
##          <dbl>      <dbl>         <dbl>
## 1        0.386      0.580         0.311
## # A tibble: 1 x 3
##   depth_length depth_mass depth_flipper
##          <dbl>      <dbl>         <dbl>
## 1        0.355      0.579         0.310

The correlations have actually fallen! Perhaps it’s because I removed the outliers individually by sex but am computing the correlation for the Adelie as a whole. I could compute the correlations by sex as well, but that would be too tedious.

Since Adelie population doesn’t have outliers for bill_depth_mm altogether, I’ll return the data as it was before removing the outliers.

Out of curiosity, I will check to see if Adelie’s have outliers as a whole for the rest of their variables.

Only flipper_length_mm have occurences outside of its range. I’ll remove those and compare the original correlations to the correlations we get after removing the flipper outliers.

## # A tibble: 1 x 3
##   depth_length depth_mass depth_flipper
##          <dbl>      <dbl>         <dbl>
## 1        0.386      0.580         0.311
## # A tibble: 1 x 3
##   depth_length depth_mass depth_flipper
##          <dbl>      <dbl>         <dbl>
## 1        0.396      0.587         0.342

The correlations did go up, but they’re still not high enough to be significant. It makes me wonder whether calculating correlations would be better if I calculated it per sex as well.

## # A tibble: 6 x 5
## # Groups:   species [3]
##   species   sex    depth_length depth_flipper depth_mass
##   <fct>     <fct>         <dbl>         <dbl>      <dbl>
## 1 Adelie    female       0.157          0.113      0.414
## 2 Adelie    male        -0.0145         0.239      0.159
## 3 Chinstrap female       0.256          0.135      0.391
## 4 Chinstrap male         0.446          0.421      0.345
## 5 Gentoo    female       0.430          0.308      0.372
## 6 Gentoo    male         0.307          0.471      0.253

They are very much not. Based on this, I believe I can conclude that bill_depth_mm is not correlated to any of the other variables for the Adelie species.

Python

I started using python in a cryptography class taught by professor Nathan Pflueger at Amherst in the spring of 2020. The code I wrote for the course was usually short programs in Jupyter notebooks meant to demonstrate encrytion languages and key signatures.

Currently, I am reading Zed. A Shaw’s “Learn Python the Hard Way 3”, following the exercises in the book and utilizing Atom and the command line to run files. I have recently created a short program choose your path, the code to which I have shared on GitHub. Meant to run on the command line.

My goal is to also get a certification from w3schools.com in python.

I’m also practicing manipulating and plotting data using python’s matplotlib and pandas by utilizing the palmer penguins data set as described in the R section. These I do within a Jupyter file.

I’m working on showing a preview of that Juypter notebook here

SQL

I am currently reading Practical SQL: A Beginner’s Guide to Storytelling with Data written by Anthony DeBarros and working through its guided lessons on pgAdmin.

In addition, I have been utilizing w3schools.com to teach myself and am an active participant of HackerRank.

Excel

Tableau

I’ve only recently started using Tableau, particularly thanks to Stanford Computer Science Professor Widom’s free online courses for data science. Utilizing a data set for world soccer players, I created this Tableau dashboard to try my hand at using the program.

Other Skills

  • Languages: Bilingual in Spanish and English.
  • Can type up to 70 WPM
  • Diligent, detail oriented, strong time management and a team player
  • Strong interpersonal and communication skills
  • Practiced foil fencing for 4 years and was Foil Captain and President of the Amherst College Fencing Club
  • Likes to hula hoop for hours non-stop while watching movies and TV Shows