Research Question

The goal of this project is to answer the research question Which are the most valued data science skills?

In order to answer that question we found and used survey data from the Kaggle ML and Data Science Survey, 2017.

While the answer to the question is by definition subjective, the Kaggle Survey was, “an industry-wide survey to establish a comprehensive view of the state of data science and machine learning” and with over 16,000 responses it provides a good starting point for exploring the views of professionals in the field and what they value.

Importing data

The survey was stored in 2 different files consisting of:

multiple choice items
free-response items

We chose to focus on the multiple choice data only for statistical analysis. Kaggle stored each data in csv format. We downloaded the multiple choice item survey results in csv format and placed it in our GitHub repo.

A Day in a Data Scientist’s Life

We start with exploring the resources utilized by Kaggle survey users for learning data science. What are the different data science activities they do, what are the different learning platforms they use and how do they feel about the userfulness of those platforms?

Insights into the demographics : How respondents data is distributed across different countries and also some interesting facts about country-wise gender distribution

A Day in a Data Scientist’s Life

Variables and their definition

To begin with, we focussed on users/ respondents demographics to understand the age group and their gender.

After analyzing data, variables:GenderSelect & Age, it appears that out of 16716 global Kaggle respondents there are 13610 males and 2778 females. In this subset male respondents are almost 5 (~4.8) times more than female respondents. Also, from the plot below it is pretty evident that repondents’ age peaks at 25 for both males and females whereas the median age is about 30.

Since, we are trying to determine what the most important Data Science Skills are, it is very important to understand what a data scientist does. What are the different activities a data scientist performs on daily basis, and how much time does each activity typically take?

A Day in a Data Scientist’s Life

Let’s take a peek at a day in the life of a Data Scientist and try to figure out what a data scientist does.

The day typically starts with a question or business problem and invloves following activties/ tasks:

GatheringData
FindingInsights
ModelBuilding
Visualizing
Production

A Day in a Data Scientist’s Life

Manipulating data

Kaggle successfully captured repondents’ data about time spent in different activities. In order to analyze this question we looked at attributes: TimeGatheringData, TimeModelBuilding, TimeProduction, TimeVisualizing, and TimeFindingInsights.

In order to determining usefulness of learners platfom we tidy the data for 18 learning platform attributes present in the data set and perform the analysis on long data type. We also successffuly manipulated data to find user’s sentiments/ remarks from platform usefulness standpoint.

A Day in a Data Scientist’s Life

Exploratory Data Analysis (EDA)

After analyzing data for US repondents it appears that data aquisition or gathering data is the main activitiy, at 37.75%. This is where a data scientist spends most of their time. Model building ranks 2nd, at 19.23%, followed by time spent in finding insights and data visualization. Only 10.23% of their total time appears to be taken by production activities.

DSActivity	mean_precent
TimeGatheringData	37.75491
TimeModelBuilding	19.23263
TimeFindingInsights	14.50524
TimeVisualizing	13.74509
TimeProduction	10.23198

Whether one is employed full-time, part-time or a student; its worth exploring how people are using different learning platforms and how they feel about them. We made use of different learning platform attributes captured in the dataset which also includes Kaggle as a learning platform.

lid	Country	EmploymentStatus	LPlatform	LP_count	LearningPlatform
1	United States	Not employed, but looking for work	LearningPlatformUsefulnessKaggle	Somewhat useful	Kaggle
3	United States	Independent contractor, freelancer, or self-employed	LearningPlatformUsefulnessBlogs	Very useful	Blogs
3	United States	Independent contractor, freelancer, or self-employed	LearningPlatformUsefulnessCollege	Very useful	College
3	United States	Independent contractor, freelancer, or self-employed	LearningPlatformUsefulnessConferences	Very useful	Conferences
3	United States	Independent contractor, freelancer, or self-employed	LearningPlatformUsefulnessFriends	Very useful	Friends
3	United States	Independent contractor, freelancer, or self-employed	LearningPlatformUsefulnessDocumentation	Very useful	Documentation

After analyzing respondents take on different learning platforms it appears that learners mostly benefited from personal projects as majority of resonses indicate projects as being very useful. Online courses appears to be 2nd, followed by StackOverflow and Kaggle. Blogs,textbooks and college also appear to be very userful whereas newsletters, podcasts and tradebook rank low.

What do Data Scientists Want to Learn?

Next, we examine what these survey takers of various educational backgrounds find themselves excited to learn. Due to the ever-evolving nature of technology and, by extension, data science, it is imperative that they remain relevant in their field and are passionate in their pursuit for relevance. Understanding what working professionals want to learn could give us insight into what skills are most valued in the field.

Does survey takers’ formal education have any relationship to the Machine Learning/Data Science method he or she is most excited about learning in the next year?

What do Data Scientists Want to Learn?

Variables and their definition

To do the analysis, we concentrate on two columns in the dataset

FormalEducation: Which level of formal education have you attained?
MLMethodNextYearSelect : Which Machine Learning/Data Science method are you most excited about learning in the next year?

These questions were asked to all participants.

What do Data Scientists Want to Learn?

Exploratory Data Analysis (EDA)

First we plot the distribution of formal education in the dataset

The data set predominantly contains candidates with Master’s degrees which are followed by Bachelor’s then doctoral degrees.

Now let’s look at the different Machine Learning/Data Science methods in the dataset.

Machine Learning/Data Science
Random Forests
Deep learning
Neural Nets
Text Mining
Genetic & Evolutionary Algorithms
Link Analysis
Rule Induction
Regression
Proprietary Algorithms
I don’t plan on learning a new ML/DS method
Ensemble Methods (e.g. boosting, bagging)
Factor Analysis
Social Network Analysis
Monte Carlo Methods
Time Series Analysis
Other
Bayesian Methods
Survival Analysis
MARS
Anomaly Detection
Cluster Analysis
Decision Trees
Association Rules
Uplift Modeling
Support Vector Machines (SVM)

Now we can plot the distribution of Machine Learning/Data Science methods with formal education.

What do Data Scientists Want to Learn?

Our results revealed that Deep Learning is the top most Machine Learning/Data Science method among the Kaggle survey takers regardless of their earned formal education. Interestingly, both 40% of respondents who had earned bachelor degree and 40% of survey takers with earned master’s degree stated that Deep Learning is the technique that they are most excited about learning in the next year. Similarly, 39% of the respondents with high school degree reported to learn Deep Learning as their top desired Machine Learning/Data Science method.

Following Deep Learning, Neural Nets emerged as the second top Machine Learning/Data Science method that Data Scientists have the desire to learn next year. Intriguingly, College Dropouts have highest percentage in the distribution in learning Neural Nets.

Time Series Analysis was found to be the third Machine Learning/Data Science method of interest. High school graduates want to learn Genetic & Evolutionary Algorithms as their third choice.

Among doctoral survey takers, Bayesian Methods is the third preference. This particular Machine Learning/Data Science method was not choice for others but only with PhDs.

The results are suggesting that there is a clear trend among the data scientists that Deep Learning is the Machine Learning/Data Science method they want to learn. As to the global research question of interest what data science skills are valued the most, the results from this insight suggest that aspiring data scientists should consider learning Deep Learning.

Data Science Methods

What are the most frequently used data science (DS) methods by those writing code in DS professions? Do those relate to formal educational attainment?

The Kaggle dataset provides multiple different variables to assess what the most valuable data science skills may be. In the previous section, we examined what data science methods learners are most excited about and working on. In this section, we’ll look at which data science methods are the most frequently used and if that has any relationship to educational attainment–a potential indicator of whether or not certain methods require advanced academic training.

Data Science Methods

Variables and their definition

The following variables label questions asking survey respondents how often they use each of these data science methods. Response options were: Rarely, Sometimes, Often, Most of the time

WorkMethodsFrequencyA/B
WorkMethodsFrequencyAssociationRules
WorkMethodsFrequencyBayesian
WorkMethodsFrequencyCNNs
WorkMethodsFrequencyCollaborativeFiltering
WorkMethodsFrequencyCross-Validation
WorkMethodsFrequencyDataVisualization
WorkMethodsFrequencyDecisionTrees
WorkMethodsFrequencyEnsembleMethods
WorkMethodsFrequencyEvolutionaryApproaches
WorkMethodsFrequencyGANs
WorkMethodsFrequencyGBM
WorkMethodsFrequencyHMMs
WorkMethodsFrequencyKNN
WorkMethodsFrequencyLiftAnalysis
WorkMethodsFrequencyLogisticRegression
WorkMethodsFrequencyMLN
WorkMethodsFrequencyNaiveBayes
WorkMethodsFrequencyNLP
WorkMethodsFrequencyNeuralNetworks
WorkMethodsFrequencyPCA
WorkMethodsFrequencyPrescriptiveModeling
WorkMethodsFrequencyRandomForests
WorkMethodsFrequencyRecommenderSystems
WorkMethodsFrequencyRNNs
WorkMethodsFrequencySegmentation
WorkMethodsFrequencySimulation
WorkMethodsFrequencySVMs
WorkMethodsFrequencyTextAnalysis
WorkMethodsFrequencyTimeSeriesAnalysis

The additional variables used for this analysis will include:

Formal Education

Data Science Methods

Manipulating data

In order to answer the question of which methods are most popular among code writers, several transformations must first be done. First, we filter the dataset down to only those who were classified as code writers: those that were employed in some capacity working in data science and writing code as part of their job duties. Additionally, we include only participants who endorsed at least one data science skill on the question, “At work, which data science methods do you use? (Select all that apply)” with variable name :WorkMethodsSelect.

Once filtered, the endorsed data science methods were aggregated and plotted for frequency (see Exploratory Data Analysis below). The top five most frequent data science methods endorsed were then selected and given a frequency score to represent among those who endorse using them to some extent, how frequently they use that tool.

The final transformation performed on the data was grouping by formal education level attainment and then identifying the most frequently endorsed data science methods for each group. This can help identify if those writing certain types of code and using certain data analyses are potentially benefitted by pursuing advanced education–a valuable insight for potential data science pupils.

Data Science Methods

Exploratory Data Analysis (EDA)

Following manipulation of the Kaggle data set, we created plots to visualize the aforementioned research questions. First, here is a look at the frequency with which the following data science methods were endorsed by a total of 7,773 respondents. Nearly 2/3 of the survey respondents endorsed the first place skill, data visualization. Over half endorse logistic regression and just shy of half endorse cross-validation and decision trees.

Options	Freq
Data Visualization	5022
Logistic Regression	4291
Cross-Validation	3868
Decision Trees	3695
Random Forests	3454
Time Series Analysis	3153
Neural Networks	2811
PCA and Dimensionality Reduction	2789
kNN and Other Clustering	2624
Text Analytics	2405
Ensemble Methods	2056
Segmentation	2050
SVMs	1973
Natural Language Processing	1949
A/B Testing	1936
Bayesian Techniques	1913
Naive Bayes	1902
Gradient Boosted Machines	1557
CNNs	1417
Simulation	1398
Recommender Systems	1158
Association Rules	1146
RNNs	891
Prescriptive Modeling	851
Collaborative Filtering	793
Lift Analysis	650
Evolutionary Approaches	436
HMMs	419
Other	391
Markov Logic Networks	255
GANs	244

Data Science Methods

The following plot graphically displays the frequency of endorsements for the data science methods asked about.

Data Science Methods

In this plot we show the “Frequency Score” for the Top Five most endorsed data science methods. It’s important to break this down further than endorsement, as the above table and plot only consider which data science methods one uses at all. Just because a method is endorsed, doesn’t mean that individuals use it frequently. It may be a rare but essential method in data science. To get a more fine grained understanding of how commonly one uses a given data science method on the job, the kaggle survey followed up each endorsed method by asking respondents if they use it Rarely, Sometimes, Often, Most of the time. We converted these to numeric values (Rarely = 1; Sometimes = 2, Often = 3, and Most of the time = 4) in order to graph a score and average the categorical responses.

Of the top five data science methods endorsed, data visualization was the skill indicated to be used the most frequently.

Data Science Methods

The below plots show the frequency of methods endorsed for each formal education level assessed by Kaggle.

We see that in the majority of educational attainment brackets, data visualization remains the most frequently endorsed data science method.

Data Science Methods

The same information is also provided in tabular format:

Selections	Freq	RelativeFreq	Degree
Data Visualization	1236	0.0944232	Bachelor’s Education
Logistic Regression	989	0.0755539	Bachelor’s Education
Decision Trees	847	0.0647059	Bachelor’s Education
Data Visualization	1129	0.0756348	Doctoral Education
Cross-Validation	1046	0.0700744	Doctoral Education
Logistic Regression	1031	0.0690695	Doctoral Education
Neural Networks	24	0.0808081	High School Education
Data Visualization	23	0.0774411	High School Education
Text Analytics	18	0.0606061	High School Education
Data Visualization	2331	0.0835873	Master’s Education
Logistic Regression	2022	0.0725069	Master’s Education
Cross-Validation	1821	0.0652992	Master’s Education
Data Visualization	150	0.0897129	Professional Education
Logistic Regression	121	0.0723684	Professional Education
Decision Trees	119	0.0711722	Professional Education
Data Visualization	137	0.1008837	Some Post Secondary Education
Logistic Regression	97	0.0714286	Some Post Secondary Education
Decision Trees	87	0.0640648	Some Post Secondary Education

Data Science Methods

Answering the research question of which data science skills are the most important can be interpreted and answered in many ways. One way to explore this deceivingly complex question is to analyze which data science methods are endorsed as being used by code writers on the job. This analysis did just that, and further explored the Top 5 most endorsed data science methods by seeing how frequently those that endorsed them actually use those methods on the job.

The bottom line of this analysis is to consider data visualization, logistic regression, cross-validation, decision trees, and random forests as not only frequently endorsed methods, but as methods that are not only essential but used in small ways. It seems like across data science code writers, these methods are popular and then for individual data science code writers, they are used frequently.

The second goal of this analysis was to understand how formal educational attainment relates to data science methods used on the job. When looking at the plots of each educational level and the table coalescing all of that data, it does not seem like data science methods used by code writers differ given the educational level. Data visualization remains the most frequently endorsed data science method for the majority of educational groups. This has important implications for students of data science in understanding that certain popular job functions are not only performed by those with advanced degrees. This speaks to how crucial skills like data visualization and the other frequently endorsed and commonly used methods are to data science as a whole.

‘Learners’ vs. Employed Data Scientists

Is there a difference between what ‘Learners’ think are the important skills to learn…

…and what employed Data Scientists say are the skills and tools they use?

‘Learners’ vs. Employed Data Scientists

Variables and Manipulating Data - Likert Scales

105 variables in 5 likert scale categories

Learning Platform Usefulness - learners and data scientists
- Not Useful Somewhat useful Very useful
Job Skill Importance - learners
- Unnecessary Nice to have Necessary
Work Tools Frequency - data scientists
- Rarely Sometimes Often Most of the time
WorkMethodsFrequency - data scientists
- Rarely Sometimes Often Most of the time

Plus a few basic demographic fields

‘Learners’ vs. Employed Data Scientists

Employed Data Scientists - Demographics

30% go by “Data Scientist”
20% go by “Scientist/Researcher” or “Software Developer/Software Engineer”

CurrentJobTitleSelect	total	percent
Data Scientist	644	30.51
Scientist/Researcher	225	10.66
Software Developer/Software Engineer	212	10.04
Data Analyst	185	8.76
Other	177	8.38
Researcher	138	6.54
Machine Learning Engineer	102	4.83
Engineer	73	3.46
Statistician	71	3.36
NA	71	3.36
Business Analyst	59	2.79
Computer Scientist	47	2.23
Predictive Modeler	41	1.94
Programmer	21	0.99
DBA/Database Engineer	19	0.90
Operations Research Practitioner	16	0.76
Data Miner	10	0.47

‘Learners’ vs. Employed Data Scientists

‘Learners’ - Demographics

About 73% are college students

StudentStatus	total	percent
Yes	113	73.38
No	41	26.62

55% are “focused on learning mostly data science skills” regardless of academic status.

StudentStatus	LearningDataScience	total	percent
Yes	Yes, I’m focused on learning mostly data science skills	63	55.75
Yes	Yes, but data science is a small part of what I’m focused on learning	50	44.25
No	Yes, I’m focused on learning mostly data science skills	23	56.10
No	Yes, but data science is a small part of what I’m focused on learning	18	43.90

‘Learners’ vs. Employed Data Scientists

Learning Platform Usefulness - ‘Learners’

‘Learners’ - Top 3 ways to learn data science:

Courses
Projects
College

‘Learners’ vs. Employed Data Scientists

Learning Platform Usefulness - Employed Data Scientists

Projects and Courses belong in the top 3, but College is in 5th place

Much greater importance on Friends…

46.5% say Friends are “Very useful”
- 97.5% say Friends are useful

‘Learners’ vs. Employed Data Scientists

Job Skills Importance to ‘Learners’

63.6% say Python is “Necessary”
39% say Data Visualization is “Necessary”
35% say R is “Necessary”

‘Learners’ vs. Employed Data Scientists

Work Tools Frequency

75.4% use Python either “Often” or “Most of the time”
63.6% use R either “Often” or “Most of the time”

‘Learners’ vs. Employed Data Scientists

Work Methods Frequency

Data Visualization is at the top of the list

0% use it “Rarely”
8.5% use it “Sometimes”

Only 38.8% of ‘Learners’ said Visualizations were a “Necessary” job skill to learn!

R vs. Python

The most frequently used of the programming languages are R and Python. But do those that use R recommend R or Python? And do those that use Python recommend R or Python? In other words, do those survey takers feel that others should first and foremost study the languages they themselves have taken up, or perhaps with their insight, know to suggest the language of the two they themselves did not learn?

R vs. Python

Thus the following questions were explored:

What is the distribution of following programming languages Kaggle survey takers used in the past year:

R Only
Python only
Both Python and R
Neither Python nor R

What is the distribution of programming language recommendations by following programming languages Kaggle survey takers used in the past year:

Using R Only
Using Python only
Using Both Python and R
Using Neither Python nor R

R vs. Python

Variables and their definition

There are 2 variables used in this section of the analysis :

LanguageRecommendationSelect=(What programming language would you recommend a new data scientist learn first? (Select one option) - Selected Choice)
WorkToolsSelect= For work, which data science/analytics tools, technologies, and languages have you used in the past year? (Select all that apply) - Selected Choice

R vs. Python

Manipulating data

The major task in this part of the analysis was to create a tidy data structure This can be accomplished using Select function calls and the required variables for the analysis. Because the respondents were provided the option of choosing anything that applied to them, the data for the languages were captured as strings as opposed to having one language as a column for each respondent.

R vs. Python

Exploratory Data Analysis (EDA)

## [1] 16716   229

## [1] 7955  229

Let’s examine the above graph of LanguageRecommendationSelect

## [1] 7955    5

R vs. Python

Results of Exploring R vs. Python:

We found that a little below the half of the survey takers (N=3540, 44.5%) reported to use both R and Python. The take home message for aspiring data scientists is that a substantial majority of the Kaggle survey takers are using both languages–both languages are used widely. Among the remaining half of the respondents, a small portion of them (N=714, 8.98%) are using neither Python nor R. The rest of the survey takers are using either R or Python. In particular, 2533 (31.84%) indicated using only Python while only 1168 (14.68%) of them reported using R Only.

The story of this contentious debate on R vs Python gets more interesting when comparing their used languages with their recommended languages. Specifically, it is plausible to assume that Python users will recommend Python while R users will recommend R. We explore this hypothesis by comparing the difference of R users recommending R and Python and the difference of Python users recommending R and Python.

Our results revealed that 72.17 % of the Python users recommended Python while 53.77% of R users recommended R. This result is not surprising–there are more Python only users than R only users in this sample, it makes sense to have differences in their recommendations since a different proportion of each know only the one language. However, what is surprising is the degree of difference in their recommendation for the other language: 15.92 % of the R users recommend Python whereas only 1.42 % of the Python users recommend R.

However, these results should be interpreted carefully because there are survey takers who did not make any recommendation. For instance 18.87% of the sample who are Python users did not respond to this question. Similarly, 17.55% of R users did not leave any opinion on their recommended languages. This is a sizable portion of the sample and if these users were to make recommendations, it’s possible that more Python users would be recommending R.

Since half the sample included respondeds who are both R and Python users, their recommendation is particularly valuable since they have experience with both languages. Of this subset, 51.72% of them recommend Python while 25.65% of them recommend R. A quarter of the users that use both reccomend R over Python.

Salary Comparison for Python vs. R

Finally, true to the word “value,” considerations have to be made regarding pay. The compensation received by survey takers for their work in either R or Python needs quantification to discover which language earns a data scientist more overall and in general.

Salary Comparison for Python vs. R

Contributing Variables

Three variables were used:

WorkToolsSelect: a “select all that apply” variable with a list of various data science tools, technologies, and languages
CompensationAmount: a numerical value to indicate their annual pay
CompensationCurrency: a character string that indicated currency of annual pay

There was also the variable “id” that was created for the purpose of this report, acting as a way to identify each individual survey taker, and the variable “work_tools” which was a derivative of WorkToolsSelect, breaking the lists down into their individual components.

Salary Comparison for Python vs. R

Exploration and Review of Compensations

	Minimum	1st Quartile	Median	Mean	3rd Quartile	Maximum	Standard Deviation
Python	$0.00	$53,000.00	$100,000.00	$112,826.14	$145,000.00	$2,000,000.00	$122,425.21
R	$0.00	$58,000.00	$87,000.00	$98,177.64	$130,000.00	$550,000.00	$67,487.91

Conclusion

On the contentious debate on which Machine Learning/Data Science methods Data Scientists are most excited about learning in the next year as the most valued Data Science Skills

Deep Learning is the top most Machine Learning/Data Science method in all categories of formal education followed by Neural Nets except High school graduates, all others wants to learn
Time Series Analysis as the third Machine Learning/Data Science method. High school graduates want to learn Genetic & Evolutionary Algorithms as their third choice.
Among doctoral survey takers, Bayesian Methods is the third preference.

On Data Science Methods Used on the Job

Data Visualization is a remarkably popular data science method. It is the most endorsed by nearly all education attainment levels.
Cross validation, random forests, logistic regression, and decision trees are also heavily endorsed.
These are not just short but required or essential tasks–not only do so many of those writing code use data visualization, but they also use it quite frequently
The data suggest that data science methods do not differ much between formal educational attainment groups.

‘Learners’ vs. Employed Data Scientists

Both Learners and employed Data Scientists agree that Courses, Projects and College are in the top three ways to learn Data Science
‘Learners’ place much less importance on Friends for learning than employed Data Scientists do
‘Learners’ place a higher importance on Python vs. R as compared with employed Data Scientists
Data Visualization is the top used skill for working Data Scientists even though learners put relatively little importance on it as a Job Skill

On Data Science Activities

Gathering Data is the main activity where data scientists spend most of their time followed by model building.
Personal projects and Online Courses appear to be very useful learning platforms.

On the contentious debate on R vs Python as the most valued Data Science Skills

Half of the sample uses both R and Python
R only to Python only users are in 1:2 ratio
More R users recommended Python than the Python users recommended R
Both users recommendations in Python is more than their recommendation in R
R users are more likely to have a higher base salary, but Python users have the greater potential for wage growth

Project 3

Research Question

Importing data

A Day in a Data Scientist’s Life

A Day in a Data Scientist’s Life

Variables and their definition

A Day in a Data Scientist’s Life

A Day in a Data Scientist’s Life

Manipulating data

A Day in a Data Scientist’s Life

Exploratory Data Analysis (EDA)

What do Data Scientists Want to Learn?

What do Data Scientists Want to Learn?

Variables and their definition

What do Data Scientists Want to Learn?

Exploratory Data Analysis (EDA)

What do Data Scientists Want to Learn?

Data Science Methods

Data Science Methods

Variables and their definition

Data Science Methods

Manipulating data

Data Science Methods

Exploratory Data Analysis (EDA)

Data Science Methods

Data Science Methods

Data Science Methods

Data Science Methods

Data Science Methods

‘Learners’ vs. Employed Data Scientists

Is there a difference between what ‘Learners’ think are the important skills to learn…

…and what employed Data Scientists say are the skills and tools they use?

‘Learners’ vs. Employed Data Scientists

Variables and Manipulating Data - Likert Scales

‘Learners’ vs. Employed Data Scientists

Employed Data Scientists - Demographics

‘Learners’ vs. Employed Data Scientists

‘Learners’ - Demographics

‘Learners’ vs. Employed Data Scientists

Learning Platform Usefulness - ‘Learners’

‘Learners’ vs. Employed Data Scientists

Learning Platform Usefulness - Employed Data Scientists

‘Learners’ vs. Employed Data Scientists

Job Skills Importance to ‘Learners’

‘Learners’ vs. Employed Data Scientists

Work Tools Frequency

‘Learners’ vs. Employed Data Scientists

Work Methods Frequency

R vs. Python

R vs. Python

R vs. Python

Variables and their definition

R vs. Python

Manipulating data

R vs. Python

Exploratory Data Analysis (EDA)

R vs. Python

Results of Exploring R vs. Python:

Salary Comparison for Python vs. R

Salary Comparison for Python vs. R

Contributing Variables

Salary Comparison for Python vs. R

Exploration and Review of Compensations

Conclusion