For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing. For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation. For the NoSQL database, you may use MongoDB (which we introduced in week 7), Neo4j, or another NoSQL database of your choosing. Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.
I chose to select the data from project 3 which is located in my local database in MySQL and I will migrate the whole data in MongoDB.
# Loading the required libraries
library(DBI)
library(RMySQL)
library(knitr)
library(mongolite)
library(tidyverse)
## -- Attaching packages ----------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0 v purrr 0.3.2
## v tibble 2.1.1 v dplyr 0.8.0.1
## v tidyr 0.8.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts -------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
database <- dbDriver("MySQL")
connection <- dbConnect(database, user="root", password="password", host="127.0.0.1", port=3306, dbname="data")
dbListTables(connection)
## [1] "final"
# Creating a vector to save the query
datascience_skills <- "select * from final" #
# Getting the tables from the database
datascience_table <-dbGetQuery(connection, datascience_skills)
head(datascience_table)
## jobTitle company
## 1 Data Scientist - Entry Level Numerdox
## 2 Data Scientist, Engineering Google
## 3 data scientist, Insights & Analytics - Seattle, WA Starbucks
## 4 Entry Level Data Scientist IBM
## 5 Entry Level - Associate Data Scientist IBM
## 6 Data Scientist IBM
## location
## 1 Sacramento, CA
## 2 San Francisco, CA
## 3 Seattle, WA
## 4 <NA>
## 5 <NA>
## 6 <NA>
## link
## 1 https://www.indeed.com/rc/clk?jk=21091c4562d65586&fccid=dd616958bd9ddc12&vjs=3
## 2 https://www.indeed.com/rc/clk?jk=ae416685deba77b5&fccid=a5b4499d9e91a5c6&vjs=3
## 3 https://www.indeed.com/rc/clk?jk=e7460ad46e4dac06&fccid=a88e611ddef97571&vjs=3
## 4 https://www.indeed.com/rc/clk?jk=267e7cf7de2959c7&fccid=de71a49b535e21cb&vjs=3
## 5 https://www.indeed.com/rc/clk?jk=1e686166d30fd118&fccid=de71a49b535e21cb&vjs=3
## 6 https://www.indeed.com/rc/clk?jk=dce651a13a33e641&fccid=de71a49b535e21cb&vjs=3
## jobDescription
## 1 As a Data Scientist you will be working on consulting side of our business. You will be responsible for analyzing large, complex datasets and identify meaningful patterns that lead to actionable recommendations. You will be performing thorough testing and validation of models, and support various aspects of the business with data analytics.Ability to do statistical modeling, build predictive models and leverage machine learning algorithms.This position will combine the typical Data Scientist math and analytical skills, with research, advanced business, communication, and presentation skills.Primary job location is in Sacramento, but work-from-home option is available.QualificationsBachelors, MS or PhD in a relevant field (Computer Science, Engineering, Statistics, Physics, Applied Math)Experience in R and/or Python is preferred
## 2 Note: By applying to this position your application is automatically submitted to the following locations: Mountain View, CA, USA; San Bruno, CA, USA; Seattle, WA, USA; San Francisco, CA, USAMinimum qualifications:Master's degree in a quantitative discipline (e.g., Statistics, Operations Research, Bioinformatics, Economics, Computational Biology, Computer Science, Mathematics, Physics, Electrical Engineering, Industrial Engineering) or equivalent practical experience.2 years of work experience in data analysis related field.Experience with statistical software (e.g., R, Python, MATLAB, pandas) and database languages (e.g., SQL)Preferred qualifications:PhD degree in a quantitative discipline.4 years of relevant work experience, including expertise with statistical data analysis such as linear models, multivariate analysis, stochastic models, sampling methods.Applied experience with machine learning on large datasets.Experience articulating and translating business questions and using statistical techniques to arrive at an answer using available data.Demonstrated leadership and self-direction. Willingness to both teach others and learn new techniques.Demonstrated skills in selecting the right statistical tools given a data analysis problem. Effective written and verbal communication skills.About the jobAs a Data Scientist, you will evaluate and improve Google's products. You will collaborate with a multi-disciplinary team of engineers and analysts on a wide range of problems. This position will bring scientific rigor and statistical methods to the challenges of product creation, development and improvement with an appreciation for the behaviors of the end user.Google is and always will be an engineering company. We hire people with a broad set of technical skills who are ready to take on some of technology's greatest challenges and make an impact on millions, if not billions, of users. At Google, data scientists not only revolutionize search, they routinely work on massive scalability and storage solutions, large-scale applications and entirely new platforms for developers around the world. From Google Ads to Chrome, Android to YouTube, Social to Local, Google engineers are changing the world one technological achievement after another.ResponsibilitiesWork with large, complex data sets. Solve difficult, non-routine analysis problems, applying advanced analytical methods as needed. Conduct analysis that includes data gathering and requirements specification, processing, analysis, ongoing deliverables, and presentations.Build and prototype analysis pipelines iteratively to provide insights at scale. Develop comprehensive knowledge of Google data structures and metrics, advocating for changes where needed for product development.Interact cross-functionally, making business recommendations (e.g., cost-benefit, forecasting, experiment analysis) with effective presentations of findings at multiple levels of stakeholders through visual displays of quantitative information.Research and develop analysis, forecasting, and optimization methods to improve the quality of Google's user facing products.At Google, we don’t just accept difference—we celebrate it, we support it, and we thrive on it for the benefit of our employees, our products and our community. Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google's EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know by completing this form.
## 3 Now Brewing – data scientist! #tobeapartnerFrom the beginning, Starbucks set out to be a different kind of company. One that not onlycelebrated coffee and the rich tradition, but that also brought a feeling of connection. We are known for developing extraordinary leaders who share this passion and are guided by their service to others.Are you a data scientist who thinks like a business owner? Are you passionate about solving business problems by designing optimal experiments and deploying solutions at scale using 1st, 2nd, and 3rd party customer data in a next generation analytics infrastructure to help Starbucks optimize digital engagements with our 75M customers?If you answered “yes” to the questions above, please come and join our Data Science team. You will help optimize app-level product recommendations, digital menu board content, voice-order recommendations, drive-thru personalized menu board content, 1-1 and in-store marketing via personalized content on order labels and POS systems.As a data scientist, your responsibilities and job functions will include…Data Preparation:Under direction of more senior data scientists, execute extraction and synthesis of data from Azure: data lake storage (ADLS), blob storage, SQL DW, SQL Server; and legacy systems: Oracle and its companion data lake storageUnder direction of more senior data scientists, process 1st 2nd, and 3rd party customer data in next generation privacy compliant infrastructureBusiness Understanding & Provide Optimal Solutions:Has minimal understanding of Starbucks business, and business acument in general. Requires mentoring and guidanceMachine Learning and Data Product Dev and Deployment:Under direction of more senior data scientists, contribute to AI and Machine Learning models in batch, real-timeDevelop data pipelines and scalable Restful APIs to create and enable analytical applicationsStatistics and Model Development and Deployment:Leverage the latest cloud technologies, existing and immerging statistical and machine learning techniques to identify data patterns and trends to solve business problemsWith support from more senior data and decision scientists, build and deploy customer segments to facilitate optimal marketing targeting within channelsVia a "feature factory" approach, build large numbers of weak learners in a Customer 360 frameworkInsights Operationalization:Under the guidance of more senior data scientists, create clear and concise packaging and presentation of data products and insights to business stakeholders, leaders and the broader analytics communityData Science Evangelism:With support from more senior data scientists, establish and foster close collaboration between data and decision scientists, engineers, business and leadership teams to align on technical roadmaps for innovationEstablish brand and team as subject matter experts and trusted advisors for Analytics across departmentsProject & Work Management:Actively participate in an Agile team structure designed to create a bias for action (fail fast/often)Utilize Wiki and GitHub to share standards and codePresent code to team for review compared to Best PracticeActively participate in Agile-related meetings: stand-ups, sprint planning, retrospectives, showcasesParticipate in Microsoft Azure and other trainingsWe’d love to hear from people with:Education: Min BS/BA with concentration in quantitative discipline - Stats, Math, Comp Sci, Engineering, Econ or similar discipline1+ years’ professional data science experienceDemonstrated experience with one programming language such as Scala, Java, C++, C#Demonstrated experience with scripting languages such as Unix Shell (ksh, csh, bash, sh), PowerShell, ARMDemonstrated experience with cloud tech for data and analytics solutions (Azure, AWS)Demonstrated experience building and deploying AI / Machine Learning solutions, at scaleDemonstrated familiarity with Web application security, SSL OAuthDemonstrated self-sufficiency with R or Python (or equivalent)Retail, customer loyalty, or eCommerce experience preferredJoin us and be part of something bigger. Apply today!Starbucks and its brands are an equal opportunity employer of all qualified individuals, including minorities, women, veterans and individuals with disabilities. Starbucks will consider for employment qualified applicants with criminal histories in a manner consistent with all federal, state, and local ordinances.
## 4 IntroductionAt IBM we have an amazing opportunity to transform the world with cognitive technology. By using the vast amounts of information available today to identify new patterns and make new discoveries, we are helping cities become smarter, hospitals transform patient care, financial institutions minimize risk, and pharmaceuticals find cures for rare diseases.Data scientists work with enterprise leaders and key decision makers to solve business problems by preparing, analyzing, and understanding data to deliver insight, predict emerging trends, and provide recommendations to optimize results. Data scientists use a variety of data (structured, unstructured, IoT streaming), analytics, AI tools, and programming languages often using a cloud infrastructure to handle the volume and veracity of data streams.Armed with data, modeling expertise, and analytic results, the data scientist communicates conclusions and recommendations to stakeholders in an organization's leadership structure. Business acumen is an important skill for data scientists to effectively communicate their findings to business leaders, data scientists need strong consulting, communication, visualization, and storytelling skills.Your Role and ResponsibilitiesSTART DATES FOR THIS POSITION ARE IN 2020Data Scientists are in demand across IBM's growth areas. If hired, you will be matched to a team based on business demand, location and fit. Join the forward-thinking teams at IBM solving some of the world’s most complex problems – there is no better place to grow your career!What You’ll Do as an Entry-Level Data Scientist:You will implement and validate predictive models as well as create and maintain statistical models with a focus on big data.You will be exposed to and incorporate a variety of statistical and machine learning techniques such as logistic regression, experimental design, generalized linear models, mixed modeling, CHAID/decision trees, neural networks and ensemble models.You’ll communicate with internal and external clients to understand business needs and provide analytical solutions.You will use leading edge tools such as COGNOS, Watson Studio and Watson Machine Learning.You’ll work in an Agile, collaborative environment, partnering with other scientists, engineers, and database administrators of all backgrounds and disciplines to bring analytical rigor and statistical methods to the challenges of predicting behaviors.Who You Are:You are great at solving problems, debugging, troubleshooting, and designing & implementing solutions to complex technical issues.You thrive on teamwork and have excellent verbal and written communication skills.You have strong technical and analytical abilities, a knack for driving impact and growth, and some experience with programming/scripting in a language such as Java or Python.You have a basic understanding of statistical programming in a language such as R, SAS, or Python.You have an interest in, understanding of, or experience with Design Thinking and Agile Development MethodologiesRequired Professional and Technical ExpertiseBasic understanding of statistical programming in a language such as R, SAS, or Python.Experience with programming/scripting in a language such as Java or Python.Knowledge of statistical concepts such as regression, time series, mixed model, Bayesian, clustering, etc., to analyze data and provide insights.Preferred Professional and Technical ExpertiseAdvanced knowledge of statistical concepts such as regression, time series, mixed model, Bayesian methods, clustering, etc., to analyze data and provide insights.About Business UnitNo matter where you work in IBM, you are making an impact. As an Early Professional with IBM, you will be taking on a key role with one of our industry-leading business units to work on the technology that is solving our most challenging problems and changing the way the world thinks.Your Life @ IBMWhat matters to you when you’re looking for your next career challenge?Maybe you want to get involved in work that really changes the world? What about somewhere with incredible and diverse career and development opportunities – where you can truly discover your passion? Are you looking for a culture of openness, collaboration and trust – where everyone has a voice? What about all of these? If so, then IBM could be your next career challenge. Join us, not to do something better, but to attempt things you never thought possible.Impact. Inclusion. Infinite Experiences. Do your best work ever.About IBMIBM’s greatest invention is the IBMer. We believe that progress is made through progressive thinking, progressive leadership, progressive policy and progressive action. IBMers believe that the application of intelligence, reason and science can improve business, society and the human condition. Restlessly reinventing since 1911, we are the largest technology and consulting employer in the world, with more than 380,000 IBMers serving clients in 170 countries.Location StatementWe consider qualified applicants with criminal histories, consistent with applicable law.Being You @ IBMIBM is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
## 5 IntroductionAs a Data Scientist at IBM, you will help transform our clients’ data into tangible business value by analyzing information, communicating outcomes and collaborating on product development. Work with Best in Class open source and visual tools, along with the most flexible and scalable deployment options. Whether it’s investigating patient trends or weather patterns, you will work to solve real world problems for the industries transforming how we live.Your Role and ResponsibilitiesImplement and validate predictive and prescriptive models, create and maintain statistical models with a focus on big data.Incorporate a variety of statistical and machine learning techniques in your projects.Write programs to cleanse and integrate data in an efficient and reusable manner.Use leading edge and open-source tools such as Python, R, and TensorFlow, combined with IBM tools and our AI application suites.Work in an Agile, collaborative environment, partnering with other scientists, engineers, consultants and database administrators of all backgrounds and disciplines to bring analytical rigor and statistical methods to the challenges of predicting behaviors.Communicate with internal and external clients to understand and define business needs and appropriate modelling techniques to provide analytical solutions.Evaluate modelling results and communicate the results to technical and non-technical audiences.Required Professional and Technical ExpertiseAbility to look at things differently, debug, troubleshoot, design and implement solutions to complex technical issues.Strong technical and analytical abilities, a knack for driving impact and growth, and experience with a programming/scripting in a language such as Java or Python.Basic understanding of statistical programming in a language such as R, Python, or SAS, SPSS, MATLAB.Basic understanding of Cloud (AWS, Azure, etc.)Excellent verbal and written communication skills.Work or internship experience using data science tools in a corporate environment.Interest in, understanding of, or experience with Design Thinking and Agile Development MethodologiesWillingness to travel up to 100% of the time.Majors:Computer Science, Data Science, Applied Mathematics, Statistics, Cognitive Science, Artificial Intelligence, Business Intelligence, Operations Research, EngineeringLocations:Cambridge, MAChicago, ILDallas, TXHouston, TXNew York, NYSan Francisco, CAWashington DCPreferred Professional and Technical ExpertiseNAAbout Business UnitIBM Services is a team of business, strategy and technology consultants that design, build, and run foundational systems and services that is the backbone of the world's economy. IBM Services partners with the world's leading companies in over 170 countries to build smarter businesses by reimagining and reinventing through technology, with its outcome-focused methodologies, industry-leading portfolio and world class research and operations expertise leading to results-driven innovation and enduring excellence.Your Life @ IBMWhat matters to you when you’re looking for your next career challenge?Maybe you want to get involved in work that really changes the world? What about somewhere with incredible and diverse career and development opportunities – where you can truly discover your passion? Are you looking for a culture of openness, collaboration and trust – where everyone has a voice? What about all of these? If so, then IBM could be your next career challenge. Join us, not to do something better, but to attempt things you never thought possible.Impact. Inclusion. Infinite Experiences. Do your best work ever.About IBMIBM’s greatest invention is the IBMer. We believe that progress is made through progressive thinking, progressive leadership, progressive policy and progressive action. IBMers believe that the application of intelligence, reason and science can improve business, society and the human condition. Restlessly reinventing since 1911, we are the largest technology and consulting employer in the world, with more than 380,000 IBMers serving clients in 170 countries.Location StatementWe consider qualified applicants with criminal histories, consistent with applicable law.Being You @ IBMIBM is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
## 6 IntroductionClient Technical Specialists (CTP) are the technical experts and advisors to clients, IBM sales teams and/or IBM Business Partners. As a CTP you understand the client's business requirements, technical requirements and/or competitive landscape. You apply your business insights, build and maintain client relationships, incorporate hardware, software and services into client-valued solutions and ensure client readiness for the implementation of technical solutions. This is an opportunity to shape the future for both IBM and its clients. Start your journey now!Your Role and ResponsibilitiesYour Role & Responsibilities:As part of an entrepreneurial team, you will be the Subject Matter Expert on Data and AI Statistical models and how they apply to business problems. Led by a solution architect, you will advise on and help implement models in Machine Learning, Optimization, Neural Networks, and Artificial Intelligence such as Natural Language, Transfer Learning, Deep Learning and other quantitative approaches. You will also leverage deep skills and best practices to provide expertise and leadership to help design IBM Data Science and AI solutions that will help our clients drive technology benefits and business outcomes across industries. You will work with cutting edge technologies such as Watson, as well as Open Source approaches such as Python and Jupyter notebooks as well as with a passionate team of people who are driving the innovation and digital transformation to cross-industry enterprise clients with the adoption of IBM Data Science and AI. An ideal candidate will be familiar with Design Thinking, Statistics, building Supervised and Unsupervised machine learning models, and data cleansing techniques using various utilities and programming techniques.Key Responsibilities:Run and statistically evaluate statistical models such as Machine Learning, Optimization, Neural Networks and Artificial IntelligencePartner with Scrum Masters, Product Owners, Solution architects and peer data scientist and development data engineers to create solutions to meet business and technical opportunitiesUnderstand and communicate technical advantages and tradeoffs between Data ModelsExplore and develop new technical skills and industry practices while absorbing professional knowledge quickly and using demonstrated interpersonal skills to be an effective ambassador for IBM Data Science and AIUse exceptional communication skills and with input from product management, development, and architecture thought leaders, work to deliver high quality end-to-end Solutions at Scale in response to the identified business requirements from our clients; ensure the results are statistically validcldpakatRequired Technical and Professional ExpertiseTechnical degree in Computer Science or another field relevant to Data Science1+ years of experience working with Machine Learning, Optimization, Neural Networks and/or Artificial Intelligence1+ years of experience with a Data Science programming language such as Python or RDeep understanding of StatisticsFluent in EnglishAbility to Travel 75% and conduct Client Facing/Technical SolutionsPreferred Technical and Professional ExpertiseExperience with Jupyter NotebooksDeep understanding of Statistical Machine Learning Models with Python or RSupporting Relevant business domain knowledge such as Finance or Health CareAbout Business UnitDigitization is accelerating the ongoing evolution of business, and clouds - public, private, and hybrid - enable companies to extend their existing infrastructure and integrate across systems. IBM Cloud provides the security, control, and visibility that our clients have come to expect. We are working to provide the right tools and environment to combine all of our client’s data, no matter where it resides, to respond to changing market dynamics.Your Life @ IBMWhat matters to you when you’re looking for your next career challenge?Maybe you want to get involved in work that really changes the world? What about somewhere with incredible and diverse career and development opportunities – where you can truly discover your passion? Are you looking for a culture of openness, collaboration and trust – where everyone has a voice? What about all of these? If so, then IBM could be your next career challenge. Join us, not to do something better, but to attempt things you never thought possible.Impact. Inclusion. Infinite Experiences. Do your best work ever.About IBMIBM’s greatest invention is the IBMer. We believe that progress is made through progressive thinking, progressive leadership, progressive policy and progressive action. IBMers believe that the application of intelligence, reason and science can improve business, society and the human condition. Restlessly reinventing since 1911, we are the largest technology and consulting employer in the world, with more than 380,000 IBMers serving clients in 170 countries.Location StatementIBM will not be providing visa sponsorship for this position now or in the future. Therefore, in order to be considered for this position, you must have the ability to work without a need for current or future visa sponsorship.We consider qualified applicants with criminal histories, consistent with applicable law.Being You @ IBMIBM is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
data_mongo <- mongo("datascience_table")
data_mongo
## <Mongo collection> 'datascience_table'
## $aggregate(pipeline = "{}", options = "{\"allowDiskUse\":true}", handler = NULL, pagesize = 1000, iterate = FALSE)
## $count(query = "{}")
## $disconnect(gc = TRUE)
## $distinct(key, query = "{}")
## $drop()
## $export(con = stdout(), bson = FALSE, query = "{}", fields = "{}", sort = "{\"_id\":1}")
## $find(query = "{}", fields = "{\"_id\":0}", sort = "{}", skip = 0, limit = 0, handler = NULL, pagesize = 1000)
## $import(con, bson = FALSE)
## $index(add = NULL, remove = NULL)
## $info()
## $insert(data, pagesize = 1000, stop_on_error = TRUE, ...)
## $iterate(query = "{}", fields = "{\"_id\":0}", sort = "{}", skip = 0, limit = 0)
## $mapreduce(map, reduce, query = "{}", sort = "{}", limit = 0, out = NULL, scope = NULL)
## $remove(query, just_one = FALSE)
## $rename(name, db = NULL)
## $replace(query, update = "{}", upsert = FALSE)
## $run(command = "{\"ping\": 1}", simplify = TRUE)
## $update(query, update = "{\"$set\":{}}", filters = NULL, upsert = FALSE, multiple = FALSE)
data_mongo$insert(datascience_table)
## List of 5
## $ nInserted : num 510
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
datascience_df <- data_mongo$find('{}')
datascience_df %>%
head ()
## jobTitle company
## 1 Data Scientist - Entry Level Numerdox
## 2 Data Scientist, Engineering Google
## 3 data scientist, Insights & Analytics - Seattle, WA Starbucks
## 4 Entry Level Data Scientist IBM
## 5 Entry Level - Associate Data Scientist IBM
## 6 Data Scientist IBM
## location
## 1 Sacramento, CA
## 2 San Francisco, CA
## 3 Seattle, WA
## 4 <NA>
## 5 <NA>
## 6 <NA>
## link
## 1 https://www.indeed.com/rc/clk?jk=21091c4562d65586&fccid=dd616958bd9ddc12&vjs=3
## 2 https://www.indeed.com/rc/clk?jk=ae416685deba77b5&fccid=a5b4499d9e91a5c6&vjs=3
## 3 https://www.indeed.com/rc/clk?jk=e7460ad46e4dac06&fccid=a88e611ddef97571&vjs=3
## 4 https://www.indeed.com/rc/clk?jk=267e7cf7de2959c7&fccid=de71a49b535e21cb&vjs=3
## 5 https://www.indeed.com/rc/clk?jk=1e686166d30fd118&fccid=de71a49b535e21cb&vjs=3
## 6 https://www.indeed.com/rc/clk?jk=dce651a13a33e641&fccid=de71a49b535e21cb&vjs=3
## jobDescription
## 1 As a Data Scientist you will be working on consulting side of our business. You will be responsible for analyzing large, complex datasets and identify meaningful patterns that lead to actionable recommendations. You will be performing thorough testing and validation of models, and support various aspects of the business with data analytics.Ability to do statistical modeling, build predictive models and leverage machine learning algorithms.This position will combine the typical Data Scientist math and analytical skills, with research, advanced business, communication, and presentation skills.Primary job location is in Sacramento, but work-from-home option is available.QualificationsBachelors, MS or PhD in a relevant field (Computer Science, Engineering, Statistics, Physics, Applied Math)Experience in R and/or Python is preferred
## 2 Note: By applying to this position your application is automatically submitted to the following locations: Mountain View, CA, USA; San Bruno, CA, USA; Seattle, WA, USA; San Francisco, CA, USAMinimum qualifications:Master's degree in a quantitative discipline (e.g., Statistics, Operations Research, Bioinformatics, Economics, Computational Biology, Computer Science, Mathematics, Physics, Electrical Engineering, Industrial Engineering) or equivalent practical experience.2 years of work experience in data analysis related field.Experience with statistical software (e.g., R, Python, MATLAB, pandas) and database languages (e.g., SQL)Preferred qualifications:PhD degree in a quantitative discipline.4 years of relevant work experience, including expertise with statistical data analysis such as linear models, multivariate analysis, stochastic models, sampling methods.Applied experience with machine learning on large datasets.Experience articulating and translating business questions and using statistical techniques to arrive at an answer using available data.Demonstrated leadership and self-direction. Willingness to both teach others and learn new techniques.Demonstrated skills in selecting the right statistical tools given a data analysis problem. Effective written and verbal communication skills.About the jobAs a Data Scientist, you will evaluate and improve Google's products. You will collaborate with a multi-disciplinary team of engineers and analysts on a wide range of problems. This position will bring scientific rigor and statistical methods to the challenges of product creation, development and improvement with an appreciation for the behaviors of the end user.Google is and always will be an engineering company. We hire people with a broad set of technical skills who are ready to take on some of technology's greatest challenges and make an impact on millions, if not billions, of users. At Google, data scientists not only revolutionize search, they routinely work on massive scalability and storage solutions, large-scale applications and entirely new platforms for developers around the world. From Google Ads to Chrome, Android to YouTube, Social to Local, Google engineers are changing the world one technological achievement after another.ResponsibilitiesWork with large, complex data sets. Solve difficult, non-routine analysis problems, applying advanced analytical methods as needed. Conduct analysis that includes data gathering and requirements specification, processing, analysis, ongoing deliverables, and presentations.Build and prototype analysis pipelines iteratively to provide insights at scale. Develop comprehensive knowledge of Google data structures and metrics, advocating for changes where needed for product development.Interact cross-functionally, making business recommendations (e.g., cost-benefit, forecasting, experiment analysis) with effective presentations of findings at multiple levels of stakeholders through visual displays of quantitative information.Research and develop analysis, forecasting, and optimization methods to improve the quality of Google's user facing products.At Google, we don’t just accept difference—we celebrate it, we support it, and we thrive on it for the benefit of our employees, our products and our community. Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google's EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know by completing this form.
## 3 Now Brewing – data scientist! #tobeapartnerFrom the beginning, Starbucks set out to be a different kind of company. One that not onlycelebrated coffee and the rich tradition, but that also brought a feeling of connection. We are known for developing extraordinary leaders who share this passion and are guided by their service to others.Are you a data scientist who thinks like a business owner? Are you passionate about solving business problems by designing optimal experiments and deploying solutions at scale using 1st, 2nd, and 3rd party customer data in a next generation analytics infrastructure to help Starbucks optimize digital engagements with our 75M customers?If you answered “yes” to the questions above, please come and join our Data Science team. You will help optimize app-level product recommendations, digital menu board content, voice-order recommendations, drive-thru personalized menu board content, 1-1 and in-store marketing via personalized content on order labels and POS systems.As a data scientist, your responsibilities and job functions will include…Data Preparation:Under direction of more senior data scientists, execute extraction and synthesis of data from Azure: data lake storage (ADLS), blob storage, SQL DW, SQL Server; and legacy systems: Oracle and its companion data lake storageUnder direction of more senior data scientists, process 1st 2nd, and 3rd party customer data in next generation privacy compliant infrastructureBusiness Understanding & Provide Optimal Solutions:Has minimal understanding of Starbucks business, and business acument in general. Requires mentoring and guidanceMachine Learning and Data Product Dev and Deployment:Under direction of more senior data scientists, contribute to AI and Machine Learning models in batch, real-timeDevelop data pipelines and scalable Restful APIs to create and enable analytical applicationsStatistics and Model Development and Deployment:Leverage the latest cloud technologies, existing and immerging statistical and machine learning techniques to identify data patterns and trends to solve business problemsWith support from more senior data and decision scientists, build and deploy customer segments to facilitate optimal marketing targeting within channelsVia a "feature factory" approach, build large numbers of weak learners in a Customer 360 frameworkInsights Operationalization:Under the guidance of more senior data scientists, create clear and concise packaging and presentation of data products and insights to business stakeholders, leaders and the broader analytics communityData Science Evangelism:With support from more senior data scientists, establish and foster close collaboration between data and decision scientists, engineers, business and leadership teams to align on technical roadmaps for innovationEstablish brand and team as subject matter experts and trusted advisors for Analytics across departmentsProject & Work Management:Actively participate in an Agile team structure designed to create a bias for action (fail fast/often)Utilize Wiki and GitHub to share standards and codePresent code to team for review compared to Best PracticeActively participate in Agile-related meetings: stand-ups, sprint planning, retrospectives, showcasesParticipate in Microsoft Azure and other trainingsWe’d love to hear from people with:Education: Min BS/BA with concentration in quantitative discipline - Stats, Math, Comp Sci, Engineering, Econ or similar discipline1+ years’ professional data science experienceDemonstrated experience with one programming language such as Scala, Java, C++, C#Demonstrated experience with scripting languages such as Unix Shell (ksh, csh, bash, sh), PowerShell, ARMDemonstrated experience with cloud tech for data and analytics solutions (Azure, AWS)Demonstrated experience building and deploying AI / Machine Learning solutions, at scaleDemonstrated familiarity with Web application security, SSL OAuthDemonstrated self-sufficiency with R or Python (or equivalent)Retail, customer loyalty, or eCommerce experience preferredJoin us and be part of something bigger. Apply today!Starbucks and its brands are an equal opportunity employer of all qualified individuals, including minorities, women, veterans and individuals with disabilities. Starbucks will consider for employment qualified applicants with criminal histories in a manner consistent with all federal, state, and local ordinances.
## 4 IntroductionAt IBM we have an amazing opportunity to transform the world with cognitive technology. By using the vast amounts of information available today to identify new patterns and make new discoveries, we are helping cities become smarter, hospitals transform patient care, financial institutions minimize risk, and pharmaceuticals find cures for rare diseases.Data scientists work with enterprise leaders and key decision makers to solve business problems by preparing, analyzing, and understanding data to deliver insight, predict emerging trends, and provide recommendations to optimize results. Data scientists use a variety of data (structured, unstructured, IoT streaming), analytics, AI tools, and programming languages often using a cloud infrastructure to handle the volume and veracity of data streams.Armed with data, modeling expertise, and analytic results, the data scientist communicates conclusions and recommendations to stakeholders in an organization's leadership structure. Business acumen is an important skill for data scientists to effectively communicate their findings to business leaders, data scientists need strong consulting, communication, visualization, and storytelling skills.Your Role and ResponsibilitiesSTART DATES FOR THIS POSITION ARE IN 2020Data Scientists are in demand across IBM's growth areas. If hired, you will be matched to a team based on business demand, location and fit. Join the forward-thinking teams at IBM solving some of the world’s most complex problems – there is no better place to grow your career!What You’ll Do as an Entry-Level Data Scientist:You will implement and validate predictive models as well as create and maintain statistical models with a focus on big data.You will be exposed to and incorporate a variety of statistical and machine learning techniques such as logistic regression, experimental design, generalized linear models, mixed modeling, CHAID/decision trees, neural networks and ensemble models.You’ll communicate with internal and external clients to understand business needs and provide analytical solutions.You will use leading edge tools such as COGNOS, Watson Studio and Watson Machine Learning.You’ll work in an Agile, collaborative environment, partnering with other scientists, engineers, and database administrators of all backgrounds and disciplines to bring analytical rigor and statistical methods to the challenges of predicting behaviors.Who You Are:You are great at solving problems, debugging, troubleshooting, and designing & implementing solutions to complex technical issues.You thrive on teamwork and have excellent verbal and written communication skills.You have strong technical and analytical abilities, a knack for driving impact and growth, and some experience with programming/scripting in a language such as Java or Python.You have a basic understanding of statistical programming in a language such as R, SAS, or Python.You have an interest in, understanding of, or experience with Design Thinking and Agile Development MethodologiesRequired Professional and Technical ExpertiseBasic understanding of statistical programming in a language such as R, SAS, or Python.Experience with programming/scripting in a language such as Java or Python.Knowledge of statistical concepts such as regression, time series, mixed model, Bayesian, clustering, etc., to analyze data and provide insights.Preferred Professional and Technical ExpertiseAdvanced knowledge of statistical concepts such as regression, time series, mixed model, Bayesian methods, clustering, etc., to analyze data and provide insights.About Business UnitNo matter where you work in IBM, you are making an impact. As an Early Professional with IBM, you will be taking on a key role with one of our industry-leading business units to work on the technology that is solving our most challenging problems and changing the way the world thinks.Your Life @ IBMWhat matters to you when you’re looking for your next career challenge?Maybe you want to get involved in work that really changes the world? What about somewhere with incredible and diverse career and development opportunities – where you can truly discover your passion? Are you looking for a culture of openness, collaboration and trust – where everyone has a voice? What about all of these? If so, then IBM could be your next career challenge. Join us, not to do something better, but to attempt things you never thought possible.Impact. Inclusion. Infinite Experiences. Do your best work ever.About IBMIBM’s greatest invention is the IBMer. We believe that progress is made through progressive thinking, progressive leadership, progressive policy and progressive action. IBMers believe that the application of intelligence, reason and science can improve business, society and the human condition. Restlessly reinventing since 1911, we are the largest technology and consulting employer in the world, with more than 380,000 IBMers serving clients in 170 countries.Location StatementWe consider qualified applicants with criminal histories, consistent with applicable law.Being You @ IBMIBM is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
## 5 IntroductionAs a Data Scientist at IBM, you will help transform our clients’ data into tangible business value by analyzing information, communicating outcomes and collaborating on product development. Work with Best in Class open source and visual tools, along with the most flexible and scalable deployment options. Whether it’s investigating patient trends or weather patterns, you will work to solve real world problems for the industries transforming how we live.Your Role and ResponsibilitiesImplement and validate predictive and prescriptive models, create and maintain statistical models with a focus on big data.Incorporate a variety of statistical and machine learning techniques in your projects.Write programs to cleanse and integrate data in an efficient and reusable manner.Use leading edge and open-source tools such as Python, R, and TensorFlow, combined with IBM tools and our AI application suites.Work in an Agile, collaborative environment, partnering with other scientists, engineers, consultants and database administrators of all backgrounds and disciplines to bring analytical rigor and statistical methods to the challenges of predicting behaviors.Communicate with internal and external clients to understand and define business needs and appropriate modelling techniques to provide analytical solutions.Evaluate modelling results and communicate the results to technical and non-technical audiences.Required Professional and Technical ExpertiseAbility to look at things differently, debug, troubleshoot, design and implement solutions to complex technical issues.Strong technical and analytical abilities, a knack for driving impact and growth, and experience with a programming/scripting in a language such as Java or Python.Basic understanding of statistical programming in a language such as R, Python, or SAS, SPSS, MATLAB.Basic understanding of Cloud (AWS, Azure, etc.)Excellent verbal and written communication skills.Work or internship experience using data science tools in a corporate environment.Interest in, understanding of, or experience with Design Thinking and Agile Development MethodologiesWillingness to travel up to 100% of the time.Majors:Computer Science, Data Science, Applied Mathematics, Statistics, Cognitive Science, Artificial Intelligence, Business Intelligence, Operations Research, EngineeringLocations:Cambridge, MAChicago, ILDallas, TXHouston, TXNew York, NYSan Francisco, CAWashington DCPreferred Professional and Technical ExpertiseNAAbout Business UnitIBM Services is a team of business, strategy and technology consultants that design, build, and run foundational systems and services that is the backbone of the world's economy. IBM Services partners with the world's leading companies in over 170 countries to build smarter businesses by reimagining and reinventing through technology, with its outcome-focused methodologies, industry-leading portfolio and world class research and operations expertise leading to results-driven innovation and enduring excellence.Your Life @ IBMWhat matters to you when you’re looking for your next career challenge?Maybe you want to get involved in work that really changes the world? What about somewhere with incredible and diverse career and development opportunities – where you can truly discover your passion? Are you looking for a culture of openness, collaboration and trust – where everyone has a voice? What about all of these? If so, then IBM could be your next career challenge. Join us, not to do something better, but to attempt things you never thought possible.Impact. Inclusion. Infinite Experiences. Do your best work ever.About IBMIBM’s greatest invention is the IBMer. We believe that progress is made through progressive thinking, progressive leadership, progressive policy and progressive action. IBMers believe that the application of intelligence, reason and science can improve business, society and the human condition. Restlessly reinventing since 1911, we are the largest technology and consulting employer in the world, with more than 380,000 IBMers serving clients in 170 countries.Location StatementWe consider qualified applicants with criminal histories, consistent with applicable law.Being You @ IBMIBM is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
## 6 IntroductionClient Technical Specialists (CTP) are the technical experts and advisors to clients, IBM sales teams and/or IBM Business Partners. As a CTP you understand the client's business requirements, technical requirements and/or competitive landscape. You apply your business insights, build and maintain client relationships, incorporate hardware, software and services into client-valued solutions and ensure client readiness for the implementation of technical solutions. This is an opportunity to shape the future for both IBM and its clients. Start your journey now!Your Role and ResponsibilitiesYour Role & Responsibilities:As part of an entrepreneurial team, you will be the Subject Matter Expert on Data and AI Statistical models and how they apply to business problems. Led by a solution architect, you will advise on and help implement models in Machine Learning, Optimization, Neural Networks, and Artificial Intelligence such as Natural Language, Transfer Learning, Deep Learning and other quantitative approaches. You will also leverage deep skills and best practices to provide expertise and leadership to help design IBM Data Science and AI solutions that will help our clients drive technology benefits and business outcomes across industries. You will work with cutting edge technologies such as Watson, as well as Open Source approaches such as Python and Jupyter notebooks as well as with a passionate team of people who are driving the innovation and digital transformation to cross-industry enterprise clients with the adoption of IBM Data Science and AI. An ideal candidate will be familiar with Design Thinking, Statistics, building Supervised and Unsupervised machine learning models, and data cleansing techniques using various utilities and programming techniques.Key Responsibilities:Run and statistically evaluate statistical models such as Machine Learning, Optimization, Neural Networks and Artificial IntelligencePartner with Scrum Masters, Product Owners, Solution architects and peer data scientist and development data engineers to create solutions to meet business and technical opportunitiesUnderstand and communicate technical advantages and tradeoffs between Data ModelsExplore and develop new technical skills and industry practices while absorbing professional knowledge quickly and using demonstrated interpersonal skills to be an effective ambassador for IBM Data Science and AIUse exceptional communication skills and with input from product management, development, and architecture thought leaders, work to deliver high quality end-to-end Solutions at Scale in response to the identified business requirements from our clients; ensure the results are statistically validcldpakatRequired Technical and Professional ExpertiseTechnical degree in Computer Science or another field relevant to Data Science1+ years of experience working with Machine Learning, Optimization, Neural Networks and/or Artificial Intelligence1+ years of experience with a Data Science programming language such as Python or RDeep understanding of StatisticsFluent in EnglishAbility to Travel 75% and conduct Client Facing/Technical SolutionsPreferred Technical and Professional ExpertiseExperience with Jupyter NotebooksDeep understanding of Statistical Machine Learning Models with Python or RSupporting Relevant business domain knowledge such as Finance or Health CareAbout Business UnitDigitization is accelerating the ongoing evolution of business, and clouds - public, private, and hybrid - enable companies to extend their existing infrastructure and integrate across systems. IBM Cloud provides the security, control, and visibility that our clients have come to expect. We are working to provide the right tools and environment to combine all of our client’s data, no matter where it resides, to respond to changing market dynamics.Your Life @ IBMWhat matters to you when you’re looking for your next career challenge?Maybe you want to get involved in work that really changes the world? What about somewhere with incredible and diverse career and development opportunities – where you can truly discover your passion? Are you looking for a culture of openness, collaboration and trust – where everyone has a voice? What about all of these? If so, then IBM could be your next career challenge. Join us, not to do something better, but to attempt things you never thought possible.Impact. Inclusion. Infinite Experiences. Do your best work ever.About IBMIBM’s greatest invention is the IBMer. We believe that progress is made through progressive thinking, progressive leadership, progressive policy and progressive action. IBMers believe that the application of intelligence, reason and science can improve business, society and the human condition. Restlessly reinventing since 1911, we are the largest technology and consulting employer in the world, with more than 380,000 IBMers serving clients in 170 countries.Location StatementIBM will not be providing visa sponsorship for this position now or in the future. Therefore, in order to be considered for this position, you must have the ability to work without a need for current or future visa sponsorship.We consider qualified applicants with criminal histories, consistent with applicable law.Being You @ IBMIBM is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
knitr::include_graphics('mongoDB.png')
knitr::include_graphics('mongoDB2.png')
The above pictures were taken by me while checking if the data were migrated to mongoDB or not through command prompt. First, the collections were checked as shown above in first screen shot. It shows that there are four dbs. “test” was later created while migrating and “datascience_table” was then moved in “test” database. For more clarity, I used “db.datascience_table.find()” function to see if it contains the relevant data. It shows that all the data were actually migrated from MySQL to MongoDB.
dbDisconnect(connection)
## [1] TRUE
For this assignment, I chose to select the datascience skills which I scraped from different websites in Project 3. I installed MongoDB 4.2 community edition and connected the database for both MongoDB and MySQL. I chose the path in environment variables and then opened the database through command prompt as shown in above pictures. For data migration, I connected R with MySQL database and then created db in MongoDB and later I moved all the data over there. I used library “mongolite” to access MongoDB.
Both relational and NoSQL has its own advantages and disadvantages. Relational databases are stored in a very consistent manner and there are some certain columns which has rows(values). It is built over time and easy to use for data analysis as well but it is quite hard to keep big data in relational databases as it’s performance gets affected.
On the other hand, NoSQL can store any type of data but the consistency is lacking there. It’s maintenance is easy as compared with relational databases. In NoSQL, performance is much better as it does not seperate infrastructure like relational database but it supports caching in system memory.