Import data
# csv file
data <- read.csv("01_module4/data/languages.csv")
# The size of the csv file was causing the compilation to be really slow
# The internet suggested I use head() to get the first 6 rows
head(data)
## pldb_id title description type appeared
## 1 java Java <NA> pl 1995
## 2 javascript JavaScript <NA> pl 1995
## 3 c C <NA> pl 1972
## 4 python Python <NA> pl 1991
## 5 sql SQL <NA> queryLanguage 1974
## 6 cpp C++ <NA> pl 1985
## creators website
## 1 James Gosling https://oracle.com/java/
## 2 Brendan Eich <NA>
## 3 Dennis Ritchie <NA>
## 4 Guido van Rossum https://www.python.org/
## 5 Donald D. Chamberlin and Raymond F. Boyce <NA>
## 6 Bjarne Stroustrup http://isocpp.org/
## domain_name domain_name_registered
## 1 <NA> NA
## 2 <NA> NA
## 3 <NA> NA
## 4 python.org 1995
## 5 <NA> NA
## 6 isocpp.org 2012
## reference isbndb book_count
## 1 <NA> 400 401
## 2 https://www.w3schools.com/js/js_reserved.asp 349 351
## 3 http://www.c4learn.com/c-programming/c-keywords/ 78 78
## 4 https://www.programiz.com/python-programming/keyword-list 339 342
## 5 <NA> 177 182
## 6 <NA> 128 128
## semantic_scholar language_rank github_repo github_repo_stars
## 1 37 0 <NA> NA
## 2 48 1 <NA> NA
## 3 19 2 <NA> NA
## 4 52 3 <NA> NA
## 5 37 4 <NA> NA
## 6 6 6 <NA> NA
## github_repo_forks github_repo_updated github_repo_subscribers
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## github_repo_created github_repo_description github_repo_issues
## 1 NA <NA> NA
## 2 NA <NA> NA
## 3 NA <NA> NA
## 4 NA <NA> NA
## 5 NA <NA> NA
## 6 NA <NA> NA
## github_repo_first_commit github_language github_language_tm_scope
## 1 NA Java source.java
## 2 NA JavaScript source.js
## 3 NA C source.c
## 4 NA Python source.python
## 5 NA SQL source.sql
## 6 NA C++ source.c++
## github_language_type github_language_ace_mode
## 1 programming java
## 2 programming javascript
## 3 programming c_cpp
## 4 programming python
## 5 data sql
## 6 programming c_cpp
## github_language_file_extensions
## 1 java jav
## 2 js _js bones cjs es es6 frag gs jake javascript jsb jscad jsfl jslib jsm jspre jss jsx mjs njs pac sjs ssjs xsjs xsjslib
## 3 c cats h idc
## 4 py cgi fcgi gyp gypi lmi py3 pyde pyi pyp pyt pyw rpy smk spec tac wsgi xpy
## 5 sql cql ddl inc mysql prc tab udf viw
## 6 cpp c++ cc cp cxx h h++ hh hpp hxx inc inl ino ipp ixx re tcc tpp
## github_language_repos
## 1 11529980
## 2 16046489
## 3 2160271
## 4 9300725
## 5 1222
## 6 2161625
## wikipedia
## 1 https://en.wikipedia.org/wiki/Java_(programming_language)
## 2 https://en.wikipedia.org/wiki/JavaScript
## 3 https://en.wikipedia.org/wiki/C_(programming_language)
## 4 https://en.wikipedia.org/wiki/Python_(programming_language)
## 5 https://en.wikipedia.org/wiki/SQL
## 6 https://en.wikipedia.org/wiki/C++
## wikipedia_daily_page_views wikipedia_backlinks_count
## 1 5242 11543
## 2 4264 8982
## 3 6268 10585
## 4 7204 6849
## 5 3084 4159
## 6 4307 10943
## wikipedia_summary
## 1 Java is a general-purpose computer programming language that is concurrent, class-based, object-oriented, and specifically designed to have as few implementation dependencies as possible. It is intended to let application developers "write once, run anywhere" (WORA), meaning that compiled Java code can run on all platforms that support Java without the need for recompilation. Java applications are typically compiled to bytecode that can run on any Java virtual machine (JVM) regardless of computer architecture. As of 2016, Java is one of the most popular programming languages in use, particularly for client-server web applications, with a reported 9 million developers. Java was originally developed by James Gosling at Sun Microsystems (which has since been acquired by Oracle Corporation) and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++, but it has fewer low-level facilities than either of them. The original and reference implementation Java compilers, virtual machines, and class libraries were originally released by Sun under proprietary licenses. As of May 2007, in compliance with the specifications of the Java Community Process, Sun relicensed most of its Java technologies under the GNU General Public License. Others have also developed alternative implementations of these Sun technologies, such as the GNU Compiler for Java (bytecode compiler), GNU Classpath (standard libraries), and IcedTea-Web (browser plugin for applets). The latest version is Java 9, released on September 21, 2017, and is one of the two versions currently supported for free by Oracle. Versions earlier than Java 8 are supported by companies on a commercial basis; e.g. by Oracle back to Java 6 as of October 2017 (while they still "highly recommend that you uninstall" pre-Java 8 from at least Windows computers).
## 2 JavaScript (), often abbreviated as JS, is a high-level, dynamic, weakly typed, prototype-based, multi-paradigm, and interpreted programming language. Alongside HTML and CSS, JavaScript is one of the three core technologies of World Wide Web content production. It is used to make webpages interactive and provide online programs, including video games. The majority of websites employ it, and all modern web browsers support it without the need for plug-ins by means of a built-in JavaScript engine. Each of the many JavaScript engines represent a different implementation of JavaScript, all based on the ECMAScript specification, with some engines not supporting the spec fully, and with many engines supporting additional features beyond ECMA. As a multi-paradigm language, JavaScript supports event-driven, functional, and imperative (including object-oriented and prototype-based) programming styles. It has an API for working with text, arrays, dates, regular expressions, and basic manipulation of the DOM, but the language itself does not include any I/O, such as networking, storage, or graphics facilities, relying for these upon the host environment in which it is embedded. Initially only implemented client-side in web browsers, JavaScript engines are now embedded in many other types of host software, including server-side in web servers and databases, and in non-web programs such as word processors and PDF software, and in runtime environments that make JavaScript available for writing mobile and desktop applications, including desktop widgets. Although there are strong outward similarities between JavaScript and Java, including language name, syntax, and respective standard libraries, the two languages are distinct and differ greatly in design; JavaScript was influenced by programming languages such as Self and Scheme.
## 3 C (, as in the letter c) is a general-purpose, imperative computer programming language, supporting structured programming, lexical variable scope and recursion, while a static type system prevents many unintended operations. By design, C provides constructs that map efficiently to typical machine instructions, and therefore it has found lasting use in applications that had formerly been coded in assembly language, including operating systems, as well as various application software for computers ranging from supercomputers to embedded systems. C was originally developed by Dennis Ritchie between 1969 and 1973 at Bell Labs, and used to re-implement the Unix operating system. It has since become one of the most widely used programming languages of all time, with C compilers from various vendors available for the majority of existing computer architectures and operating systems. C has been standardized by the American National Standards Institute (ANSI) since 1989 (see ANSI C) and subsequently by the International Organization for Standardization (ISO). C is an imperative procedural language. It was designed to be compiled using a relatively straightforward compiler, to provide low-level access to memory, to provide language constructs that map efficiently to machine instructions, and to require minimal run-time support. Despite its low-level capabilities, the language was designed to encourage cross-platform programming. A standards-compliant and portably written C program can be compiled for a very wide variety of computer platforms and operating systems with few changes to its source code. The language has become available on a very wide range of platforms, from embedded microcontrollers to supercomputers.
## 4 Python is a widely used high-level programming language for general-purpose programming, created by Guido van Rossum and first released in 1991. An interpreted language, Python has a design philosophy that emphasizes code readability (notably using whitespace indentation to delimit code blocks rather than curly brackets or keywords), and a syntax that allows programmers to express concepts in fewer lines of code than might be used in languages such as C++ or Java. It provides constructs that enable clear programming on both small and large scales. Python features a dynamic type system and automatic memory management. It supports multiple programming paradigms, including object-oriented, imperative, functional and procedural, and has a large and comprehensive standard library. Python interpreters are available for many operating systems. CPython, the reference implementation of Python, is open source software and has a community-based development model, as do nearly all of its variant implementations. CPython is managed by the non-profit Python Software Foundation.
## 5 SQL ( ( listen) ESS-kew-EL or ( listen) SEE-kwəl or SKWEEL, Structured Query Language) is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). In comparison to older read/write APIs like ISAM or VSAM, SQL offers two main advantages: first, it introduced the concept of accessing many records with one single command; and second, it eliminates the need to specify how to reach a record, e.g. with or without an index. Originally based upon relational algebra and tuple relational calculus, SQL consists of a data definition language, data manipulation language, and data control language. The scope of SQL includes data insert, query, update and delete, schema creation and modification, and data access control. Although SQL is often described as, and to a great extent is, a declarative language (4GL), it also includes procedural elements. SQL was one of the first commercial languages for Edgar F. Codd's relational model, as described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data Banks". Despite not entirely adhering to the relational model as described by Codd, it became the most widely used database language. SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standardization (ISO) in 1987. Since then, the standard has been revised to include a larger set of features. Despite the existence of such standards, most SQL code is not completely portable among different database systems without adjustments.
## 6 C++ ( pronounced cee plus plus) is a general-purpose programming language. It has imperative, object-oriented and generic programming features, while also providing facilities for low-level memory manipulation. It was designed with a bias toward system programming and embedded, resource-constrained and large systems, with performance, efficiency and flexibility of use as its design highlights. C++ has also been found useful in many other contexts, with key strengths being software infrastructure and resource-constrained applications, including desktop applications, servers (e.g. e-commerce, web search or SQL servers), and performance-critical applications (e.g. telephone switches or space probes). C++ is a compiled language, with implementations of it available on many platforms. Many vendors provide C++ compilers, including the Free Software Foundation, Microsoft, Intel, and IBM. C++ is standardized by the International Organization for Standardization (ISO), with the latest standard version ratified and published by ISO in December 2014 as ISO/IEC 14882:2014 (informally known as C++14). The C++ programming language was initially standardized in 1998 as ISO/IEC 14882:1998, which was then amended by the C++03, ISO/IEC 14882:2003, standard. The current C++14 standard supersedes these and C++11, with new features and an enlarged standard library. Before the initial standardization in 1998, C++ was developed by Bjarne Stroustrup at Bell Labs since 1979, as an extension of the C language as he wanted an efficient and flexible language similar to C, which also provided high-level features for program organization. The C++17 standard is due in July 2017, with the draft largely implemented by some compilers already, and C++20 is the next planned standard thereafter. Many other programming languages have been influenced by C++, including C#, D, Java, and newer versions of C.
## wikipedia_page_id wikipedia_appeared wikipedia_created
## 1 15881 1995 2001
## 2 9845 1995 2001
## 3 6021 2011 2001
## 4 23862 1991 2001
## 5 29004 1986 2001
## 6 72038 1998 2001
## wikipedia_revision_count
## 1 7818
## 2 6131
## 3 7316
## 4 6342
## 5 4153
## 6 1487
## wikipedia_related
## 1 javascript pizza ada csharp eiffel mesa modula-3 oberon objective-c ucsd-pascal object-pascal beanshell chapel clojure ecmascript fantom gambas groovy hack jsharp kotlin php python scala seed7 vala java-bytecode jvm c oak linux solaris arm eclipse-editor html http mime java-server-pages motif-software android xml java-ee-version-history
## 2 java lua scheme perl self c python awk hypertalk actionscript coffeescript dart livescript objective-j opa perl-6 qml typescript json ecmascript html regex pdf tcl c-- vbscript jscript jquery npm-pm mongodb sql max unity-engine google-apps-script objective-c applescript visual-studio-editor asmjs processing oberon smalltalk scala racket llvmir fantom haxe clojure kotlin squeak wasm
## 3 cyclone unified-parallel-c split-c cilk b bcpl cpl algol-68 assembly-language pl-i ampl awk c-- csharp objective-c d go java javascript julia limbo lpc perl php pike processing python rust seed7 vala verilog unix algol swift multics unicode fortran pascal mathematica matlab ch smalltalk
## 4 jython micropython stackless-python cython abc algol-68 c dylan haskell icon java lisp modula-3 perl boo cobra coffeescript d f-sharp falcon genie go groovy javascript julia nim ruby swift setl unix unicode standard-ml pascal regex csharp common-lisp scheme objective-c numpy mime http sagemath llvmir jvm java-bytecode cil pyrex mercurial python-for-s60 qt django scipy matplotlib gdb freebsd ecmascript ocaml tcl erlang pandas
## 5 sql-92 datalog linq powershell c sql-psm sqlpl transact-sql mysql pl-sql ada postgresql plpgsql java perl python tcl javascript xml xquery dot-ql isbl quel mumps isbn doi
## 6 ada algol-68 c clu ml simula python csharp chapel d java lua perl php rust nim sql bcpl unix assembly-language regex
## features_has_comments features_has_semantic_indentation
## 1 TRUE FALSE
## 2 TRUE FALSE
## 3 TRUE FALSE
## 4 TRUE TRUE
## 5 TRUE FALSE
## 6 NA NA
## features_has_line_comments line_comment_token last_activity number_of_users
## 1 TRUE // 2022 5550123
## 2 TRUE // 2022 5962666
## 3 TRUE // 2022 3793768
## 4 TRUE # 2022 2818037
## 5 TRUE -- 2022 7179119
## 6 NA // 2022 4128238
## number_of_jobs origin_community
## 1 85206 Sun Microsystems
## 2 63993 Netscape
## 3 59919 Bell Labs
## 4 46976 Centrum Wiskunde & Informatica
## 5 219617 IBM
## 6 61098 Bell Labs
## central_package_repository_count file_type is_open_source
## 1 NA text NA
## 2 NA text NA
## 3 0 text NA
## 4 NA text NA
## 5 0 text NA
## 6 0 text NA