class: center, middle, inverse, title-slide .title[ # Advanced quantitative data analysis ] .subtitle[ ## Introduction ] .author[ ### Mengni Chen ] .institute[ ### Department of Sociology, University of Copenhagen ] .date[ ### 07/09/2022 ] --- #Introduction About me - Assistant professor in Copenhagen University (05/2021-now) - Research Scientist (10/2020 – 04/2021) in Cologne University, Germany - Research Officer (10/2018-09/2020) in Catholic University of Louvain, Belgium - Post-doc (03/2017 – 12/2017) in Vienna University of Economics and Business, Vienna Institute of Demography, IIASA, Austria - Phd in University of Hong Kong (2013-2017) - Research interest: family formation and dissolution, fertility, ageing, population health, population policies - My office hour: Monday and Friday 10:30-11:30 am, Room 16.2.65 --- #Introduction About you - Name - Year and department --- #Introduction About what you know [I would like to know what you have known about R and statistics](https://docs.google.com/forms/d/e/1FAIpQLSfMQZ4_HvpoVYgA5y6-hwx5afJFHCPXijJZ4N2-bjqaUVddCQ/viewform?usp=sf_link) --- #About the course - Main content - Intro to R - OLS & logistic regression - Panel data management - Panel data analysis I: fixed effect - Panel data analysis II: difference in difference - Learning approach - Lecture for aprox. 45 min - In-class exercise: work in groups for 45 min - Bring lap-top every class - Fagcafe after class, Friday 13:00-17:00 --- #Assessment - Portfolio of four assignments: select one of the three topics - A1: 3 pages for introduction, theories, and simple regression - Deadline is Oct.14, 23:59 - A2: 2 pages for describing what is fixed effect and applying fixed effect - Deadline is Nov. 25, 23:59 - A3: 2 pages for describing what is random effect and applying randome effect - Deadline is Dec. 2, 23:59 - A4: 2 pages for describing what is DID and applying DID - Deadline is Dec. 16, 23:59 - A final combined product: max 10 pages for all - Individual or group work: 10 + 5 pages per extra person - Final exam: 5. January 2023, uploaded in Digital Exam before 12:00 (noon, Danish Time) --- #Three topics to choose - How does first childbearing impact women’s subjective wellbeing of women? - How does entry into a partnership impact people’s subjective wellbeing? - How does divorce impact people’s subjective wellbeing? --- #Access to the Pairfam data - [What is pairfam](https://www.pairfam.de/en/) - [What you need to know about pairfam: see Tutorial 8-12](https://www.pairfam.de/en/documentation/video-tutorials/) - How to access to pairfam - Fill up the form: go to absalon → this course → files → Data → Form to be filled → send to support@pairfam.de --- #What is R If you're mainly interested in using regression to analyze a specific survey, it is 1. ...a rather tedious stats programming language. 2. ...basically a calculator: ```r 1+1 ``` ``` ## [1] 2 ``` ```r x <- 1 x+1 ``` ``` ## [1] 2 ``` --- #What is R  --- #What is R If you enjoy coding and working with several data sources at once, it is 1. ... a great platform for (social) data science of 2. ... networks, text, geo, pictures... 3. ... an all in one solution.→ I coded these slides in R! 4. ... a huge, dynamic and cooperative community. [why you should use R](https://www.youtube.com/watch?v=9kYUGMg_14s&list=PLbh5g6tUnVSRwojrTf-ZBTXeCs-oYh9DV&index=94&t=3s) --- #What you will learn in R (Social) Data Science  .backgrnote[*Source*: Grolemund and Wickham (2017)] <src="https://d33wubrfki0l68.cloudfront.net/571b056757d68e6df81a3e3853f54d3c76ad6efc/32d37/diagrams/data-science.png" width="70%" style="display: block; margin: auto;" > --- background-image: url(https://mareds.github.io/r_course/img_site/Tidyverse_packages.png) background-size: contain background-position: center class: clear --- background-image: url(https://upload.wikimedia.org/wikipedia/commons/thumb/1/1b/R_logo.svg/724px-R_logo.svg.png) background-size: contain background-position: center class: clear --- background-image: url(https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/RStudio_logo_flat.svg/1280px-RStudio_logo_flat.svg.png) background-size: contain background-position: center class: clear -- #[download R studio](https://www.rstudio.com/products/rstudio/download/#download) --- #Interface in R <img src="https://d33wubrfki0l68.cloudfront.net/8a64bb047429d7ae0e2acae35c40e421e6439bf6/80e5d/diagrams/rstudio-editor.png", width="550px" height="550px" style="position:absolute; top:120px;"> --- # Set R Studio preference Set your preference - Tools -> Global options <img src="https://d33wubrfki0l68.cloudfront.net/7fa44a5471d40025344176ede4169c5ad3159482/1577f/screenshots/rstudio-workspace.png", width="500px" height="500px" style="position:absolute; right:10px; top:110px;"> - General - Pane Layout - Appearance --- # R Studio workflow 1. Use **Projects** to manage all files (scripts, data, figures and tables) belonging to the same project. 2. Use the editor the write **R scripts**, so you can reproduce your results. <img src="https://d33wubrfki0l68.cloudfront.net/8a64bb047429d7ae0e2acae35c40e421e6439bf6/80e5d/diagrams/rstudio-editor.png", width="450px" height="450px" style="position:absolute; right:200px; top:220px;"> --- #Set up your "IntRo" project - Click on the Project button. - "New Directory" - "New Project" - Finally, a name for the folder and under "Browse" where you want that folder to be located. - Done! Now you should find an empty folder under the path your wanted it to be set up. - In the future, start RStudio by double clicking on the project icon in your folder. --- #Set up your "IntRo" project <img src="https://github.com/fancycmn/slidesimage2/blob/main/R%20project.PNG?raw=true", width="550px" height="450px" style="position:absolute; right:200px; top:150px;"> -- <img src="https://github.com/fancycmn/slidesimage2/blob/main/R%20project2.PNG?raw=true", width="550px" height="450px" style="position:absolute; right:200px; top:150px;"> -- <img src="https://github.com/fancycmn/slidesimage2/blob/main/R%20project3.PNG?raw=true", width="550px" height="450px" style="position:absolute; right:200px; top:150px;"> -- <img src="https://github.com/fancycmn/slidesimage2/blob/main/R%20project4.PNG?raw=true", width="550px" height="450px" style="position:absolute; right:200px; top:150px;"> -- <img src="https://github.com/fancycmn/slidesimage2/blob/main/R%20project5.PNG?raw=true", width="600px" height="450px" style="position:absolute; right:200px; top:150px;"> --- #R Scripts create a R script <img src="https://github.com/fancycmn/slidesimage2/blob/main/R%20project7.PNG?raw=true", width="550px" height="450px" style="position:absolute; right:200px; top:200px;"> -- <img src="https://github.com/fancycmn/slidesimage2/blob/main/R%20project8.PNG?raw=true", width="550px" height="450px" style="position:absolute; right:200px; top:200px;"> -- <img src="https://github.com/fancycmn/slidesimage2/blob/main/R%20project9.PNG?raw=true", width="550px" height="450px" style="position:absolute; right:200px; top:200px;"> --- #R scripts OK, you are good to go! Pro tip: Use the R Script to take notes during the lecture! "# take notes" ```r #take notes x <- 1 x+1 ``` ``` ## [1] 2 ``` --- #First steps with R - Creating objects - R is an object-oriented programming language. We create objects with the assignment operator - The assignment operator may be used in both directions: <-, or ->. This can help you write more legible code. - For Window, alt + '-' for ' <- ' - For Mac, Option + '-' for ' <- ' - Run code - For Windows, ctrl + enter - For Mac, cmd + enter ```r # Create a new object "test_object" and assign to it the result of 5 + 2 test_object <- 5 + 2 #printing the result: type test_object in the "Console" and see what you get ``` --- #First steps with R - Creating objects - Objects can contain all kinds of information. ```r # text can also be an object text_object <- "Yes, you can also assign text" #pring the results: type text_object in the "Console" and see what you get ``` --- #First steps with R compare the two sets of codes ```r test_object <- 5+2 test_object + 3 test_object ``` ```r test_object <- test_object + 3 test_object ``` -- If you don't use the assignment operator (<-), R will simply print the result of an operation without changing that object (i.e., without any further consequences). -- If you actually want to create or modify objects, you need to use the assignment operator and explicitly assign the result of your operation to the object you want to change. --- #First steps with R Assigning and printing together The short-hand version for assigning and printing an object right away uses surrounding **()**: ```r (test_object <- 5 + 2) ``` --- #First steps with R Use functions: function_name(argument1 = value1, argument2 = value2, ...) Example 1: The **seq()** function, for instance, produces a sequence of numbers and has (among others) the arguments from and to. ```r (yet_another_object <- seq(from = 1, to = 5)) ``` ``` ## [1] 1 2 3 4 5 ``` ```r #or a shorter code (yet_another_object <- seq(1, 5)) ``` ``` ## [1] 1 2 3 4 5 ``` --- #First steps with R Example 2: The class() function tells you what class/type of object your are dealing with: -- ```r class(yet_another_object) # "yet_another_object" contails whole numbers, so it is: ``` ``` ## [1] "integer" ``` -- ```r class(text_object) # "text_object" contains a sentence, so it is: ``` ``` ## [1] "character" ``` -- ```r class(seq) ``` ``` ## [1] "function" ``` -- ```r #Try to run alphabet <- seq(from = "a", to = "z") ``` -- Functions will throw an error message, if you feed them with an object of a class it cannot handle. --- #First steps with R Arithmetic operators -- ```r 2 + 3 ``` ``` ## [1] 5 ``` -- -- ```r 2 * 3 ``` ``` ## [1] 6 ``` -- ```r 2 / 3 ``` ``` ## [1] 0.6666667 ``` -- ```r 2^3 ``` ``` ## [1] 8 ``` -- ```r sqrt(yet_another_object) #square root as a function ``` ``` ## [1] 1.000000 1.414214 1.732051 2.000000 2.236068 ``` --- #First steps with R Logical operator and = **&**; or = **|**; not= **!**; belong = **%in%;** <img src="https://github.com/fancycmn/slidesimage2/blob/main/R%20project10.PNG?raw=true", width="600px" height="280px" style="position:absolute; right:200px; top:350px;"> -- 1 y & !x ; 2 x&y; 3 x & !y; 4 x; 5 !(x&y); 6 y; 7 x|y; --- #Logical operator in action ```r x <- 2 x==3 ``` ``` ## [1] FALSE ``` -- ```r x!=3 ``` ``` ## [1] TRUE ``` -- ```r x > 5 & 6 # greater than 5 and greater than 6 ``` ``` ## [1] FALSE ``` -- ```r x > 5|1 # greater than 5 or greater than 1 ``` ``` ## [1] TRUE ``` -- ```r x > (5 & 6) | (1 & 2) # more complex comparison ``` ``` ## [1] TRUE ``` --- #Getting help in R R has a steep learning curve and is, at times, simply frustrating. The documentation of a specific function can be accessed via typing **?<Name of function>** ```r ?seq ?sqrt ``` --- #Take home 1. R is an object-oriented programming language. Objects are basically containers and they can contain anything: data, functions, other objects, results, figures, and more ... Objects can be called by their name, which needs to start with a letter. If you just type the name of an object, R prints the object. 2. Functions do stuff with objects, which we assign as values to a function's arguments. What exactly a function does, depends on the class of an object. Functions themselves are objects too. 3. Important code - "?": View a function's help file. - "#": defines a comment in your code. Comment as much as possible! - "<-": is the assignment operator of R. - +, -, *, \, ^ : Arithmetic operators - & "and", | "or", ! "not" : Logical (Boolean) operators - class(): tells you the class of an object **Reference**: Grolemund, G. and H. Wickham (2017). R for Data Science. O'Reilly. --- class: center, middle #[Exercise](https://merlin-intro-r.netlify.app/exercises/exercise1.html)