class: center, middle, inverse, title-slide .title[ # Advanced quantitative data analysis ] .subtitle[ ## Introduction ] .author[ ### Mengni Chen ] .institute[ ### Department of Sociology, University of Copenhagen ] .date[ ### 04/09/2024 ] --- #Introduction About me - Assistant professor in Copenhagen University (05/2021-now) - Teaching: population and society, family sociology, advanced quantitative data analysis - Research: family formation and dissolution, fertility, ageing, population health, population policies - Research Experience: Cologne University, Germany; Catholic University of Louvain, Belgium; Wittgenstein Centre, Austria - Phd in University of Hong Kong (2013-2017) - My office hour: Every Tuesday 10:30-11:30 am, Room 16.1.48, Building 16 --- #Introduction About you - Name - Year and department --- #Introduction About what you know [I would like to know what you have known about R and statistics](https://docs.google.com/forms/d/e/1FAIpQLSdrKGGUIC_RO21DPaBciM0DIiKGMlJPL36iaMZCXCXyp9e6tw/viewformLinks to an external site. ) --- #About the course - Main content - Intro to R - Cross-sectional data analysis: OLS, OB-decomposition - Panel data management - Panel data analysis I: fixed effect - Panel data analysis II: difference in difference - Learning approach - Lecture - Small in-class exercise - Bring lap-top every class - Fagcafe after class, Wednesday 15:00-17:00, Room CSS 2-1-12. - Julie Olea Brønnum Cordes and Yanzhen Luo are our TAs. --- #Assessment - Portfolio of 3 assignments: select one of the three topics - P1: 3 pages for introduction, theories, and OLS regression - Deadline is Oct.27, 23:59. - P2: 3 pages for describing what is fixed effect and applying fixed effect - Deadline is Dec.01, 23:59. - P3: 3 pages for describing what is DID and applying DID - Deadline is Dec.15, 23:59. - Please submit your portfolio even if it is not 100% complete. - A final combined product for exam: max 10 pages for all - Individual or group work: 10 + 5 pages per extra person - Final exam: 13 January 2025, uploaded in Digital Exam before 12:00 (noon, Danish Time) - Re-exam: 10 February 2025, written exam with new questions --- #Feedback - Portfolio 1: all get feedback from the teacher - Portfolio 2: randomly 50% get feedback from the teacher + peer feedback - Portfolio 3: the rest 50% get feedback from the teacher + peer feedback - Overall, everyone gets twice feedback from the teacher + twice peer feedback --- #Three topics to choose - Q1: How does first childbearing affect women’s subjective wellbeing of women? - Q2: How does entry into a partnership affect people’s subjective wellbeing? - Q3: How does partnership break-up affect people’s subjective wellbeing? - **Choose one topic for all your 3 portfolios** - The aim of 3 portfolios is to answer the chosen question using 3 different methods. --- #How to measure subjective wellbing You can choose one of the following measurements as your outcome variable: - Subjective satisfaction with work - Subjective satisfaction with leisure activities - Subjective satisfaction with family - General life satisfaction --- #Access to the Pairfam data - [What is pairfam](https://www.pairfam.de/en/) - [What you need to know about pairfam: see Tutorial 8-12](https://www.pairfam.de/en/documentation/video-tutorials/) - How to access to pairfam - Fill up the form: go to absalon → this course → files → Data → **TeachingVersion_pairfam13.0_Cov-19_en.pdf** → send to support@pairfam.de --- #How to use ChatGPT in the course - How many of you using ChatGPT - What you use it for - Is it ok to use? - yes: check codes, ask questions, get some ideas - no: just copy the codes without understanding what you are copying - mindful and critical --- #Other info - Datacamp is a platform you can have afterclass exercise and instruction. - Absalon setup - Data: go to absalon → this course → files → Data - Women data: can used to answer Q1 - 50%data: can be used to answer the Q2 and Q3 - Codebook Anchor_en, pairfam Wave 1 2008-09 - Variables, pairfam Waves 1-13.xlsx - Reading: go to absalon → this course → files → Reading - Example reading - Week40-41, Week-43-45, Week46-47, Week48-49 - Assignment examples: portfolio examples from previous cohorts --- #What is R If you're mainly interested in using regression to analyze a specific survey, it is 1. ...a rather tedious stats programming language. 2. ...basically a calculator: ```r 1+1 ``` ``` ## [1] 2 ``` ```r x <- 1 x+1 ``` ``` ## [1] 2 ``` --- #What is R If you enjoy coding and working with several data sources at once, it is 1. ... a great platform for (social) data science of 2. ... networks, text, geo, pictures... 3. ... an all in one solution.→ I coded these slides in R! 4. ... a huge, dynamic and cooperative community. 5. ... a very higher chance to get a job in Danish labor market! [why you should use R](https://www.youtube.com/watch?v=9kYUGMg_14s&list=PLbh5g6tUnVSRwojrTf-ZBTXeCs-oYh9DV&index=94&t=3s) --- #What is R  --- #What you will learn in R (Social) Data Science  .backgrnote[*Source*: Grolemund and Wickham (2017)] <src="https://d33wubrfki0l68.cloudfront.net/571b056757d68e6df81a3e3853f54d3c76ad6efc/32d37/diagrams/data-science.png" width="70%" style="display: block; margin: auto;" > --- #Install R and R - [First, download R and install](https://www.rstudio.com/products/rstudio/download/#download) - [Second, download R studio and install ](https://www.rstudio.com/products/rstudio/download/#download) - Video on how to install them in [Windows](https://www.youtube.com/watch?v=_2sewGCA0y4) and [Mac](https://www.youtube.com/watch?v=LanBozXJjOk) - If you don't manage to install R and Rstudio, you can use [Rstudio cloud](https://posit.cloud/) - [Video on how to use Rstudio cloud](https://www.youtube.com/watch?v=uK1Va_UWQFc) --- #Interface in R <img src="https://d33wubrfki0l68.cloudfront.net/8a64bb047429d7ae0e2acae35c40e421e6439bf6/80e5d/diagrams/rstudio-editor.png", width="550px" height="550px" style="position:absolute; top:120px;"> --- # R Studio workflow 1. Use **Projects** to manage all files (scripts, data, figures and tables) belonging to the same project. 2. Use **R scripts** to write down your codes, so you can reproduce your results by running the codes in the R script. <img src="https://d33wubrfki0l68.cloudfront.net/8a64bb047429d7ae0e2acae35c40e421e6439bf6/80e5d/diagrams/rstudio-editor.png", width="450px" height="450px" style="position:absolute; right:200px; top:220px;"> --- #Set up your own Rproject and Rscript - Click on the Project button. - "New Directory" - "New Project" - Finally, a name for the folder and under "Browse" where you want that folder to be located. - Done! Now you should find an empty folder under the path your wanted it to be set up. - In the future, start RStudio by double clicking on the project icon in your folder. - You can follow the procedures to create your R project and R script in [this website, click here](https://rpubs.com/fancycmn/1213843). - [Video on creating a project: 3'38-7'32](https://www.youtube.com/watch?v=wqOme7xsZvs) - [Video on openning an existing project: 9'35-10'29](https://www.youtube.com/watch?v=wqOme7xsZvs) --- #R scripts (wait and see) OK, you are good to go! Pro tip: Use the R Script to take notes during the lecture! <span style="color:blue">**# take notes**</span> ```r #take notes x <- 1 x+1 ``` ``` ## [1] 2 ``` --- #First steps with R - Creating objects - R is an object-oriented programming language. We create objects with the assignment operator - The assignment operator may be used in both directions: <-, or ->. This can help you write more legible code. - Shorthand in Windows, press <span style="color:blue">**alt**</span> and <span style="color:blue">**-**</span> together - Shorthand in Mac, press <span style="color:blue">**Option**</span> and <span style="color:blue">**-**</span> together - Run code - For Windows, press <span style="color:blue">**Ctrl**</span> and <span style="color:blue">**enter**</span> together - For Mac, press <span style="color:blue">**cmd**</span> and <span style="color:blue">**enter**</span> together --- #First steps with R - Creating objects - Objects can contain all kinds of information. ```r # Create a new object "test_object" and assign to it the result of 5 + 2 test_object <- 5 + 2 #printing the result: type test_object in the "Console" and see what you get test_object ``` ``` ## [1] 7 ``` ```r # text can also be an object text_object <- "I love this course so much!" #pring the results: type text_object in the "Console" and see what you get text_object ``` ``` ## [1] "I love this course so much!" ``` --- #First steps with R compare the two sets of codes ```r test_object1 <- 5+2 test_object1 + 3 test_object1 ``` ```r test_object2 <- 5+2 test_object2 <- test_object2 + 3 test_object2 ``` -- If you don't use the assignment operator (<-), R will simply print the result of an operation without changing that object (i.e., without any further consequences). -- If you actually want to create or modify objects, you need to use the assignment operator and explicitly assign the result of your operation to the object you want to change. --- #First steps with R Use functions: function_name(argument1 = value1, argument2 = value2, ...) Example 1: The **seq()** function, for instance, produces a sequence of numbers and has (among others) the arguments from and to. ```r (yet_another_object <- seq(from = 1, to = 5)) ``` ``` ## [1] 1 2 3 4 5 ``` ```r #or a shorter code (yet_another_object <- seq(1, 5)) ``` ``` ## [1] 1 2 3 4 5 ``` --- #First steps with R Example 2: The class() function tells you what class/type of object your are dealing with: -- ```r class(yet_another_object) # "yet_another_object" contails whole numbers, so it is: ``` ``` ## [1] "integer" ``` -- ```r class(text_object) # "text_object" contains a sentence, so it is: ``` ``` ## [1] "character" ``` -- ```r class(seq) ``` ``` ## [1] "function" ``` -- ```r #Try to run alphabet <- seq(from = "a", to = "z") ``` -- Functions will throw an error message, if you feed them with an object of a class it cannot handle. --- #First steps with R Arithmetic operators: plus=<span style="color:blue">+</span>, multiply=<span style="color:blue">*</span>, power=<span style="color:blue">^</span>, square root=<span style="color:blue">sqrt()</span> -- ```r 2 + 3 ``` ``` ## [1] 5 ``` -- ```r 2 * 3 ``` ``` ## [1] 6 ``` ```r 2^3 ``` ``` ## [1] 8 ``` -- ```r sqrt(yet_another_object) #square root as a function ``` ``` ## [1] 1.000000 1.414214 1.732051 2.000000 2.236068 ``` --- #First steps with R Logical operator - and = <span style="color:blue">**&**</span>; - or = <span style="color:blue">**|**</span>; - not= <span style="color:blue">**!**</span>; - belong = <span style="color:blue">**%in%**</span>; - equal = <span style="color:blue">**==**</span>; - not equal= <span style="color:blue">**!=**</span>; --- #Logical operator in action ```r x <- 2 x==3 ``` ``` ## [1] FALSE ``` -- ```r x!=3 ``` ``` ## [1] TRUE ``` -- ```r x > 5 & 6 # greater than 5 and greater than 6 ``` ``` ## [1] FALSE ``` -- ```r x > 5|1 # greater than 5 or greater than 1 ``` ``` ## [1] TRUE ``` -- ```r x > (5 & 6) | (1 & 2) # more complex comparison ``` ``` ## [1] TRUE ``` --- #Getting help in R R has a steep learning curve and is, at times, simply frustrating. The documentation of a specific function can be accessed via typing **?<Name of function>** ```r ?seq ?sqrt ``` --- #Take home 1. R is an object-oriented programming language. Objects are basically containers and they can contain anything: data, functions, other objects, results, figures, and more ... Objects can be called by their name, which needs to start with a letter. If you just type the name of an object, R prints the object. 2. Functions do stuff with objects, which we assign as values to a function's arguments. What exactly a function does, depends on the class of an object. Functions themselves are objects too. 3. Important code - "?": View a function's help file. - "#": defines a comment in your code. Comment as much as possible! - "<-": is the assignment operator of R. - +, -, *, ^ : Arithmetic operators - & means "and", | means "or", ! means "not", == means testing whether the two are equal - class(): tells you the class of an object **Reference**: Grolemund, G. and H. Wickham (2017). R for Data Science. O'Reilly. --- # Set R Studio preference (optional) Set your preference - Tools -> Global options <img src="https://d33wubrfki0l68.cloudfront.net/7fa44a5471d40025344176ede4169c5ad3159482/1577f/screenshots/rstudio-workspace.png", width="500px" height="500px" style="position:absolute; right:10px; top:110px;"> - General - Pane Layout - Appearance --- class: center, middle #[Exercise: 1-6 necessary; 7-9 optional](https://rpubs.com/fancycmn/1214111)