Instructor Information | Course Information |
---|---|
Anthony Howell, PhD | Course Meeting Time: Mondays, 3:10-5:00pm |
Office: #322 School of Economics Bldg. | Course Meeting Location: 三教508 |
Email: tonyjhowell@pku.edu.cn | Office Hours: By Appt. |
In 2016, Glassdoor named ‘Data Scientist’ as the best job of the year based on current job trends among thousands of different professions. Hal Varian, the chief economist at Google, said that the sexiest job in the next 10 years will be statisticians. At the same time, there is a major global skills-deficit when it comes to the tools required to perform in-depth data analysis. The McKinsey Global Institute, for instance, indicates that the “United States alone faces a shortage of 140,000 to 190,000 people with deep analytical skills.”
The emphasis of this course will be on learning basic statistical concepts and methods while gaining experience working with hands-on data science projects. During the class, students will learn data visualization and analysis techniques using R statistical software, work with multiple datasets and be exposed to applied economic analysis topics. After completing the course, students will be able to: (1) read critically economic research reports; (2) use statistical methods in their own work; (3) write professional reports and reproducible research; (4) pursue further coursework in statistics/econometrics.
Participation
The participation grade, which accounts for 15% of your final grade, is based on in-class attendance and participation in lab work. In total, there will be 7 in-class labs during the semester. Students must choose 5 labs to submit for evaluation by the instructor. NOTE: Students must attend the same class that the lab was assigned in order to submit it for evaluation.
Participation points are calculated as follows:
# of Submitted Labs | Points |
---|---|
0 | 0 |
1 | 3 |
2 | 6 |
3 | 9 |
4 | 12 |
5 | 15 |
Homework
There will be 3 homework assignments given during the semester, and in total account for 15% of your final grade. Each homework assignment will have 5 problems worth 1 point each, and may include several parts.
Homework assignments will take the form of a single R Markdown text file: namely, code snippets integrated with captions and other narrative. Your score for each assignment will be assigned according to the scheme outlined in the rubric below.
Homework rubric (Total: 5 points)
Correctness: Each problem will be worth 1 point. Deductions will be made at the discretion of the grader.
Knitting: -1 deduction if the Rmd file you submit does not knit correctly (i.e., if there are errors and no HTML file is produced when the grader attempts to knit your Rmd file). If your Rmd file fails to knit, you will be contacted by the grader and will be given 24 hours to resubmit your homework. You will need to trace the source of the error(s) and correct it.
Style: Coding style is very important. With the exception of Homework 1, you will receive a deduction of up to 1 point if you do not adhere to good coding style.
All completed assignments are to be emailed to the course email address by Sunday 11:59pm on the dates indicated in the Course Outline (below).
Quizzes
There will be 2 quizzes given in the second half of the term, accounting for 20% of your final grade. The quizzes will be largely based on the previous course labs. The quizzes may be cumulative, and their purpose is to assess your understanding of various concepts that are central to the class.
Final Research Report/Presentation
The final project accounts for 50% of your final grade, and will ask you to explore a broad data-driven policy question. The instructor will provide access to various social, economic or environmental datasets for students to explore and analyze. This project is intended to provide students with the complete experience of going from a study question and a rich data set to a full statistical report.
Students will be expected to:
While students may work in small groups to decide on appropriate statistical methodology and graphical/tabular summaries, each student will be required to produce and submit their own code and final report.
Summary of Grade Distribution
Activity | Grade Contribution |
---|---|
1. Participation | 15% |
2. Assignments/Homework | 15% |
3. Quizzes | 20% |
4. Final Research Report/Presentation | 50% |
A WeChat group will be created for the class to serve as discussion forums for the class in order to facilitate interaction between students and to promote broader participation. Students are expected to conduct themselves with respect by posting comments and replies only in the context of the course. It is encouraged to not email the instructor or TA directly, rather use the Wechat class group to ask general questions about specific problems with R, programmatic issues, and/or homework. Your question will probably help other classmates. You can also paste small snippets of code to clarify an idea. Students are encouraged to answer each others’ questions.
Class attendance is encouraged. As you can see from the grading rubric, 35% of your grade is related to your class attendance and participation through lab work and quizzes. If you consistently miss classes it will not be possible to obtain a high grade, and may even result in your failing the class. If you have a planned and excusable absence, please notify the instructor beforehand.
You are encouraged to discuss labwork and homework problems with your fellow students. However, the work you submit must be your own. The course collaboration policy allows you to discuss the problems with other students, but requires that you complete the work on your own. Every line of text and line of code that you submit must be written by you personally.
Submissions that fail to properly acknowledge help from other students or non-class sources will receive no credit. Copied work will receive no credit. Any and all violations will be reported to school administration.
In order to participate in class, students are expected bring their own laptops to class. Please see the instructor if you do not have access to a laptop.
I value students’ opinions regarding my teaching effectiveness and the content, pace and level of difficulty of the course. I will take student feedback in consideration to make this course as exciting and engaging as possible. You can also leave anonymous feedback in the form of a note in my departmental mail box.
Week | Date | Topic | Labs/Assignments |
---|---|---|---|
Section I: Data Visualization and Mapping | |||
1 | 9/17 | Course introduction and R basics | Lab 1 |
2 | 9/24 | Holiday - No Class | |
3 | 10/1 | Holiday - No Class | |
4 | 10/8 | Data manipulation and programming basics | Lab 2 |
5 | 10/15 | Tidyverse and ggplot | Lab 3 |
6 | 10/22 | Mapping with World Bank API | Hw 1: Due 10/29 |
Section II: Statistical and Spatial Modeling | |||
7 | 10/29 | Regression with programming | Lab 4 |
8 | 11/5 | Casuality and identification | Lab 5 |
9 | 11/12 | Spatial Analysis and Moran’s I | Hw 2: Due 11/19 |
10 | 11/19 | Review | Quiz 1 |
Section III: Network Analysis | |||
11 | 11/26 | Creating and handling network data | Lab 6 |
12 | 12/3 | Topological properties and visualizing networks | Lab 7 |
13 | 12/10 | Statistical modeling of networks | Hw 3: Due 12/17 |
14 | 12/17 | Review | Quiz 2 |
15 | 12/24 | Student Presentations I | |
16 | 12/31 | Student Presentations II | Final Written Project: Due 1/3 |