Instructor: Yifan Peng (yip4002@med.cornell.edu)
TA: Braja Patra (bgp4001@med.cornell.edu), Daniel Sanky (das4014@med.cornell.edu)
Office Hours: 5:00-6:00 pm on Mondays by appointment (Location: TBD)
Grading: Letter grade
Course Aims and Outcomes
This course provides students with an understanding of the field of natural language processing and its applications in health. Students will acquire knowledge of sources of text data, linguistic structures, and the range of methods available for processing. Hands-on experience with the Python programming language and tool kit will provide useful skills for managing text data for solving a variety of problems in the health domain.
Format and Procedures
The course is 14 weeks in length. The course follows the progression of topics: python review, regular expression and automata, text normalization, n-gram, text classification, sequence labeling, parsing, word vector, introduction to deep learning, convolutional neural network and recurrent neural network, and transformer-based method. Each topic is addressed in a module lasting 1-2 weeks. Students will work on an individual project in parallel with these activities and give a final presentation last week.
Prerequisites
- Python: Prior exposure to programming and Python is highly recommended. We will provide a tutorial on Python in the first two weeks.
- Basic Probability and Statistics: You should know the basics of probabilities, mean, standard deviation, etc.
- College Calculus, Linear Algebra: You should understand matrix/vector notation and operations.
Reference Texts
The following texts are useful, but none are required.
- Natural Language Processing with Python
- Foundations of Statistical Natural Language Processing
- Speech and Language Processing (3rd ed. draft)
- Natural Language Processing
If you are not very familiar with Python
If you are interested in Deep Learning
Tentative Course Schedule Overview
Dates | Topic | Event | Deadline |
---|---|---|---|
01/05 | Introduction | ||
01/12 | Regular expression | Assignment 1 | |
01/19 | Text preprocessing | ||
01/26 | n-gram | Assignment 2, Final project | Assignment 1 |
02/02 | Text classification | ||
02/09 | Part-of-speech tagging | ||
02/16 | Parsing | Assignment 3 | Assignment 2 |
02/23 | Intro to deep learning: neural network | ||
03/02 | Guest lecture: Hao Liu, Columbia Univ | Literature review | |
03/09 | Word embeddings | Assignment 4 | Assignment 3 |
03/16 | Guest lecture: Qingyu Chen, NHI/NLM | ||
03/23 | CNN, RNN | ||
03/30 | Guest lecture: Imon Banerjee, Mayo Clinic | Assignment 4 | |
04/06 | Final project presentation | Final project paper |