Skip to the content.

Instructor: Yifan Peng (
TA: Braja Patra (, Daniel Sanky (
Office Hours: 5:00-6:00 pm on Mondays by appointment (Location: TBD)
Grading: Letter grade

Course Aims and Outcomes

This course provides students with an understanding of the field of natural language processing and its applications in health. Students will acquire knowledge of sources of text data, linguistic structures, and the range of methods available for processing. Hands-on experience with the Python programming language and tool kit will provide useful skills for managing text data for solving a variety of problems in the health domain.

Format and Procedures

The course is 14 weeks in length. The course follows the progression of topics: python review, regular expression and automata, text normalization, n-gram, text classification, sequence labeling, parsing, word vector, introduction to deep learning, convolutional neural network and recurrent neural network, and transformer-based method. Each topic is addressed in a module lasting 1-2 weeks. Students will work on an individual project in parallel with these activities and give a final presentation last week.


Reference Texts

The following texts are useful, but none are required.

If you are not very familiar with Python

If you are interested in Deep Learning

Tentative Course Schedule Overview

Dates Topic Event Deadline
01/05 Introduction    
01/12 Regular expression Assignment 1  
01/19 Text preprocessing    
01/26 n-gram Assignment 2, Final project Assignment 1
02/02 Text classification    
02/09 Part-of-speech tagging    
02/16 Parsing Assignment 3 Assignment 2
02/23 Intro to deep learning: neural network    
03/02 Guest lecture: Hao Liu, Columbia Univ   Literature review
03/09 Word embeddings Assignment 4 Assignment 3
03/16 Guest lecture: Qingyu Chen, NHI/NLM    
03/23 CNN, RNN    
03/30 Guest lecture: Imon Banerjee, Mayo Clinic   Assignment 4
04/06 Final project presentation   Final project paper