Skip to the content.

Instructor: Yifan Peng (yip4002@med.cornell.edu)
Time: Jan. 12, 2026 - April 13, 2026, 5:00-8:00 pm East Time on Mondays
Location: TBD
TA: Haotian Ma ham7026@med.cornell.edu, Geoffrey Martin ghm4002@med.cornell.edu
Office Hours: TBD
Grading: Letter grade

Course Aims and Outcomes

This course provides students with an understanding of the field of natural language processing and its applications in health. Students will acquire knowledge of sources of text data, linguistic structures, and the range of methods available for processing. Hands-on experience with the Python programming language and tool kit will provide useful skills for managing text data for solving a variety of problems in the health domain.

Format and Procedures

The course follows the progression of topics: text preprocessing and regular expression, n-gram, text classification, sequence labeling, parsing, word vector, convolutional neural network and recurrent neural network, and transformer-based methods. Each topic is addressed in a module lasting 1-2 weeks. Students will work on individual assignments alongside these activities, as well as participate in a team project.

Prerequisites

Reference Texts

The following texts are useful, but none are required.

If you are not very familiar with Python

If you are interested in Deep Learning

Tentative Course Schedule Overview

Date Week TOPICS READINGS OR PRE-WORK DUE BEFORE CLASS ASSIGNMENTS DUE
1/12 1 Introduction    
1/19 2 Martin Luther King, Jr. Day – no classes    
1/26 3 Text preprocessing and regular expression Homework 1  
2/2 4 n-gram    
2/9 5 Text classification Homework 2 Homework 1
2/16 6 Presidents’ Day - no classes    
2/23 7 Part-of-speech tagging and parsing Homework 3 Homework 2
3/2 8 Word vector    
3/9 9 Intro to deep learning Homework 4 Homework 3
3/16 10 CNN, RNN, and Transformer    
3/23 11 Large Language Model Homework 5 Homework 4
3/30 12 Multimodal large language models    
4/6 13 Final project presentation   Homework 5
4/13   Final Exams   Final project manuscript