CS 886: Topics in Language Models (Fall 24)

Instructor: Yuntian Deng

Course Schedule: Wednesdays, 12:00 pm - 2:50 pm

Location: DC 2585

Enrollment Limit: 40 students

Instructor Email: yuntian@uwaterloo.ca

Office Hours: Schedule via Calendly

Note: This is a provisional version of the syllabus. Expect changes over time.


Course Description

This graduate seminar focuses on recent advancements in language models. In each session, students will present and discuss on recent papers in the field. The course emphasizes critical analysis, enabling students to understand the strengths, limitations, and emerging trends in language modeling.


Grading

All deliverables are due by 11:59pm Eastern Time on the respective due date. Late submissions will only be considered with prior approval from the instructor.

Task Due Date Weight
Class Participation Throughout the semester 20%
Two Presentations As assigned 40%
Project: Team Formation Sep 25 -
Project: Proposal Oct 11 10%
Project: Progress Report Nov 01 10%
Project: Presentation Nov 27 10%
Project: Final Report Dec 05 10%

Intended Learning Outcomes

  • Keep up-to-date with the current progress in language models.
  • Critically analyze research publications.
  • Develop and present original research ideas related to language models.

Prerequisites and Recommended Materials

  • Familiarity with probabilistic theory
  • Understanding of numeric optimization (backpropagation)
  • Proficiency in Python, PyTorch, and tensor programming
  • Ability to independently read and understand research papers for active participation in discussions

Below is a series of exercises from my PhD advisor Professor Alexander Rush that I highly recommend:


Coursework Overview

Presentations: Students will present one or more papers during the course. Presentations are crucial and should not be missed, as they significantly contribute to the class dynamics. If unforeseen circumstances arise, inform the instructor as soon as possible.

Final Project: Students will work on a final project that allows them to explore a topic related to language models in depth. Projects can be done individually or in groups of up to 3 members. Groups larger than this range require a justification and are subject to instructor approval.

  • Original Contribution: The primary aim of the project is to make an original contribution to the field. Ideally, the project should be of a quality that could potentially lead to a publication. This could involve, but is not limited to:
    • Novel Research: Developing new methods, models, or insights in the field of language models.
    • Reproducibility Test: Attempting to replicate and possibly extend the results from a published paper, providing detailed analysis and commentary on the findings.
    • Negative Results: Investigating a hypothesis or method that did not yield expected results, with thorough documentation of the process and analysis of why it failed.
    • Survey: Conducting a comprehensive and critical survey of a specific area within language models, identifying gaps or trends that could inform future research. Even for surveys, the work should go beyond summarization to include thoughtful analysis and synthesis of the literature.
  • Project Timeline:
    • Team Formation: By Sep 25, students form teams of up to three members and sign up at bit.ly/cs886signup (Sheet 2).
    • Proposal Submission: By Oct 11, students submit a project proposal outlining their research question, hypothesis, and the planned approach.
    • Progress Report: By Nov 1, students submit a progress report detailing the work completed so far, challenges encountered, and any adjustments to the initial plan.
    • Final Presentation: On Nov 27, students present their project findings to the class, providing a clear overview of their approach, results, and conclusions.
    • Final Report: By Dec 5, students submit a final project report in a format similar to a machine learning conference paper. The report should follow the ICML template format, and include the motivation, research question, hypothesis, approach, results, and discussion.

Participation in every class is expected since discussion is crucial to the seminar format.

Syllabus and Presentation Sign-Up

Sign-up Instructions
  • Sign up for two presentation slots at bit.ly/cs886signup (Sheet 1) to receive full credit.
  • Each section allows up to three students to sign up.
  • Use the embedded Google Sheet below (or the link above) to sign up for presentations by adding a comment with your name in the desired cell (only the first comment in each cell will be considered valid).
Presentation Preparation Guidelines
  • Time Management: Each group has 80 minutes (discussions included). Please make sure to stay within this time limit.
  • Participation: Ensure that every member of the group participates in the presentation. During the presentation, please clearly state your name so each presenter can be marked individually. You are free to choose your presentation format (whether focusing on depth or breadth), but make sure to effectively teach the audience about the topic. Expect questions and discussions during the presentation, and be prepared to engage with the class.
  • Collaboration: You can find your teammates' contact information in the shared email thread. Please collaborate and decide on the paper(s) you would like to present. While you are encouraged to choose from the list provided, you are also welcome to select other relevant papers as long as they align with the course topics.
  • Slides: Please work together to create a cohesive set of slides. Collaboration-friendly platforms such as Google Slides or Overleaf (with LaTeX Beamer) are recommended, but feel free to use any tool you're comfortable with. Please meet offline or virtually to ensure smooth coordination and preparation.
Presentation Grading Rubric

The presentations will be graded on an individual basis (per presenter) according to the following criteria:

Criteria 4 - Excellent 3 - Good 2 - Satisfactory 1 - Needs Improvement
Relevance to Topic Fully adheres to the assigned topic and effectively teaches the audience about the topic. Mostly adheres to the topic with minor tangents; still effectively teaches core aspects. Partially adheres to the topic but misses key elements or strays too far. Significantly deviates from the assigned topic, does not teach the relevant material effectively.
Content Thorough understanding, covering key points with insights and analysis. Good understanding, covers most key points. Basic understanding, covers some key points but misses important points. Limited understanding or significant gaps in coverage of key points.
Clarity, Organization, and Visuals Clear and well-structured presentation with effective use of visuals that support the content. Mostly clear; could improve in structure or visual support. Somewhat unclear or disorganized. Unclear, disorganized, or difficult to follow.
Engagement and Interaction Actively engages the audience, encourages questions and discussion, responds thoughtfully. Engages the audience but doesn't fully encourage interaction; handles questions fairly well. Limited engagement with the audience, weak handling of questions or discussions. No meaningful audience engagement; struggles to respond to questions or avoid interaction.
Collaboration Seamless teamwork with all members contributing equally; smooth transitions between presenters. Good teamwork but with some minor imbalances in contribution or transitions between presenters. Uneven contribution from members; transitions are awkward or lacking in coordination. Poor teamwork; significant imbalance in contributions or disjointed transitions between presenters.
Timeliness Stays within the allocated time and pacing is excellent throughout. Mostly adheres to the time limit but with minor deviations; pacing is generally good. Significantly over or under the time limit; pacing is inconsistent. Fails to adhere to the time limit; pacing is problematic throughout.
Grading Breakdown
  • Relevance to Topic: 20%
  • Content Depth: 30%
  • Clarity & Visuals: 20%
  • Engagement & Interaction: 15%
  • Collaboration: 10%
  • Timeliness: 5%

Note: Make sure that all members contribute and that each presenter introduces themselves clearly at the start of their part.

Google Sheet Sign-up:
Date Section Section Topic Recommended Papers Presenters
Sep 4 1 Introduction Lecture
2 Course Logistics
Sep 11 3 Seq2Seq Learning Gurjot Singh
Delara Forghani
Luke Rivard
4 Optimization Songcheng Cai
Amber Wang
Ruoxi Ning
Sep 18 5 Variational Autoencoders Achint Soni
Andy Zheng
Wentao Zhang
6 Knowledge Distillation Hala Sheta
Peter Pan
Gurjot Singh
Sep 25 7 Transformer
8 Pretraining
Oct 2 9 Non-Autoregressive Models
10 Evaluation 1
Oct 9 11 Interpretability
12 Mechanistic Interpretability
Oct 23 13 Knowledge Editing
14 Adversarial Robustness
Oct 30 15 Prompting
16 Prompt Engineering
Nov 6 17 Alignment
18 Evaluation 2
Nov 13 19 Training Scalability
20 Inference/Model Scalability
Nov 20 21 Reasoning
22 Multimodality
Nov 27 23 Project Presentations
24 Project Presentations
Student Responsibilities:
  • Presentation: Collaborate with your co-presenters, choose a subset of papers, make slides, and lead your selected section.
  • Attendance: Attendance for your presentation day is mandatory. If an emergency arises, notify the instructor as soon as possible to make alternative arrangements.

Course Policies

Academic Integrity: All submitted work must be original. Plagiarism is not permitted.

Attendance: Regular attendance and active participation are expected. Absences should be communicated in advance, and unexcused absences may impact the participation grade.

Accommodations: Students requiring accommodations should reach out early in the semester to discuss necessary arrangements.


Acknowledgment

This syllabus was adapted from the syllabus of CS187 at Harvard, developed by my PhD co-advisor, Professor Stuart Shieber. The grading structure was adapted from Professor Pengyu Nie's course website CS 846.