Course code:



M - Intermediate

Class size limit:


Meets the following requirements:

  • QR - Quantitative Reasoning
  • HS - Human Studies

Typically offered:

Upon occasion

Computational text analysis (CTA) is an emerging field that uses computation to analyze texts. CTA draws on the fields of computer science, machine learning, computational linguistics, and literary theory. Using machine learning and statistics, computers can be used to explore how language is used in particular contexts, including how frequently different words are used, the sentiment of a word/text, as well as nuances in the ways words are associated with one another.

We will use CTA to engage in “Distant Reading”, a term coined by literary theorist Franco Moretti. Distant Reading stands in contrast to the more familiar “Close Reading”: a deep engagement with a particular text or a passage from a text. Distant Reading engages not with a particular text, but with a large corpus of texts: e.g., all novels published in English in the 20th century, all articles written in The New York Times and The Washington Post in the last decade, or the lyrics of all top-100 pop songs from the 1980s. Computational techniques applied to large collections of texts allow one to ask broad questions about structural and linguistic change over time and to look for patterns of language use that would not be evident from analysis of one or even several individual texts. Distant Reading, and computational text analysis more generally, is not intended to replace close reading, but to complement it.

We will use CTA to explore how power structures and systems—such as race, gender, and colonialism—manifest themselves in bodies of text. For example, CTA has been used to investigate Islamophobia, analyze race in US novels, explore settler colonialism in the Americas, and to investigate shifts in anti-Asian sentiments in the US brought on by the COVID crisis.

Students will who successfully complete this course will: 1) gain a conceptual understanding of various CTA techniques, including word frequency analysis, topic modeling, and sentiment analysis; 2) learn how to apply these techniques using pre-existing software and doing their own coding; 3) gain experience asking questions about power structures/systems—race, gender, colonialism—and how those structures manifest themselves in corpora of text; 4) learn how these questions of power can be explored using algorithmic methods; and 5) gain experience critiquing algorithmic methods through the lenses of race, gender, and colonialism.

Classes will be a mixture of lecture, group exercises, discussion, and live coding. Readings will include case studies and selections from literary theorists. Evaluation will be based on participation in discussion and in-class activities, several short coding/analysis exercises, several short reflection assignments, and a group project on a topic of the students’ choosing.


Either an introductory coding class (in any language) or a college-level course that introduced critical and/or literary theory. Students unsure about their background are encouraged to reach out to the instructors.

Always visit the Registrar's Office for the official course catalog and schedules.