Congratulations to iSchool Professor and Textual Studies faculty Melanie Walsh, who has received a Digital Humanities Advancement Grant from the National Endowment for the Humanities!
Walsh is a co-project director with Matthew Wilkens and David Mimno from the Department of Information Science at Cornell University. Their project, “BERT for Humanists,” will develop case studies about and professional development workshops on the use of BERT (bidirectional encoder representations from transformers) for humanities scholars and students interested in large-scale text analysis.
We asked Walsh to tell us a little more about BERT, and how AI and machine reading are and will be pertinent for literary and humanistic study. She kindly provided us with the following response.
Large language models (LLMs) like Google’s BERT and OpenAI’s GPT-3 can now generate text, answer questions, summarize documents, and translate between languages—both human and programming—with levels of accuracy and quality that have never been seen before. Most recently, in November 2022, OpenAI released a chatbot called ChatGPT, built on the slightly revised GPT-3.5 model, which launched the impressive capabilities and concerning limitations of LLMs into the public eye like no model had before.
The BERT for Humanists project, which received an NEH Level I Digital Humanities Advancement Grant in 2021 and an NEH Level III Digital Humanities Advancement Grant in 2023, seeks to make LLMs accessible to humanities scholars so that they can better use, understand, and critique them. The project explores how these technologies, which have revolutionized the field of natural language processing (NLP), might be applied to humanistic text collections and enable scholars to answer humanistic research questions. For example, LLMs can potentially be used to trace how literary genres change over time, analyze how fictional characters interact with each other, or identify and track migration patterns from the locations mentioned in historical documents.
However, there are serious barriers to humanities scholars adopting these technologies in their work and other challenges and concerns. The texts that we study in the humanities are often trickier and more complex than the texts used and studied in NLP contexts — humanistic texts are typically longer, more archaic, and more ambiguous — and NLP tools are not typically designed with humanities scholars’ skillsets or use cases in mind. Plus there are many other ethical, social, and legal questions that have been raised by these models, such as their well-documented biases or their potential to harm living people. For example, LLMs are “trained” on billions of texts and images scraped from the web, which includes the works of living artists and writers, who are not notified, credited, or compensated for their work. These technologies may consequently harm, exploit, and displace living artists. The BERT for Humanists project thus seeks to bridge the technical gap between LLMs and the humanities, but also to inform and empower humanities scholars so that they can appropriately critique these models and fully understand their flaws and limitations.