Towards a Bibliography for AI Systems
Abstract: To date, much bibliographic study of digital textuality has focused on the materiality of computation—forensic studies of the hard drive's textual inscriptions, for example, or vertical analysis of the layered software and hardware of a single ebook. Less clear, however, is how such approaches might apply to the texts generated by a large language model (LLM) such as ChatGPT. While the data for LLMs exists somewhere, that existence is far more diffuse, abstract, and distributed than that of a discrete digital document or bound set of files. Is the textual data undergirding an LLM inscribed in any single place to which we could point, in the way that bibliographers point to a particular witness of a particular state of a particular edition of a particular book, or does this new medium require a more capacious bibliographical approach? The "black box" constructed by the corporate owners of popular LLMs further obfuscates bibliographical investigations, forcing researchers to speculate about material realities through indirect clues rather than through direct experience of material substrates.
In this talk, Cordell argues that a bibliography for AI systems must bring together two related traditions: the sociological school of bibliography and book history, which forefronts the linked technological, social, economic, and artistic contexts through which books coming into being, and the growing set of approaches gathered under the mantle of "data archaeology," which seek to outline the similarly linked contexts through which datasets are created, distributed, and accessed. Such a bibliography triangulates between materiality, infrastructure, and labor to describe the systems of digitization, archiving, extraction, and representation that lead from digitized and born-digital sources to the strings of text output by LLMs in response to user prompts. Just as D.F. McKenzie argued bibliographers should "describe not only the technical but the social processes of texts' transmission," AI systems require a bibliography that studies the diffuse and often obscured sociology of textual data.
Time & Place
08.06.2023 | 16:00 c.t. - 18:00
Hörsaal D / HFB
Garystr. 35-37 /
Henry-Ford-Bau