Generative-AI Summarization

]> Generative-AI Summarization Eric Lease Morgan converted into TEI-conformant markup by Eric Lease Morgan Eric Lease Morgan, © University of Notre Dame

emorgan@nd.edu

Available through the Distant Reader at . 68

This document is distributed under a GNU Public License.

Ann Blair's book Too Much To Know overflows with techniques of how pre-early modern scholars dealt with information overload. One of the more oft-used techniques is summarization. With the advent of generative-AI, it is almost trivial to create more-than-plausible summaries of documents.

This is the original publication of this posting.

2024-06-27 libraries and librarianshiplarge-language models (LLMs)summarization 2024-06-27 Eric Lease Morgan initial TEI encoding

Ann Blair's book Too Much To Know overflows with techniques of how pre-early modern scholars dealt with information overload. [1] One of the more oft-used techniques is summarization. With the advent of generative-AI, it is almost trivial to create more-than-plausible summaries of documents.

The linked Python script is an example. Given the path to a plain text file, the script will load a configured large-language model, vectorize the given plain text file, compare the two, and output a three-sentence summary. I enhanced the script to work in batch, and thus I have used the technique to summarize collections of items:

each chapter in each book written by Jane Austen 250 journal articles on the topic rheumatoid arthritis another 250 journal articles on the topic of climate change 130 articles on the topic of cataloging

For any given document there are zero 100% correct summaries; everybody will summarize a document differently. That said, the results of this automated process look pretty good to me. Moreover, each list of summaries addresses difficult to answer questions such as:

how can Jane Austen's works be characterized? what is rheumatoid arthritis and what are some of its treatments? how is climate change being manifested across the globe? how has the practice of cataloging changed over time?

The lists of summaries may be deemed as information overload in-and-of themselves, and one might consider summarizing the summaries. Such is an exercise left up to the reader.

I believe libraries and librarians ought to learn how to exploit generative-AI for summarization purposes. Just as the migration of printed cards to MARC transformed how libraries hosted catalogs, migrating from hand-crafted summaries to computed summaries will transform how information overload is managed.

[1] Blair, Ann. 2010. Too Much to Know : Managing Scholarly Information Before the Modern Age. New Haven Conn: Yale University Press.