AI and Climate Change

]> AI and Climate Change Eric Lease Morgan converted into TEI-conformant markup by Eric Lease Morgan Eric Lease Morgan, © University of Notre Dame

emorgan@nd.edu

Available through the Distant Reader at . 62

This document is distributed under a GNU Public License.

I have recently applied a bit of generative-AI to a set of journal articles on the topic of climate change, and this blog posting outlines how I did the work. TL;DNR: 1) create a data set of journal articles, 2) index them using a technique called RAG, 3) provide a Web-based interface to query the index.

This is the original positing of this item.

2024-05-06 libraries and librarianshipchatbotsRetreival-augmented generation 2024-05-06 Eric Lease Morgan initial TEI encoding

Create a data set

I first used a tool of my own design -- Index to the Distant Reader -- to identify a set of scholarly journal articles on the topic of climate change. Each article in the set ought to include the phrase "climate change", be a journal article garnered from the Directory of Open Access Journals, include the word "climate" as a computed keyword, and include the word "climate" in the title. You can see the result of such a query at the Index.

I then submitted the result to another tool of my own design, the Distant Reader. The result is a data set -- affectionately called a "study carrel". Given a set of texts, the Reader programmatically curates the collection, to the best of its ability, and outputs sets of files describing the collection. For example, it creates rudimentary bibliographies (txt, html, json ). It also creates a simple summary page as well as a browsable interface to the whole. Heck, you can even download the carrel .

I now have a computable collection.

Index

Using a technique called RAG (retrieval-augmented generation) one can vectorize ("index") content, and then make it searchable. With the support of a generous Amazon Web Services sponsorship ("Thanks Luke and Brian!"), I wrote such an indexing application. The script loops through each document in the carrel, vectorizes it, and caches the result.

Finally, by exploiting a rudimentary chat interface, one can query the index, get a human-readable result, and get a list of the items from whence the result was generated. (See the interface and chat.py. [9]) This interface is temporary, and if the this link does not work, then drop me a line and I'll see if it can get it back up and running. The quality of the results is a factor of the computing adage "garbage in, garbage out". The original content needs to be balanced. The indexing needs to be thorough. The queries applied to the index need to be thoughtful; querying the collection for the definition of love is only asking for trouble.

Conclusion

Generative-AI is a thing. It behooves us here in Library Land to know how to exploit the technology. Like any other technology, it can be used well, poorly, or nefariously. Only after experimentation and learning will be be able to use generative-AI effectively. More specifically, libraries too can curate content, use RAG to index it, and provide interfaces to the index. Such would be an additional tool on our toolbox and supplement the learning process.