<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main Document Type//EN"
"http://www.infomotions.com/alex/dtd/tei2.dtd" [
<!ENTITY % TEI.XML         'INCLUDE' >
<!ENTITY % TEI.prose       'INCLUDE' >
<!ENTITY % TEI.linking     'INCLUDE' >
<!ENTITY % TEI.figures     'INCLUDE' >
<!ENTITY % TEI.names.dates 'INCLUDE' >
<!ATTLIST xptr   url CDATA #IMPLIED >
<!ATTLIST xref   url CDATA #IMPLIED >
<!ATTLIST figure url CDATA #IMPLIED >
]> 
<TEI.2>
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>AI and Climate Change</title> 
        <author>Eric Lease Morgan</author>
        <respStmt>
          <resp>converted into TEI-conformant markup by</resp>
          <name>Eric Lease Morgan</name>
        </respStmt>
      </titleStmt>
      <publicationStmt>
        <publisher>Eric Lease Morgan, &#169; University of Notre Dame</publisher>
        <address>
        	<addrLine>emorgan@nd.edu</addrLine>
        </address>
        <distributor>Available through the Distant Reader at <xptr url='https://distantreader.org/blog/ai-and-climate-change/' />.</distributor>
        <idno type='reader'>62</idno>
        <availability status='free'>
          <p>This document is distributed under a GNU Public License.</p>
        </availability>
      </publicationStmt>
      <notesStmt>
       <note type='abstract'>I have recently applied a bit of generative-AI to a set of journal articles on the topic of climate change, and this blog posting outlines how I did the work. TL;DNR: 1) create a data set of journal articles, 2) index them using a technique called RAG, 3) provide a Web-based interface to query the index.
</note>
      </notesStmt>
      <sourceDesc>
        <p>This is the original positing of this item.</p>
      </sourceDesc>
    </fileDesc>
    <profileDesc>
      <creation>
        <date>2024-05-06</date>
      </creation>
      <textClass>
        <keywords>
          <list><item>libraries and librarianship</item><item>chatbots</item><item>Retreival-augmented generation</item></list>
        </keywords>
      </textClass>
    </profileDesc>
    <revisionDesc>
      <change>
<date>2024-05-06</date>
<respStmt>
<name>Eric Lease Morgan</name>
</respStmt>
<item>initial TEI encoding</item>
</change>
    </revisionDesc>
  </teiHeader>
  <text>
    <front>
    </front>
    <body>
      <div1>

<p>I have recently applied a bit of generative-AI to a set of journal articles on the topic of climate change, and this blog posting outlines how I did the work. TL;DNR: 1) create a data set of journal articles, 2) index them using a technique called RAG, 3) provide a Web-based interface to query the index.</p>


<div2><head>Create a data set</head>


<p>
I first used a tool of my own design -- Index to the Distant Reader -- to identify a set of scholarly journal articles on the topic of climate change. Each article in the set ought to include the phrase "climate change", be a journal article garnered from the Directory of Open Access Journals, include the word "climate" as a computed keyword, and include the word "climate" in the title. You can see the <xref url='http://index.distantreader.org/biblios?version=2.0&amp;operation=searchRetrieve&amp;maximumRecords=512&amp;recordSchema=marcxml&amp;facetLimit=32&amp;stylesheet=style-searchRetrieve.xsl&amp;query=%28%28%28%22climate+change%22%29+and+koha.ccode%3D%22DOAJ%22%29+and+dc.title%3Dclimate%29+and+dc.subject%3Dclimate'>result of such a query</xref>
 at the Index.
</p>

<p>
I then submitted the result to another tool of my own design, the Distant Reader. The result is a data set -- affectionately called a "study carrel". Given a set of texts, the Reader programmatically curates the collection, to the best of its ability, and outputs sets of files describing the collection. For example, it creates rudimentary bibliographies (<xref url='http://carrels.distantreader.org/search-climate_change-reader/index.txt'>txt</xref>, <xref url='http://carrels.distantreader.org/search-climate_change-reader/index.xhtml'>html</xref>, <xref url='http://carrels.distantreader.org/search-climate_change-reader/index.json'>json</xref>
). It also creates a simple <xref url='http://carrels.distantreader.org/search-climate_change-reader/index.htm'>summary page</xref>
 as well as a <xref url='http://carrels.distantreader.org/search-climate_change-reader/index.xml'>browsable interface</xref>
 to the whole. Heck, you can even <xref url='http://carrels.distantreader.org/search-climate_change-reader/index.zip'>download the carrel</xref>
.
</p>

<p>
I now have a computable collection.
</p>
</div2>


<div2><head>Index</head>


<p>
Using a technique called RAG (retrieval-augmented generation) one can vectorize ("index") content, and then make it searchable. With the support of a generous Amazon Web Services sponsorship ("Thanks Luke and Brian!"), I wrote such an <xref url='./bin/index.py'>indexing application</xref>. The script loops through each document in the carrel, vectorizes it, and caches the result. 
</p>

</div2>


<div2><head>Search</head>


<p>
Finally, by exploiting a rudimentary chat interface, one can query the index, get a human-readable result, and get a list of the items from whence the result was generated. (See the <xref url='https://5c0af9ffadb4b3d2ba.gradio.live'>interface</xref> and <xref url='./bin/chat.py'>chat.py</xref>. [9]) This interface is temporary, and if the this link does not work, then drop me a line and I'll see if it can get it back up and running. The quality of the results is a factor of the computing adage "garbage in, garbage out". The original content needs to be balanced. The indexing needs to be thorough. The queries applied to the index need to be thoughtful; querying the collection for the definition of love is only asking for trouble.
</p>

</div2>


<div2><head>Conclusion</head>


<p>
Generative-AI is a thing. It behooves us here in Library Land to know how to exploit the technology. Like any other technology, it can be used well, poorly, or nefariously. Only after experimentation and learning will be be able to use generative-AI effectively. More specifically, libraries too can curate content, use RAG to index it, and provide interfaces to the index. Such would be an additional tool on our toolbox and supplement the learning process.
</p>

</div2>
</div1>

    </body>
    <back>
    </back>
  </text>
</TEI.2>
