<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main Document Type//EN"
"http://www.infomotions.com/alex/dtd/tei2.dtd" [
<!ENTITY % TEI.XML         'INCLUDE' >
<!ENTITY % TEI.prose       'INCLUDE' >
<!ENTITY % TEI.linking     'INCLUDE' >
<!ENTITY % TEI.figures     'INCLUDE' >
<!ENTITY % TEI.names.dates 'INCLUDE' >
<!ATTLIST xptr   url CDATA #IMPLIED >
<!ATTLIST xref   url CDATA #IMPLIED >
<!ATTLIST figure url CDATA #IMPLIED >
]> 
<TEI.2>
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>Whenever you have a hammer, everything begins to look like a nail: Hypernyms</title> 
        <author>Eric Lease Morgan</author>
        <respStmt>
          <resp>converted into TEI-conformant markup by</resp>
          <name>Eric Lease Morgan</name>
        </respStmt>
      </titleStmt>
      <publicationStmt>
        <publisher>Eric Lease Morgan, &#169; University of Notre Dame</publisher>
        <address>
        	<addrLine>emorgan@nd.edu</addrLine>
        </address>
        <distributor>Available through the Distant Reader at <xptr url='https://distantreader.org/blog/hypernyms/' />.</distributor>
        <idno type='reader'>43</idno>
        <availability status='free'>
          <p>This document is distributed under a GNU Public License.</p>
        </availability>
      </publicationStmt>
      <notesStmt>
       <note type='abstract'>Whenver you have a hammer, everything begins to look like a nail, and my newest hammer outputs network graphs describing sets of words and their associated hypernyms.</note>
      </notesStmt>
      <sourceDesc>
        <p>This is the original source of this publication.</p>
      </sourceDesc>
    </fileDesc>
    <profileDesc>
      <creation>
        <date>2023-11-06</date>
      </creation>
      <textClass>
        <keywords>
          <list><item>hypernyms</item><item>hacks</item></list>
        </keywords>
      </textClass>
    </profileDesc>
    <revisionDesc>
      <change>
<date>2023-11-06</date>
<respStmt>
<name>Eric Lease Morgan</name>
</respStmt>
<item>initial TEI encoding</item>
</change>
    </revisionDesc>
  </teiHeader>
  <text>
    <front>
    </front>
    <body>
      <div1>
<p>
Whenever you have a hammer, everything begins to look like a nail, and my newest hammer outputs network graphs describing sets of words and their associated hypernyms.
</p>

<p>
As a librarian, I am always interested in the concept of aboutness; very often I ask myself, "What is this item or corpus about?" Traditionally, librarians read content, identify themes, peruse a controlled vocabulary for authorized themes, and make assignments accordingly. A more modern technique might be to calculate statistically significant words using an algorithm like Term-Frequency Inverse Document Frequency (TFIDF). Another approach might be to apply topic modeling to a corpus, observe the resulting topics, and identify aboutness terms. I have used all of these techniques, many times.
</p>

<p>
My latest techique is to appy the concept of hypernms to sets of words. For all intents and purposes, hypernyms are broader terms of given terms. For example, given the terms "January" and "Febrary" a broader term might be "month". Similarly, given the terms "France" and "Germany" a broader term might be "country". Luckily, the venerable tool called WordNet implements (models) the concept of hypernyms. Given two WordNet things (called "synsets"), it is possible to compute their closest hypernym. I can repeat this process for a given lexicon (a set of words of interest), and output the result as a network graph. Thus, my new hammer was born, hypernyms.py. Given a lexicon, hypernyms.py: 1) identifies a synset for each word, 2) compares the resulting synset with every other synset, 3) finds the closest hypernym, and 4) outputs a graph modeling language file where nodes are the lexicon words or hypernyms, and edges are the floating point numbers denoting the distances between the nodes.
</p>

<p>
As an example, I applied this technique to the results of topic modeling. As you may or may not know, topic modeling is an unsupervised machine learning process used to enumerate latent themes in a corpus. The result of topic modeling are lists of themes where each theme is a list of words. These words are close to each other in the given corpus, and therefore are considered topics. For example, if I topic model Homer's Iliad and Odyssey, then the resulting themes/topics might be listed like this:
</p>

<p rend='pre'>        themes  weights                                           features
         house  1.08442       house men ulysses father see home took made 
       trojans  0.40099  trojans spear hector achaeans fight jove ships...
      achilles  0.18074  achilles peleus priam hector city women body r...
     agamemnon  0.11047  agamemnon ships achaeans atreus nestor king jo...
           sea  0.10438         sea ship men circe wind island water cave 
        horses  0.10101  horses menelaus diomed agamemnon nestor tydeus...
       ulysses  0.09161  ulysses telemachus suitors penelope house euma...
      alcinous  0.03744  alcinous phaeacians clothes ulysses stranger t...
</p>

<p>
The student, researcher, or scholar might then take all of these words as their lexicon, compare them to each other to identify hypernyms, output as a network graph, and visualize the results. In the following visualization, the original keywords are in black, and the computed hypernyms are red. Thus, another way to interpret the topic model is to say Homer's Iliad and Odyssey are about mythical beings, groups, persons, etc.:
</p>

<p rend='center'><figure url='./hypernyms.png'/><lb/>visualizing relationships between keywords (in black) and hypernyms (in red)</p>


<p>
Applying a technique called modularity to the graph, one can enumerate clusters of nodes, and visualizing such things brings the beings, groups, and persons together:
</p>

<p rend='center'><figure url='./modularities.png'/><lb/>visualizing mathematical modularities (clusters) of keywords and hypernyms</p>

<p>
In summary, modeling text in terms of hypernyms is a quick and easy way to grasp the aboutness of an text. The script -- hypernyms.py -- can be applied to any list of words. Such words might be the N most frequent words in a text, a set of computed keywords, a list of named-entities, etc. Fun with data science applied to words. The script, a sample lexicon, a graphic modeling language file, a Gephi file, and a couple of images are all included in the attached file -- <xref url='./hypernyms.zip'>hypernyms.zip</xref>
. Enjoy.
</p>

</div1>

    </body>
    <back>
    </back>
  </text>
</TEI.2>
