Short JWNL tutorial

WordNet is an invaluable resource for NLP research. John Didion has developed a Java library for accessing WordNet data in a programmatic way. To access WN from Java, the following steps are necessary:

  1. Download WordNet
  2. Add a dependency to JWNL to your project or download the library.
  3. Configure properties.xml so that JWNL knows where to find WordNet and which version is used.
  4. Create Dictionary instance for querying WordNet.

Configuration

The configuration is stored in an XML file that sets the path where WordNet can be found. If you use a standard WN distribution, then the path should end in dict as the following minimalistic properties.xml illustrates:

On GitHub, you find two prepared properties files:

  • properties_min.xml uses only a minimum of the possible settings
  • properties.xml includes a rule-based morphological stemmer that allows you to query for inflected forms, e.g., houses, runs, dogs

Boilerplate code

A singleton instance of Dictionary is used to query WordNet with JWNL. In fact, setting up the dictionary is very easy:

Afterwards, you can easily query the dictionary for a lemma of your choice (try house, houses, dog). For each lemma, you also specify one of the 4 possible part-of-speech classes that you are looking for, that is one of POS.ADJECTIVE, POS.ADVERB, POS.NOUN, POS.VERB. For house you would choose POS.NOUN or POS.VERB. The whole process looks rather clumsy, so I listed the steps below:

  1. Lookup: Is the lemma in the dictionary?
    final IndexWord indexWord = dictionary.lookupIndexWord(pos, lemma);

    • If the lookup fails, indexWord is null.
  2. What different senses may the lemma have?
    final Synset[] senses = indexWord.getSenses();
  3. For each sense, we may get a short description of the sense, called the gloss.
    final String gloss = synset.getGloss();
  4. What other lemmas are in a synset?
    final Word[] words = synset.getWords();

    • For each word, we may get its lemma and its POS: word.getLemma(); and word.getPOS().getKey();

Where to get it

The code for this tutorial is available on GitHub. You need to copy the template properties file(s) in src/main/resources before you can run the code. Given an lemma and part-of-speech, the program returns the list of synsets that contain the lemma. For house/v the output looks like so:

Aug 23, 2013 9:13:40 AM net.didion.jwnl.dictionary.Dictionary doLog
INFO: Installing dictionary net.didion.jwnl.dictionary.FileBackedDictionary@6791d8c1
 1 Lemmas: [house/v] (Gloss: contain or cover; “This box houses the gears”)
 2 Lemmas: [house/v, put_up/v, domiciliate/v] (Gloss: provide housing for; “The immigrants were housed in a new development outside the town”)

Maven dependency for JWNL reader and the necessary logging:

Links

  • [1] JWNL Sourceforge site
  • [2] JWNLSourceforge Wiki with much more information
  • [3] WordNet 3.0 download

4 comments

    1. Thanks, Kirstie! I updated the pom.xml to use the following dependency:

      <dependency>
      <groupId>net.sf.jwordnet</groupId>
      <artifactId>jwnl</artifactId>
      <version>1.4_rc3</version>
      </dependency>

Leave a Reply