ULAPI  8.0
Ultralingua Application Programming Interface (ULAPI)

Introduction

The Ultralingua Application Programming Interface (ULAPI) is a collection of C++ classes and functions that enable programmers to use Ultralingua's multi-lingual linguistic tools in their own applications. ULAPI forms the core of the Ultralingua dictionary applications for iPhone, iPad, Mac OS X, Windows, Windows Mobile, Palm's webOS, and Symbian, as well as the Ultralingua online dictionary.

This document gives a technical overview of the main ULAPI classes. If you have questions or suggestions, please contact us at dev@u.nosp@m.ltra.nosp@m.lingu.nosp@m.a.co.nosp@m.m. We are eager to make ULAPI as useful as possible for you, so please keep those ideas coming.

Getting started: sample programs

To get started using ULAPI, we recommend that you read this page and then investigate the sample programs. The samples/basics/basics.cpp sample will introduce you to a few utility classes, but the samples/stemming/stemming.cpp sample is the best place to see a first typical use of ULAPI's core services.

The quickest way to get the samples running is to get yourself to a Unix prompt, with g++ installed. Then:

The "make" command will build source/lib/libulapi.a, followed by sample programs in all the subdirectories of samples/. You will have received a collection of .uld and .ulc files along with the ULAPI source code, so the "./getstems" command will refer to wherever those files are located. The getstems operation should show you that "thought" is both a noun and a form of the verb "to think".

Most of the samples/ subdirectories include two applications. One, named something-demo.cpp (where "something" will refer to the tool in question, like "stemming-demo.cpp" or "conjugation-demo.cpp), runs without command-line arguments, demonstrating examples of the tool in question. The other, named getsomething.cpp, allows you to experiment with the tool by specifying languages and words as command-line arguments. If you just type "./getsomething" after building a particular sample, you'll see a usage statement. Note also that the something-demo.cpp samples expect .uld and .ulc files to be located in the samples/data/ directory.

Getting started: using ULFactory

Each program that uses ULAPI should start by instantiating an instance of ULFactory and telling the ULFactory object about the data files it will be using (e.g. english-french.uld, english.ulc, french.ulc, etc.) This process is demonstrated in the initializeFactory function in the stemming.cpp sample. When you call:

     ULFactory *factory = ULFactory::createFactory(factoryID);

factoryID is either the string given to you by Ultralingua specifically for your use, or "default" or NULL if you received no such string. It is possible to use ULAPI without ULFactory, but to do so requires more intimate knowledge of the initialization requirements of each ULAPI tool. ULFactory takes care of the nitty-gritty details. We at Ultralingua never write a program without ULFactory, since we like to be spared those details, too.

Once you have a ULFactory object in hand, you can use it to obtain tools for using ULAPI's services. For example, you can get a ULStemmer object for Spanish by calling factory->getStemmer(ULLanguage::Spanish). Again, see samples/stemming/stemming.cpp for details.

ULFactory keeps pointers to all the tools it instantiates for you. Thus, to free the memory used by ULAPI, all you need to do is delete the pointer you received from ULFactory::createFactory.

If you need to reduce ULAPI's memory footprint before you're done using ULAPI, you can call factory->freeInessentialMemory(). This function closes files, deletes caches, and so on. Accessing ULAPI services will take a little longer immediately after a call to freeInessentialMemory, but in low-memory situations, this can be a valuable tool.

General information

Strings and containers

ULAPI includes a string class (ULString), a collection of containers (ULList, ULPair, ULHashTable, and ULVector), and iterators to go with them (ULStringIterator, ULStringReverseIterator, ULListIterator, ULListConstIterator, ULHashTableIterator). Since all of these have equivalents in the C++ Standard Template Library (STL), you might reasonably ask why we have written our own. In fact, ULAPI's earliest versions used STL, but we eventually replaced STL. Here are the main reasons why:

The basics.cpp sample application demonstrates simple uses of ULString, ULList, ULListIterator, and ULListConstIterator. The rest of the containers and iterators have straight-forward interfaces.

ULAPI "enhanced enums"

In earlier versions of ULAPI, one of the main maintenance problems was keeping track of all the places you needed to make changes when an enumerated type needed adjustment. Add Turkish to the language list? Then you have to add an item to the ULLanguage enum, find all the lists of strings that need adjustment, change several conversion functions, etc.

For ULAPI 8, we researched the many possible ways to improve the capabilities of enumerated types in C++. No solution was ideal, but we decided to go with the approach you will find in ULError, ULLanguage, and several other ULAPI classes. This approach allows us to keep the enumerated constants themselves, associated character strings, static methods, and non-static methods all in one .h/.cpp file pair.

Besides its maintenance benefits, one major benefit of our approach is the ability to associate methods (including a variety of constructors) with enumerated type objects. For example:

ULLanguage frenchLanguage(ULLanguage::French);
cout << frenchLanguage.getTwoLetterISOCode() << endl; // "fr"
ULLanguage germanLanguage("deu");
cout << germanLanguage.getDisplayString() << endl; // "German"

The biggest disadvantage to our system is that the enumerated constants (e.g. ULLanguage::Polish, ULTense::ENGFirstPersonPlural, etc.) cannot be used as cases in switch statements. Fortunately, if/else-if statements work just fine.

Note that the enumerated constants themselves are references to static variables, which in turn contain unique integer ID variables. Thus, performing a comparison like this:

ULLanguage frenchLanguage(ULLanguage::French);
...
if (language == ULLanguage::Spanish) {
    ...
}

involves one non-virtual method call followed by an integer comparison.

If you want to know more about how ULAPI's enhanced enums word, We recommend that you look at kernel/ullanguage.h or kernel/ulerror.h.

The enhanced enum classes at this writing are: ULError, ULLanguage, ULPerson, ULPartOfSpeechCategory, ULTense, ULForestType, and ULWordFeature.

Working with Ultralingua dictionary data

ULAPI services that require bilingual or monolingual dictionary data include ULDefiner (for looking up words), ULStemmer (to help the morphological analysis algorithms identify which strings are words and which are not), etc. These worker classes in turn use ULDictionaryDataSource objects to get access to the dictionary data.

ULDefiner provides the essential dictionary look-up services you are likely to need. Its core services are provided by its implementation of interfaces ULDictionary::begin, ULDictionary::end, and ULDictionary::find. Each of these methods builds a ULDictionaryIterator object pointing to an entry in the dictionary in question, and iterable based on a specified index.

For example, suppose you are working with a French-English dictionary, and you call:


ULDictionaryIterator iterator;
ULDefiner definer = factory->getDefiner(ULLanguage::French, ULLanguage::English);
/ Check for non-NULL definer goes here...