ULAPI
8.0
|
The Ultralingua Application Programming Interface (ULAPI) is a collection of C++ classes and functions that enable programmers to use Ultralingua's multi-lingual linguistic tools in their own applications. ULAPI forms the core of the Ultralingua dictionary applications for iPhone, iPad, Mac OS X, Windows, Windows Mobile, Palm's webOS, and Symbian, as well as the Ultralingua online dictionary.
This document gives a technical overview of the main ULAPI classes. If you have questions or suggestions, please contact us at dev@u. We are eager to make ULAPI as useful as possible for you, so please keep those ideas coming. ltra lingu a.co m
To get started using ULAPI, we recommend that you read this page and then investigate the sample programs. The samples/basics/basics.cpp sample will introduce you to a few utility classes, but the samples/stemming/stemming.cpp sample is the best place to see a first typical use of ULAPI's core services.
The quickest way to get the samples running is to get yourself to a Unix prompt, with g++ installed. Then:
The "make" command will build source/lib/libulapi.a, followed by sample programs in all the subdirectories of samples/. You will have received a collection of .uld and .ulc files along with the ULAPI source code, so the "./getstems" command will refer to wherever those files are located. The getstems operation should show you that "thought" is both a noun and a form of the verb "to think".
Most of the samples/ subdirectories include two applications. One, named something-demo.cpp (where "something" will refer to the tool in question, like "stemming-demo.cpp" or "conjugation-demo.cpp), runs without command-line arguments, demonstrating examples of the tool in question. The other, named getsomething.cpp, allows you to experiment with the tool by specifying languages and words as command-line arguments. If you just type "./getsomething" after building a particular sample, you'll see a usage statement. Note also that the something-demo.cpp samples expect .uld and .ulc files to be located in the samples/data/ directory.
Each program that uses ULAPI should start by instantiating an instance of ULFactory and telling the ULFactory object about the data files it will be using (e.g. english-french.uld, english.ulc, french.ulc, etc.) This process is demonstrated in the initializeFactory function in the stemming.cpp sample. When you call:
ULFactory *factory = ULFactory::createFactory(factoryID);
factoryID is either the string given to you by Ultralingua specifically for your use, or "default" or NULL if you received no such string. It is possible to use ULAPI without ULFactory, but to do so requires more intimate knowledge of the initialization requirements of each ULAPI tool. ULFactory takes care of the nitty-gritty details. We at Ultralingua never write a program without ULFactory, since we like to be spared those details, too.
Once you have a ULFactory object in hand, you can use it to obtain tools for using ULAPI's services. For example, you can get a ULStemmer object for Spanish by calling factory->getStemmer(ULLanguage::Spanish)
. Again, see samples/stemming/stemming.cpp for details.
ULFactory keeps pointers to all the tools it instantiates for you. Thus, to free the memory used by ULAPI, all you need to do is delete the pointer you received from ULFactory::createFactory.
If you need to reduce ULAPI's memory footprint before you're done using ULAPI, you can call factory->freeInessentialMemory()
. This function closes files, deletes caches, and so on. Accessing ULAPI services will take a little longer immediately after a call to freeInessentialMemory, but in low-memory situations, this can be a valuable tool.
ULAPI is intended to be a cross-platform tool. Therefore, we have written it in ANSI C++, and tried to avoid obscure language features that might give you trouble with some compilers. ULAPI compiles and runs without change on many development systems, including Xcode, Visual Studio (6.0 through 2010), g++, etc.
The ULAPI C++ framework can be accessed from Objective-C/Cocoa applications on Mac OS X via Objective-C++, which allows C++ code to be used within Objective-C classes. For more information on accessing C++ frameworks using Objective-C++, see Apple's conceptual guide to Objective-C.
Because some important platforms (notably iOS and MacOS) discourage the use of C++ exceptions, ULAPI does not use them. Rather, it uses an assertion macro called UL_ASSERT to report illegal states, and error codes of type ULError to report failed operations (like file opening, etc.).
ULAPI includes a string class (ULString), a collection of containers (ULList, ULPair, ULHashTable, and ULVector), and iterators to go with them (ULStringIterator, ULStringReverseIterator, ULListIterator, ULListConstIterator, ULHashTableIterator). Since all of these have equivalents in the C++ Standard Template Library (STL), you might reasonably ask why we have written our own. In fact, ULAPI's earliest versions used STL, but we eventually replaced STL. Here are the main reasons why:
The STL string class does not have built-in support for the Unicode Collation Algorithm, upon which ULAPI depends for linguistically sensible look-up and sorting of dictionary data. We wrote ULString to provide a portable interface for UCA-supportive strings. For most platforms, ULString is essentially a wrapper for IBM's International Components for Unicode library. Where ICU is unavailable, we will provide alternate implementations of ULString.
STL has no hash table class. The hash_map template is a very common extension to STL, but it's not available on all platforms.
We occasionally need very tight control of the performance of our containers under some idiosyncratic operations. Writing our own containers gave us that control.
The basics.cpp sample application demonstrates simple uses of ULString, ULList, ULListIterator, and ULListConstIterator. The rest of the containers and iterators have straight-forward interfaces.
In earlier versions of ULAPI, one of the main maintenance problems was keeping track of all the places you needed to make changes when an enumerated type needed adjustment. Add Turkish to the language list? Then you have to add an item to the ULLanguage enum, find all the lists of strings that need adjustment, change several conversion functions, etc.
For ULAPI 8, we researched the many possible ways to improve the capabilities of enumerated types in C++. No solution was ideal, but we decided to go with the approach you will find in ULError, ULLanguage, and several other ULAPI classes. This approach allows us to keep the enumerated constants themselves, associated character strings, static methods, and non-static methods all in one .h/.cpp file pair.
Besides its maintenance benefits, one major benefit of our approach is the ability to associate methods (including a variety of constructors) with enumerated type objects. For example:
ULLanguage frenchLanguage(ULLanguage::French); cout << frenchLanguage.getTwoLetterISOCode() << endl; // "fr"
ULLanguage germanLanguage("deu"); cout << germanLanguage.getDisplayString() << endl; // "German"
The biggest disadvantage to our system is that the enumerated constants (e.g. ULLanguage::Polish, ULTense::ENGFirstPersonPlural, etc.) cannot be used as cases in switch statements. Fortunately, if/else-if statements work just fine.
Note that the enumerated constants themselves are references to static variables, which in turn contain unique integer ID variables. Thus, performing a comparison like this:
ULLanguage frenchLanguage(ULLanguage::French); ... if (language == ULLanguage::Spanish) { ... }
involves one non-virtual method call followed by an integer comparison.
If you want to know more about how ULAPI's enhanced enums word, We recommend that you look at kernel/ullanguage.h or kernel/ulerror.h.
The enhanced enum classes at this writing are: ULError, ULLanguage, ULPerson, ULPartOfSpeechCategory, ULTense, ULForestType, and ULWordFeature.
ULAPI services that require bilingual or monolingual dictionary data include ULDefiner (for looking up words), ULStemmer (to help the morphological analysis algorithms identify which strings are words and which are not), etc. These worker classes in turn use ULDictionaryDataSource objects to get access to the dictionary data.
ULDefiner provides the essential dictionary look-up services you are likely to need. Its core services are provided by its implementation of interfaces ULDictionary::begin, ULDictionary::end, and ULDictionary::find. Each of these methods builds a ULDictionaryIterator object pointing to an entry in the dictionary in question, and iterable based on a specified index.
For example, suppose you are working with a French-English dictionary, and you call:
ULDictionaryIterator iterator;
ULDefiner definer = factory->getDefiner(ULLanguage::French, ULLanguage::English);
/ Check for non-NULL definer goes here...