ULAPI  8.0
Public Member Functions | List of all members
ULStemmer Class Reference

A worker that computes the stems of words. For example, the French word "couvent" is both a singular noun and the third person plural form of the verb "couver". Thus, a French ULStemmer object would identify both "couvent n.m." and "couver v." as stems for "couvent". More...

#include <ulstemmer.h>

Public Member Functions

 ULStemmer ()
 
 ULStemmer (const ULStemmer &other)
 
virtual ~ULStemmer ()
 
ULStemmeroperator= (const ULStemmer &other)
 
void clear ()
 
ULDissectorgetDissector ()
 
void setDissector (ULDissector *newDissector)
 
const ULLanguagegetLanguage () const
 
virtual bool isServiceAvailable (const ULServiceDescriptor &service)
 
virtual void getAvailableServices (ULList< ULServiceDescriptor > &services)
 
virtual void setCancelOperation (bool shouldCancel)
 
ULError getAllStems (const ULString &surfaceForm, ULList< ULDerivation > &stemList)
 
ULError getStems (const ULString &surfaceForm, ULList< ULDerivation > &stemList)
 
ULError getStems (const ULString &surfaceForm, const ULPartOfSpeechCategory &category, ULList< ULDerivation > &stemList)
 
ULError getFrequencies (const ULString &surfaceForm, ULList< ULFrequency > &frequencyList)
 
- Public Member Functions inherited from ULWorker
 ULWorker ()
 
virtual ~ULWorker ()
 
virtual bool shouldCancelOperation () const
 

Detailed Description

A worker that computes the stems of words. For example, the French word "couvent" is both a singular noun and the third person plural form of the verb "couver". Thus, a French ULStemmer object would identify both "couvent n.m." and "couver v." as stems for "couvent".

Constructor & Destructor Documentation

ULStemmer::ULStemmer ( )

Default constructor.

ULStemmer::ULStemmer ( const ULStemmer other)

Copy constructor.

ULStemmer::~ULStemmer ( )
virtual

Destructor

Member Function Documentation

void ULStemmer::clear ( )

Sets this stemmer to its default state.

ULError ULStemmer::getAllStems ( const ULString surfaceForm,
ULList< ULDerivation > &  stemList 
)

Computes the full list of stems for the specified surface form. For example, if the surface form is "thought", then the stem list will include (thought, thought, noun), (thought, think, verb, past participle), (thought, think, verb, past tense, first person singular), (thought, think, verb, past tense, second person singular), etc.

This list can get long, since some surface forms will play many roles for the same root word (as "thought" does for the verb "think"). To get the list of distinct root words without the repetition (e.g. only one stem for (thought, think, verb)), use getStems instead of getAllStems.

Returns
ULError::NoError if the stemming operation completes successfully, ULError::NoMatch if there are no suitable stems, or some other ULError value
Parameters
[in]surfaceFormThe word whose stems are sought.
[in]stemListThe list of stems.
void ULStemmer::getAvailableServices ( ULList< ULServiceDescriptor > &  serviceList)
virtual
Parameters
[out]serviceListUsed to return a list of all the services this ULWorker can provide.

Implements ULWorker.

ULDissector * ULStemmer::getDissector ( )
Returns
a pointer to the ULDissector used by this stemmer to perform stemming operations.
ULError ULStemmer::getFrequencies ( const ULString surfaceForm,
ULList< ULFrequency > &  frequencyList 
)

ULLanguageDataSource objects may contain frequency data of the form (word, root, part-of-speech, count). These data come from manually tagged corpora similar to the American National Corpus or the Penn Treebank.

This method returns a list of frequency objects corresponding to the specified word. (For example, the word "chairs" might yield ("chairs", "chair", verb, 21), ("chairs", "chair", noun, 623), and ("chairs", "chair", unknown, 2).

The method performs its search in a case-insensitive and accent-insensitive way.

Returns
ULError::NoMatch if there are no frequency records corresponding to the specified word, ULError::DataSourceOpenFailed if there was a problem with the data source, or ULError::NoError otherwise.
Parameters
[in]wordthe word whose frequencies are sought
[out]frequencyListthe corresponding frequencies, sorted in decreasing order of frequency
const ULLanguage & ULStemmer::getLanguage ( ) const
Returns
the language for which this stemmer can compute stems.
ULError ULStemmer::getStems ( const ULString surfaceForm,
ULList< ULDerivation > &  stemList 
)

Computes the list of root words for the specified surface form. For example, if the surface form is "thought", then the stem list will consist of (thought, thought, noun) and (thought, think, verb). Note that unlike getAllStems, this method returns exactly one ULDerivation object per root word, and is thus usually more useful and easier to use than getAllStems.

Returns
ULError::NoError if the stemming operation completes successfully, ULError::NoMatch if there are no suitable stems, or some other ULError value otherwise.
Parameters
[in]surfaceFormThe word whose stems are sought.
[in]stemListThe list of stems.
ULError ULStemmer::getStems ( const ULString surfaceForm,
const ULPartOfSpeechCategory category,
ULList< ULDerivation > &  stemList 
)

Computes the list of root words for the specified surface form, restricted to the specified part of speech category. For example, if the surface form is "thought" and the category is verb, then the stem list will consist of only (thought, think, verb). Note that unlike getAllStems, this method returns exactly one ULDerivation object per root word.

Note that the part of speech category restriction applies to the surface form, not necessarily to the root. For example, the surface form "baker" is a noun, but it can be stemmed to the root "bake", which is a verb. If we call getStems("baker", ULPartOfSpeechCategory::Verb, stemList), we will get no results, since "baker" is not a verb. If we call getStems("baker", ULPartOfSpeechCategory::Noun, stemList), however, we will get the verb "bake" as our stem.

Returns
ULError::NoError if the stemming operation completes successfully, ULError::NoMatch if there are no suitable stems, or some other ULError value otherwise.
Parameters
[in]surfaceFormThe word whose stems are sought.
[in]categoryThe desired part of speech category.
[in]stemListThe list of stems.
bool ULStemmer::isServiceAvailable ( const ULServiceDescriptor service)
virtual
Returns
true if the specified service can be performed by this ULWorker, and false otherwise.
Parameters
serviceThe desired service.

Implements ULWorker.

ULStemmer & ULStemmer::operator= ( const ULStemmer other)

Assignment operator.

void ULStemmer::setCancelOperation ( bool  shouldCancel)
virtual

Setter for the long-operation cancellation boolean attribute.

Parameters
[in]setto true if

Reimplemented from ULWorker.

void ULStemmer::setDissector ( ULDissector newDissector)

Sets the ULDissector to be used by this stemmer to perform stemming operations. This ULStemmer does not take responsibility for deleting the dissector. That will need to happen elsewhere (typically the ULFactory will take care of it if your application uses ULFactory to instantiate data sources and workers). param[in] newDissector A pointer to the desired dissector.


The documentation for this class was generated from the following files: