ULLanguageDataSource is the abstract parent for classes that interface with single-language data stored somewhere like a .ulc file or a database.
More...
#include <ullanguagedatasource.h>
|
virtual | ~ULLanguageDataSource () |
|
virtual ULError | attach (const ULString &dataSourceIdentifier)=0 |
|
virtual ULError | detach ()=0 |
|
virtual ULError | load ()=0 |
|
virtual ULError | close ()=0 |
|
virtual ULString | getDataSourceIdentifier ()=0 |
|
virtual ULDataSourceVersion | getVersion ()=0 |
|
virtual const ULLanguage & | getLanguage ()=0 |
|
virtual ULError | getWords (const ULString &root, ULList< ULDerivation > &wordList, bool filterResults=false)=0 |
|
virtual ULError | getVerbs (const ULString &infinitive, ULList< ULDerivation > &verbList, bool filterResults=false)=0 |
|
virtual ULError | getNouns (const ULString &text, ULList< ULDerivation > &nounList)=0 |
|
virtual ULError | getVerbModel (uluint32 verbClassID, ULVerbModel &model)=0 |
|
virtual ULError | getMatchingRoots (const ULString &prefix, uluint32 maxMatches, ULList< ULString > &rootList)=0 |
|
virtual ULError | getMatchingNouns (const ULString &prefix, uluint32 maxMatches, ULList< ULString > &nounList)=0 |
|
virtual ULError | getMatchingInfinitives (const ULString &prefix, uluint32 maxMatches, ULList< ULString > &infinitiveList)=0 |
|
virtual ULError | getMatchingInfinitives (const ULString &prefix, uluint32 maxMatches, ULList< ULDerivation > &infinitiveList)=0 |
|
virtual ULError | getVerbFormTypes (const ULDerivation &verb, ULList< ULPartOfSpeech > &verbFormTypes)=0 |
|
virtual ULError | getTenses (const ULDerivation &v, ULList< ULTense > &tenseList, bool includeParticiples=false)=0 |
|
virtual ULError | getTensesForClass (uluint32 classID, ULList< ULTense > &tenseList, bool includeParticiples=false)=0 |
|
virtual ULError | getAllTenses (ULList< ULTense > &tenseList, bool includeParticiples=false)=0 |
|
virtual ULError | getPersons (const ULDerivation &v, ULTense tense, ULList< ULPerson > &personList)=0 |
|
virtual ULError | getPersonsForClass (const ULDerivation &v, uluint32 classID, ULTense tense, ULList< ULPerson > &personList)=0 |
|
virtual ULError | getAllTaggingRules (ULList< ULTaggingRule > &ruleList)=0 |
|
virtual ULError | getFeatureNameList (ULList< ULString > &featureNameList)=0 |
|
virtual ULError | getInflectionRules (const ULDerivation &derivation, const ULPartOfSpeech &targetPartOfSpeech, ULList< ULInflectionRule > &ruleList)=0 |
|
virtual ULError | getInflectionRulesForDissection (const ULDerivation &derivation, ULList< ULInflectionRule > &ruleList)=0 |
|
virtual ULError | getSuccessors (const ULInflectionRule &rule, ULList< ULInflectionRule > &successorList)=0 |
|
virtual ULError | getPredecessors (const ULInflectionRule &rule, ULList< ULInflectionRule > &predecessorList)=0 |
|
virtual bool | hasStopWord (const ULString &word)=0 |
|
virtual ULError | getClosedClassWordForPartOfSpeech (const ULPartOfSpeech &partOfSpeech, ULList< ULDerivation > &wordList)=0 |
|
virtual ULError | getFrequencies (const ULString &word, ULList< ULFrequency > &frequencyList)=0 |
|
virtual | ~ULDataSource () |
|
| ULLockable () |
|
| ULLockable (const ULLockable &lockable) |
|
virtual | ~ULLockable () |
|
const ULLockable & | operator= (const ULLockable &lockable) |
|
void | clear () |
|
ULLock * | getLock () |
|
void | setLock (ULLock *newLock) |
|
ULLanguageDataSource is the abstract parent for classes that interface with single-language data stored somewhere like a .ulc file or a database.
Warning: If you find yourself thinking about directly using one of the subclasses of this class, you should reconsider. It is much easier to use ULAPI's data sources correctly by working with a ULFactory and the associated higher-level tools such as ULConjugator or ULStemmer, which take care of the initialization and manipulation of the data sources for you.
virtual ULLanguageDataSource::~ULLanguageDataSource |
( |
| ) |
|
|
inlinevirtual |
virtual ULError ULLanguageDataSource::attach |
( |
const ULString & |
dataSourceIdentifier | ) |
|
|
pure virtual |
Causes this ULDataSource object to be associated with the specified data source, and reads enough information from that data source to determine its language(s), etc.
The exact behavior of attach will be dependent on the nature of the data source. If the data source is a file, then attach will read header information from the file and then close the file to save memory until the data source is actually needed. On the other hand, if the data source is a remote database, then attach might open a connection, collect header information, and then close the connection.
- Returns
- ULError::NoError if the attachment is successful, or some other ULError value if not.
- Parameters
-
[in] | dataSourceIdentifier | A string describing the data source (e.g. a file name, a database connection string, a URL, etc.). |
Implements ULDataSource.
virtual ULError ULLanguageDataSource::close |
( |
| ) |
|
|
pure virtual |
Frees dynamically allocated memory associated with this data source while keeping it attached to the file, db, etc. to which it was previously attached. Also closes any relevant files, db connections, etc.
- Returns
- ULError::NoError if the memory freeing was successful.
Implements ULDataSource.
virtual ULError ULLanguageDataSource::detach |
( |
| ) |
|
|
pure virtual |
Releases the connection between this ULDataSource object and the data source specified in the previous open() or attach() call, closing any relevant files or network connections and freeing memory in the process.
- Returns
- ULError::NoError if the attachment is successful, or some other ULError value if not.
Implements ULDataSource.
virtual ULError ULLanguageDataSource::getAllTaggingRules |
( |
ULList< ULTaggingRule > & |
ruleList | ) |
|
|
pure virtual |
Retrieves all the part-of-speech tagging rules used in this data source's language.
- Returns
- ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
- Parameters
-
[out] | ruleList | The list of rules. |
virtual ULError ULLanguageDataSource::getAllTenses |
( |
ULList< ULTense > & |
tenseList, |
|
|
bool |
includeParticiples = false |
|
) |
| |
|
pure virtual |
Finds all the tenses available for the language associated with this data source.
- Returns
- ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
- Parameters
-
[in] | includeParticiples | True if the tense list should include participles (e.g. infinitive, past participle, present participle, gerund...). |
ULLanguageDataSource objects may contain a list of closed class words with corresponding parts of speech. Typically, a data source will include articles, conjunctions, pronouns, and prepositions.
This method retrieves ULDerivation objects for every closed class word whose part of speech satisfies the category and features in the partOfSpeech parameter. If you want all the closed class words, just set partOfSpeech's category to ULPartOfSpeechCategory::Any, without any features.
- Returns
- ULError::NoError or ULError::NoMatch, depending on whether there are any matching words or not.
- Parameters
-
[in] | partOfSpeech | the part of speech for which we want |
virtual ULString ULLanguageDataSource::getDataSourceIdentifier |
( |
| ) |
|
|
pure virtual |
- Returns
- the data source identifier for the data source attached to this ULDataSource object, or the empty string if no data source is attached.
Implements ULDataSource.
Retrieves the list of all feature names stored in this data source. These feature names will typically include some that refer to global features represented by subclasses of ULEnum (e.g. "pastparticiple"), and others that refer to features used only internally in the language data source.
- Returns
- ULError::NoMatch if there are no feature names in this data source, or ULError::NoError otherwise.
- Parameters
-
[out] | featureNameList | the desired feature names, or the empty list if an error occurs. |
virtual ULError ULLanguageDataSource::getFrequencies |
( |
const ULString & |
word, |
|
|
ULList< ULFrequency > & |
frequencyList |
|
) |
| |
|
pure virtual |
ULLanguageDataSource objects may contain frequency data of the form (word, root, part-of-speech, count). These data come from manually tagged corpora similar to the American National Corpus or the Penn Treebank.
This method returns a list of frequency objects corresponding to the specified word. (For example, the word "chairs" might yield ("chairs", "chair", verb, 21), ("chairs", "chair", noun, 623), and ("chairs", "chair", unknown, 2).
The method performs its search in a case-insensitive and accent-insensitive way.
- Returns
- ULError::NoMatch if there are no frequency records corresponding to the specified word, ULError::DataSourceOpenFailed if there was a problem with the data source, or ULError::NoError otherwise.
- Parameters
-
[in] | word | the word whose frequencies are sought |
[out] | frequencyList | the corresponding frequencies, sorted in decreasing order of frequency |
Retrieves the inflection rules in this data source that might contribute to a successful inflection from the specified derivation to the specified target part of speech. This method is an essential part of the inflection process coordinated by ULInflector.
- Returns
- ULError::DataSourceOpenFailed; ULError::NoMatch if there are no appropriate inflection rules in this data source; or ULError::NoError.
- Parameters
-
[in] | derivation | the derivation so far, on which we wish to build. This derivation may simply consist of a root word and part of speech, or it may already have some inflection rules to which we're hoping to add. |
[in] | targetPartOfSpeech | the part of speech towards which the current inflection is being directed. |
[out] | ruleList | the desired inflection rules (if any), or the empty list if an error occurs. |
Retrieves the inflection rules in this data source that might contribute to a successful dissection by being inserted at the front of the specified derivation. This method is an essential part of the dissection/stemming process coordinated by ULDissector.
- Returns
- ULError::DataSourceOpenFailed; ULError::NoMatch if there are no appropriate inflection rules in this data source; or ULError::NoError.
- Parameters
-
[in] | derivation | the derivation so far, on which we wish to build. This derivation may simply consist of a root word and part of speech, or it may already have some inflection rules in front of which we're hoping to add a rule. |
[out] | ruleList | the desired inflection rules (if any), or the empty list if an error occurs. |
virtual const ULLanguage& ULLanguageDataSource::getLanguage |
( |
| ) |
|
|
pure virtual |
- Returns
- the language for which this data source provides data.
virtual ULError ULLanguageDataSource::getMatchingInfinitives |
( |
const ULString & |
prefix, |
|
|
uluint32 |
maxMatches, |
|
|
ULList< ULString > & |
infinitiveList |
|
) |
| |
|
pure virtual |
Gets the list of verb infinitives contained in this data source that match (accent- and case-insensitively) the specified prefix. For example, the prefix "spri" might (depending on the specific English data source) yield an infinitive list of "spring", "sprinkle", and "sprint".
- Returns
- ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
- Parameters
-
[in] | prefix | The prefix to match. |
[in] | maxMatches | if non-zero, this is the maximum number of matches to return; if zero, the method returns all matches (which can be a very long list if, for example, prefix is one letter) |
[out] | infinitiveList | The list of matching infinitives. |
Gets the list of nouns contained in this data source that match (accent- and case-insensitively) the specified prefix.
- Returns
- ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
- Parameters
-
[in] | prefix | The prefix to match. |
[in] | maxMatches | if non-zero, this is the maximum number of matches to return; if zero, the method returns all matches (which can be a very long list if, for example, prefix is one letter) |
[out] | nounList | The list of matching nouns. |
Gets the list of root words contained in this data source that match (accent- and case-insensitively) the specified prefix. For example, the prefix "spri" might (depending on the specific English data source) yield a root list of "spring", "sprinkle", and "sprint" among verbs, and "springy" etc. among adjectives. Typically, only verbs are returned by this way for languages with simple noun and adjective inflection structures. But for languages like Russian, German, and Latin, this method will typically return verbs, nouns, and adjectives.
- Returns
- ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
- Parameters
-
[in] | prefix | The prefix to match. |
[in] | maxMatches | if non-zero, this is the maximum number of matches to return; if zero, the method returns all matches (which can be a very long list if, for example, prefix is one letter) |
[out] | rootList | The list of matching root words. |
Finds all the nouns in this data source whose root (typically singular) forms match (in an accent- and case-insensitive way) the specified text.
- Returns
- ULError::NoError, ULError::NoMatch, or any of the error codes related to failure to open or attach to a data source.
- Parameters
-
[in] | text | the search string. |
[out] | nounList | the matching nouns. |
Finds all the persons available for the specified verb in the specified tense. For most verb + tense combinations, the list of tenses is the same. Occasionally there are irregular or defective verbs that have a different collection of persons. For example, the French verb "apparoir" ("to be evident") only takes the third person singular in the present tense.
- Returns
- ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
- Parameters
-
[in] | v | The verb whose persons are desired. |
[in] | tense | The tense for which the persons are desired. |
[out] | personList | The list of persons. |
Finds all the persons available for the specified verb class in the specified tense, assuming temporarily that the verb falls in the specified verb model class. For most verb + tense combinations, the list of tenses is the same. Occasionally there are irregular or defective verbs that have a different collection of persons. For example, the French verb "apparoir" ("to be evident") only takes the third person singular in the present tense.
In most cases, the classID parameter is redundant, because it is equal to v.getClassID(). But during the development of new conjugators, Ultralingua's data editors need to be able to try out different verb classes for each new verb to help them classify the verb correctly. In general, if you find yourself using this method, you should switch to getPersons, which does not have the classID parameter.
- Returns
- ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
- Parameters
-
[in] | v | The verb whose persons are desired. |
[in] | classID | The verb model class ID in which this verb is to be interpreted. |
[in] | tense | The tense for which the persons are desired. |
[out] | personList | The list of persons. |
Retrieves the list of predecessor rules for the specified inflection rule.
- Returns
- ULError::DataSourceOpenFailed; ULError::NoMatch if there are no appropriate inflection rules in this data source; or ULError::NoError.
- Parameters
-
[in] | rule | the inflection rule whose predecessors are desired. |
[out] | ruleList | the desired inflection rules (if any), or the empty list if an error occurs. |
Retrieves the list of successor rules for the specified inflection rule.
- Returns
- ULError::DataSourceOpenFailed; ULError::NoMatch if there are no appropriate inflection rules in this data source; or ULError::NoError.
- Parameters
-
[in] | rule | the inflection rule whose successors are desired. |
[out] | ruleList | the desired inflection rules (if any), or the empty list if an error occurs. |
Finds all the tenses available for the specified verb. For most verbs in a given language, the list of tenses is the same. Occasionally there are irregular or defective verbs that have a different collection of tenses.
- Returns
- ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
- Parameters
-
[in] | v | The verb whose tenses are desired. |
[out] | tenseList | The tenses. |
[in] | includeParticiples | True if the tense list should include participles (e.g. infinitive, past participle, present participle, gerund...). |
virtual ULError ULLanguageDataSource::getTensesForClass |
( |
uluint32 |
classID, |
|
|
ULList< ULTense > & |
tenseList, |
|
|
bool |
includeParticiples = false |
|
) |
| |
|
pure virtual |
Finds all the tenses available for the specified verb class. For most verb classes in a given language, the list of tenses is the same. Occasionally there are irregular or defective verb classes that have a different collection of tenses.
- Returns
- ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
- Parameters
-
[in] | classID | The class ID whose tenses are desired. |
[out] | tenseList | The tenses. |
[in] | includeParticiples | True if the tense list should include participles (e.g. infinitive, past participle, present participle, gerund...). |
Finds the combinations of tense, number, person, and any other relevant features that are used to specify a particular conjugated form for the specified verb. Though each type is stored in a ULPartOfSpeech object, that object should be read as representing something like "future perfect
tense first person plural".
For most verbs in a given language, the list of allowed verb form types is the same. But some verbs are irregular or "defective" and have fewer or different types of conjugated forms. For example, depending on which grammar you consult, the verb "snow" may not have first person forms ("I snow" doesn't really make sense).
The list of form types is sorted by the language's canonical tense ordering, then by number, then by person.
This method should be used instead of the deprecated getTenses and getPersons methods.
- Returns
- ULError::NoError, ULError::NoMatch, or an error associated with the failure to open or attach to a data source.
- Parameters
-
[in] | verb | the verb whose form types are desired. |
[out] | verbFormTypes | the admissible form types for the given verb. |
virtual ULError ULLanguageDataSource::getVerbModel |
( |
uluint32 |
verbClassID, |
|
|
ULVerbModel & |
model |
|
) |
| |
|
pure virtual |
Gets the verb model associated with the specified ID.
- Returns
- ULError::NoError if the operation succeeds, ULError::InvalidID if the specified verb class ID is invalid, or an error associated with the failure to open or attach to a data source.
- Parameters
-
[in] | verbClassID | The ID of the desired verb model. |
[out] | model | The verb model. |
Finds all the verbs in this data source whose infinitives match (in an accent- and case-insensitive way) the specified infinitive.
- Returns
- ULError::NoError, ULError::NoMatch, or any of the error codes related to failure to open or attach to a data source.
- Parameters
-
[in] | infinitive | the infinitive of the desired verbs. |
[out] | verbList | the matching verbs. |
virtual ULDataSourceVersion ULLanguageDataSource::getVersion |
( |
| ) |
|
|
pure virtual |
- Returns
- the ULDataSourceVersion associated with this data source.
Implements ULDataSource.
Finds all the words in this data source whose roots (infinitive for verbs, singular for nouns and adjectives, etc.) match the specified root (in an accent- and case-insensitive way)
- Returns
- ULError::NoError, ULError::NoMatch, or any of the error codes related to failure to open or attach to a data source.
- Parameters
-
[in] | root | The root form of the desired words. |
[out] | wordList | The matching words. |
virtual bool ULLanguageDataSource::hasStopWord |
( |
const ULString & |
word | ) |
|
|
pure virtual |
ULLanguageDataSource objects may contain a list of "stop words"–words that are very common, and should be ignored in some search contexts. These words tend to be from closed linguistic classes like articles, pronouns, prepositions, etc.
- Returns
- true if this language data source's stop word list includes the specified word.
- Parameters
-
[in] | word | the word we're testing. |
virtual ULError ULLanguageDataSource::load |
( |
| ) |
|
|
pure virtual |
Perform one-time opening and loading operations. Normally, such operations are performed lazily, when the data source is first queried. If you would prefer to control the time at which loading is performed, call this method.
- Returns
- ULError::NoError if the loading was successful.
Implements ULDataSource.
The documentation for this class was generated from the following file: