| Full text search |
The Noematics engine searches over the whole indexed work, as defined by the administrator before indexing.
All the words are indexed. |
| Access control and Security |
Noematics search engine takes into account the basic autentication and the integrated autentication (NTLM). This ensure that a user will see in his/her search results only the documents to which he/she has an access right. Several methods of user's access control to search results are available. |
| Number of user keywords |
Illimited |
| All the forms of words: inflected forms |
The term "inflected forms" is used by linguists to designate the grammatical forms of a word, i. e. masculine/feminine (+neutral in some languages), singular/plural, and verb conjugations.
When the "inflected forms" option is activated (as it is by default), Noematics search will yield the same results whichever form was entered by the user for a given keyword.
For example, a request with the word horse will give the same complete set of results as with horses. So with ox versus oxen, etc.
This functionality applies to nouns, adjectives, verbs. It is active by default, but may be disabled by the user.
|
| Boolean Operations (summary) |
The user-friendly interface spares the user of Noematics the chore of manipulating boolean operations. An uneasy task and often source of errors, the engine takes for itself the job of translating the user request into the proper boolean operators (AND, NOT, OR). See also: Boolean operations (for connoisseurs). |
| Nearness / distance between words |
Among the documents containing the keywords entered by the user, this option retains only those in which the first word and the last word are separated at most by n words (the so-called banal words are counted - see below).
The idea is that nearness of words in a document gives a good clue about the relevance of the document regarding the set of concepts represented by the user's keywords.
|
| Expressions and phrases |
This option retrieves only the documents containing the exact phrase entered by the user. |
| Excluded words |
This options removes from the result list all the documents containing one of the excluded words (or one of its forms if «inflected forms» is active). |
| Language handling |
The Noematics engine presently handles French and English languages.
The French linguistic database contains some 290,000 forms stemming from more than 62,000 families of words (nouns, adjectives, verbs, and other grammatical categories).
The English linguistic database contains some 148,000 formes stemming from more than 80,000 families of words. Databases of other European languages may be developed by Noematics on request.
|
| "Banal" or "noise" words |
Banal words are usually pronouns, conjunctions, etc, which do not explicitly designate objects, concepts or actions: they are indexed by the Noematics indexing engine, but skipped from the search request. For each language, Noematics has a list of about 300 banal words, that the administrator is free to customize by adding, editing or removing. |
| Special and accented characters |
The Noematics search engine take words exactly as they are entered by the user, including special and accented characters. But it also makes it possible to the user to forget the accents and however to obtain, in most cases, identical results. |
| "Ligature" characters |
For French language and some Latin words in English: these special characters are made by tying two vowels : æ, Æ, œ and Œ. They may appear in documents and are taken into account during indexing and stored as two vowels. The user may enter them as 2 vowels or as a "ligature" (if he/she can type it on his/her keyboard): the results of a request will be the same. |
| Mixing characters |
The user may enter into his/her request any combination of printable characters.
The wild card '*' may be used to replace a sequence of characters, but only at the end of a word. |
| Word breakers (delimiters) |
The Noematics search engine considers as a word any string of characters bounded at left and right by a "word-breaker" or delimiter, or by the beginning/end of a document. Noematics uses more than 50 word-breaker characters, of which only 10 to 15 are of current use. It can thus index texts having various punctuations, from the most current to the rarest. |
| Fast search through numeric coding of words |
The Noematics engine uses a system of coding of the words which allows for excellent response times. Moreover, this system is at the root of the linguistic effectiveness of the engine. Of course, this coding is transparent to the user. |
| Ambiguity reduction |
Noematics resolves a part of the ambiguities between words during the indexing process, by priroritizing the grammatical categories most frequently used in queries. |
| Exhaustiveness |
The Noematics search engine is exhaustive on any given work, due to its full text indexing, to its extended linguistic databases, and also to its treatment of new words. If a request with a keyword does not give any result that simply means that that word is not present in the work. |
| Boolean operations (for connoisseurs) |
Any boolean operation is defined by reference to the "element document" defined when indexing (usually a file), and to the property "containing such word", so it is often a complicated issue for anyone not familiar with formal logic.
The words you enter in the keyword text area are linked together by the operater AND. For example: you wish to get the list of all documents which each contain horse AND race.
But the inflected forms of each of these words are linked by the operator INCLUSIVE OR. For example: you request the documents each containing (horse OR horses OR these two forms) AND (race OR races OR these two forms).
The words you enter in the "exclusion" text box are linked together by the operator INCLUSIVE OR regarding the elementary document. For example: exclude documents each containing butcher OR butchers OR slaughterhouse OR slaughterhouses OR any combination of these forms.
Eventually, the same request may be formulated as follows:
"Find the set of documents each containing (horse OR horses) AND (race OR races) minus the set of documents containing (butcher OR butchers OR slaughterhouse OR slaughterhouses)."
This may yield different boolean expressions depending upon the selection method you choose.
|
| Display of results |
The Noematics search engine first presents to the user the list of documents matching the search criteria (keywords) and options. The layout of this list and the selection of the elements that are displayed is entirely customizable throught the administration module. |
| Ranking |
The Noematics engine classifies by decreasing relevance the list of documents matching the user's query. For each document, the calculated relevance index (or rank) takes into account several factors, in particular:
the relative weight of each document in the indexed corpus and compared to its language.
the number of occurrences of the keywords in the document and in the corpus. |
| Highlight searched words found |
By clicking on one of the listed documents, the user gets the display of the selected document. The keywords present in the document are emphasized by a color. This functionality applies to html, text and asp documents. |