Information Retrieval with ArangoSearch (2024)

ArangoSearch is ArangoDB’s built-in search engine for full-text, complex data structures, and more

ArangoSearch provides information retrieval features, natively integratedinto ArangoDB’s query language and with support for all data models. It isprimarily a full-text search engine, a much more powerful alternative to thefull-text index type. It can index nested fieldsfrom multiple collections, optionally with transformations such as textnormalization and tokenization applied, rank query results by relevance andmore.

Example Use Cases

  • Perform federated full-text searches over product descriptions for aweb shop, with the product documents stored in various collections.
  • Find information in a research database using stemmed phrases, case andaccent insensitive, with irrelevant terms removed from the search index(stop word filtering), ranked by relevance based on term frequency (TFIDF).
  • Query a movie dataset for titles with words in a particular order(optionally with wildcards), and sort the results by best matching (BM25)but favor movies with a longer duration.

Getting Started with ArangoSearch

ArangoSearch introduces the concept of Views, which can be seen asvirtual collections. There are two types of Views:

  • arangosearch Views:Each View of the arangosearch type represents an inverted index to provide fastfull-text searching over one or multiple linked collections and holds theconfiguration for the search capabilities, such as the attributes to index.It can cover multiple or even all attributes of the documents in the linkedcollections.

    See arangosearch Views Reference for details.

  • search-alias Views:Views of the search-alias type reference one or moreInverted indexes. Inverted indexes are defined onthe collection level and can be used stand-alone for filtering, but addingthem to a search-alias View enables you to search over multiple collections atonce, called “federated search”, and offers you the same capabilities forranking search results by relevance and search highlighting like witharangosearch Views. Each inverted index can index multiple or even allattribute of the documents of the collection it is defined for.

    See search-alias Views Reference for details.

Views are not updated synchronously as the source collectionschange in order to minimize the performance impact. They areeventually consistent, with a configurable consolidation policy.

The input values can be processed by so called Analyzerswhich can normalize strings, tokenize text into words and more, enablingdifferent possibilities to search for values later on.

Search results can be sorted by their similarity ranking to return the bestmatches first using popular scoring algorithms (Okapi BM25, TF-IDF),user-defined relevance boosting and dynamic score calculation.

Information Retrieval with ArangoSearch (3)

Views can be managed in the web interface, via an HTTP API andthrough a JavaScript API.

Views can be queried with AQL using the SEARCH operation.It takes a search expression composed of the fields to search, the search terms,logical and comparison operators, as well asArangoSearch functions.

Create your first arangosearch View

  1. Create a test collection (e.g. food) and insert a few documents sothat you have something to index and search for:
    • { "name": "avocado", "type": "fruit" } (yes, it is a fruit)
    • { "name": "carrot", "type": "vegetable" }
    • { "name": "chili pepper", "type": "vegetable" }
    • { "name": "tomato", "type": ["fruit", "vegetable"] }
  2. In the VIEWS section of the web interface, click the Add View card.
  3. Enter a name (e.g. food_view) for the View, click Create, and clickthe card of the newly created View.
  4. Enter the name of the collection in the Links fields, then click theunderlined name to access the link properties and tick theInclude All Fields checkbox. In the editor on the right-hand side, youcan see the View definition in JSON format, including the following setting:
    "links": { "food": { "includeAllFields": true }},
  5. Click Save view. The View indexes all attributes (fields) of the documentsin the food collection from now on (with some delay). The attribute valuesget processed by the default identity Analyzer, which means that theyget indexed unaltered. Note that arrays (["fruit", "vegetable"] in the example)are automatically expanded in arangosearch Views by default, indexing theindividual elements of the array ("fruit" and "vegetable").
  6. In the QUERIES section, try the following query:
    FOR doc IN food_view RETURN doc
    The View is used like a collection and simply iterated over to return all(indexed) documents. You should see the documents stored in food as result.
  7. Now add a search expression. Unlike with regular collections where you woulduse FILTER, a SEARCH operation is needed to utilize the View index:
    FOR doc IN food_view SEARCH doc.name == "avocado" RETURN doc
    In this basic example, the ArangoSearch expression looks identical to aFILTER expression, but this is not always the case. You can also combineboth, with FILTERs after SEARCH, in which case the filter criteria areapplied to the search results as a post-processing step.

Note that if you link a collection to a View and execute a query against thisView while it is still being indexed, you may not get complete results. In thecase where a View is still being built and simultaneously used in a query, thequery includes a warning message (code 1240) informing you that the arangosearchView building is in progress and results can be incomplete.

Create your first search-alias View

  1. Create a test collection (e.g. food) and insert a few documents sothat you have something to index and search for. You may use the web interfacefor this:
    • { "name": "avocado", "type": ["fruit"] } (yes, it is a fruit)
    • { "name": "carrot", "type": ["vegetable"] }
    • { "name": "chili pepper", "type": ["vegetable"] }
    • { "name": "tomato", "type": ["fruit", "vegetable"] }
  2. In the COLLECTIONS section of the web interface, click the food collection.
  3. Go to the Indexes tab and click Add Index.
  4. Select Inverted index as the Type.
  5. In the Fields panel, enter name into Fields and confirm.Then also add type[*] as a field.The [*] is needed to index the individual elements of the type array.Note that all type attributes of the example documents are arrays, even ifthey only contain a single element. If you use [*] for expanding arrays,only array elements are indexed, whereas primitive values like the string"fruit" would be ignored by the inverted index (but see the searchFieldoptions regarding exceptions).
  6. In the General panel, give the index a Name like inv-idx-name-typeto make it easier for you to identify the index.
  7. Click Create.The inverted index indexes the specified attributes (fields) of the documentsin the food collection from now on (with some delay). The attribute valuesget processed by the default identity Analyzer, which means that theyget indexed unaltered.
  8. In the VIEWS section, click the Add View card.
  9. Enter a name for the View (e.g. food_view) and select search-aliasas the Type.
  10. Select the food collection as Collection and select theinverted index you created for the collection as Index.
  11. Click Create. The View uses the inverted index for searching and addsadditional functionality like ranking results and searching acrossmultiple collections at once.
  12. In the QUERIES section, try the following query:
    FOR doc IN food_view RETURN doc
    The View is used like a collection and simply iterated over to return all(indexed) documents. You should see the documents stored in food as result.
  13. Now add a search expression. Unlike with regular collections where you woulduse FILTER, a SEARCH operation is needed to utilize the View index:
    FOR doc IN food_view SEARCH doc.name == "avocado" RETURN doc
    In this basic example, the ArangoSearch expression looks identical to aFILTER expression, but this is not always the case. You can also combineboth, with FILTERs after SEARCH, in which case the filter criteria areapplied to the search results as a post-processing step.
  14. You can also use the inverted index as a stand-alone index as demonstratedbelow, by iterating over the collection (not the View) with an index hint toutilize the inverted index together with the FILTER operation:
    FOR doc IN food OPTIONS { indexHint: "inv-idx-name-type", forceIndexHint: true } FILTER doc.name == "avocado" RETURN doc
    Note that you can’t rank results and search across multiple collectionsusing stand-alone inverted index, but you can if you add inverted indexesto a search-alias View and search the View with the SEARCH operation.

Understanding the Analyzer context

arangosearch Views allow you to index the same field with multiple Analyzers.This makes it necessary to select the right one in your query by setting theAnalyzer context with the ANALYZER() function.

If you use search-alias Views, the Analyzers are inferred from the definitionsof the inverted indexes. This is possible because every field can only beindexed with a single Analyzer. Don’t specify the Analyzer context with theANALYZER() function in search-alias queries to avoid errors.

We did not specify an Analyzer explicitly in above example, but it workedregardless. That is because the identity Analyzer is used by default in bothView definitions and AQL queries. The Analyzer chosen in a query needs to matchwith one of the Analyzers that a field was indexed with as per the arangosearch Viewdefinition - and this happened to be the case. We can rewrite the query to bemore explicit about the Analyzer context:

FOR doc IN food_view SEARCH ANALYZER(doc.name == "avocado", "identity") RETURN doc

ANALYZER(… , "identity") matches the Analyzer defined in the View"analyzers": [ "identity" ]. The latter defines how fields are transformed atindex time, whereas the former selects which index to use at query time.

To use a different Analyzer, such as the built-in text_en Analyzer, you wouldchange the View definition to "analyzers": [ "text_en", "identity" ] (or just"analyzers": [ "text_en" ] if you don’t need the identity Analyzer at all)as well as adjust the query to use ANALYZER(… , "text_en").

If a field is not indexed with the Analyzer requested in the query, then youwill get an empty result back. Make sure that the fields are indexedcorrectly and that you set the Analyzer context.

You can test if a field is indexed with particular Analyzer with one of thevariants of the EXISTS() function,for example, as shown below:

RETURN LENGTH( FOR doc IN food_view SEARCH EXISTS(doc.name, "analyzer", "identity") LIMIT 1 RETURN true) > 0

If you use an arangosearch View, you need to change the "storeValues"property in the View definition from "none" to "id" for the function to work.For search-alias Views, this feature is always enabled.

Basic search expressions

ArangoSearch supports a variety of logical operators and comparison operatorsto filter Views. A basic one is the equality comparison operator:

doc.name == "avocado"

The inversion (inequality) is also allowed:

doc.name != "avocado"

You can also test against multiple values with the IN operator:

doc.name IN ["avocado", "carrot"]

The same can be expressed with a logical OR for multiple conditions:

doc.name == "avocado" OR doc.name == "carrot"

Similarly, AND can be used to require that multiple conditions must be true:

doc.name == "avocado" AND doc.type == "fruit"

An interesting case is the tomato document with its two array elements as type:["fruit", "vegetable"]. The View definition defaults to"trackListPositions": false, which means that the array elements get indexedindividually as if the attribute had both string values at the same time(requiring array expansion using type[*] or "searchField": true in case of theinverted index for the search-alias View), matching the following conditions:

doc.type == "fruit" AND doc.type == "vegetable"

The same can be expressed with ALL == and ALL IN. Note that the attributereference and the search conditions are swapped for this:

["fruit", "vegetable"] ALL == doc.type

To find fruits which are not vegetables at the same time, the latter can beexcluded with NOT:

doc.type == "fruit" AND NOT doc.type == "vegetable"

For a complete list of operators supported in ArangoSearch expressions seeAQL SEARCH operation.

Searching for tokens from full-text

So far we searched for full matches of name and/or type. Strings could containmore than just a single term however. It could be multiple words, sentences, orparagraphs. For such text, we need a way to search for individual tokens,usually the words that it is comprised of. This is where Text Analyzers comein. A Text Analyzer tokenizes an entire string into individual tokens that arethen stored in an inverted index.

There are a few pre-configured text Analyzers, but you can also add your own asneeded. For now, let us use the built-in text_en Analyzer for tokenizingEnglish text.

arangosearch View:

  1. In the VIEWS section of the web interface, click the card of thepreviously created food_view of type arangosearch.

  2. In the Links panel, click the underlined name of thefood collection. Enter name into Fields and confirm.

  3. Click the underlined name of the field and select the Analyzers text_enand identity.

    Alternatively, use the editor on the right-hand side to replace "fields": {},with the below code:

    "fields": { "name": { "analyzers": ["text_en", "identity"] }},
  4. Click Save view.

  5. After a few seconds, the name attribute has been indexed with the text_enAnalyzer in addition to the identity Analyzer.

  6. Run below query that sets text_en as context Analyzer and searches forthe word pepper:

    FOR doc IN food_view SEARCH ANALYZER(doc.name == "pepper", "text_en") RETURN doc.name
  7. It matches chili pepper because the Analyzer tokenized it into chili andpepper and the latter matches the search criterion. Compare that to theidentity Analyzer:

    FOR doc IN food_view SEARCH ANALYZER(doc.name == "pepper", "identity") RETURN doc.name

    It does not match because chili pepper is indexed as a single token thatdoes not match the search criterion.

  8. Switch back to the text_en Analyzer but with a different search term:

    FOR doc IN food_view SEARCH ANALYZER(doc.name == "PéPPêR", "text_en") RETURN doc.name

    This will not match anything, even though this particular Analyzer convertscharacters to lowercase and accented characters to their base characters.The problem is that this transformation is applied to the document attributewhen it gets indexed, but we haven’t applied it to the search term.

  9. If we apply the same transformation then we get a match:

    FOR doc IN food_view SEARCH ANALYZER(doc.name == TOKENS("PéPPêR", "text_en")[0], "text_en") RETURN doc.name

    Note that the TOKENS() functionsreturns an array. We pick the first element with [0], which is thenormalized search term "pepper".

search-alias View:

  1. Collection indexes cannot be changed once created. Therefore, you need tocreate a new inverted index to index a field differently.In the COLLECTIONS section of the web interface, go to the Indexestab and click Add Index.
  2. Select Inverted index as the Type.
  3. In the Fields panel, enter name into Fields and confirm.
  4. Click the underlined name field and select text_en as Analyzer.Note that every field can only be indexed with a single Analyzer in invertedindexes and search-alias Views.
  5. In the General panel, give the index a Name like inv-idx-name-ento make it easier for you to identify the index.
  6. Click Create.The inverted indexes indexes the name attribute of the documents with thetext_en Analyzer, which splits strings into tokens so that you can searchfor individual words.
  7. In the VIEWS section, click the Add View card.
  8. Enter a name for the View (e.g. food_view_fulltext) and selectsearch-alias as the Type.
  9. Select the food collection as Collection and select theinv-idx-name-en inverted index as Index.
  10. Click Create. After a few seconds, the name attribute has been indexedwith the text_en Analyzer.
  11. Run below query which searches for the word pepper:
    FOR doc IN food_view_fulltext SEARCH doc.name == "pepper" RETURN doc.name
    It matches chili pepper because the Analyzer tokenized it into chili andpepper and the latter matches the search criterion.
  12. Try a different search term:
    FOR doc IN food_view_fulltext SEARCH doc.name == "PéPPêR" RETURN doc.name
    This does not match anything, even though the text_en Analyzer convertscharacters to lowercase and accented characters to their base characters.The problem is that this transformation is applied to the document attributewhen it gets indexed, but we haven’t applied it to the search term.
  13. If we apply the same transformation then we get a match:
    FOR doc IN food_view_fulltext SEARCH doc.name == TOKENS("PéPPêR", "text_en")[0] RETURN doc.name
    Note that the TOKENS() functionsreturns an array. We pick the first element with [0], which is thenormalized search term "pepper".

Search expressions with ArangoSearch functions

Basic operators are not enough for complex query needs. Additional searchfunctionality is provided via ArangoSearch functionsthat can be composed with basic operators and other functions to form searchexpressions.

ArangoSearch AQL functions take either an expression or a reference (of anattribute path or the document emitted by a View) as the first argument.

BOOST(<expression>, )STARTS_WITH(doc.attribute, )TDIDF(doc, )

If an attribute path expressions is needed, then you have to reference adocument object emitted by a View, e.g. the doc variable ofFOR doc IN viewName, and then specify which attribute you want to test for, as an unquoted string literal. For example, doc.attr ordoc.deeply.nested.attr, but not "doc.attr". You can also use thebracket notation doc["attr"].

FOR doc IN viewName SEARCH STARTS_WITH(doc.deeply.nested["attr"], "avoca") RETURN doc

If a reference to the document emitted by the View is required, like forscoring functions, then you need to pass the raw variable.

FOR doc IN viewName SEARCH ... SORT BM25(doc) DESC ...

If an expression is expected, it means that search conditions can be expressed inAQL syntax. They are typically function calls to ArangoSearch filter functions,possibly nested and/or using logical operators for multiple conditions.

BOOST(STARTS_WITH(doc.name, "chi"), 2.5) OR STARTS_WITH(doc.name, "tom")

You should make sure that search terms can match the indexed values by processingthe search terms with the same Analyzers as the indexed document attributes.This is especially important for full-text search and any form of normalization,where there is little chance that an unprocessed search term happens to matchthe processed, indexed values.

If you use arangosearch Views, the default Analyzer that is used for searchingis "identity". You need to set the Analyzer context in queries against arangosearchViews to select the Analyzer of the indexed data, as a field can be indexedby multiple Analyzers, or it uses the identity Analyzer.

If you use search-alias Views, the Analyzers are inferred from the definitionsof the inverted indexes, and you don’t need to and should not set the Analyzercontext with the ANALYZER() function. You should still transform search termsusing the same Analyzer as for the indexed values.

While some ArangoSearch functions accept an Analyzer argument, it is sometimesnecessary to wrap search (sub-)expressions with an ANALYZER() call to set thecorrect Analyzer in the query so that it matches one of the Analyzers withwhich the field has been indexed. This only applies to queries againstarangosearch Views.

It can be easier and cleaner to use ANALYZER() even if you exclusivelyuse functions that take an Analyzer argument and leave that argument out:

// Analyzer specified in each function callPHRASE(doc.name, "chili pepper", "text_en") OR PHRASE(doc.name, "tomato", "text_en")// Analyzer specified using ANALYZER()ANALYZER(PHRASE(doc.name, "chili pepper") OR PHRASE(doc.name, "tomato"), "text_en")

The PHRASE() function applies thetext_en Analyzer to the search terms in both cases. chili pepper getstokenized into chili and pepper and these tokens are then searched in thisorder. Searching for pepper chili would not match.

Certain expressions do not require any ArangoSearch functions, such as basiccomparisons. However, the Analyzer used for searching will be "identity"unless ANALYZER() is used to set a different one.

// The "identity" Analyzer will be used by defaultSEARCH doc.name == "avocado"// Same as before but being explicitSEARCH ANALYZER(doc.name == "avocado", "identity")// Use the "text_en" Analyzer for searching insteadSEARCH ANALYZER(doc.name == "avocado", "text_en")

Ranking results by relevance

Finding matches is one thing, but especially if there are a lot of results thenthe most relevant documents should be listed first. ArangoSearch implementsscoring functions thatcan be used to rank documents by relevance. The popular ranking schemesOkapi BM25 andTF-IDF areavailable.

Here is an example that sorts results from high to low BM25 score and alsoreturns the score:

FOR doc IN food_view SEARCH doc.type == "vegetable" SORT BM25(doc) DESC RETURN { name: doc.name, type: doc.type, score: BM25(doc) }
FOR doc IN food_view SEARCH ANALYZER(doc.type == "vegetable", "identity") SORT BM25(doc) DESC RETURN { name: doc.name, type: doc.type, score: BM25(doc) }

As you can see, the variable emitted by the View in the FOR … IN loop ispassed to the BM25() function.

nametypescore
tomato[“fruit”,“vegetable”]0.43373921513557434
carrotvegetable0.38845786452293396
chili peppervegetable0.38845786452293396

The TFIDF() function works the same:

FOR doc IN food_view SEARCH doc.type == "vegetable" SORT TFIDF(doc) DESC RETURN { name: doc.name, type: doc.type, score: TFIDF(doc) }
FOR doc IN food_view SEARCH ANALYZER(doc.type == "vegetable", "identity") SORT TFIDF(doc) DESC RETURN { name: doc.name, type: doc.type, score: TFIDF(doc) }

It returns different scores:

nametypescore
tomato[“fruit”,“vegetable”]1.2231435775756836
carrotvegetable1.2231435775756836
chili peppervegetable1.2231435775756836

The scores will change whenever you insert, modify or remove documents, becausethe ranking takes factors like how often a term occurs overall and within asingle document into account. For example, if you insert a hundred more fruitdocuments (INSERT { type: "fruit" } INTO food) then the TF-IDF score forvegetables will become 1.4054651260375977.

You can adjust the ranking in two different ways:

  • Boost sub-expressions to favor a condition over another with theBOOST() function
  • Calculate a custom score with an expression, optionally taking BM25() andTFIDF() into accountHave a look at the Ranking Examples for that.

Indexing complex JSON documents

Working with sub-attributes

As with regular indexes, there is no limitation to top-level attributes.Any document attribute at any depth can be indexed. However, with ArangoSearchit is possible to index all documents attributes or particular attributesincluding their sub-attributes without having to modifying the View definitionas new sub-attribute are added. This is possible with arangosearch Viewsas well as with inverted indexes if you use them through search-alias Views.

You need to create an inverted index and enable the Include All Fieldsfeature to index all document attributes, then add the index to asearch-alias View. No matter what attributes you add to your documents,they will automatically get indexed.

You can also add Fields, click their underlined names, and enableInclude All Fields for specific attributes and their sub-attributes:

... "fields": [ { "name": "value", "includeAllFields": true } ],...

This will index the attribute value and its sub-attributes. Consider thefollowing example document:

{ "value": { "nested": { "deep": "apple pie" } }}

The View will automatically index apple pie, and it can then be queriedlike this:

FOR doc IN food_view SEARCH doc.value.nested.deep == "apple pie" RETURN doc

We already used the Include All Fields feature to index all documentattributes above when we modified the View definition to this:

{ "links": { "food": { "includeAllFields": true } }, ...}

No matter what attributes you add to your documents, they will automaticallyget indexed. To do this for certain attribute paths only, you can enablethe Include All Fields options for specific attributes only, and include alist of Analyzers to process the values with:

{ "links": { "food": { "fields": { "value": { "includeAllFields": true, "analyzers": ["identity", "text_en"] } } } }}

This will index the attribute value and its sub-attributes. Consider thefollowing example document:

{ "value": { "nested": { "deep": "apple pie" } }}

The View will automatically index apple pie, processed with the identity andtext_en Analyzers, and it can then be queried like this:

FOR doc IN food_view SEARCH ANALYZER(doc.value.nested.deep == "apple pie", "identity") RETURN doc
FOR doc IN food_view SEARCH ANALYZER(doc.value.nested.deep IN TOKENS("pie", "text_en"), "text_en") RETURN doc

Using includeAllFields for a lot of attributes in combination with complexAnalyzers may significantly slow down the indexing process.

Indexing and querying arrays

With arangosearch Views, the elements of arrays are indexed individually bydefault, as if the source attribute had each element as value at the same time(like a disjunctive superposition of their values). This is controlled by theView setting trackListPositionsthat defaults to false.

With search-alias Views, you can get the same behavior by enabling thesearchField option globally or for specific fields in their inverted indexes,or you can explicitly expand certain array attributes by appending [*] to thefield name.

Consider the following document:

{ "value": { "nested": { "deep": [ 1, 2, 3 ] } }}

A View that is configured to index the field value including sub-fieldswill index the individual numbers under the path value.nested.deep, whichyou can query for like:

FOR doc IN viewName SEARCH doc.value.nested.deep == 2 RETURN doc

This is different to FILTER operations, where you would use anarray comparison operatorto find an element in the array:

FOR doc IN collection FILTER doc.value.nested.deep ANY == 2 RETURN doc

You can set trackListPositions to true if you want to query for a valueat a specific array index (requires searchField to be true forsearch-alias Views):

SEARCH doc.value.nested.deep[1] == 2

With trackListPositions enabled there will be no match for the documentanymore if the specification of an array index is left out in the expression:

SEARCH doc.value.nested.deep == 2

Conversely, there will be no match if an array index is specified buttrackListPositions is disabled.

String tokens are also indexed individually, but only some Analyzer typesreturn multiple tokens.If the Analyzer does, then comparison tests are done per token/word.For example, given the field text is analyzed with "text_en" and containsthe string "a quick brown fox jumps over the lazy dog", the followingexpression will be true:

doc.text == 'fox'
ANALYZER(doc.text == 'fox', "text_en")

Note that the "text_en" Analyzer stems the words, so this is also true:

doc.text == 'jump'
ANALYZER(doc.text == 'jump', "text_en")

So a comparison will actually test if a word is contained in the text. WithtrackListPositions: false, this means for arrays if the word is contained inany element of the array. For example, given:

{"text": [ "a quick", "brown fox", "jumps over the", "lazy dog" ] }

… the following will be true:

doc.text == 'jump'
ANALYZER(doc.text == 'jump', "text_en")

With trackListPositions: true you would need to specify the index of thearray element "jumps over the" to be true:

doc.text[2] == 'jump'
ANALYZER(doc.text[2] == 'jump', "text_en")

Arrays of strings are handled similarly. Each array element is treated like atoken (or possibly multiple tokens if a tokenizing Analyzer is used andtherefore applied to each element).

Dealing with eventual consistency

Regular indexes are immediately consistent. If you have a collection with apersistent index on an attribute text and update the value of the attributefor instance, then this modification is reflected in the index immediately.View indexes (and inverted indexes) on the other hand are eventualconsistent. Document changes are not reflected instantly, but only near-realtime.This mainly has performance reasons.

If you run a search query shortly after a CRUD operation, then the results maybe slightly stale, e.g. not include a newly inserted document:

db._query(`INSERT { text: "cheese cake" } INTO collection`);db._query(`FOR doc IN viewName SEARCH doc.text == "cheese cake" RETURN doc`);// May not find the new document

Re-running the search query a bit later will include the new document, however.

There is an internal option to wait for the View to update and thus includechanges just made to documents:

db._query(`INSERT { text: "pop tart" } INTO collection`);db._query(`FOR doc IN viewName SEARCH doc.text == "pop tart" OPTIONS { waitForSync: true } RETURN doc`);

This is not necessary if you use a single server deployment and populate acollection with documents before creating a View.

SEARCH … OPTIONS { waitForSync: true } is intended to be used in unit teststo block search queries until the View caught up with the underlyingcollections. It is designed to make this use case easier. It should not be usedfor other purposes and especially not in production, as it can stall queries.

Do not useSEARCH … OPTIONS { waitForSync: true } in transactions. View indexchanges cannot be rolled back if transactions get aborted. It will lead topermanent inconsistencies between the linked collections and the View.

How to go from here

To learn more, check out the different search examples:

  • Exact value matching:Search for values as stored in documents (full strings, numbers, booleans).
  • Range queries:Match values that are above, below or between a minimum and a maximum value.This is primarily for numeric values.
  • Prefix matching:Search for strings that start with certain strings. A common use case forthis is to implement auto-complete kind of functionality.
  • Case-sensitivity and diacritics:Strings can be normalized so that it does not matter whether characters areupper or lower case, and character accents can be ignored for a better searchexperience. This can be combined with other types of search.
  • Wildcard search:Search for partial matches in strings (ends with, contains and more).
  • Full-text token search:Full-text can be tokenized into words that can then be searched individually,regardless of their original order, also in combination with prefixsearch. Array values are also indexed as separate tokens.
  • Phrase and proximity search:Search tokenized full-text with the tokens in a certain order, such aspartial or full sentences, optionally with wildcard tokens for a proximitysearch.
  • Faceted search:Combine aggregation with search queries to retrieve how often values occuroverall.
  • Fuzzy search:Match strings even if they are not exactly the same as the search terms.By allowing some fuzziness you can compensate for typos and match similartokens that could be relevant too.
  • Geospatial search:You can use ArangoSearch for geographic search queries to find nearbylocations, places within a certain area and more. It can be combined withother types of search queries unlike with the regular geo index.
  • Search highlighting:Retrieve the positions of matches within strings, to highlight what was foundin search results (Enterprise Edition only).
  • Nested search:Match arrays of objects with all the conditions met by a single sub-object,and define for how many of the elements this must be true (Enterprise Edition only).

For relevance and performance tuning, as well as the reference documentation, see:

  • Ranking:Sort search results by relevance, fine-tune the importance of certain searchconditions, and calculate a custom relevance score.
  • Performance:Give the View index a primary sort order to benefit common search queriesthat you will run and store often used attributes directly in the View indexfor fast access.
  • Views ReferenceYou can find all View properties and options that are available for therespective type in the arangosearch Views Referenceand search-alias Views Referencedocumentation.

If you are interested in more technical details, have a look at:

  • ArangoSearch Tutorial:The tutorial includes sections about the View concept, Analysis, and theranking model.
  • ArangoSearch architecture overview:A description of ArangoSearch’s design, its inverted index and someimplementation details.
  • The IResearch librarythat provides the searching and ranking capabilities.

On this page

Information Retrieval with ArangoSearch (2024)

FAQs

What is the difference between search and filter in ArangoDB? ›

The SEARCH operation guarantees to use View indexes for an efficient execution plan. If you use the FILTER keyword for Views, no indexes are utilized and the filtering is performed as a post-processing step.

What is ArangoSearch? ›

ArangoSearch is a C++ based full-text search engine including similarity ranking capabilities natively integrated into ArangoDB. ArangoSearch allows users to combine two information retrieval techniques: boolean and generalized ranking retrieval.

What is the tokens function in ArangoDB? ›

The TOKENS() function is an exception. It requires the Analyzer name to be passed in in all cases even if wrapped in an ANALYZER() call, because it is not an ArangoSearch function but a regular string function which can be used outside of SEARCH operations. analyzer (string): name of an Analyzer.

How do you distinguish between using a filter and a query to find the records? ›

Using a filter parameter alone returns search results in no specific order. Using a query parameter alone returns search results in order of relevance.

What is the difference between filtered search and faceted search? ›

Filters and facets are used to refine search results but operate differently. Filters limit the results based on predetermined criteria, while facets display the options for each category, allowing users to select multiple options for more specific results.

What is the difference between _id and _key in Arangodb? ›

System attributes. All documents contain special attributes at the top-level that start with an underscore, known as system attributes: The document key is stored as a string in the _key attribute. The document identifier is stored as a string in the _id attribute.

How do you normalize tokens? ›

Token normalization is the process of canonicalizing tokens so that matches occur despite superficial differences in the character sequences of the tokens. The most standard way to normalize is to implicitly create equivalence classes , which are normally named after one member of the set.

What are tokens in database? ›

A database token serves as a digital key, granting authorized users access to the database's resources while ensuring data integrity and privacy.

What is the difference between search and filter? ›

The difference between search and filters

Filters let you create a list of records that meet a common value. Search lets you find a single record based on a particular value.

What is the difference between find and filter method? ›

The find() method finds the element in the DOM tree by traversing through the root to the leaf. The filter() method returns the element that matches and removes the element that does not match.

What are the two types of filters in Cognos? ›

  • Stand alone filter.
  • Embedded filter.

What is the difference between array filter and array search? ›

When you have a use case where more than 1 element is expected to be returned and you want to perform operation on all elements, then you can use the filter() method. But if you expect only a single element to be returned from the array, then you can use find() and avoid extra iterations.

Top Articles
Latest Posts
Article information

Author: Tuan Roob DDS

Last Updated:

Views: 6257

Rating: 4.1 / 5 (42 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Tuan Roob DDS

Birthday: 1999-11-20

Address: Suite 592 642 Pfannerstill Island, South Keila, LA 74970-3076

Phone: +9617721773649

Job: Marketing Producer

Hobby: Skydiving, Flag Football, Knitting, Running, Lego building, Hunting, Juggling

Introduction: My name is Tuan Roob DDS, I am a friendly, good, energetic, faithful, fantastic, gentle, enchanting person who loves writing and wants to share my knowledge and understanding with you.