Login

Search algorithm in EPiServer

Versions: n/a, FAQ number: 35, Old FAQ number: 969

Q: How does the search algorithm work in EPiServer?

A: The search algorithm is based on the quite common p-norm extended boolean model with ranked retrieval (Salton, Fox, Wu, 1983). Every page in EPiServer which is saved and published is analyzed to calculate the word frequencies. The results are stored in an index table which is later used for search queries.

After the search index has been built you can make queries with operators such as +, - and wildcards. The queries are transformed in the backend to rank-based queries with criterias based on the chosen operators. Each word on each page that matches the search has a weight, a value indicating the importance of the word on the particular page.

A weight is calculated like this: The total number of occurences of each word in a page is matched in a search with the total number of times the word is occuring on every page. The result is a so called weight, which is a value between 0.0 and 1.0. The closer to 1.0 the calculated weight is on a page, the greater relevance the word has on the page, in the context of the current query. The weights are adjusted to minimize the importance of words that are used frequently on the same page. If a word is occuring 10 000 times on a page, the weight of the word will be lower on the page compared to a page where it only occurs a few times. Even if the word has a lower weight on a page where it occurs most frequently it will still result in a higher rank, but not as high as it would be without the weight adjustment.

In connection to each and every query the search engine automatically calculates the optimal rank for the query based on the page which has the best match for the search criterias and therefore gets the best ranking. The optimal rank is used as the upper bound to normalize all ranking values in the search result, so that every value is between 0.0 and 1.0. To make these values comparable to the results in Microsoft Index Server(*) which has an interval between 0 and 1000, every ranking in episerver is multiplied with 1000 before the result is returned. The search page which displays the result finally divides every ranking with 10 to display the value in percent.

(*) The Microsoft Index Server is used to search for documents and files located on file servers, which is optional. In comparison, the EPiServer search engine searches only EPiServer pages in the database.

EPiTrace logger