Mini:Proximity based ranking

Proximity based ranking is a method for content searching and search results ranking, that offers an alternative to the more common keyword based methods. It is based on assigning certain criteria to each content item and searching the content according to the proximity of the content items to requested criteria.

Contents

[edit] Proximity based vs content based search engines

The most common search method that is in use today, is based on searching the actual content of content items. Although search engines do half of the work in advance, by crawling the web and creating indices of searchable content, the fact remains that the search results are mainly based on finding certain words, phrases, etc, within the content itself. This method, while proving itself quite successful in the Web 1.0 realm, is not very suitable for the rapidly changing, user contribution based realm of Web 2.0.

In proximity based search engines, the search is not done by looking into the content, but by looking at a criteria-based description of it. This means that the content does not need to be searchable or at least not directly searchable. The search criteria is based on a small set of descriptive indicators which are simply fields that can be assigned certain values. Each content item is assigned one or more set of values and when users use the search engine, they specify a specific set of values that they want to search by. The search engine will return a list of content items, ordered by the proximity of their assigned criteria to the entered search criteria. In this way, users can see not only exact matches but also proximal matches, which helps them to discover more content.

[edit] Proximity functions

The calculation of the proximity of assigned criteria to the search criteria is based on the use of proximity functions. A proximity function is a function that takes two values from a certain set and returns a value between 0 to 1. 1 means that the two input values are identical and 0 means that they are completely different. Any value between 0 to 1 represents a certain degree of proximity. A simple example of a proximity function for real numbers is:

math

Of course, in most cases, more sophisticated proximity functions are needed in order to calculate the proximity of descriptive indicators.

[edit] Criteria profiles

In many cases, the proximity of two values of a descriptive indicator cannot be calculated directly from the values themselves. Think for example of an airline tickets ordering system. It is easy to see how a proximity based search engine is very suitable for such a system. The users type in the requested flight criteria, such as destination, time of departure, etc., and the system returns a list suitable flights. Now, if for example, we want to be able to calculate the proximity of two airports, we have to compare some information pieces about the airports, such as their geographical distance from one another, the country each of them are in (although Tel Aviv and Beirut airports are not that far away from one another, they cannot be considered as good alternatives...), etc. In such cases, the search engine can simply use the indicator values as indexes into profile tables with the appropriate information. More complex applications may involve external database queries, etc.

[edit] Criteria assignment

In principle, there are two ways to assign criteria to searchable content items - direct and indirect. In an effective proximity based search engine, both methods are used together. Direct assignment is based on extracting the criteria directly from the content items, that is, it uses descriptive indicators values that are assigned to content items in a way that the search engine can read. Thus, direct assignment can be used only to find content items that where intentionally prepared to be used with specific search engines that can recognize their criteria data.

Indirect assignment, on the other hand, is done indirectly by the search engine users. To understand how indirect criteria assignment works, let us look again at the example of an airline tickets ordering system. Let's assume that you want to order tickets to a certain flight, and your preferred departure time is in the afternoon. The system asks you for a departure time range, and you enter 12:00 - 16:00. You then enter the other needed criteria and run the search. The search engine will present exact matches first, but it will also present approximate matches, such as a flight that departures at 11:45, or a flight that lands in some nearby destination to the one you requested. Now let's assume that for some reason, you chose to order tickets to a flight from one of the approximated matches. As a result, the system may assign your search criteria to the selected flights, so that next time someone will enter the same criteria, the flight you chose will appear as an exact match.

Another case where direct and indirect criteria assignments occur, is when users are allowed to add searchable content to the search engine's database. Again, we may distinguish between direct content contribution (and thus, direct criteria assignment), when users act with the intent of adding pointers to new items, to indirect contribution when users perform a certain search and do not find what they where looking for, or do not find a specific item that they think that should have been in the results. In such a case, users may add a pointer to a new item without having to specify its descriptive criteria, because it is the same as what they specified in their search request.

[edit] Combining proximity based ranking with other ranking methods

In the airline tickets example above, it may seem that it make sense to still rank the flight that was indirectly assigned an exact match to the original search criteria, lower than those that where directly assigned this criteria (and really match to it). However, this is not always the case. In some cases, an approximate match is really the best match and we would like it to rank higher than other exact matches. Because of this kind of ambiguities, proximity based ranking is usualy combined with other ranking methods, such as Popularity-aware ranking or rating marks.

Retrieved from "http://search.wikia.com/wiki/Mini:Proximity_based_ranking"

This page was last modified 07:49, 7 January 2008. GFDL