CommunityRanking
The novel contribution in communityranking algorithms is live feedbacks from searches occuring in the community right now. In a manner of speaking we would not only be searching documents - but also the other searchers 'search history'. With a search client on user's machine can also correlate search result quality in proportion to time-on-page.
Realizing the vision of communityranking will involve getting the largest amount of feedback from users as possible. This can seem as a daunting task, since the process of giving feedback to a search result can be tedious and error prone. Furthermore, solving the techical tasks related to using feedback on a index that covers most of the visible internet will require knowledge of modern algorithms and programming techniques.
Contents |
[edit] What do we want from the community?
It is not as much what *we* want, it is more what the users want. If the users are allowed to provide feedback to help influence the relevancy of the returned results, then we improve the overall effiency of search on the internet. In a ideal situation results for a given search should be answered with "what do you really want?" - if the users could answer this question beforehand, then we would have a better search - but sadly no search users I have seen so far possess this "Oracle" quality (We have to help speed up the process).
Basically we want to have all the feedback we can get from the community - without bothering the community too much. To solve this we need to make some educated "guesses" - we need to apply inference on what little data the users are willing to share.
We need:
- assesment of relevance of returned results
- inference of user feedback ( Forum:Idea_for_semi-automatic_semantic_tagging)
- inference on community search history (solving this in a manner that handles the related privacy issues)
- A way to identify the authority of given knowledge sources
[edit] How do we get it?
Here - at the wikia search project - we have the obvious choice of allowing the users to supply assesments of relevance with wiki-technology. E.g we have the option of allowing users to manually supply judgments of relevance using a wiki page with extended functionality.
The Atlas protocol should help the user supply relevance information using the Broker , as well as let the Collector discern results based on information made available in the Factory.
[edit] What will we supply ?
[edit] What has allready been done?
There are several successfull techniques and frameworks for search readily available. In the wikia search project we set out to improve and build upon those technologies allready available.
- Lucene, hadoop, nutch (search engine technologies)
- mysql, postgresql (database systems )
[edit] Challenges
[edit] Authority detection
When allowing immediate feedback during searches we need to establish a fast and easy way to establish the authority of 'correctness' of the feedback given. E.g. at any point in time the search engine should be able to evaluate how useful a given feedback is. The algorithm used for detecting this should be able to respond to changes in user behaviour.
[edit] existing protocols
- http://www.loc.gov/standards/sru/specs/search-retrieve.html (Existing Search and Retrieve protocol)
- http://www.sitemaps.org/ (Existing crawler description for sites)
- http://www.opensearch.org/Home (existing protocol for sharing search results)