Forum:Cleaning the web
Today’s web mostly consists of duplicate information. Original content increased hardly twice since 1997 while number of sites increased dozen times. Crowd just resells free information with ads. That is why search results trashed. If we imagine Internet as a big hard drive then it is long time overdue for cleaning up. Cleaning included crap identifier (antspy, antiadvertisment), duplicate object finder, defragmenter (structure re indexing). Rudiment objects will not sustain and delete themselves automatically.
Would be a search option of sorting results by date, clustering (or tree look) by possible original source and percentage of ads appropriate?
Furthermore, none of existing searches has an option of sorting out results by any parameter like time, size, metatags. Yes, it takes two much server resources but it is the way. Metatags clouds good. However, imagine your desktop search present top 30 last accessed files in random order with a cloud of parameters? Would you find a 1 year old file on your own computer that way? —The preceding unsigned comment was added by DimitriRU (talk • contribs).
- Yeah, I would find the file—even without desktop search—because I know the files on my HDD; at least because of the strict use of special folders. ;-)
- Anyway, there is no way to clean all crap - therefore you would have to maintain a black- and whitelist of all websites continously. Even with the ammount of users that google has that still will be Sisyphus's work. I think we should maintain only a whitelist for this purpose as Jimbo already suggested.
- Also, when you want to search by parameters like time, size, etc. you should keep in mind, that the age or size of a website doesn't make sense mostly. For Images this makes sense and therefore are search engines that can handle this. Also Metainformation are used by the search engines partially. Ie. Google can restrict searches to movies, stocks, weather, author, location, etc.; when there is a missing feature its mostly because it doesn't makes sense to get this information automatically because of to much noise or to less use of this feature in the most websites.
- See also: ?GoogleGuide: Advanced Operators Reference
- — MovGP0 10:54, 16 January 2007 (UTC)