GrubNG Protocol

[edit] Server functions

  • Check robots.txt on sites to crawl.
  • Prepare workunit with websites for client (for now amount of url's in one workunit is 250). In one workunit cannot be more than one link to this same page.
  • Check uploaded by client arc.gz files - check for amount, order of links and correctness of arc file.

More information about workunit you can find in article Grub Workunit.


[edit] Client functions

  • Download prepared workunit from server.
  • Crawl given url's and create .arc file. Client don't follow any links on crawled pages and don't go redirect.
  • After crawl, compress .arc file and send it on server.
  • There is few differences between available clients. C client need to manually download workunits from server (for example by wget). C# client can run few crawlers simultaneously.

Information about available clients you can find in article Grub Clients


[edit] Additional information

Current User-Agent name for crawler: GrubNG 20080128

Retrieved from "http://search.wikia.com/wiki/GrubNG_Protocol"

This page was last modified 14:25, 29 March 2008. GFDL