GrubNG Protocol
[edit] Server functions
- Check robots.txt on sites to crawl.
- Prepare workunit with websites for client (for now amount of url's in one workunit is 250). In one workunit cannot be more than one link to this same page.
- Check uploaded by client arc.gz files - check for amount, order of links and correctness of arc file.
More information about workunit you can find in article Grub Workunit.
[edit] Client functions
- Download prepared workunit from server.
- Crawl given url's and create .arc file. Client don't follow any links on crawled pages and don't go redirect.
- After crawl, compress .arc file and send it on server.
- There is few differences between available clients. C client need to manually download workunits from server (for example by wget). C# client can run few crawlers simultaneously.
Information about available clients you can find in article Grub Clients
[edit] Additional information
Current User-Agent name for crawler: GrubNG 20080128
