Hactar Go Bot

Hactar Go Bot is crawler that indexes SGF files in internet. Craling is done incrementally all the time. Typically recent games can be expected to appear in few weeks to search results (if site is generally updated very often).

Bot uses both hard-coded rules and multiple heuristics determine types of SGF files. Sometimes these types are identified wrongly. In such cases feedback would be appreciated.

Implementation

Bot crawling is implemented using apache nutch crawler. It is configured to be very polite to any web sites. Bot respects robots.txt strictly. Re-crawling frequency is adjusted adaptively depending on update frequency of site.

Indexing and search are hactar special implementations.

Feedback

Any feedback on crawling or found content is welcome!