A buddy of mine pointed me to a white paper by Zoltan Gyongyi, Hector Garcia-Molina, & Jan Pederson about a concept called TrustRank(PDF). Human editors help search engines combat search engine spam, but reviewing all content is impractical.
TrustRank places a core vote of trust on a seed set of reviewed sites to help search engines identify pages that would be considered useful from pages that would be considered spam. This trust is attenuated to other sites through links from the seed sites.
TrustRank can be use to
* automatically boost pages that have a high probablility of being good, as well as demote the rankings of pages that have a high probability of being bad.
* help search engines identify what pages should be good canidates for quality review
Some common ideas that TrustRank is based upon:
* Good pages rarely link to bad ones. Bad pages often link to good ones in an attempt to improve hub scores.
* The care with which people add links to a page is often inversely proportional to the number of links on the page.
* Trust score is attenuated as it passes from site to site.
To select seed sites they looked for sites which link to many other sites. DMOZ clones and other similar sites created many non useful seed sites.
Sites which were not listed in any of the major directories were removed from the seed set, of the remaining sites only sites which were backed by government, educational, or corporate bodies were accepted as seed sites.
When deciding what sites to review it is mostly important to identify high PR spam sites since they will be more likely to show in the results and because it would be too expensive to closely monitor the tail.
TrustRank can be bolted onto PageRank to significantly improve search relevancy.
Aaron Wall is the Author of SEO Book