Everything About Google News Technology

Amit Agarwal put together a great overview of Google News Technology on his Digital Inspiration blog. Here are some snippets :

The first internal Google News demo appeared in Dec 2001. It used only 100 sources. Today, the Google News service scans 4,500 different websites real time, determines which news stories are related and then groups them based on importance. And there aren’t any journalists to work on the service, Google News is managed entirely by computer programs. Google News includes articles that have appeared within the past 30 days. Krishna Bharat, the Google Principal Scientist, is the brain behind Google News.

Amit also looks into the Google Paper :

Read the important points below mentioned in the Google paper on Combating Web Spam with TrustRank.

How is this (TrustRank) different from applying a weighting to PageRank?

It attempts to detect clusters of pages which have few inbound links, which also propagating “trust” scores to all other sites by using their linking structure. For sites that have many inbound links (high scroring in pagerank), the authors claim this modification tends to classify spam and reputable sites differently.

Will the owners of the pages / sites deemed to fall within the set of trusted seed sites get any money for all their hard work (i.e. hand-maintaining pages of links)?

No. However, they will get better search engine visibility, which is quite valuable.

What if such an owner decides to link to a page of commercial or spam links – will they get any money from the owner of the linked site?

The paper suggests using only highly reputable organizations with long-term stability for the seed pages. Government organizations, universities, very well known companies.

