Search Engine News | Search Engine Optimization

Microsoft Research Hits on Google’s Page Rank

Arnold Zafra

07/25/08

7 Comments

“The more visits of the page made by the users and the longer time periods spent by the users on the page, the more likely the page is important. We can leverage hundreds of millions of users’ implicit voting on page importance.” And so claims the findings of some Microsoft researchers in partnership with some Asian academic fellows in a research report on BrowseRank: Letting Web Users Vote for Page Importance.

The paper pointed out some problems with PageRank as it proposes a better way of valuing website in what it calls as BrowseRank. Among the points highlighted by the report against PageRank is the fact that is vulnerable to getting gamed by people through link farms. A strategy for rasining a website’s PageRank through multiple links coming from various sites that were created or were commissioned to link to a particular website to boost its search engine ranking. The paper also pointed out that PageRank’s indexing process doesn’t consider the time spent a user spends on a particular site.

The report then argues that:

“Experimental results show that BrowseRank can achieve better performance than existing methods, including PageRank…in important page finding, spam page fighting, and relevance ranking.”

The research is still on a continuing process and the researchers hope to further elaborate on the BrowseRank method of measuring the importance of website. But what the paper failed to mention is the fact that Google’s PageRank is not only dependent on links to websites but also on some other signals which its algorithn used to determine the position of websites on search engine results page.

For all its flaws and shortcomings, Google’s PageRank is still an industry accepted standard for a lot of things relating to website importance and values. And it would take more than user behavior analysis to disprove its importance and accuracy.In fact, we’re already anticipating another round of Google page rank update in the coming days.

7 Comments

  • gfigg says:

    Very interesting, although this has been tried before. DirectHit had a search engine built entirely on clickstream data (Acquired by Ask.com in 2000). They got the data from ISPs in those days. The end-result is really not that much better than Page-Rank.

    We at Me.dium on the other hand (http://me.dium.com/search) are processing our user’s clickstream data in real-time to create a different lens based on what’s going on now. e.g. do a search for John Edwards on Google or Live, and you get johnedwards.com and wiki/johnedwards. Do the same search on Me.dium and you learn that today people care about his love child, pictures of his mistress, etc.

    The difference is real-time (what people are browsing now) vs. historical (what they browsed in the past). Social vs. Old School. Check it out and let us know your thoughts. http://me.dium.com/search.

  • wolf says:

    The FAROO P2P Search Engine has been doing something very similar for some time already.
    http://www.faroo.com/english/technology/architecture.html

    FAROO’s “If users spend a long time on a page, visit it often, put it to bookmarks or print it out, this page goes up in ranking.”
    http://altsearchengines.com/2007/10/02/great-debate-peer-to-peer-p2p-search-part-i/
    sounds very familiar to Microsoft’s
    “The more visits of the page made by the users and the longer time periods spent by the users on the page, the more likely the page is important.”
    http://research.microsoft.com/users/tyliu/files/fp032-Liu.pdf
    doesn’t it?

    A very significant difference is though, that FAROO maintains the privacy of the user because it calculates the PeerRank in a decentralized manner, while Microsoft would collect all click streams of all users in a central server.

    It’s great to see that Microsoft research paper confirms that attention based ranking is able to outperform PageRank both for relevancy and for spam suppression.

  • Oded says:

    There is no doubt that PR is easily manipulated. However, in today’s SEO savvy world, I believe people are much more aware of what PR really is.

    Basically PR is just an indicator of the amount/quality of inbound links and it is only one indicator out of many more. PR by itself is meaning less. I mean you cannot compare a PR6 site and a PR3 claiming “ah the PR6 site is much better”. Get a lot of high PR links and you’ll get high PR… but that’s about it. It doesn’t indicate any ranking in the SERP. You can have a PR6 homepage with no ranking while that PR3 homepage has ranking. Plus considering live.com’s performance… Well I wouldn’t consider Microsoft as too much of an authority regarding search technologies :)

  • Liz says:

    I agree with comments made by Oded,
    There are number of factors involved.

  • paul says:

    But, how can they distinguish the the elpased time in the page between the meaningful and meaningless? Even the user left there seat from the PC, the browser will counting the time…

  • Frank McCown says:

    “Google’s PageRank is not only dependent on links to websites but also on some other signals which its algorithn used to determine the position of websites on search engine results page.”

    I think you might be misunderstanding what PageRank is. The Microsoft researchers are talking about the PageRank algorithm that was introduced by Page and Brin in a research paper. PageRank is *solely* reliant on incoming and outgoing links. Google uses PageRank as only one factor when determining which pages are relevant to a query. When two pages that for all purposes are equal in content match a query, it’s the page with the higher PageRank that will appear before the other page in the SERP.

    MS would not use just their BrowseRank in a ranking algorithm… they would also use a PageRank-like measure that accounts for the web graph.

  • Martin says:

    @Paul: i dont think it does really matter if the time some users spend at a page is “quality time”.
    because it is on average the result should only be influenced marginaly.
    however, there should be a limit to the score a page is able to get due to spending time, since the time spent on certain kind of pages is on average very different from that on other kind of pages (for example lexicon with short explanations versus university with linked diploma thesis). which would by the way address both problems.
    it is kind of interesting though, that the concept is pretty much the opposite of the TFIDF concept.
    it is of course a good thing to optimize the static ranking methods, but i think there is a lot more to achieve on the dynamic side.
    i would look for better search options, a topic selection (or search)e.g. then performing the actual search and rank them against those criteria. i am not talking about semantic search, thats probably a long way to go, to work for arbitrary topics/criteria. just some heuristics using what we already got

Leave a Reply