MSN Search Updates Results Based on RankNet
Besides the news yesterday that MSN Local has launched, the people at MSN Web Search snuck in an update to their search results with an algorithm based on what MSN calls RankNet. The search results seem more relevant to the query and MSN feels that RankNet “has imporved [their] relevance and most importantly gives [them] a platform they can move forward on.” The new ranking technology is based on neural net, which was discussed by Microsoft in a research paper headed by Chris Burges titled Learning to Rank using Gradient Descent.
Apparently, using their RankNet, MSN overtime has judged which sites are quality sites and has slowly pushed those to the top of the rankings. In a diagram, MSN shows their results from Early May, Late May, and now June and how their results have changed to show more quality and authority sites in the top MSN results. Additionally, on the MSN Search Blog, MSN Search General Manager Ken Moss uses Japanese examples (in Romaji) to display the new results : “Aided by our new ranker, we were able to produce particularly big relevance gains in Japan. The queries tsutaya (A Multimedia Entertainment Shopping Company), win2000, and kat-tun (a music band) all showed noticeable improvement. The feedback we’ve received so far has been very positive so we think we are on the right track.”
Sagoii desu na dakedo doshite Nihongo no tatueba tsukaimasuka?
Hmmm.. why the Japanese examples Ken? Well, seems that some of the search technology patents which were files by Microsoft recently were filed by Microsoft researchers in Japan. We’re looking more into this and in our searches have identified two other Microsoft patents dealing with RankNet and Neural Net technology.
The first patent identified is “Method for scanning, analyzing and handling various kinds of digital information content” which mentions neural net in the Abstract :
Computer-implemented methods are described for, first, characterizing a specific category of information content–pornography, for example–and then accurately identifying instances of that category of content within a real-time media stream, such as a web page, e-mail or other digital dataset. This content-recognition technology enables a new class of highly scalable applications to manage such content, including filtering, classifying, prioritizing, tracking, etc. An illustrative application of the invention is a software product for use in conjunction with web-browser client software for screening access to web pages that contain pornography or other potentially harmful or offensive content. A target attribute set of regular expression, such as natural language words and/or phrases, is formed by statistical analysis of a number of samples of datasets characterized as “containing,” and another set of samples characterized as “not containing,” the selected category of information content. This list of expressions is refined by applying correlation analysis to the samples or “training data.” Neural-network feed-forward techniques are then applied, again using a substantial training dataset, for adaptively assigning relative weights to each of the expressions in the target attribute set, thereby forming an awaited list that is highly predictive of the information content category of interest.
And Chris Burges, mentioned in the MSN Search Blog post and head author of the Learning to Rank with Gradient Descent paper, was one of the co-authors of this patent application which describes neural network; ” System and method for identifying content and managing information corresponding to objects in a signal.” The abstract:
An “interactive signal analyzer” provides a framework for sampling one or more signals, such as, for example, one or more channels across the entire FM radio spectrum in one or more geographic regions, to identify objects of interest within the signal content and associate attributes with that content. The interactive signal analyzer uses a signal fingerprint extraction algorithm, i.e., a “fingerprint engine,” for deriving traces from segments of one or more signals. These traces are referred to as “fingerprints” since they are used to uniquely identify the signal segments from which they are derived. These fingerprints are then used for comparison to a database of fingerprints of known objects of interest. Information describing the identified content and associated object attributes is then provided in an interactive user database for viewing and interacting with information resulting from the comparison of the fingerprints to the database.
Special thanks to Bill Slawski for pointing out some of these patents.