Search Canyon deploys technology that helps disambiguate search queries, essentially assisting the user to dig deeper into results, suggesting links that makes a better match to results the user seeks recommendations on. The approach is unique in the sense that it makes search an ever continuing process where the machine assists user through a continuous feedback loop. The engine integrates well with Google, Yahoo and MSN Live results.
Mark Cramer, CEO at Surf Canyon shared his thoughts on the use of user feedback for better results and how SurfCanyon works.
1. Can you provide us some information on the how you got started on SurfCanyon and what technological concepts it is grounded in?
[Mark] The original concept was born out of some frustration I was having with search back in March 2006. In short, I read an article in a magazine on plane and when I got home wanted to find it online. Unfortunately I couldn’t remember the title, author or even which magazine it was. After spending hours plugging in key concepts from what I read, I gave up trying to find it. It was a few weeks after that that I occurred to me that the search engine should have provided more assistance during the process. After all of my queries, reformulations and clicks, I felt like that the search engine should have had a better idea of my intent and, as a result, reduced the burden. I thus developed the idea of “real-time implicit personalization”.
At the time, of course, I didn’t know what to call it, but the principal was to observe user behavior signals as the search in taking place in order to build a real-time model of the user’s intent. This model can then be used to calculate the “instantaneous” relevancies of documents in the result set which are then exploited immediately, bringing forward pertinent information while pushed back that which is deemed less relevant. In essence, it’s an application that sits atop of search engine and works with the user as the search is being conducted.
Today we employ all sorts of technological concepts, from vector analysis to linear algebra, clustering and semantic matching. It’s combining of all of these things appropriately along with our proprietary technologies and algorithms that makes the task difficult.
2. SurfCanyon takes the middle path to personalization with logs maintained per session. If a major search engine does this it is gets construed as privacy breach. Where do you draw the line?
[Mark] We don’t have access to any information that isn’t already available to the major search engines. When logged in, Google personalization, as you know, maintains a record of the user’s entire search and click history. What we’re doing is taking a sub-set of that information and using it differently for the user’s benefit. Ultimately, however, there’s no technical reason preventing us from pushing the entire application to the client, giving the user complete control over all personal information beyond the queries they submit. We believe that this would be very appealing from a privacy perspective and we’ll be looking into this more as we move forward.
3. What is the next milestone for SurfCanyon? Would you be looking to verticals or a different approach to viewing data?
[Mark] Right now we’re focused on continuing to optimize the solution that’s been in development for the last two years. In the same way that PageRank, and its accompanying relevancy calculations, is a complex set of algorithms that are constantly being adjusted and perfected, the value of our technology is directly related to the relevance of the results we recommend. We’re tackling a very difficult problem, so in the near term we’ll be focused on improving what we’re doing already.
That being said, we’ll be looking to apply our solution to search products outside of Google, Yahoo! and MSN Live. This will most likely include vertical search engines.
4. Does SurfCanyon extend to audio and media related content as well?
[Mark] In general the technique can be applied to any search that produces a ‘large’ list of potentially relevant results in response to a user input. Non-text content would need to be handled differently, but we’ll get there eventually.
5. Do you see this technique going more mainstream with users getting comfortable to the fact that their input is critical to making results relevant?
[Mark] There’s no doubt in our minds. Put simply, it is becoming increasingly difficult, if not mathematically impossible, to access the tens of billions of items of content on the internet using two- and three-word queries. Furthermore, entering more keywords, in addition to adding difficulty and work for the user, can often have the unintended consequence of eliminating potentially relevant information. Precisely expressing intent, no matter how many keywords are used, is, for many reasons, an often impossible task. Our approach is a powerful and elegant solution that enables users to locate relevant information without the onus of having to explicitly express intent.
Thank you Mark, for taking the time to answer these questions.