For those of you that like to follow social media sites such as Digg, an easy analysis tool may be of some use to you. Yahoo Pipes lets you very quickly put together a suite of tools to organize a web feed’s items. In this example, I’m going to to sort the Digg homepage RSS feed by the submitter of each story.
To do that, we need to manipulate some of the content of the Digg feed using the Yahoo Pipes Regex (regular expressions) module. Otherwise, all the information we need is in the feed.
Regular expression patterns:
I’m not going to get into an elaborate discussion of regexes. Instead, I’ll just list what I’ve used in the screencast video. (If you’re familiar with regexes already, bear with me.)
- ^ – caret – match the beginning of a string.
- $ – dollar – match the end of a string.
- .* – dot star – match any sequence of characters.
- ^.*$ – match the entire string.
- (^.*$) – match the entire string and save it in parameter 1, aka $1.
Digg feed variables used:
The Digg home page RSS feed has a number of fields/ variables that we can access in Yahoo Pipes. In this example, I’ve only used one:
Within Yahoo Pipes, to access it, we place braces (curly brackets) around it:
These are the steps I take in the video below.
- Grab the Digg home page feed.
- Insert the digg username (of the story submitter) in the item.title field’s values, at the beginning of the title, surrounded by square brackets.
- Do the same with the item.y:title field. (This is probably redundant, but it’s not a big deal.)
- Replace the item.description fields with nothing – i.e., an empty string. For our analysis, getting rid of the description reduces visual clutter in the results. It’s just easier to see only the title and submitter.
- Sort the resulting manipulated feed by the item.title.
What we’re doing is taking a story title such as
Paris’ Sob Story
[RainbowPhoenix] Paris’ Sob Story
for each home page story. The string in the square brackets is the name of the Digg member that submitted the article. So ^.*$ matches “Paris’ Sob Story”, and the () brackets assigns this string to $1. Thus the Regex replace rule (^.*$) for item.title takes the very same title and inserts the current digg username in square brackets in front of the title.
Other than getting rid of the story description, this all we’re really doing, followed by a sort on the title values.
Yahoo Pipes modules used:
- Fetch Feed
Here’s a SplashCast screencast showing the process of creating the Pipe. (Apologies for the choppy narration, as I had to use an earlier voiceover due to upload problems.)
You can take my Digg by Submitter pipe, clone and tweak it to your heart’s content. Or wait for the next one. In the next part of this mini-series, we’ll sort the Digg homepage by category (and prove an Apple bias for the home page).