Originally published on: Fri, 01 May 2009 02:33:14 +0000
Since the original publication of this post, I have moved to a different blog host. I have left some of the original post intact, but have corrected links to other posts to reflect their new addresses.
The approach was a deviation from the more common-sense route of using a programming language with an XML parser ( or better yet, an RSS parser ) available, but my curiosity got the better of me; I briefly pondered how concisely I could write an AWK script to sift out the simple data that I needed.
Line one sets the AWK field-separator to the regular expression [<>] which causes the angle-bracket characters prevalent in XML tags to act as delimiter characters.
The rule on the next line states that if the second word ($2) is either "title" or "link" print the second and third words ( $2 and $3 respectively ) separated by a space.
The output of the above script using my current RSS feed is as follows:
Note that since the XML markup in my RSS file doesn't contain CDATA tags and since I avoid XML entities I can get away with using a script similar to the above to extract the title and link to each post I've submitted in order from the newest post to the oldest post.
In a future post, we'll build a more sensible RSS feed processor that isn't dependent on the coincidental purity of the data in the title and link tags.
My immediate need was to build a script that would generate a web page that contained the list of posts and the links to each. We're just about there with the short AWK script.
The title and link to the blog itself ends up appearing twice at the beginning of the markup, so my second script (rssparse.awk) sets a first use flag variable called first to limit the output of that title and link to one.
The new script also stores the title when encountered and then outputs both the title and link when the link line is encountered.
The BEGIN and END special AWK rules are used to set the initial flag and will generate the HTML header and footer data.
To automate the process of downloading the RSS data, I use wget, the command-line HTTP retrieval utility.
Here are the two files that comprise my brief little system to download my RSS feed and turn it into simple HTML:
my_feed.bat
rssparse.awk
The resulting simple HTML bloglist.htm is as follows:
I will probably make a similar AWK script based on this one in the coming weeks that will build a small block of three or four random links from the blog to include on every new blog post.
Unless otherwise noted, all code and text entries are Copyright ©2009 by James K. Lawless
Save to del.icio.us
Save to StumbleUpon
Digg it
Save to Reddit
Share on Facebook
Share on Twitter
More bookmarks
| Previous post: | PHP, Transparent GIF's, and Web Tracking |
|---|---|
| Next post: | Safe Scripting with Scroll Lock and Caps Lock |
Subscribe!
Auto Save Clipboard Images Redux
Extending SpiderMonkey JavaScript on Windows
Rhino JavaScript to EXE with launch4j
Compiling Rhino JavaScript to Java
Directory Traversal in Rhino JavaScript
A Command-Line MP3 Player for Windows
Auto Save Images from the Clipboard
Java in a Windows EXE with launch4j
An Interview with Tom Zimmer: Forth System Developer
Setting Windows Console Text Colors in C
TAP : A Command Processor Library
Windows Text to Speech in WSH JavaScript
Preventing Windows Screen-Saver Activation
Extracting URL Addresses from Text in C
Thwarting HTTP Referer Trackbacks
MicroISV on a Shoestring
DadHacker
The Bottom Feeder
Writin' That Code!
The Recursive ISV
The Thomsen Blog
Prototypically Speaking
The Reinvigorated Programmer