Soup is a port of the Python Beautiful Soup HTML parser.
Load it:
Gofer new
smalltalkhubUser: 'PharoExtras' project: 'Soup';
package: 'ConfigurationOfSoup';
load.
(ConfigurationOfSoup project version: #stable) load.
Soup can be used to parse RSS.
| s |
s := Soup fromString: (ZnEasy get: 'http://samadhiweb.com/blog/rss.xml') contents.
((s findTag: 'channel') findAllChildTags: 'item') do: [ :ele |
Transcript show: (ele findTag: 'title') text; cr ].
Transcript flush.
The above code produces the following output:
Tested on Pharo versions 2.0 and 3.0 beta.
Tags: parsing, RSSThis blog began life as a set of static pages, generated by a home-grown content management system written in Smalltalk, imaginatively called SmallCMS1.
I've now rewritten SmallCMS1 to serve content dynamically, to support tag linking, like this: SQLite.
Each blog post page now has forward and backward navigational links just above the blog post title.
Rendering code now uses Seaside. More than a year ago, I blogged on that. Seaside now has a cleaner way to render static HTML, or maybe that previous blog post got it wrong. Anyhow, here's how SmallCMS1 uses Seaside's HTML rendering engine:
^ WAHtmlCanvas builder
fullDocument: true;
rootBlock: [ :root | self renderSiteRootOn: root ];
render: [ :html | self renderContentOn: html ].
Similarly, RSS is rendered thusly:
^ RRRssRenderCanvas builder
fullDocument: true;
render: [ :rss | self renderRssOn: rss ]).
Tags: content management, RSS, Seaside
Back to Seaside, it also provides an API to generate RSS. Indeed, that is how this blog's feed is generated:
| doc ctx root rss |
String streamContents: [ :stream |
doc := WAXmlDocument new initializeWithStream: stream codec: nil.
ctx := WARenderContext new document: doc.
root := RRRssRoot new openOn: doc.
rss title: self siteTitle.
rss description: self siteSlogan.
self blog do: [ :ea |
rss item: [
rss title: ea title.
rss author: self siteAuthor.
rss link: self baseUrl, ea blogUrlPath.
rss guid: self baseUrl, ea blogUrlPath.
rss publicationDate: ea timestamp printHttpFormat.
rss description: ea outputHtml ]].
root closeOn: doc ]
Tags: RSS, Seaside