Parsing RSS with Pharo Smalltalk

6 April 2014

Soup is a port of the Python Beautiful Soup HTML parser.

Load it:

Gofer new 
    smalltalkhubUser: 'PharoExtras' project: 'Soup';
    package: 'ConfigurationOfSoup';
(ConfigurationOfSoup project version: #stable) load.

Soup can be used to parse RSS.

| s |
s := Soup fromString: (ZnEasy get: '') contents.
((s findTag: 'channel') findAllChildTags: 'item') do: [ :ele |
    Transcript show: (ele findTag: 'title') text; cr ].
Transcript flush.

The above code produces the following output:

Tested on Pharo versions 2.0 and 3.0 beta.

Now Has Dynamic Content

6 April 2012

This blog began life as a set of static pages, generated by a home-grown content management system written in Smalltalk, imaginatively called SmallCMS1.

I've now rewritten SmallCMS1 to serve content dynamically, to support tag linking, like this: SQLite.

Each blog post page now has forward and backward navigational links just above the blog post title.

Rendering code now uses Seaside. More than a year ago, I blogged on that. Seaside now has a cleaner way to render static HTML, or maybe that previous blog post got it wrong. Anyhow, here's how SmallCMS1 uses Seaside's HTML rendering engine:

^ WAHtmlCanvas builder
    fullDocument: true;
    rootBlock: [ :root | self renderSiteRootOn: root ];
    render: [ :html | self renderContentOn: html ].

Similarly, RSS is rendered thusly:

^ RRRssRenderCanvas builder
    fullDocument: true;
    render: [ :rss | self renderRssOn: rss ]).

Static RSS Rendering with Seaside

7 February 2011

Back to Seaside, it also provides an API to generate RSS. Indeed, that is how this blog's feed is generated:

| doc ctx root rss |
String streamContents: [ :stream |
    doc := WAXmlDocument new initializeWithStream: stream codec: nil.
    ctx := WARenderContext new document: doc.
    root := RRRssRoot new openOn: doc.
    rss title: self siteTitle.
    rss description: self siteSlogan.
    self blog do: [ :ea |
        rss item: [
            rss title: ea title.
            rss author: self siteAuthor.
            rss link: self baseUrl, ea blogUrlPath.
            rss guid: self baseUrl, ea blogUrlPath.
            rss publicationDate: ea timestamp printHttpFormat.
            rss description: ea outputHtml ]].
    root closeOn: doc ]