Parsing RSS with Pharo Smalltalk

06 Apr 2014

Soup is a port of the Python Beautiful Soup HTML parser.

Load it:

Gofer new 
    smalltalkhubUser: 'PharoExtras' project: 'Soup';
    package: 'ConfigurationOfSoup';
(ConfigurationOfSoup project version: #stable) load.

Soup can be used to parse RSS.

| s |
s := Soup fromString: (ZnEasy get: '') contents.
((s findTag: 'channel') findAllChildTags: 'item') do: [ :ele |
    Transcript show: (ele findTag: 'title') text; cr ].
Transcript flush.

The above code produces the following output:

Tested on Pharo versions 2.0 and 3.0 beta.

Tags: parsing, RSS

Now Has Dynamic Content

06 Apr 2012

This blog began life as a set of static pages, generated by a home-grown content management system written in Smalltalk, imaginatively called SmallCMS1.

I've now rewritten SmallCMS1 to serve content dynamically, to support tag linking, like this: SQLite.

Each blog post page now has forward and backward navigational links just above the blog post title.

Rendering code now uses Seaside. More than a year ago, I blogged on that. Seaside now has a cleaner way to render static HTML, or maybe that previous blog post got it wrong. Anyhow, here's how SmallCMS1 uses Seaside's HTML rendering engine:

^ WAHtmlCanvas builder
    fullDocument: true;
    rootBlock: [ :root | self renderSiteRootOn: root ];
    render: [ :html | self renderContentOn: html ].

Similarly, RSS is rendered thusly:

^ RRRssRenderCanvas builder
    fullDocument: true;
    render: [ :rss | self renderRssOn: rss ]).
Tags: content management, RSS, Seaside

Static RSS Rendering with Seaside

07 Feb 2011

Back to Seaside, it also provides an API to generate RSS. Indeed, that is how this blog's feed is generated:

| doc ctx root rss |
String streamContents: [ :stream |
    doc := WAXmlDocument new initializeWithStream: stream codec: nil.
    ctx := WARenderContext new document: doc.
    root := RRRssRoot new openOn: doc.
    rss title: self siteTitle.
    rss description: self siteSlogan.
    self blog do: [ :ea |
        rss item: [
            rss title: ea title.
            rss author: self siteAuthor.
            rss link: self baseUrl, ea blogUrlPath.
            rss guid: self baseUrl, ea blogUrlPath.
            rss publicationDate: ea timestamp printHttpFormat.
            rss description: ea outputHtml ]].
    root closeOn: doc ]
Tags: RSS, Seaside