Soup is a port of the Python Beautiful Soup HTML parser.
Gofer new smalltalkhubUser: 'PharoExtras' project: 'Soup'; package: 'ConfigurationOfSoup'; load. (ConfigurationOfSoup project version: #stable) load.
Soup can be used to parse RSS.
| s | s := Soup fromString: (ZnEasy get: 'http://samadhiweb.com/blog/rss.xml') contents. ((s findTag: 'channel') findAllChildTags: 'item') do: [ :ele | Transcript show: (ele findTag: 'title') text; cr ]. Transcript flush.
The above code produces the following output:
Tested on Pharo versions 2.0 and 3.0 beta.Tags: parsing, RSS
This blog began life as a set of static pages, generated by a home-grown content management system written in Smalltalk, imaginatively called SmallCMS1.
I've now rewritten SmallCMS1 to serve content dynamically, to support tag linking, like this: SQLite.
Each blog post page now has forward and backward navigational links just above the blog post title.
Rendering code now uses Seaside. More than a year ago, I blogged on that. Seaside now has a cleaner way to render static HTML, or maybe that previous blog post got it wrong. Anyhow, here's how SmallCMS1 uses Seaside's HTML rendering engine:
^ WAHtmlCanvas builder fullDocument: true; rootBlock: [ :root | self renderSiteRootOn: root ]; render: [ :html | self renderContentOn: html ].
Similarly, RSS is rendered thusly:
Tags: content management, RSS, Seaside
^ RRRssRenderCanvas builder fullDocument: true; render: [ :rss | self renderRssOn: rss ]).
Back to Seaside, it also provides an API to generate RSS. Indeed, that is how this blog's feed is generated:
Tags: RSS, Seaside
| doc ctx root rss | String streamContents: [ :stream | doc := WAXmlDocument new initializeWithStream: stream codec: nil. ctx := WARenderContext new document: doc. root := RRRssRoot new openOn: doc. rss title: self siteTitle. rss description: self siteSlogan. self blog do: [ :ea | rss item: [ rss title: ea title. rss author: self siteAuthor. rss link: self baseUrl, ea blogUrlPath. rss guid: self baseUrl, ea blogUrlPath. rss publicationDate: ea timestamp printHttpFormat. rss description: ea outputHtml ]]. root closeOn: doc ]