Digging in the dirt: PDF Report and the Law of Demeter

I'm finishing a small project which uses Christian Haider's pdf4smalltalk to build report output using a Seaside influenced coding style. A report with a header, text and footer would be coded as...

| report |
report := PRwReport new.
report portrait.
report page: [:page |
page header string: 'This is a header'.
page text string: self someText.
page footer string: 'This is a footer'].
report saveAndShowAs: 'TestText.pdf'.

The tool supports the usual report output options, like fonts, alignment, tables, images, bullets and so on.

I've built a couple of other Smalltalk report frameworks over the years. One used Crystal Reports for the layout with configurable data gathering, and another (much better tool) that used Totally Object's Visibility (I did a presentation on that one at Smalltalk Solutions 2004). Both of those used a data + layout spec model, which, with the benefit of hindsight, was not be best choice. It was a challenge to keep the code and layout in sync. Maintenance was painful. For PDF Report I opted for Seaside's 'paint the content on a canvas' pattern. It is working nicely (I'll be presenting the details at Smalltalk Industry Conference in Biloxi).

Here's the part that got me thinking about how nice objects are and the Law of Demeter... when building a report output, you have to deal with coordinating the size and position of layout components on the page. Do you give the responsibility to the page, or do you have the layout objects find their own place? I opted for a 'builder': it knows how much space is available on a page and which layout objects need to be processed.

The interesting part was in deciding how much the builder needed to know about each layout. The first few iterations were rudimentary: each layout had a calculated height (word wrapped text with a selected font) and the builder would output as much as would fit on one page, then trigger a page break and continue on to the next page.

But that did not work with tables, since each row could have some cells that spanned pages. The builder could not blindly trigger a page break on a tall cell, since the next cell would be on the previous page. The table, row and cell had to communicate layout information to the builder, with the cell width and height dependent on neighbouring cells. And, to make things especially interesting, tables can be nested and cells can span rows and columns, like this...

...and this...

Each time I added a new layout mix to the SUnit tests I had to rethink what each object knew. After several iterations a pattern emerged: the less the builder knew about the layout objects, the better. And as the builder got dumber, its code got simpler and new layout mixes just worked. A tricky part was in sequencing the layout calculation for nested objects: a cell's height is dependent on the row's height, but the row's height is the maximum of it's cell heights.

Once the calculation sequence was correct, each layout object was able to answer it's layout values: position, margin and padding. The builder could ask if a layout could fit in the remaining space on a page without knowing what the layout objects was (text, table, bullet, image or line) and could create a new physical page without knowing how a layout object would be split. Now each refactoring cycle starts with me asking myself: how can I reduce what the builder needs to know? The latest version is much cleaner than the first. It's nice to apply well known object design rules and see real results.

Still a lot of work to do, but I'm looking forward to showing it at the conference. And, if it's good enough, it will be added to the VW public store (long term plans are to port to other Smalltalk dialects).

Simple things should be simple. Complex things should be possible.

Digging in the dirt

Sunday, 8 January 2012

PDF Report and the Law of Demeter

No comments:

Post a Comment