XStatic Design Notes

This is the ruby version of Staticizer. It uses a similar method as the Mote gem to parse ERB files very quickly.

See the m4/C version in ~/work/staticizer-m4/Readme for details.

Pros & Cons: Staticizer vs. SSI

While researching for a simpler way to build websites, I explored the plethora of static site generators. Most are overly engineered. Others seem easy from the user’s perspective, but are complex under the hood. I also wanted something with minimal dependencies, and preferably existing and standard tools.

So I decided to build my own using make(1) and m4(1). It worked and is super fast. The problem is that is is difficult to understand the m4 language. Plus it was not as flexible or extendible as something built with an interpreted language like ruby. Awk could be a possibility but I never explored it.

Then I decided to reimplement the make/m4 system using rake/ruby. It’s very nice, easy to use with a simple and straightforward implementation. It would also be very easy to add features, even though I haven’t done it.

The one problem with both of these systems is dependency. An HTML page is constructed from the content (markdown, html, erb), which is encapsulated in a layout. Optionally, the content and layout may include templates. The dependency problem occurs when a template is updated, how to update the content and layouts that may depend on it? I know make and rake are supposed to handle this stuff, but it requires manual intervention to assign the dependencies. This leaves lots of room for human error.

Not satisfied, I decided to keep looking for an answer. I was reminded of server-side includes (SSI). I had used them 20 years ago but jumped on the Ruby on Rails bandwagon and just forgot. After revisiting SSI, I think this may satisfy my requirements and deal with the dependency issue. They inherently solve the dependency issue by compiling the page “parts” on the fly. For example, if a template is updated, any page content or layout that depends on it will simply include the new version on the next request. That’s it!

But SSI is not without problems. It compiles HTML, not markdown, in which some of my content is formatted. So a processing step would be required, which diminishes its advantage over the make/m4 and rake/ruby systems. Therefore SSI doesn’t have a dependency problem but does require some preprocessing, which m4 and ruby do.

A notable issue with SSI is the web server configuration. It’s not as friendly or accessible as a makefile. For example, encapsulating a set of pages in a layout would be specified in the web server routing rules (e.g., location directive in nginx).

But the biggest problem with SSI is that, unlike the other systems, variables are only in-scope at the point of definition or inclusion. For example, a title in head cannot be set by a variable defined at the page-content level. The m4 processor solves this by ordering buffers. Ruby solves this with binding and runtime evaluation. The only way I see to solve this in SSI is to parse the include files and generate a “meta” file that would be included at the top of the document.

SSI also lacks the programming power of a language like Ruby. This power can be used for good. For example, auto-generating figure numbers and references within a technical document. This could probably be solved with SSI variables, but I think it would get ugly.

Statically generating HTML pages provides maximum portability. Any web server can serve them. However, SSI is much less portable. The SSI directives are somewhat similar between nginx and Apache; but other servers, like OpenBSD’s httpd, don’t support them. The bigger problem is that web server configurations are all unique.

A very powerful feature of SSI is the inclusion of a request. This provides a static page with dynamic elements. A drawback is client-side caching. If the page were cached then the dynamic effect would be lost. I guess this is where client-side Javascript is the answer.

A possibly minor disadvantage of SSI over static pages is performance. The web server is parsing—perhaps requesting virtual includes—and compiling complete pages on the fly. However, nginx does provide some caching solutions, but this is just more complexity.

An example SSI configuration with good presentation and content separation is <www.nginx.com/resources/wiki/start/topics/examples/dynamic_ssi/>.

A cool benefit of the rake system is that it scans the page sources looking for templates to add as dependencies, and, if a template doesn’t actually exist it complains.

Idea: could the include directive virtually request a markdown page, which will get compiled on the fly? That would be kind of cool.

Page & Template Processing

The problem with variables in XML comments (<!– –>) and processing instructions (<? ?>) in markdown is that kramdown will output them. Erb processing cannot use such instructions. Alternatively, when using the Erb processing instructions (<% %>), kramdown entity encodes the tags.

The immediate solution is to process with Erb first then kramdown. The only problem with this is that Erb may be processing all of the markdown only for a few variables at the top, or worse, for no reason at all. This could be optimized later by only reading the beginning of the file where the code block must be, then pass the rest of the file on to kramdown. A flag could be set in the preamble that indicates if the rest of the file should be processed by Erb or not. This could improve performance but would require more logic.

When processing markdown content, first run Erb to evaluate all processing instructions. Erb processing instructions may evaluate to html or additional markdown—either of which kramdown can manage. Then kramdown converts to html. Finally, the page renders with a layout.

Indexing & Search

At some point, need to add a search feature. Start at <en.wikipedia.org/wiki/Search_engine_indexing>. Should probably index or parse the final HTML files. It’s tempting to parse the markdown files directly because they are simple, but this would complicate the search engine because then it would also have to be adapted to search HTML, Erb, etc. Plus, HTML has additional semantics.

Been thinking about how to do this, but never got serious about it. While researching email clients, I ran across [Notmuch](notmuchmail.org) which is an email search utility used by Mutt and others. Notmuch uses Xapian under the hood. The port mail/mu is a searcher that also uses Xapian.

[Xapian](xapian.org) is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It is written in C++ and has many bindings, including Lua, Ruby, and Tcl.

Changed Resources

What if a style sheet or script changes? How will the layouts or pages that depend on them get updated?

Changes in CSS, Javascript, or other externally linked resources is not particularly in XStatic’s domain. This is the concern of HTML linking and HTTP caching. XStatic is unaware of the relationship between these linked resources. However, to serve new versions of such resources, XStatic can help overcome long cache times.

One solution is to add or update a version parameter to the resource’s URL. That will trigger a rebuild of the page, which updates the last-modified and etag headers sent by the HTTP server. With a proper cache configuration, the client will request the new version of the resource.

Path Name Conflicts Within the HTML Directory

Page names may conflict with directory or asset names or another page with the same name but a different extension. A rendered page is stored in an extensionless filename under the HTML site directory. For example, if there is a directory named “book”, then a page named “book.md” within the same directory will conflict.

content/
    book.md
    book/
        chapter1
        chapter2

The page “book.md” and the directory “book/” will have the same path name under the HTML site directory. To resolve this, move “book.md” to an index file under the “book/” directory.

content/
    book/
        index.md
        chapter1
        chapter2

Likewise, the pages “foobar.md” and “foobar.erb” within the same directory will conflict (i.e., they both render as the HTML file “foobar”). All markdown is preprocessed with Erb first; using “book.md” will suffice.