I've spent a few hours on version 2 of The Corner Office this weekend, and I've made some good progress. The reader-facing stuff is quite close to complete, with a brand new look and feel and a few more 'bloggy' features to complement what's in the current Domino-based design. What's needed now is the back-end 'admin' functionality, and I'd be lying if I said it was the kind of work I'm relishing. By and large it's just forms & views and displaying and validating and persisting, and that's never thrilling.
That gets me to something of a cross-roads in this blog's design. When I started work on the blog, I decided to use a PostgreSQL backend, and so my data layer is basically good old JDBC code. I've come across two Java blog projects recently,
Blojsom and
Pebble, and what's intrigued me about both of them is that they don't use a database at all - everything's stored on the file system.
There are a few things about this that appeal to me. In a memory-lean hosted environment, avoiding the overhead of a database server is worth something, even it means more heavy lifting to access, categorise, index and store raw data. (Another way to look at it is that for a small blog, the meg-or-few saved on not having to worry about an RDBMS is available for cacheing.) The blog data is also more accessible. It means that a lot of 'admin' work can be done via SSH, vi and rsync, and means I can leave out or postpone building some of the web-based admin functionality. Above all, it presents a few interesting problems to solve, which will be useful for some pet projects I want to get going with as soon as the blog is done.
The one question I haven't answered for myself is performance. File system versus database? It seems to be something of a religious war. There's a fair amount of argument in both directions, even when it comes down to pure performance. The intuitive argument (which I tend to side with) is that file system access is faster, but the problem is exactly that - many comments or answers on mailing lists might be based on the 'obvious' answer that a file system must be faster, because, it's just obvious, dammit. But when you dive into issues like index structures, data storage formats, number of system calls, and the like, it's not that cut-and-dried, especially when you're going beyond simple 'find file x' scenarios. Apart from getting to the data you need (ultimately, a file system requires index tree traversal, much the same way an RDBMS does), you have to weigh up decoding of data records in C code, being serialised, flying up and down a TCP/IP stack, and being re-parsed by JDBC, to straight-up file I/O, but having to worry about XML parsing in slow(er) Java, etc. I suppose the only way to know for sure is to benchmark both, but that's not top of my priority list right now. It's safe to say though, that the performance differences are going to be slight enough for a quiet blog like mine, that I can afford to worry more about other issues like simplicity, memory and functionality.
It would be nice to know for certain, though.
Thankfully, the blog design is nicely tiered, and so if I do move away from database storage, it's just a case of implementing a new data layer, and if it turns out to be a problem, I can always move back.
File under: thee_blog, techie : {2005.07.23 23:34} : Comments (0)