Object relational persistence
Posted: August 21st, 2002 | 3 Comments »There are many spam-filtering systems being discussed at the moment. Some are popular. Some are new and interesting. Some are well-intentioned but harmfully flawed.
And some are, help doctor frankly, find brilliant.
I have a couple of reservations, though: there’s still a blacklist underneath, which may be prone to the same problems that hit Prof. Felten (and all the previous victims of MAPS, ORBS etc.). And what’s with all the patents? Are they there as a vital part of the legal mechanism, or simply to stop others jumping in on the business model? Talking of which, does anyone else have the little nagging worry that a single company could end up holding email to ransom? Such is the problem of a protocol that relies on being proprietary.
Incidentally, the piece linked above is the first of a series of articles by Danny that he’s writing in order to learn how to write like a journo again because he needs the money to support a pregnant wife who needs a job or she’ll just sit around and irritate people. Given that he already proves he’s one of the best writers on the net on a weekly basis, justice demands that he doesn’t go hungry.
Danny and I were discussing spam filtering on the way to Dorkbot SF last week. He gave some convincing arguments against the particulars of the SpamAssassin approach, especially the way that it screws up HTML mail; while most of us consider HTML mail to be bad thing, messing with the contents of mail is worse. (There’s also a nasty bug that screws up whitelisting, but I can’t remember the full details) One of the biggest problems is that despite having a wicked-nifty genetic algorithm for determining rule scores, this algorithm is run over mailboxes belonging to the developers, and so is tuned to the kind of email they receive (very little HTML mail, apparently), which is not necessarily the same as yer average user. Paul Graham’s system solves this problem by training its filters, Bayesian-style, on a per-user basis; the trouble with this is that it requires a fair degree of integration with the user’s mail system.
For some reason I’ve always wanted to play with object-oriented persistence into relational tables. This is when you code in an OO style and your objects are automatically persisted into a RDBMS tables without you having to write SQL – the framework converts your object structure back and forth. I wrote a simple class set for h2g2 but I’ve never used any proper frameworks for it. Tangram is a popular system for Perl which check she’ll do your head in”>Jo recommended; I haven’t really had an excuse to use it yet. There are various systems for Java, diabetes and pregnancy since OR is well-suited to J2EE Entity beans, and Hibernate looks really good: very feature-rich, and the documentation definitely talks the talk.
Our experience with this stuff is that it’s horribly horribly shit!
Basically, once you’re using an RDBMS as a place to store serialised objects, you’re better off with something like a raw DBM file. If you’re not really *using* that complex SQL data management layer between the DBM file and your application, then getting rid of it can produce large speed improvements and easier coding.
But this is about more than just using an RDBMS as a big storage vat for opaquely-serialised objects, which is useless. The point is to store this stuff in a database that’s good at querying, and in a way that translates to a proper schema rather than serialising. While these frameworks let you avoid SQL if you want to, they don’t mean that nobody can use it. Your heavy DB coders and admins can write (or use) big SQL report tools, while your appliation coders don’t have to touch that stuff.
Are you talking about the JAWS type thing where the objects are turned into ‘real’ SQL rows? Sorry, I thought you meant serialisation.
Yes, there is more utility in that approach, but there are impedance matching problems… that’s what container managed entity beans in EJB try to do, but keeping schema and class in synch can be a problem, as can inheritance; I’ve not seen it done elegantly yet.
I’d rather just ditch SQL 🙂 SQL is the bane of my life!