Fun with filters

Posted: November 6th, 2002 | 5 Comments »

“You’re not evil. You’re just really dirty.”

As Mozilla continues its evolution from browser to platform, disease more interesting side projects are popping up daily. They range from small-but-useful browser add-ons like
MozBlog to complete new desktop environments like OEone’s HomeBase via fascinating saucer-crash spin-offs like XP Server. (Oh, and this is probably a good place to mention Phoenix for the Mac, which is a good thing because it means there’s now a XUL runtime for OS X which isn’t dog slow.)

So off we go, we proud, evangelical Mozillians, to run around mozdev.org, slurping up XPI files as fast as our connections will carry them. But when half of them turn out to be unstable shite that reduce our browser to a mess of buggy widgets, what then? We search in vain for some kind of uninstaller, but there isn’t one. Most of the projects don’t even have proper Preferences panels, let alone a (usually unconnected) “Uninstall” button. Unless we decide to brave the horrific mess of subdirectories and cryptic XML and Javascript files to find the right wires to cut, the only resort is a full wipe and reinstall. Ouch.

For some reason, the current version of the XPI API, despite tons of useful functions, has absolutely nothing for undoing those functions. Apparently early versions of Mozilla had some kind of package uninstaller but it never worked properly.

This was going to be an entry bemoaning the lack of an uninstaller framework, but it turns out that help is on the way: see this Bugzilla bug proposing a nice ‘n’ easy uninstaller panel in the Preferences. Of course, all the Mozdev projects will have to rewrite their installers to work with it, but they’re already doing that for Phoenix and every other new browser that comes along anyway…

(Note: This trick requires you to be using either Windows 2000 or XP, there and to have not already uninstalled Winamp3 and returned to version 2, food sneering in disgust)

Before you start, online get yourself a decent skin. There. Doesn’t that feel better?
Start playing a track. Choose something fun and jumpy that Winamp will like. (We don’t care about what you like.)
Open the AVS window. (This is the visualisation window that does the pretty patterns. Look for it in the Thinger window. You want the icon that says avs in big letters.)
Admire the pretty patterns for a moment.
Double-click in the AVS window to bring up the Editor window.
Choose Display from the Settings menu.
Check the Overlay mode and Set desktop to color checkboxes on the right.
Aaaand… wheeeeeee!
Show it off to everyone in the vicinity.
Now see how long you can carry on working with that running.

This is the bit where I’m meant to whinge about Winamp3’s size, slowness, bugginess, horrific default skin and the fact that it takes five times longer to load than Winamp 2.0. Fortunately, I’ve been distracted by the continually-increasing fabness of the AVS. However, I will say this: If premature optimisation is the root of all evil then Winamp3 can look forward to an unhindered ascent to heaven (where it will doubtless be given a huge, oddly-shaped halo textured with a picture of Jennifer Love Hewitt).

(Note: This trick requires you to be using either Windows 2000 or XP, there and to have not already uninstalled Winamp3 and returned to version 2, food sneering in disgust)

Before you start, online get yourself a decent skin. There. Doesn’t that feel better?
Start playing a track. Choose something fun and jumpy that Winamp will like. (We don’t care about what you like.)
Open the AVS window. (This is the visualisation window that does the pretty patterns. Look for it in the Thinger window. You want the icon that says avs in big letters.)
Admire the pretty patterns for a moment.
Double-click in the AVS window to bring up the Editor window.
Choose Display from the Settings menu.
Check the Overlay mode and Set desktop to color checkboxes on the right.
Aaaand… wheeeeeee!
Show it off to everyone in the vicinity.
Now see how long you can carry on working with that running.

Look Around You! In particular, public health
look at the Periodic Table.

Following on from being an unwitting accomplice in the approved 3604, diagnosis 807299,00.html” title=”BATTERED COD – FRYING SQUAD”>”Welsh drivers using cooking oil instead of diesel” controversy, British supermarket chain Asda has started running its fleet of lorries on reprocessed chicken fat:

Asda produces more than 50m litres of used cooking oil and 138,000 of waste frying fat every year from its canteens, restaurants and rotisseries. The gunge was a disposal headache rather than a potential money-earner until an unexpected phone call last spring.

“We were approached by a biodiesel firm, which cleans up waste cooking oil, adds a bit of methanol and sells it as a much cheaper alternative to diesel,” said Rachel Fellows of Asda yesterday. “We were only too happy to do business with them.

“But then we thought: hang on, isn’t there something we can do here for ourselves?”

Biodiesel, while being a combustion fuel, is not only considerably cheaper than normal diesel but releases only 40% of the emissions, as well as being “simple to use, biodegradable, nontoxic, and essentially free of sulfur and aromatics.” (see the FAQ) Plus, “Biodiesel is the only alternative fuel to have fully completed the health effects testing requirements of the 1990 Clean Air Act Amendments.”, which may be of interest to Californians.

One of the hottest topics this past week has been the formation of Mitch Kapor‘s OSAF and its big project, healing look you fools”>Chandler, a kind of souped-up do-everything PIM. The term that’s being bandied about at the moment is “Outlook on steroids” but, as the product page says, Outlook is not the right comparison model here. The feature summary looks like a standard-issue email client until you hit the bullet points at the bottom:

structure data how you like it, view it that way, change your mind at any time

automatic recognition of names, places, dates, and etc.; automatic categorization of items

Those features go way beyond what most PIMs offer today, yet they come from a 15-year-old DOS program. Chandler’s true daddy is the PIM that Kapor delivered to the market in the 80’s: Lotus Agenda.

Prompted by others’ reminisence on haddock this time last year (“I’ve been playing with Lotus Agenda via dosemu, and it’s fucking fantastic,” said Nick. “That program damn near got me organised,” said Danny) I did some snooping and Googling, determined to find out what it was that made the program so useful. (I didn’t have a PC to play with until ’94, so Agenda passed me by completely)

A good starting point is Michael Stocker’s Agenda links site. Off it, we find Walter Rowe’s quick tour through the magic of Agenda:

What does Agenda do? With it, you can weed through a mountain of information and arrange it into categories. Suppose you keep track of phone calls, are writing a proposal, and need to maintain a daily calendar. If you type “Call Bob and tell him to send the proposal notes by Friday,” Agenda is clever enough to read this sentence and assign this item to the categories: phone calls, proposal, even Friday. When you ask the computer what you have to do Friday, it will remind you to call Bob and insist that you ask him about the proposal.

The automatic recognition of facts (dates, people, places etc.) within free text is obviously incredibly useful and not all that hard, so it’s odd that few PIMs have done it since. It’s the magic icing on top of Agenda’s most prominent feature, in fact the core of its design philosophy: the flexibility of data formatting and categorisation. If you want to just enter stuff in free, flat text, you do that. Agenda will help you sort it.

Lotus Agenda is the only available database in the market that allows the keying of data to precede the creation of database tables. It may appear dull, since more thought has been given on its internal design than on its physical appearance, but Agenda is an excellent tool for sorting piles of information into meaningful categories. With this program, users can keep track of their activities, writings, research, notes, expenses and even other programs. Agenda can accurately read dates in practically any wording, from ‘next week from Friday’ to ‘6/30/93,’ and can create a separate category for items that have not been classified under any particular category.

Much heavier detail about the structure of Agenda is available in this document from Agenda’s creators. The most notable part is their list of key design requirements for the program:

The user must be able to easily enter, edit, and manipulate free textual items without concern for the underlying structure of the database.

The user must not be required to specify the structure of the data in advance and must be able to modify the database structure as it evolves without losing data or reorganizing the database.

The user must be able to define reports in idiosyncratic formats. Through these reports, the user must be able to create and modify both database structure and content.

The reports mentioned above are referred to in Agenda as views, which work in a similar way to views in RDBMSes. From James Fallows’s lovesong to Agenda for The Atlantic in 1992:

Views, finally, are presentations of the information in
your items, arrayed and selected according to the categories you
specify. This may sound similar to what a normal data-base
program does. With Paradox, dBase IV, RBase, and so on you can
retrieve pieces of information, through a “query,” according to
the criteria you choose. (“Show me the last name, first name, and
phone number for all families whose addresses have a zip code
from 10001 to 10292.”) The difference is that Agenda eliminates
the need for queries. In most data-base programs, there is one
bed-rock chunk of data, the mother lode, from which you request
samplings from time to time. In its fundamental technology,
Agenda also has one mother-lode of data, but – in ways that are,
again, easier to appreciate on the screen – it creates the
illusion that the information exists in small, pre-customized
chunks. You can create an Agenda view called “New York City,”
comparable to the zip-code query above. Whenever you flip there,
with one key, it can show you all the dealings you’ve had with
anyone in New York.

(I love his hyphenation of “data-base”. It feels so quaint and different to the way we write to-day.)

The use of user-defined categories was a key part of Agenda, but you could save the task of categorisation until after your data had been entered. Furthermore, the task was made much easier by being able to define remarkably-capable sets of rules and triggers; see the section on “Automatic Assignment and Implicit Actions” in the designers’ overview. Most importantly, a piece of information could be tagged with any number of categories.

So the way Agenda worked was to let you enter your data (contacts, appointments, notes, ideas) as freely as you liked, then slice and dice with views. However, it could give you a formalised interface to your information, depending on context: Agenda 2.0 came with Planner, a sample, customised view suited to appointments and to-do lists.

One negative point that I continually come across is the idiosyncracy of Agenda’s interface. While not being too hard to learn, it was still different enough to put most people off, people who preferred to stick to classic interfaces such as Lotus’s other PIM, Organiser. This was its downfall. Rowe:

Victor Cruz, spokesman for Lotus Development Corp., says Lotus stopped developing Agenda after selling only 100,000 copies. They thought Agenda was too difficult to learn, so they bought a no-brainer program called the Threads Organizer from a company in the United Kingdom. Threads looks like a notebook and a day calendar, so it is obvious what it does. Agenda is more subtle. Lotus has sold 450,000 copies of Threads.

Jimmy Guterman speculates that the program fell foul of the subjective suitability of most freeform idea managers:

It’s unlikely that all of the people who bought (or whose companies bought them) Agenda used it, or used it as suggested–not everyone’s mind works like Kapor’s. Anyone who has taken a single course in perception or neurobiology knows that every person’s brain interprets and organizes information differently. There are basic similarities (i.e., we all use the occipital lobe for visual information), but our neurons are as unique as our fingerprints. It’s easy to be skeptical when a company claims to have a program that “organizes your computer like your mind.” A recent PIM, “The Brain,” made such a claim, but it only worked like the developer’s brain and appears to have flopped in the marketplace.

The feature that appears to be most relevant to Agenda’s usefulness, and most lacking in today’s applications, is its use of views and categorisation to slice your information in as many ways that you need. From Guterman’s 1998 interview with Mitch Kapor:

But Kapor realizes that, as millennium approaches, none of the currently popular PIMs match the original vision Agenda. “Oh, we’ve had some evolution. PIMs have evolved a lot. They’ve gotten better at handling contacts and appointments. They’ve become very sophisticated. But the one thing that was the greatest thing about Agenda and why it still has a cadre of followers is the one thing that hasn’t been incorporated into PIMs: multifiling.”

“Today,” Kapor observes, “the PIMs are very Web-influenced, they have connectivity features and all, but they’re stuck in the old mindset. They’re focused on managing contacts and calendars. Agenda was all about managing ideas. Maybe that means Agenda isn’t really a PIM. But then again, the term ‘PIM’ was invented by Connell Ryan, Agenda’s marketing manager, at the time of the product’s first release. He invented that category name, but in retrospect the category didn’t describe what Agenda was.”

I’ve certainly been continually astonished by the lack of these relatively basic features in popular applications. The most obvious one is email: I have yet to find a decent personal-level email system which will let me file the same mail in more than one folder, or allow me to store and reuse views across my mailboxes. I certainly can’t get anywhere near Agenda’s rule and action capabilities without getting into my mail server and writing code. As my friend Manar Hussain said to me, years ago: “Your email is probably the most important database you have, so how come you can do so little with it?”

It also got me thinking about something we take for granted in the software world: continual feature evolution. We tend to think of software functionality as being on a linear good-bad scale. Good tech evolves and thrives, bad tech dies. Yet this is one case where some obviously good technology had to sit in the dustbin of history for many years before being revived; it’s lucky it’s being revived at all (and it still may not be, given Chandler’s current non-existence). If Apple hadn’t rescued NeXT from oblivion, what would have happened to brilliant ideas like Display Postscript?

It’ll be many months before Chandler is anywhere near useful, but I’ll be keeping a close eye on it. It sounds like this thing is easily the closest to my dream PIM, and anything that has the faintest hope of getting a shloch like me organised deserves the red carpet treatment.

(Coming up next: A take on the whole OSAF/Open Source anti-competition argument, and a brief overview of some of the other idea managers that have arrived since Agenda. I’d particularly welcome suggestions for the latter.)

Radio 4’s Book At Bedtime was utterly fantastic tonight: Ewan McGregor read Anton Chekov’s short story “The Bet”. You can catch it info I’m afraid”>here, but only for the next week. Go on, it’s only 15 minutes.

I forgot to mention that you can download and play with Agenda 2.0 right now; you just need to be able to run DOS programs. Practically any version of Windows will run it. Linux can run it under DOSEMU. (I wonder if anyone’s ported DOSEMU to OS X yet?)

Michael Stocker’s Agenda site has instructions on downloading and creating install disks, vitamin but there’s an easier way: use this nicely-zipped install.

Michael Stocker’s Agenda site has instructions on downloading and creating install disks, and but there’s an easier way: use this nicely-zipped install.

An astonishing discovery thanks to Google: one Health
grow up”>Pictures of my cock!

Michael Stocker’s Agenda site has instructions on downloading and creating install disks, and but there’s an easier way: use this nicely-zipped install.

An astonishing discovery thanks to Google: one Health
grow up”>Pictures of my cock!

Dammit, patient
prostate The Arecibo message is too wide to use as a phone logo!

Mind you, site the new message, website like this
while being way bigger, is much cooler. (Have a flick through the PDF)

As I was saying to salve apparently”>Quinn just now, physiotherapist it’s good to know that in this age of uncertainty, when new-media heroes rise and fall faster than their socks, there are still some people you can rely on: the ones who are convinced that a homosexual elite is running the British Internet. One particular ambassador to The Land Of The Queer Jackboot has been harrassing various friends of mine, on and off, for several years now. I don’t want to invoke him, but let’s just say he’s infamous across several UK mailing lists for his intense streams of vicious phone calls and emails, not to mention a bipolar instability akin to a crackerjack compass needle.

He’s been calling and mailing Quinn pretty much non-stop for the past week or so, making all kinds of bizarre accusations and innuendos. (Apparently he was terribly disappointed to discover that Quinn is both female and Danny’s real wife. Must have ruined his wild three-boys-in-a-bed fantasies.) And being Quinn, rather than trying to run from this nutter, she’s blogging about it, and asking for all the traffic she can get. (So at least she’s guaranteed Dave Winer’s support. Hey, maybe we can set him and Ian off against each other. I’d pay to watch that.) She has also asked me to spank her regularly if she doesn’t keep it updated. This blogging lark gets better every day!

To echo the stalker’s own sign-off: Without prejudice!

So there I was, case desperately trying to get some work finished before going out, overweight when apoplexy he said. It’ll take five minutes, he said.”>Matt IMed me out of the blue and derailed me in the surest way possible: namely, by asking me to do a quick fun bit of Perl hacking for him. (Click here to see why)

What he wanted (and was sure existed out there somewhere but couldn’t find it) was a running CGI script (sorry, RESTian Web Service) that takes a URL to a page and replaces all occurences of a given string with another string before spitting the page out. Since I had the code on hand already (in my form linker service) it only took a couple of minutes. You can try it here:

As you may see if you run the default example above, it’s good but it’s not
perfect. This is a shame because, apart from the base URI problem, writing these kinds of filters is an utter doddle. (That’s British for “really easy”) You too can write your own Pornolizer or ValleyURL! See below for a brief bit of exploration and advice, as well as a plea for help to anyone who can help fix the bug.

Let’s get the basics out of the way first: even a novice server-side coder can write this stuff in a few lines. All your script is going to do is:

Read in some form variables to use
Go fetch a page from a given URL
Alter the page in the desired fashion
Change the base URI of the page so as to keep relative links working
Spit it out

If you can’t do point 1, you need to go off and learn about CGI programming.

Point 2 is usually achievable with one line of code if you have a decent URI/web toolkit. I use the fantastic LWP::Simple for Perl, like so:

my $html = get($url);

And that’s all it takes to grab a page from the web and stick it in a string.
It is also all it takes to expose a freaking huge security hole.

Most generic URI-fetching libraries will fetch many different types of URIs, not just the ones that start with http://. And this URI that you’re fetching was given to you from an untrusted source. So if some joker comes along and types in:

file:///etc/passwd

… I hope you see the problem. Fortunately, all you need to do is check that the URI starts with http:// and return an error if it doesn’t, and you’re sorted. (You may want to allow https:// URIs too)

Next, altering the content. Hey, this kind of text munging is what Perl is for. Note, however, that passing data straight from CGI input into a regular expression can expose more security problems, so clean those variables up first with the quotemeta() function. (This exists for PHP too)

my $target = quotemeta($cgi->param("t"));

Point 4 is where it gets a bit hairy. The base URI is the web server folder that the browser will look in for relatively-addressed files referenced by the page. As an example, suppose the page we’re dealing with lives at http://www.domain.com/dir/page.html. The base URI of this page is http://www.domain.com/dir/. The page includes an image specified like so: <img src="frog.png"> . To fetch this image, the browser will append the filename to the base URI to get the complete URI.

The trouble is that the browser works out the base URI from the URI it used to fetch the page, which in this case isn’t going to work because the URI points to our filter script and not the original page. So we need to manually force the browser to use a different base URI.

First, we need to derive the base URI from the URI we used to fetch the page:

if ($url =~ //[^/]*$/)  # match everything after the last slash
{
$base = $` . "/"; # now grab everything before
}

There are two ways to force a base URI change, and I use both of them. The first is to change the HTTP header you output to the browser and add a Content-Location: attribute which specifies the new base. The second is to add a BASE element to the document’s HEAD. There are clean and proper ways of doing this, and I’m going to ignore them and just do a dirty regexp:

$html =~ s/(<HEAD([^>"']*|'[^']*'|"[^"]*")*>)/$1 <BASE HREF="$base"> /si unless $html =~ /<BASE HREF=/;

What that bizarre mess does is look for the document’s HEAD tag and stick a BASE tag immediately after it – but only if the page doesn’t have a BASE tag already. (Let’s hope it’s not in a comment.)

And this is where my bug comes in. Modifying the base seems to work fine for all relatively-addressed items apart from stylesheets, both in MSIE and Mozilla. I don’t know why, and it’s rather irritating. I decided to have a look at some of the better known filters on the web, and found that Pornolize does a bizarre trick that achieves partial (but not entire) success: they modify any LINK and META tags in the page like so:

From:
<link href="http://cheerleader.yoz.com/styles-site.css" type="text/css" rel="stylesheet">
To:
<link /="/" href="http://cheerleader.yoz.com/styles-site.css" type="text/css" rel="stylesheet">

Now, what the hell does /="/" do?

Am confused. The deep practical interaction between HTML and HTTP is bizarre and impenetrable, and I am tired, so this entry ends here with me shaking my head in despair. Do let me know if you can ease my plight.

5 Comments on “Fun with filters”

1 Earle said at 8:37 pm on November 6th, 2002:
Nice! But as you’ve pointed out, not without some issues. (Try, for example, replacing all instances of “the” on, say, http://downlode.org/blog.pl with “these”.) Still, good one.
2 paul said at 10:55 am on November 28th, 2002:
You also get problems if html tags are fiddled with (eg replace ‘<‘ with ‘ ‘, or ‘div’ with ‘img’ on any page).
In other news, the HTML::Munger module does a lot of the hard work for you. It avoids your problem by doing it the hard way, by rewriting all relevant urls in the page rather than using a base href.
3 Yoz said at 5:47 am on December 1st, 2002:
Earle: Cheers for the bug report; I’ve fixed that.
Paul: HTML::Munger’s nice but doesn’t actually do that much. Plus, I think that ignoring the fact that HTML and HTTP already have a (mostly) good method of changing the base URI is a bit silly. I may use it for a future filter rewrite.
In other news, that example bug I talk about for much of this entry seems to be gone; the filtered page now calls the correct stylesheet, without my having changed anything. Odd.
4 Chris Rimmer said at 9:39 am on January 24th, 2003:
Unfortunately it also finds and replaces in links, image names etc. So if you fetch http://www.oracle.com and replace “ora” with “xxx” you end up with links to xxxcle.com and all the pictures disappear, because it is looking at xxxcle.com for them!
5 talijanska said at 10:20 pm on December 3rd, 2003:
olá otários de merda.

Yoz Grahame's Unresolvable Discrepancy

I came here to apologise and eat biscuits, and I'm all out of biscuits

Fun with filters

5 Comments on “Fun with filters”

Archive

yoz's bookmarks

yoz on twitter

Meta