Lists Home |
Date Index |
I asked a question about manipulating a document through a flat view of
its text this morning, and got back a variety of answers that didn't
quite seem to do what I was looking for. I'm guessing that the reason
for that disjunction is that I didn't make myself very clear. I'll try
to tell this as a story, and see if it helps.
When I first got into hypertext, I was using HyperCard. My method of
creating hypertexts was pretty simple. First, I created a stack of
cards which had text in them. Some of those cards had an understood
sequence, because of the limitations of a card approach on a 512x342
screen, but basically I wrote out a collection of small texts with
titles but minimal internal structure.
In order to turn text into hypertext, I ran a script that searched
through all the texts and added links based on keywords. Effectively it
was marking up the document, but it was invisible markup. I made
hyperlinked text bold to distinguish it. Eventually I added a script to
convert these stacks into HTML for broader distribution, but it did so
by adding textual markup explicitly to the text fields - pretty ugly
when I'd been used to pristine text. Then it dumped the fields to
files, and I threw away the modified stacks without saving.
None of this was brilliant programming, but it did very nicely for 34K
of stack overhead. As I shifted gears toward HTML - heck, the world was
moving toward this markup stuff - I still used a similar approach. I'd
write up a document in plain text, and then mark it up. Eventually I
started putting the text into a template with headers and footers before
marking it up, but it was still a pretty simple and straightforward
process, one that echoes (I think) the typesetting usage of markup.
Textual stories appear, and markup gets added.
Over time, I've come to write the markup along with the text, though
it's not exactly fun. I've been looking for an editor that would mesh
well with my style of marking up documents, more or less a markup
painter, but it seems my perspective must be unusual. (Topologi's
editor is extremely cool, probably the best thing I've found.)
In my programming, I've wanted to take a similar approach. Regular
Fragmentations  is a 'painter' of a sort, though it applies rather
rigid rules to information, sort of a paint by number. A number of my
projects work with just the marked-up text of a document directly, and
also make changes based on sets of rules, though that's as much
remodeling as painting.
What I'm finding as I build my applications is that most of the toolkits
out there assume that the markup process is already done, and that
content should or must be handled as individual nodes of content defined
by the markup. There is little or no concept of the content as a
coherent whole separate from the markup.
(Although such a concept is often cited as key differentiator of
'documents' rather than data, I suspect that the relative potential
chaos of documents is a more important differentiator. I don't have
much trouble marking up tables of repetitive information if they're
presented to me as text with headers.)
The toolsets I find readily available are delighted to process nodes,
but they have very little concept of a text separate from or prior to
those nodes. There's not much notion of searching that text or
processing that text in a way which modifies the nodes underneath -
perhaps deleting, adding, or changing - determined primarily by the
contents of the text.
The only spec I've really seen try to address this notion of the text in
a document is XPointer, and I'm afraid XPointer is snarled in the same
thing I am: everyone's working with nodes these days.
The dominant view at the moment seem to be that documents are composed
of nodes, and it's the nodes that are primary, not the content of those
nodes. Document structures are containers filled with content that must
fit precisely, not information added to a document to reflect its
I guess that's fine for what most people are doing, but it also means
that I'll have to roll my own tools. A content-first view doesn't seem
very popular in the XML world at present, and I can't say I see that
changing. Markup now seems to come first.
 - http://simonstl.com/projects/ht22/
 - http://www.topologi.com
 - http://simonstl.com/projects/fragment/
Ring around the content, a pocket full of brackets
Errors, errors, all fall down!