I just thought I'd update people about my RAN thought-bubbles, in case anyone is interesting.
RAN has grown to be more of a complete eco-system: this is because there is little point providing an abstract capability without nutting out concrete use for it,
(can bubbles be concrete? Bini shells?) otherwise it risks being more design-by-prejudice.
- a document syntax: RAN
- a validation (implementation) approach: Apatak
- fast-indexes to fragments: RAN Pragma PIs
- embedded fielded data - RAN-CSV
- CRUD operations on row-sets of fielded data - RAN-CSV
- relationship to DOM, XDM - RAN-DOM
The technical consideration of RAN boils down to:
- What
would a markup language look like that made maximal use of the various
parallel processing capabilities of modern CPUs (and GPUs): parallel
threads, SIMD vectors, warps, rather than being rife with gotchas.
- In effect, this means that locating each fragment in a large document RAN document should only involve n/blocksize operations, plus a single vector comparison.
-
how to have a parser/validator that works soon after starting at any arbitrary point in the document?
- What ecosystem of validation etc would this require; how to avoid schemas and still get datatypes; what conventions should be defined to make a compelling package?
The use-cases RAN also addresses, apart from classic XML/SGML use-cases, might be:
- Many Web APIs have taken over from XML delivery, where they send coarse-grain results of simple queries.
- What would a markup language look like that supported this (with configuration and no programming)?
- Big data systems often store data in files with, for example, NDJSON, which is one JSON document per line. They can scan quickly through the file by line to find the appropriate JSON. (CVS could be used too.) These may process gigabytes of JSON per second (Google it!)
- The main requirement is that access to a fragment must take less than one operation per character, and that only necessary data should be parsed.
- Why couldn't there be a markup language that fitted this bill?
- Log and transaction files need append, even if at a coarse grain.
- What would a markup language look like that allowed this? How could one allow coarse CRUD efficiently?
- REDIS is an example of an in-memory non-SQL database. It divides things up into buckets to allow grouped access. You can load data and produce consolidated files.
- Why could't we have something as simple and performant as REDIS, but using a markup language?
- SQLite is a standalone SQL engine with portable files. It i built on top of a paging system so that allow binary-chop (or B-tree access) to aligned milestones, so that only the head of a page needs to be read in to figure out whether to use it or go ahead.
- Why couldn't the SQLite file format be some kind of markup language?
- TAR is an old UNIX archive format still in use: it lets us stream a large file and extract only parts we want.
- What would a markup language look like that automatically allowed TAR-like extraction?
- We may have a CSV tables where we want to access them by date and some field value, without have to load or even parse all that CSV.
- What would a markup language need to allow this without external configuration?
Again, the intent is not to be better in every way than anything that already exists. But to bring markup languages forward 25 years.
Cheers
Rick