XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] RE: Two hugely significant conversions that XML parsers do

It's less important than it used to be. A very high proportion of online text is now UTF-8 and it'd be frankly weird to send data over the wire with Microsoft CRNL line endings. 

On Thu, Apr 15, 2021 at 11:36 AM Roger L Costello <costello@mitre.org> wrote:

Hi Folks,

 

I find it totally fascinating that XML parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters. Applications that operate (reason) on the post-parsed input know exactly what they are working on.

 

Wicked neat!

 

Do other data format specifications specify that their parsers perform similar conversions?

 

Do JSON parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?

 

Do CSV parsers (Comma Separated Value parsers) convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?

 

Do YAML parsers (Yet Another Markup Language parsers) convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?

 

Do Protocol Buffer parsers convert its input into a standard character encoding scheme (Unicode) and line endings to linefeed characters?

 

Or, does XML stand apart from other text data formats in this regard?

 

/Roger

 

 

From: Roger L Costello <costello@mitre.org>
Sent: Thursday, April 15, 2021 11:25 AM
To: xml-dev@lists.xml.org
Subject: Two hugely significant conversions that XML parsers do

 

Hi Folks,

 

An XML parser does two hugely significant conversions.

 

Suppose we provide input to an XML parser. Here are the conversions that the parser does to the input:

 

1. The parser converts the characters in the input to Unicode.

 

2. The parser converts line endings in the input to a linefeed character (hex 0A).

 

What are the consequences of these conversions?

 

Answer: your applications can operate on the parsed input with the understanding that the characters are Unicode and the lines end with a linefeed character.

 

I like the term that Amy used: your applications can _reason_ about the parsed input with the understanding that the characters are Unicode and the lines end with a linefeed character.

 

/Roger



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS