XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
An efficient, safe, extensible XML data design ... mimicking in XMLa binary data format

Hi Folks,

Consider this scenario: you have installed a device that monitors the data that flows through a router. With that device you record information about the flow, e.g.,

<Flow-data>
   
<Number-of-bytes>500</Number-of-bytes>
   
<Source-IPv4-address>129.87.74.0</Source-IPv4-address>
   
<Destination-IPv4-address>129.87.75.0</Destination-IPv4-address>
</Flow-data>

Suppose there are many like-minded people who collect flow data: they all want to monitor and record the number of bytes, the source IP address, the destination IP address, and so forth. So someone (or a group of people) create a registry of standard flow information elements:

Information Element

Meaning

1

Number of bytes

2

Source IPv4 address

3

Destination IPv4 address

. . .

 

Given that registry, we can create a different XML design, one where consumers are expected to make use of the registry to process/understand the XML:

<Flow-data>
   
<Information-element>
       
<Code>1</Code>
       
<Value>500</Value>
   
</Information-element>
   
<Information-element>
       
<Code>2</Code>
        
<Value>129.87.74.0</Value>
   
</Information-element>
   
<Information-element>
       
<Code>3</Code>
       
<Value>129.87.75.0</Value>
   
</Information-element>
</Flow-data>

The potential drawback of that design is that it intermingles the data (500129.87.74.0129.87.75.0) with the codes (12 3). So let's create an XML design which separates the semantic description of the data (i.e., the codes) from the data, e.g.,

<Flow-data>
   
<Data-codes>
       
<Code>1</Code>
       
<Code>2</Code>
       
<Code>3</Code>
   
</Data-codes>
   
<Data>
       
<Value>500</Value>
       
<Value>129.87.74.0</Value>
       
<Value>129.87.75.0</Value>
   
</Data>
</Flow-data>

This new design assumes every user will want to collect exclusively the standard flow information elements (number of bytes, source IP address, destination IP address, etc.). But some users (companies) might want to collect non-standard, proprietary flow information elements. So let's add to the XML an indication of whether a code is a standard code or a non-standard, proprietary code:

<Flow-data>
    <Data-codes>
        <Code-specifier>
            <Standard-flow-data-item>true</Standard-flow-data-item>
           
<Code>1</Code>
        </Code-specifier>
        <Code-specifier>

           
<Standard-flow-data-item>true</Standard-flow-data-item>
           
<Code>2</Code>
        </Code-specifier>
        <Code-specifier>

           
<Standard-flow-data-item>true</Standard-flow-data-item>
            <Code>3</Code>
        </Code-specifier>
    </Data-codes>
    <Data>
        <Value>500</Value>
        <Value>129.87.74.0</Value>
        <Value>129.87.75.0</Value>
    </Data>
</Flow-data>

That instance document doesn't show the use of a non-standard flow data item, so let's add one more data item: A company records the non-standard XYZ flow data item (code=15); the metering process measures 23 such items:

<Flow-data>
    <Data-codes>
        <Code-specifier>
            <Standard-flow-data-item>true</Standard-flow-data-item>
            <Code>1</Code>
        </Code-specifier>

        <Code-specifier>
            <Standard-flow-data-item>true</Standard-flow-data-item>
            <Code>2</Code>
        </Code-specifier>
        <Code-specifier>
            <Standard-flow-data-item>true</Standard-flow-data-item>
            <Code>3</Code>
        </Code-specifier>
        <Code-specifier>

           
<Standard-flow-data-item>false</Standard-flow-data-item>
           
<Code>15</Code>
       
</Code-specifier>
    </Data-codes>
    <Data>
        <Value>500</Value>
        <Value>129.87.74.0</Value>
        <Value>129.87.75.0</Value>

       
<Value>23</Value>
    </Data>
</Flow-data>

Next, consumers want to know, "Hey, what authority (company/enterprise) does that non-standard code come from?" So we need to include an indication of the authority defining the flow information element:

<Flow-data>
    <Data-codes>
        <Code-specifier>
            <Standard-flow-data-item>true</Standard-flow-data-item>
            <Code>1</Code>
        </Code-specifier>
        <Code-specifier>
            <Standard-flow-data-item>true</Standard-flow-data-item>
            <Code>2</Code>
        </Code-specifier>
        <Code-specifier>
            <Standard-flow-data-item>true</Standard-flow-data-item>
            <Code>3</Code>
        </Code-specifier>
        <Code-specifier>
            <Standard-flow-data-item>false</Standard-flow-data-item>
            <Code>15</Code>
            <Code-of-the-enterprise-that-defined-the-code>115</Code-of-the-enterprise-that-defined-the-code>
       
</Code-specifier>
    </Data-codes>
    <Data>
        <Value>500</Value>
        <Value>129.87.74.0</Value>
        <Value>129.87.75.0</Value>
        <Value>23</Value>
    </Data>
</Flow-data>

TaDa!

I like that XML design. I think it is very powerful. It uses integer codes (very efficient) rather than string element descriptors (very inefficient and prone to security breaches). It supports extensibility; users are not locked into a limited, rigid set of flow information elements.

Recall that I talked about a "registry of standard data flow information elements." There actually is such a registry: http://www.iana.org/assignments/ipfix/

Also, recall that I talked about (or at least hinted at) a "registry of enterprise identifiers." There actually is such a registry: http://www.iana.org/assignments/enterprise-numbers/

What I have just walked you through is the design approach used by:

IPFIX (IP Flow Information Export): a, IETF standard data format
for representing flow data (http://tools.ietf.org/html/rfc7011)

IPFIX is a binary data format. I have walked you through the equivalent design in XML.

Pretty cool, aye?

/Roger



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS