OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] Validating tab-delimited text files against an XML Schema?

Hi Roger,
    Here are the approaches I can think of, to validate such data:

Both of these approaches, require transformation of tabbed text file
into an XML document, and writing a XML Schema for it (I imply XSD).

1) Convert the tabbed text file, into a well structured XML document as follows,


(For each record the first field has type xs:integer, second has type
xs:string and third field has type xs:decimal)

The schema fragment (an XSD 1.0 schema) for this would be something
like following,

<xs:element name="data">
             <xs:element name="rec" maxOccurs="unbounded">
                             <xs:element name="field1" type="xs:integer"/>
                             <xs:element name="field2" type="xs:string"/>
                             <xs:element name="field3" type="xs:decimal"/>

(this example has not been tested, for well-formedness and validity)

The transformation of text file in this case, might require a detailed
XSLT stylesheet. I believe, this is the best comprehensive approach to
solve this problem.

2) This is a possible variant of approach 1.

Put a wrapper element (say <data>) around the tabbed data (this is the
only markup you introduce for the text file). This will make the
transformation of tabbed text file easier. This will result in an XML
document like following,

1	xabc	3.5
2	pyz	6.4

(you can do this transformation either via XSLT, or by another means)

And write a XSD 1.1 schema for this as follows:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";>

   <xs:simpleType name="TabbedData">
        <xs:restriction base="xs:string">
            <xs:assertion test="every $rec1 in (for $rc in
tokenize($value, '\r?\n') return $rc)[position() gt 1]
		                           satisfies (
							 (tokenize($rec1, '\t')[1] castable as xs:integer) and
							 (tokenize($rec1, '\t')[3] castable as xs:decimal)

   <xs:element name="data" type="TabbedData"/>


This approach, allows you to make the XML document simpler. But it
requires, careful design of <assertion> within the schema document.

On Sat, Sep 22, 2012 at 6:23 PM, Costello, Roger L. <costello@mitre.org> wrote:
> Hi Folks,
> I have a customer who has tab-delimited text files.
> My customer writes:
>     It's structured data, but it isn't XML.  I would still
>     like to specify the data using XML Schema, and I
>     would still like to validate the data as if it were
>     an XML document.  Is this reasonable?
> The approach that I can think of is to write an XSLT program that transforms the tab-delimited text into an XML document and then validate the XML document against the XML Schema.
> Are there other approaches?
> /Roger

Mukul Gandhi

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS