OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

 


 

   RELAX NG and datatypes make XML generation NP-hard

[ Lists Home | Date Index | Thread Index ]

After reading Henry's posting below, I realized
that the following problem is also NP-hard:

  Given an arbitrary RELAX NG schema, which is
  allowed to contain XML Schema Datatypes, is
  there an XML document that conforms to the
  schema?

Here's the proof:

The 3SAT problem is NP-complete. Any 3SAT problem
can be encoded into a RELAX NG schema, such that
an XML document contains a solution to the 3SAT
problem if and only if the document conforms to
the RELAX NG schema. Thus, solving the XML
generation problem (for RELAX NG with XML Schema
Datatypes) is at least as hard as solving 3SAT.

For example, consider the following 3SAT problem:

  ( x1 OR x2 OR x3 ) AND
  ( x2 OR NOT(x3) OR NOT(x4) ) AND
  ( x3 OR x4 OR NOT(x1) )

This 3SAT problem can be encoded as the following
RELAX NG schema:

<?xml version="1.0" encoding="UTF-8"?>
<element name="three_sat_solution"
         xmlns="http://relaxng.org/ns/structure/1.0";
         datatypeLibrary=
           "http://www.w3.org/2001/XMLSchema-datatypes";>
    <data type="string">
        <param name="pattern">x1=[01],x2=[01],x3=[01],x4=[01]</param>
        <param name="pattern">(.*x1=1.*)|(.*x2=1.*)|(.*x3=1.*)</param>
        <param name="pattern">(.*x2=1.*)|(.*x3=0.*)|(.*x4=0.*)</param>
        <param name="pattern">(.*x3=1.*)|(.*x4=1.*)|(.*x1=0.*)</param>
    </data>
</element>

The following XML document conforms to this RELAX NG schema
and solves the 3SAT problem:

<?xml version="1.0" encoding="UTF-8"?>
<three_sat_solution>x1=1,x2=0,x3=0,x4=1</three_sat_solution>

The following XML document does not conform to the RELAX NG schema
and does not solve the 3SAT problem:

<?xml version="1.0" encoding="UTF-8"?>
<three_sat_solution>x1=1,x2=1,x3=0,x4=0</three_sat_solution>

For more information on 3SAT (a.k.a., 3-CNF), see
http://www.wikipedia.org/wiki/Boolean_satisfiability_problem.

Since RELAX NG has a solid mathematical foundation, I suspect that the
XML generation problem can be solved in linear time when the problem is
restricted to RELAX NG schemas that do not use W3C XML Schema Datatypes.

Best regards,

Bob

<sig name    = 'Bob Lyons'
     title   = 'XML and Java Consultant'
     company = 'Unidex, Inc.'
     phone   = '+1-732-975-9877'
     email   = 'boblyons@unidex.com'
     url     = 'http://www.unidex.com/'
     product = 'XML Convert: transforms flat files to XML and vice versa' />

-----Original Message-----
From: Henry S. Thompson [mailto:ht@cogsci.ed.ac.uk]
Sent: Friday, March 28, 2003 4:59 PM
To: xml-dev@lists.xml.org
Subject: ID/IDREF makes XML generation NP-hard


Somewhat surprisingly, it turns out that answering the question, for an
arbitrary XML DTD, "Are there any valid instances of the document type
defined by this DTD?", is an NP-hard problem.

This is true, because it is possible to encode a 3SAT problem [1] in a
DTD, so that there is an instance of the DTD's document type iff the
problem can be satisfied.  So finding a valid instance is as hard a
problem as solving the 3SAT problem, and 3SAT is known to be NP-hard.

[snip]





 

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 2001 XML.org. This site is hosted by OASIS