Re: [xml-dev] XML Schema generation by machine learning from acorpus of XML instance documents?
Michael, you wrote:
"Listing every (grandparent, self, child) and (preceding-sibling, self, following-sibling) triple might work better."
You mean triples of QNames, right? Like:
(parent, self, child): x:company, x:address, x:street
(preceding-sibling, self, following-sibling): x:street, x:zip, x:country
What do you think about slightly changing/generalizing the description language and using RDF triples:
x:company a yogi:nodeName
x:address a yogi:nodeName
x:street a yogi:nodeName
x:company mk:parent x:address
x:address mk:parent x:street
etc., creating a graph. (Here, the prefixes "yogi" and "mk" represents a corpus of documents and an ontology for describing XML data, respectively.)
Perhaps one might approach insights (not necessarily: conventional schemas) via SPARQL? (Possibly supported by OWL and RDF inference; possibly supported by external information merged into the generated triples.). What do you think?
With kind regards,
Hans-Jürgen
PS: The result might be insights to be used also in other ways than validating documents.
Am Mittwoch, 14. März 2018, 00:35:03 MEZ hat Michael Kay <mike@saxonica.com> Folgendes geschrieben:
> On 13 Mar 2018, at 23:03, Rick Jelliffe <
rjelliffe@allette.com.au> wrote:
>
> We had a program to generate every absolute XPath found in a corpus, then complain (Schematron) if any Xpath was found that was not in that corpus. It did not test for required elements.
>
Listing every (grandparent, self, child) and (preceding-sibling, self, following-sibling) triple might work better.
Michael Kay
Saxonica