RE: Word processors and semantic content

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

From: "Laurens van den Oever" <laurens@xopus.com>
To: <xml-dev@lists.xml.org>
Date: Fri, 15 Feb 2008 18:28:34 +0100

Bruce Cox wrote:
>  
> While the software itself was problematic, the bigger issue was that
the 
> person creating the document did not use the styles (structures) 
> appropriately (abstract tagged as the last claim, for example).

If people can do something 'wrong', they will. The environment allowed
them to tag the abstract as the last claim. They are just creating a
nice looking document. They can because there is no relation between the
styles and the way the environment responds other than the way things
are styled.

> These were folks who knew all about the structure of a patent
application
> (professional clerks in very large IP law firms), but had no economic
> motivation to be careful with the markup.

One approach I've heard of is actually paying authors to create valid
XML.

But I don't think such motivation needs to be economic. It is our
experience that people will work in a controlled environment if it helps
them do their job. The available structure can actually be used to help
people, to guide them while editing. If the complexities of the
structure are hidden and the benefits of the structure are emphasized
you will get people motivated to create structured content.

> Creating a successful patent application is the art of conforming to
the 
> rules of the MPEP, correctly using language to which the courts have 
> assigned specific interpretations, disclosing the invention to one of 
> ordinary skill in the art while escaping the attention of competitors,
and 
> still compelling the examiner to allow the application.  How do you
create 
> an authoring tool that enables that process without sacrificing 
> sufficient, correct structure?

I think you shouldn't sacrifice structure.

The key feature is prevalidation (the ability to hide all actions that
would render the document invalid thus preventing the need for the user
to fix validation problems). This is the main reason non-technical
people can use Xopus to create structured content.

This allows you to specify a document structure that supports the
logical structure of a patent application. Specify required
substructures and element order, the tool will build and maintain those
structures for the user. Less flexibility in the document type
definition will allow the tool to make more decisions about the
structure which allows the user to focus on the content. For instance,
fix the element order if reordering doesn't add information, the
software will maintain the order.

Now you can use the structure to add context sensitive help to explain
the user the type of information needed in a certain local structure.
Since the information and the structure enforce each other, the rules
that define the structure will be clear to the user. 

The documentation contains the guidelines that can't be specified in the
document type definition. You could add checkboxes for content
properties that are hard to validate automatically (objectivity, use of
correct legal terms, use of jargon, etc) so the user can manually
validate the content according to the rules stated in the documentation.

We have a study on our website [1] of a less complex but somewhat
similar case where law reporters use Xopus to create semantic structured
case digests. 

> the blank-page paradigm would evolve into something friendlier to
explicit
> structure

Many people don't understand folder structures in file dialogs and just
save their files in My Documents. We can't expect people to mentally map
a document structure to their view of the information. This is why
WYSIWYG editors are so popular. In order to be able to create structured
content, people need to see the end result of their actions. But the
whole point of structured content is that it can be more than a
screen/page of flowing text. 

Therefore to get the general public start creating semantic structured
xml documents they need to start using applications that require valid
mixed content xml. Social profiles, wikis or mashups might evolve in
that direction.

> Perhaps Google has the best opportunity to do otherwise, but I've seen

> nothing yet to suggest that they will.

So far they have been big advocates of analyzing unstructured content
using brute force. The result is that their spider currently can't tell
the difference between the main content of a page and the sidebars (or
ads!). My guess is that they will continue to add CPU cycles to solve
that problem.

[1] http://xopus.com/blog/2007/lexisnexis-buttersworth-case-study/

Best,

Laurens van den Oever
CEO
Xopus Company

laurens at xopus.com
http://xopus.com

+31 70 4452345
Waldorpstraat 17G
2521 CA Den Haag
The Netherlands

KvK 27308787

Follow-Ups:
- RE: [xml-dev] RE: Word processors and semantic content
  - From: "Michael Kay" <mike@saxonica.com>

References:
- Word processors and semantic content
  - From: "Laurens van den Oever" <laurens@xopus.com>
- RE: Word processors and semantic content
  - From: "Cox, Bruce" <Bruce.Cox@USPTO.GOV>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]