XML.orgXML.org
FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT
OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
Re: [xml-dev] XML spell checking command line tool

Hi Joachim,

> Yes, all text nodes should be spell checked.

This may not be what is desired. In the example below:

<p><strong>B</strong>inary search</p>

there would be two spelling errors if every text node is checked separately.

In the case where the concatenation of all text nodes is checked,
there will be problems like this:

<p>Hello</p><p>Good Bye</p>

This will mark the "word" HelloGood as misspellt.

The correct approach must concatenate the text nodes and know which
elements should insert a space.


If the above two problems are of no concern, then anyone can use
f:spell() from FXSL. There is also an English dictionary of moderate
size. Anyone can use their dictionary for their language instead. The
check is the most simplistic possible (which works for English OK, but
is not so well suited to such languages as Bulgarian or Russian) in
that the dictionary is just a sequence of wordforms. No morphological
rules for wordforms generation from stems. I am not sure whether this
fits German, but if someone is ready with millions of wordforms, the
binary search is fast enough. Because binary search is used, the
wordforms in the dictionary must be sorted, using the same collation
that would be used during runtime by the XSLT processor.

If you are interested and need assistance, please, let me know.



-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play



On Fri, Feb 12, 2010 at 12:57 AM, Joachim Gasch <gasch@ids-mannheim.de> wrote:
>
> Hello Dimitre,
>
> the XML instances are schema valid - so I don't need any checking of tag
> names.
>
> Yes, all text nodes should be spell checked. A XML-aware spell checker
> (ignoring tag names) would be preferable where no previous
> conversion/concatenation of the text nodes into a temporary data format will
> be necessary. The XML files are basically in German language but English is
> also of interest.
>
> Thanks and regards
>
> Joachim
>
>
> -----Ursprüngliche Nachricht-----
> Von: Dimitre Novatchev [mailto:dnovatchev@gmail.com]
> Gesendet: Donnerstag, 11. Februar 2010 18:48
> An: Joachim Gasch; xml-dev@lists.xml.org
> Betreff: Re: [xml-dev] XML spell checking command line tool
>
> Once again: what this could possibly mean?
>
>  - Should any text node be spelled checked, or should also element
> names be spell-checked?
>  - Using English, or another language?
>  - Should the concatenation of all text nodes be processed as one
> whole, or will every text node be processed separately?
>
> All answers matter. I have a spelling checker for English that uses
> big English word-forms dictionary. The speed, when last checked years
> ago was about 3000 words per second. Will probably be able to provide
> the desired command-line tool written completely in XSLT 2.0 if I have
> time and if somebody is interested.
>
> Or, you can just use f:spell() from FXSL (at:
> http://fxsl.cvs.sourceforge.net/viewvc/fxsl/fxsl-xslt2/f/func-Spell.xsl?revi
> sion=1.2&view=markup
> ) and DIY.
> --
> Cheers,
> Dimitre Novatchev
> ---------------------------------------
> Truly great madness cannot be achieved without significant intelligence.
> ---------------------------------------
> To invent, you need a good imagination and a pile of junk
> -------------------------------------
> Never fight an inanimate object
> -------------------------------------
> You've achieved success in your field when you don't know whether what
> you're doing is work or play
> -------------------------------------
> I enjoy the massacre of ads. This sentence will slaughter ads without
> a messy bloodbath.
>
>
> On Thu, Feb 11, 2010 at 7:56 AM, Joachim Gasch <gasch@ids-mannheim.de>
> wrote:
>>
>> Suppose you have a huge amount of XML files.
>> What I'm looking for is a command line operated orthographic spell checker
>> for bulk processing of the XML elements contents.
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Dimitre Novatchev [mailto:dnovatchev@gmail.com]
>> Gesendet: Donnerstag, 11. Februar 2010 15:43
>> An: Joachim Gasch; xml-dev@lists.xml.org
>> Betreff: Re: [xml-dev] XML spell checking command line tool
>>
>> > I am looking for a XML spell checking command line tool.
>>
>>
>> What is that supposed to be?
>>
>>
>>
>> --
>> Cheers,
>> Dimitre Novatchev
>> ---------------------------------------
>> Truly great madness cannot be achieved without significant intelligence.
>> ---------------------------------------
>> To invent, you need a good imagination and a pile of junk
>> -------------------------------------
>> Never fight an inanimate object
>> -------------------------------------
>> You've achieved success in your field when you don't know whether what
>> you're doing is work or play
>> -------------------------------------
>> I enjoy the massacre of ads. This sentence will slaughter ads without
>> a messy bloodbath.
>>
>>
>>
>> On Thu, Feb 11, 2010 at 3:07 AM, Joachim Gasch <gasch@ids-mannheim.de>
>> wrote:
>> >
>> > Dear collgeagues,
>> >
>> > I am looking for a XML spell checking command line tool.
>> >
>> > Any recommendations and experiences are very appreciated.
>> >
>> >
>> > Thank you very much in advance!
>> >
>> > Best regards
>> >
>> > Joachim
>> >
>> >
>> >
>>
> ----------------------------------------------------------------------------
>> > Joachim Gasch M.A.
>> > Institut für Deutsche Sprache
>> > R5 6-13, D-68161 Mannheim
>> > E-Mail: gasch@ids-mannheim.de
>> > Web: http://www.ids-mannheim.de
>> >
>>
> ----------------------------------------------------------------------------
>> >
>> >
>> >
>> > _______________________________________________________________________
>> >
>> > XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> > to support XML implementation and development. To minimize
>> > spam in the archives, you must subscribe before posting.
>> >
>> > [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> > Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>> > subscribe: xml-dev-subscribe@lists.xml.org
>> > List archive: http://lists.xml.org/archives/xml-dev/
>> > List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>> >
>>
>> _______________________________________________________________________
>>
>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>> to support XML implementation and development. To minimize
>> spam in the archives, you must subscribe before posting.
>>
>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>> subscribe: xml-dev-subscribe@lists.xml.org
>> List archive: http://lists.xml.org/archives/xml-dev/
>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>>
>>
>>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>
>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
> subscribe: xml-dev-subscribe@lists.xml.org
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS