Spell check XML files without VIM syntax problems

I’ve been using XMLMind’s XML Editor lately to write some documentation in the Docbook XML format. The editor has very good support and I’m very glad to be using it. There’s one catch though, and it’s to do with the spell check support. XMLMind have implemented their own system for which there are no Greek dictionaries available. Not having the time to go and create one myself I looked at other solutions.

I have the Hunspell dictionaries on my system anyway but the command line program that comes with Hunspell garbles Greek characters in the terminal. I don’t know what that’s about and Google didn’t come up with any quick solutions. So my next try was using spell check support in VIM 7 and after a bit of trial and error I’m happy with the result. So here’s what I did:

  1. Download a Myspell compatible Greek dictionary word list and affix file, either the original one by Steve Stavropoulos or the one by Dimitrios Gianitsaros which combines the Greek words and English words as well as provides for Greek fully capitalized words with no accents.
  2. :mkspell el /path/to/hunspell/dictionaries/el_GR
  3. make sure the resulting .spl file gets copied to ~/.vim/spell
  4. create the file ~/.vim/after/syntax/xml.vim as described below.
  5. Open an XML file and execute :set spell spelllang=el,en or something to that effect.

One little problem with spell checking XML files in VIM is that the syntax highlighting interferes in a sub-optimal way. For example by default the text content of elements is not spell checked. Another example is that URL’s in href and xmlns attributes are reported as mistakes. These problems can be solved by the following syntax commands:

syn match xmlHref +href="[^"]*"+ contained contains=xmlAttrib,@NoSpell
syn match xmlXmlns +xmlns\(:[a-z]*\)\?="[^"]*"+ contained contains=xmlAttrib,@NoSpell
syn cluster xmlStartTagHook add=xmlHref
syn cluster xmlStartTagHook add=xmlXmlns
syn spell toplevel

Now you should be good to go with spell checking XML in VIM. Another little detail about my work-flow is that I have configured gvim as a helper application in XMLMind XML Editor and that permits me to type “Ctrl-Shift-D” while editing an XML file and get the file opened in VIM ready to be spell checked. After my corrections the file is reloaded in the XML editor.

a way to track office-suite documents with VCS?

Nice to hear Sofia‘s happy with dropbox as a solution for finding the latest version of her dissertation. I was thinking about what she could do if she had to track older or alternative versions of her dissertation, perhaps even offline, instead of only the latest version in Dropbox. Of course I thought of Mercurial which I use for my SCM needs. The problem with Mercurial is that it is good at working with text files, not binary files like those of popular office suites. So I searched a bit and found a possible solution (although I haven’t implemented and tested it yet):

David Heffelfinger posted about OpenOffice.org Document Version Control With Mercurial in which he writes about using the flat version of the OpenOffice.org ODT format. According to him this solution is not satisfactory because even in the flat format, a single letter change will change all sorts of metadata in other parts of the file. Due to this it’s hard to distinguish between the important changes between two versions and the inconsequential. Instead of going this way he is currently using the oodiff hack from the Mercurial website.

Obviously a hack would not be an acceptable solution for Sofia to adopt, except if she desperately needed the ability to track changes and could live with using OpenOffice instead of Word (or whatever she’s using). But a comment on David’s post refers to an interesting tool called Beyond Compare which seems to be able to generate differences both for Microsoft Office XML files and OpenOffice ODT files! They claim integration with popular DVCS‘s. I wonder how easy it would be to integrate Beyond Compare into say TortoiseHG. Has anyone done this? Maybe I will try it sometime soon.

Any other suggestions for tracking office suite documents with a DVCS?