Spell check XML files without VIM syntax problems

I’ve been using XMLMind’s XML Editor lately to write some documentation in the Docbook XML format. The editor has very good support and I’m very glad to be using it. There’s one catch though, and it’s to do with the spell check support. XMLMind have implemented their own system for which there are no Greek dictionaries available. Not having the time to go and create one myself I looked at other solutions.

I have the Hunspell dictionaries on my system anyway but the command line program that comes with Hunspell garbles Greek characters in the terminal. I don’t know what that’s about and Google didn’t come up with any quick solutions. So my next try was using spell check support in VIM 7 and after a bit of trial and error I’m happy with the result. So here’s what I did:

  1. Download a Myspell compatible Greek dictionary word list and affix file, either the original one by Steve Stavropoulos or the one by Dimitrios Gianitsaros which combines the Greek words and English words as well as provides for Greek fully capitalized words with no accents.
  2. :mkspell el /path/to/hunspell/dictionaries/el_GR
  3. make sure the resulting .spl file gets copied to ~/.vim/spell
  4. create the file ~/.vim/after/syntax/xml.vim as described below.
  5. Open an XML file and execute :set spell spelllang=el,en or something to that effect.

One little problem with spell checking XML files in VIM is that the syntax highlighting interferes in a sub-optimal way. For example by default the text content of elements is not spell checked. Another example is that URL’s in href and xmlns attributes are reported as mistakes. These problems can be solved by the following syntax commands:

syn match xmlHref +href="[^"]*"+ contained contains=xmlAttrib,@NoSpell
syn match xmlXmlns +xmlns\(:[a-z]*\)\?="[^"]*"+ contained contains=xmlAttrib,@NoSpell
syn cluster xmlStartTagHook add=xmlHref
syn cluster xmlStartTagHook add=xmlXmlns
syn spell toplevel

Now you should be good to go with spell checking XML in VIM. Another little detail about my work-flow is that I have configured gvim as a helper application in XMLMind XML Editor and that permits me to type “Ctrl-Shift-D” while editing an XML file and get the file opened in VIM ready to be spell checked. After my corrections the file is reloaded in the XML editor.

how to insert any unicode character over VNC

Been using GRNET’s ViMA service a lot lately and sometimes it happens while using a vm’s console via VNC that I need to input a unicode character in a file. The VNC viewer applet that is provided, as well as any vncviewer I’ve tried can’t seem to manage inputing these characters directly via the keyboard. Here’s how I do it:

  1. Install Vim on the vm with apt-get, yum or whatever

  2. Localy find the unicode codepoint of the desired character:

    echo "Ψ" | iconv -f utf-8 -t iso8859-1 --unicode-subst="<U+%04X>"

    In this case it prints <U+03A8>

  3. Open the file you want to input into on the vm with vim.
  4. In input mode, type Control-V, u, and the four hexadecimal digits (i.e. 0,3,A and 8)
  5. Viola!

using forward slash in unix filenames

Ok, I was automating the creation of some ogg files based on title’s appearing in a text file. Doing it myself, with mplayer and oggenc, I hadn’t taken into account many of the caveats that preexisting ripping/encoding tools take care of, like the problem of forward slashes in filenames stored on a unix filesystem. But the good thing with doing it yourself is that you don’t have to automatically adopt the presuppositions that appear in preexisting tools.

One such presupposition is that you have to replace special characters in filenames, like quotes or forward slashes, so that they can be accessible to shell users. Well I have to say that shells and file managers are pretty advanced these days and you don’t really have to replace anything anymore. The only exception is forward slashes, which absolutely cannot appear in a filename on a unix system. Most tools replace the slash with an underscore character. I find that kind of lame.

In Unicode, forward slash is the character ‘SOLIDUS’ (U+002F), although according to wikipedia, calling the http://en.wikipedia.org/wiki/Slash_(punctuation) ‘solidus’ “contradicts long-established English typesetting terminology”.

So, since replacement of the slash is required in filenames, we can keep the visual nature of the forward slash by using the typographic solidus character as a replacement, which in unicode is called ‘SOLIDUS’ (U+002F). Here is the difference between the two (in the typeface your browser is using):

Slash (U+002f) Solidus (U+2044)

In VIM you can input the solidus by typing Ctrl+V, u, 2044 in command mode.