i18n-ing a Silva extension

Silva i18n procedures

Introduction

The Silva i18n procedure follows the Zope 3 internationalization effort, like using the messages to translate as message ids, and re-using its i18n tools.

Setting up a sandbox

Requirements:

I18n-ing

  • Mark strings for translation in Page templates, Python code, and Zope Python scripts. In the first i18n pass i18ndude (option: find-untranslated) can be used, but this gives a lot of false positives and also fails if translatable strings contain unicode characters. In new development strings should be marked immediately.

  • Create an i18n directory in the extension product directory. In this directory the <domain>.pot (message template) and translation .po files will be stored. The naming convention for domain is to use the product name in lower case, using underscores to separate words, eg. silva_document or silva_external_sources. Dashes don't work!

  • Create a message template file

    i18nextract -d <domain> -p <product dir> -o i18n

    Check the template file and change the Language-team headers after creation. Make sure your editor and/or terminal is set to UTF-8 encoding (or use poedit)! I quote Martijn: 'There can be only one encoding'. More specific documentation follows.

Translating

If there is no existing translation, copy the <domain>.pot to <domain>-<languagecode>.po and add the the Language-code, Language-name and domain metadata. See the example below for German:

"Language-code: de\n"
"Language-name: Deutsch\n"

"Domain your_silva_extension\n"

For more information about domain names see the domain names section.

If the translation file is out of sync with the template file, use:

i18nmergeall -l i18n

to get newly generated msgids and remove obsolete ones in the .po file.

Also here the encoding warning applies: if you edit the translation files,
make sure your editor and/or terminal is set to UTF-8 encoding!

(Real world) Examples

See http://plone.org/documentation/howto/I18nForDevelopers and http://zopewiki.org/ZopeInternationalization for a good introduction to i18n.

Page templates

HTML example:

<html xlmns="http://www.w3.org/1999/xhtml"
   xmlns:tal="http://xml.zope.org/namespaces/tal"
   xmlns:i18n="http://xml.zope.org/namespaces/i18n"
   i18n:domain="silva">
 <body>
   <h1 i18n:translate="">A header</h1>
   <a title="Go to the Infrae website"
     i18n:attributes="title"
     href="http://www.infrae.com"
     i18n:translate="">The Infrae website</a><br />
   <span i18n:translate="">Your username is
     <tal:block tal:content="here/REQUEST/AUTH_USER/getUserName" i18n:name="username">username</tal:block>.
   </span>
 </body>
 </html>

This piece of TAL contains all the necessary information to internationalize pagetemplates:

  • The opening tag (<html>) contains an XML namespace declaration for the i18n namespace and an i18n:domain (both mandatory). The namespace declaration needs to set so Zope knows what to do with the i18n: directives and the domain needs to be set so Zope (and the i18n extract tools) know which dictionary to use. Important: the i18n namespace has to be declared with http://xml.zope.org/namespaces/i18n! Other URI's are not recognized by the extraction tools and therefore no messages are created!

  • The <h1> element is an example of basic content translation. Adding empty i18n:translate attributes like this one will be the most common operation that will be performed on pagetemplates by developers. In certain cases the attribute can have a messageid as content, this might improve development in certain cases, but makes maintaining translations a lot harder, so should be avoided if possible.

  • The <a> element contains, apart from an i18n:translate like the <h1> has, an i18n:attributes directive, which makes Zope translate the contents of the attribute(s) mentioned in the contents (; seperated).

  • The <span> element is an example of an i18n:translate with a string interpolation: its content is partially static, partially dynamic. This can be solved by creating a new element inside the element with the translatable content, which contains the dynamic part and an 'i18n:name' directive. This string will be made available in the .pot and .po files later on with the name as a dynamic element, in our case the .pot and .po files will contain the following messageid declaration:

    msgid "Your username is ${username}."
    

  • Zope python scripts and Product code Zope Python scripts (both ZODB and fs ones) should get the same treatment: developers will have to import a function and make sure all string literals that do not get used from code anywhere are passed into that function so they get wrapped with some sort of object.

Let's examine the import:

# we import a MessageIDFactory which generates MessageIDUnicode objects
from Products.Silva.i18n import translate as _

The 'translate' function imported is either a reference to the MessageIDFactory of PlacelessTranslationService, or, if importing PlacelessTranslationService failed, a function that returns whatever came in. This function must be imported with the name '_', since the i18n extract tools search for those calls to generate the .pot files.

Note that this imports the MessageIDFactory for the 'silva' domain, and with a specific switch to make all the translations unicode strings. Take care that you don't use e.g. use str() calls on the objects, that may potentially break your code if the translated string contains non-ascii characters.

Some examples:

from Products.Silva.i18n import translate as _

# this method returns what once used to be a string-literal, now it returns
# a MessageIDUnicode object that gets converted to a translated string
# somewhere by Zope before displaying it
def foo():
    return _("foo")

This is the most simple case, where a simple string-literal gets wrapped with a messageid and the code doesn't process it anymore. Zope will, before viewing it, convert the MessageID to a translated (unicode in our case) string so it will be displayed translated to the user.

A more complicated case:

from Products.Silva.i18n import translate as _

# sometimes you want to generate explicit strings so you can use them in
# string operations later on, to convert a MessageIDUnicode to a string object
# use the 'unicode' call (in Silva everything is stored as unicode)
def bar():
    msg = ''
    for i in range(10):
        msg += unicode(_("bar"))
    return msg

If you want to perform string operations such as adding other strings, joining elements of an array which contains MessageID objects, etc. you will have to convert the MessageID objects to unicode strings first. Calling 'unicode()' on them this way results in a plain, translated, unicode string which you can use to do with whatever you're used to. Obviously there's no need to add unicode() calls everywhere, only if you (or some other bit of the code) want to process them it makes sense to call it explicitly.

String interpolations don't work the same anymore. Consider this:

# this will fail
msg = _('Foo %s baz' % bar)

It is not possible for the extraction tools to get a messageid created from this bit of code, since the string isn't a string literal but contains dynamic elements. Therefore string interpolation requires a somewhat different syntax, that allows doing the interpolation later in the process (on stringification):

from Products.Silva.i18n import translate as _

# string interpolations are not possible anymore, since that would make the
# translateble string dynamic (it can't be found in the message catalog if it's
# not a static string), to do this we use the following syntax
def baz(some_var):
    msg = _('You entered the value ${var} for some_var')
    msg.mapping = {'var': some_var}
    return msg

If the python code happens to be in a Zope Python script, we need to use the set_message method on MessageID to prevent Zope security kicking in:

msg = _('You entered the value ${var} for some_var')
msg.set_mapping({'var': some_var})

Formulator forms

The form XML needs just one elements on the form:

<i18n_domain>silva</i18n_domain>

The extraction tool will automatically extract titles and descriptions from the *.form files. Message ids will be constructed in that domain.

Any error messages explicitly defined in the form will be looked up in the form domain.

In Formulator 1.7 and before this all worked differently, but this is all you'd need to do now.

Domain names

Each extension gets a new domain, see the rule for this above.

If you have text in a form or Python code, the domain is specified there; explicitly in the form, and usually implicitly in the Python code, in which case (at the time of writing), the domain will be 'silva'. You can also explicitly supply a domain in Python code: _('My message', 'explicit_domain').

If there is a domain specified during message id construction in the form or Python code, the domain given in the page template that shows this message is ignored; instead, correctly, the domain from the message id is used in the lookup procedure.

Pitfalls

There are a number of issues, especially in Product code and Python scripts, that require special attention. These are mainly cases where translating seems appropriate, usually involving an innocent looking plain string literal, but in which it turns out that the string is later used for another purpose than displaying in the public view (sometimes strings get even shown in the public view and used for some other purpose).

Even though the work is generally quite boring, it requires you to pay attention at all times!

The problems we ran into so far:

  • String literals were used as a dictionary key, attribute name, etc.
  • String literals were used as class names for the CSS
  • String literals were used to send emails to multiple people (which is actually a bit hard to solve: we present those strings directly to users, so ideally we would want to translate them, but the problem is that there are more recipients that potentially speak different languages, for now I think we should just skip those cases and consider a solution later on)
  • The resources/sidebar_template.pt is cached. Translating text in it will result in cached translation, which will result in weird effects. We need to refactor this so that translations are not cached.

In all the mentioned cases, translating will break (usually with very uninformative tracebacks!) the code, so translation should not be done. However, it is very easy to make a mistake, so make sure to test often and thoroughly.

If you run into a bug that you have just created, and the amount of code changed is small, that bug is probably way easier to locate then when you run into one that was created a while ago, by someone else or in a large set of changes.

Localizing dates and times

We haven't done so yet. This is how Plone does it:

http://plone.org/development/teams/i18n/datetime

Zope 3 has another way again which we researched that may be more powerful.