Official website for Linux User & Developer
FOLLOW US ON:
Jul
22

Internationalise your apps using Qt

by Kunal Deo

After putting so much effort into creating an application it would be shame to see it not being used just because it was only available in English. The bottom line is; most people pay more attention and give more respect to a product which is available in their own language.If you want a global audience for your software, it is very important that you localise your application for your users. Here’s how…

This article originally appeared in issue 89 of Linux User & Developer magazine. Subscribe and save more than 30% and receive our exclusive money back guarantee – click here to find out more.

Get your first digital copy of the magazine for iPhone and iPad free – just search for ‘Linux User’ on the Apple App Store now! twitter follow us

Let’s admit it, writing applications is a complex thing to do; it requires lot of blood and sweat. After putting so much effort into creating an application it would be shame to see it not being used just because it was only available in English. The bottom line is; most people pay more attention and give more respect to a product which is available in their own language. By its very nature, open source software qualifies as some of the most translated on the planet. If you want to seek a global audience for your software, it is very important that you localise your application for your users. Here’s how…

The Basics
Technical terms involved in internationalisation can be very daunting, so let’s clear these before proceeding. The following are the key components that make up the complete internationalisation framework…

Locale: A locale is the part of a user’s environment that brings together information about how to handle data that is specific to the end user’s particular country, language or territory. The locale is typically installed as part of the operating system. Usually a locale identifier consists of at least a language identifier and a region identifier. It is defined in this format: [language[_territory][.codeset][@modifier]]. For example, British English using the UTF-8 encoding is en_GB.UTF-8. (More on character sets later in this article.) The same code also defines the territorial convention for spelling, currency, date format etc.:

en_US = “color,” mm/dd/yyyy, $1,234.56
en_GB = “colour,” dd/mm/yyyy, £1.234,56

Translation: It simply means the translation of the text into another language. It may not be an accurate word-by-word translation, but it conveys the correct message.
Localisation (aka L10n): Localisation is a combined term used for both translation while conforming to a relevant locale.
Internationalisation (also known as i18n): The term ‘internationalisation’ refers to the process of building a product that is locale-neutral. It means that the application should be adapted to target languages and countries without making changes to the core of the product.
Globalisation: The combination of localisation and internationalisation. It commonly refers to the process of transforming a locale-specific product into one that support all locales.
Character sets/encodings: ‘Character set’ is often used to describe a digital representation of text. A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, or octets, in order to facilitate the transmission of data. The following are the popular character encodings…
8-bit character encodings and multibyte encodings: This Includes Latin-1, Latin-2 and ISO-8859-3 encodings. These collectively support English, Danish, French, German, Italian, Norwegian, Portuguese, Spanish, Swedish, Czech, Hungarian, Polish, Slovak, Cyrillic, Arabic, Greek, Hebrew, Turkish, Baltic and many others. Multibyte character applies when you do not have one-byte-per-character mapping.
Unicode: This is by far the most complete character set produced ever. It contains 96,447 characters from all of the world’s languages. Unicode comes in many flavours, mostly differentiated based on the bytes used. Popular ones are UTF-8, UCS-2 and UTF-16. UTF-8 is a variable-length encoding using 1-4 bytes. Primary applications are for use with XML, XHTML and various other text file formats. UCS-2 provides native encoding on NT-based systems. UTF-16 introduces 16-bit encodings plus 4-byte surrogates. Used for Asian language characters, mathematical symbols, esoteric scripts etc.

Pages: 1 2 3 4 5 6 7 8
  • Tell a Friend
  • Follow our Twitter to find out about all the latest Linux news, reviews, previews, interviews, features and a whole more.
    • http://www.beli.ws/blog Emil Beli

      This is the right way to do it. However it is very very very daunting. Depending on size of application, if smaller, it may be viable to use other methods.
      My personal favorite is to make translation function which recieves ID of a message, and return it in appropriate language. If I say that lang 1 will be english, 2 french and 3 portugese I make ID as:
      LANGID * 100000 + MSGID

      so message translate(1) will give back value, depending on laguage
      100001 = “English”
      200001 = “Français”
      300001 = “Português”