Next: Programmer i18n, Previous: I18N and L10N, Up: Internationalization [Contents][Index]
gettextgawk uses GNU gettext to provide its internationalization
features.
The facilities in GNU gettext focus on messages: strings printed
by a program, either directly or via formatting with printf or
sprintf().86
When using GNU gettext, each application has its own
text domain.  This is a unique name, such as ‘kpilot’ or ‘gawk’,
that identifies the application.
A complete application may have multiple components—programs written
in C or C++, as well as scripts written in sh or awk.
All of the components use the same text domain.
To make the discussion concrete, assume we’re writing an application
named guide.  Internationalization consists of the
following steps, in this order:
guide’s components
and marks each string that is a candidate for translation.
For example, "`-F': option required" is a good candidate for translation.
A table with strings of option names is not (e.g., gawk’s
--profile option should remain the same, no matter what the local
language).
"guide") to the gettext library,
by calling the textdomain() function.
.po)
and translations are created and shipped with the application.
For example, there might be a fr.po for a French translation.
guide is built and installed, the binary translation files
are installed in a standard place.
gettext
to use .gmo files in a different directory than the standard
one by using the bindtextdomain() function.
guide looks up each string via a call
to gettext().  The returned string is the translated string
if available, or the original string if not.
In C (or C++), the string marking and dynamic translation lookup
are accomplished by wrapping each string in a call to gettext():
printf("%s", gettext("Don't Panic!\n"));
The tools that extract messages from source code pull out all
strings enclosed in calls to gettext().
The GNU gettext developers, recognizing that typing
‘gettext(…)’ over and over again is both painful and ugly to look
at, use the macro ‘_’ (an underscore) to make things easier:
/* In the standard header file: */
#define _(str) gettext(str)
/* In the program text: */
printf("%s", _("Don't Panic!\n"));
This reduces the typing overhead to just three extra characters per string and is considerably easier to read as well.
There are locale categories
for different types of locale-related information.
The defined locale categories that gettext knows about are:
LC_MESSAGESText messages.  This is the default category for gettext
operations, but it is possible to supply a different one explicitly,
if necessary.  (It is almost never necessary to supply a different category.)
LC_COLLATEText-collation information (i.e., how different characters and/or groups of characters sort in a given language).
LC_CTYPECharacter-type information (alphabetic, digit, upper- or lowercase, and
so on) as well as character encoding.
This information is accessed via the
POSIX character classes in regular expressions,
such as /[[:alnum:]]/
(see Bracket Expressions).
LC_MONETARYMonetary information, such as the currency symbol, and whether the symbol goes before or after a number.
LC_NUMERICNumeric information, such as which characters to use for the decimal point and the thousands separator.87
LC_TIMETime- and date-related information, such as 12- or 24-hour clock, month printed before or after the day in a date, local month abbreviations, and so on.
LC_ALLAll of the above.  (Not too useful in the context of gettext.)
NOTE: As described in Locales, environment variables with the same name as the locale categories (
LC_CTYPE,LC_ALL, etc.) influencegawk’s behavior (and that of other utilities).Normally, these variables also affect how the
gettextlibrary finds translations. However, theLANGUAGEenvironment variable overrides theLC_xxxvariables. Many GNU/Linux systems may define this variable without your knowledge, causinggawkto not find the correct translations. If this happens to you, look to see ifLANGUAGEis defined, and if so, use the shell’sunsetcommand to remove it.
For testing translations of gawk itself, you can set
the GAWK_LOCALE_DIR environment variable. See the documentation
for the C bindtextdomain() function and also see
Other Environment Variables.
For some operating systems, the gawk
port doesn’t support GNU gettext.
Therefore, these features are not available
if you are using one of those operating systems. Sorry.
Americans use a comma every three decimal places and a period for the decimal point, while many Europeans do exactly the opposite: 1,234.56 versus 1.234,56.
Next: Programmer i18n, Previous: I18N and L10N, Up: Internationalization [Contents][Index]