Weblate language codes don't line up with Accept-Language header requirements
When pulling translations from Weblate, we get a range of different language codes. Specifically, this is the list of languages supported by Weblate:
ach af ak sq am anp ar ar_DZ ar_MA an es_AR hy as ast de_AT ay az ba eu bar be be_Latn bn bn_BD bn_IN brx bs bs_Cyrl bs_Latn br bg my ca km ch chr hne cgg zh zh_HK zh_Hans zh_Hant ksh kw cr hr cs da doi nl nl_BE dz en en_AU en_CA en_IE en_PH en_ZA en_GB en_US eo et fo fil fi frp fr fr_CA fy fur ff gd gl ka de el kl gu gun ht ha haw he hi hu is ig id ia ga it ja jv kab kn ks csb kk rw tlh tlh-qaak kok ko ku ckb ky lo la lv li ln lt jbo nds lb mk mai mg ms ml mt mnk mi arn mr mni mn me mfe nqo nah nap ne se no nb_NO nb nn ny oc or oj os pap nso fa pms pr pl pt pt_BR pt_PT pa ps ro rm ru sa sat sc sco sr sr_Cyrl sr_Latn sh sn szl sd si sk sl so son st es es_US es_MX es_PR su sw sv de_CH tl tg ta tt te th bo ti ts tr tk ug uk hsb ur ur_PK uz uz_Latn ca@valencia ve vec vi wa cy vls wo sah yi yo yue zu
The requirements for the Accept-Language
header used for content negotiation and to automatically show the correct language to the user are documented in RFC 7231 Section 5.3.5 (which links to some other RFCs). Notably, these tend to be a country code (e.g. en
or de
) optionally followed by a -
and then region or script (e.g. zh-Hant
or zh-Hans
, or perhaps zh-CN
).
Weblate is thus outputting languages such as pt_BR
whereas Apache/HTTP expects zh-br
. Technically according to RFC4647 Section 2.1 these can be upper case or lower case (as shown by the use of 1*8ALPHA
, where ALPHA
is defined as A-Z, a-z
in RFC4234 Appendix B.1.
Weirder still, when I experiment with manually changing pt_BR
to pt-BR
in my apache webroot and config:
.../webroot/index.html.pt-BR -> pt_BR/index.html
-
.../webroot/index.html
typemap, changept_BR
withpt-BR
- Apache2 config
AliasMatch
andSetEnvIf
directives are changed to set prefer-language based on the URL, where the URL starts with/pt-BR/
This works fine, but only if the Apache2 config and the URL are /pt-br/
(i.e. lower case). If they are upper case (e.g. pt-BR
then it fails to negotiate the language correctly.
@eighthave: Do you have experience with different translation systems (such as Weblate/Translate toolkit) specifying different language codes compared to what the tooling you are using expects?