We should start with either Spanish or Brazilian Portuguese, as we have many people at GitLab that are fluent in these languages.
Who does the translating?
Members of the community have been very active contributing translations, and our partner will also help with the initial translation (they have a team available for this).
Where do we store the translations?
?
How do we retrieve the translations?
?
Original issue
Translation Section Tasks
1: Setup en locale to have a special dates and times section
3: Determine all rails validation error message keys that need to be created
3: Setup special sections to support built in rails validation error messages
2: Determine all rails model keys that need to be created
2: Setup special sections to support rails model and model attributes display translations
3: Update all text and dates and times in app/views
3: Update all text and dates and times in app/controllers
2: Update all text and dates and times in app/finders
3: Update all text and dates and times in app/helpers
3: Update all text and dates and times in app/mailers
3: Update all text and dates and times in app/models
2: Update all text and dates and times in app/services
1: Update all text and dates and times in app/uploaders
1: Update all text and dates and times in app/workers
2: Examine extent of use of text in lib
3: Update all text and dates and times in lib
3: Need to check javascript for text output
2: Need plan for static files in public such as 404
Additional Tasks/Concerns
3: Need to check what is stored in databases that is not user supplied but generated, this text should be translated before being saved
3: Manual QA to ensure all strings are encoded, this task occurs mostly after the main effort is completed and is generally a manual process of using the site under a second language and making sure you don't see any English
Questions:
Do we foresee us translating any upstream gems that are referenced within the application? If yes, then which ones, and are we aware of a strategy on how we might go about internationalizing those gems.
We have not included translation of currencies in our estimation. Are there any points in the application where currency translation might be needed?
Do we intend running automated tests in non-English languages. In that case, a large number of strings in the Rspec & Spinach tests would need to be changed, so that they may get referenced by the i18n translation system. Any take on that?
It would be great to have someone do a detailed review of our task list as we intend to use it in moving ahead with the internationalization effort. And, if we have missed any potential action items within our task list, or over/under estimated any, please do bring it up.
8 of 41 checklist items completed
· Edited by
James Ramsay
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
Sorry, that comment was misleading. I'm afraid I have not enough experience with either GitLab code or translation systems to provide any help implementing this feature. What I was trying to offer is help in the actual translation process (or maybe even at any of the translation section tasks if possible).
I have mostly worked with gettext which you probably know. It is also available for ruby. Recently, I have also used angular-translate which supports asynchronous loading of external JSON files. I am not sure if such an approach is applicable to GitLab.
Additional Tasks/Concerns
I would not store translated strings in database if it is avoidable. Storing text keys instead has two advantages in my opinion:
developers can more easily debug because they do not need to lookup strings of foreign languages
if users switch their language or transfer content to other users (with a different language) localization is easier
I have a lot of experience with Rails' i18n gem, if you need any help setting things up. I'd also recommend i18n-tasks for automating adding of new strings across multiple languages.
Changing every string in the entire codebase over at once isn't going to work, it's going to take at least two months of work to get the entire codebase integrated with i18n. While it is a bit inconsistent to have some text as "normal" strings and others as i18n strings, this doesn't harm the end-user and the site will work exactly the same as before. It wouldn't prevent a release from being shipped.
We're going to have to make some decisions on how we want to organize and handle strings, this is going to a fairly complicated process. Some things to note:
In views, we can use the i18n helper like so: t(".string_name"). The dot at the start of the string indicates that the hierarchy will be implied from the view's location (e.g. in dashboard/projects/index.html.haml this would become {lang}.dashboard.projects.index.string_name).
A cut-off point for when we go from i18n strings that represent their respective text to a simpler name. For example, we would want "Preferences" and "Application theme" to be represented by preferences and application_theme respectively. However, we wouldn't want a longer piece of text, such as "Choose between fixed (max. 1200px) and fluid (100%) application layout." to be represented in that same way, otherwise the i18n string would be particularly long. I'd suggest a limit of three words before switching to a more generic string name, but this is up for others to dispute. It's less readable in the HTML/HAML, but certainly more maintainable than having to edit the YML string every time a minor wording change is wanted.
We'll need thorough documentation of our preferred usage of the i18n library. Among other guidelines, I would strongly recommend using snake_case for i18n strings, and that we split up the locales directory with each language having its own directory. Within said directories, each view namespace would get a separate YML file, like so:
Purely static pages should probably provide different documents for different locales, e.g. static_page.en.html.haml and static_page.es.html.haml
I think we should – for the time being – focus exclusively on supporting EFIGS (English, French, Italian, German, Spanish), and not worry about non-Latin or RTL languages. If we want to try to make sure we're prepared for potential future changes, that's fine, but it shouldn't be a blocker to this being added.
Actually translating the interface and allowing users to set language preferences should be considered a "Step 2". Changing all the strings to allow for translation is necessary before we can actually translate anything.
To maintain consistency between all locale files, and to find missing/unused translations, we should use the i18n-tasks gem. I've used it extensively in other projects and it's the best gem available for i18n maintenance.
We may also want to consider adding config.action_view.raise_on_missing_translations = true to the development and test environments. This would ensure we don't have any missing translation strings by failing the build and/or throwing errors for developers when loading a page. Whether or not this is preferred behavior is somewhat subjective, so I'll let others chime in with their opinions.
String size differing between languages. For example, German is kind of notorious for this. Where "Merge Requests" fits in the sidebar in English, in another language it may be 20+ characters long.
Airbnb actually handles testing this exact case with a custom locale. The intent is to make sure that strings in other languages won't escape their container or wrap where they aren't supposed to if they're longer than their English counterparts.
In-context translation. This is a goal to think about for later, but being able to translate the site in-context (e.g. load up GitLab in the browser and edit the strings around the site) leads to much better, more accurate translations because it's obvious what the string is supposed to mean when viewing the site, but not as much in isolation within a long list of strings. I'm not aware of a good open source solution to this problem, although Airbnb has written about it a bit on their blog.
I've taken a part in i18n supporting in CodeCombat.com project some time ago.
We have a big problem with actualization of translated strings in non-eu files.
I'll try describe a problem.
You add a key-value pair in your en.yaml file
Translators translate this key-value in de.yaml (for instance)
You edit your key-value in en-yaml. simple rewording at best, change of meaning at worst.
You have different meaning of sentenses in en and de.
...
Anti-profit :)
we came up with a solution:
When you edit en-line you should add some tag ({change} in our case)
Maintainer periodicaly runs a script that:
adds new tags to translate to all non-en files
marks nonactual translated strings with {change}
removes {change} from en-file
So, translators can see that some strings must be re-translaited.
@sytses already done that a year ago. It doesn't support yaml files.
It is surprisingly difficult to find translation tools that support the yaml format.
If I remember correctly I was only able to find a few paid services like transifex.
That's why I suggested to use another gem that supports .po/.pot files instead.
Quote from the gitlab admin account:
Right now it looks like we'll go with .po/.pot files and gettext/fast_gettext.
@sytses@haynes would Shuttle by Square work? Per the README, it supports Rails' i18n format. The License is Apache 2.0, it wouldn't technically integrate directly into the GitLab Rails app itself, so I don't think that should be a problem?
@connorshea the readme is interesting. I guess someone would have to set it up and test it a bit.
The workflows described in the readme seem overly complex to me, given that it just needs to monitor 1 translation file.
But maybe only the documentation is written a bit complex and it's actually easy.
Also while testing it, it would be great to provide provide a few screenshots of the ui.
It needs to be simple enough that even the dumbest non technical user would be able to translate a string.
@haynes in my experience, you'll definitely want more than one translation file. Otherwise you end up with a YML file that's a few thousand lines long, which is a problem when developers have to add new strings to the locale file. Not to mention potential merge conflicts. Dividing it up by the directories within app/views is what I'd suggest.
@connorshea what I meant is: we don't need it to scan every commit for possible translatable strings. It's enough if it monitors the english translation files.
@sytses I can't find information on whether or not it supports the Rails i18n format, and I'm also not seeing any screenshots of the actual interface. Do you know if they have either of those in the docs?
@sytses it doesn't look like they support Rails' i18n format, based on their format list. We could use GetText as suggested above, but I'm not familiar with that library, so I can't comment on whether or not it will do everything we'll need. I'm also not sure how well it integrates into Rails.
I was also planning on opening an MR that adds i18n-tasks to the GitLab development environment, so we may want another solution for synchronizing strings between languages, preventing unused strings, duplicates, etc.
@haynes do you have any experience implementing GetText in a Rails app? I'm not opposed to using it over i18n, but I'm unfamiliar with it so I don't know what its limitations/quirks are.
@connorshea no. I just found a lot more translation tools support the .po format.
But if Shuttle by square works fine, we can go with i18n as well.
I guess we'd need a test setup to verify this.
Right now I think we should use gettext for performance reasons and I'm interested if Shuttle works. If shuttle works we can consider bundling it in GitLab EE so you have easy translations for your apps out of the box.
@fabiobeneditto I'm against implementing translations with javascript.
We already had several issues where people complained that it wasn't possible to some basic stuff in gitlab with javascript disabled.
(i.e. deleting the own account)
If we load all translations with javascript, I think we'll get a lot more of those issues.
Also on a first quick google search, I didn't see many translation tools for i20n. (This doesn't necessariyl mean anything. Maybe I have to search for different terms.)
@fabiobeneditto No need to apologise. We appreciate every suggestion
I just tried to list why I personally don't think i20n would fit our needs.
Maybe I used some bad wording^^
To me, it looks like they put a lot of brainpower into the spec to solve problems other translation systems have. For example, gettext lacks a built-in possibility to reuse partial translations.
Another advantage over gettext would be that l20n allows different languages to have different grammar. Sounds reasonable, doesn't it?
Also please note that using the l20n format doesn't necessarily require to use JavaScript. There is already a Python implementation. Maybe that can be adapted for Ruby?
Using FastGettext to translate a Rails application - This is from 2012, so I'm not sure if much has changed since then - but the FastGetText implementation is certainly a lot faster than GetText or Rails' Simple i18n.
I still can't find a good guide that actually shows the syntax for GetText, and how they handle the different locales, which is really frustrating.
@connorshea the only reason I prefer gettext over i18n is because there are a lot more translation tools available that can handle .po files.
regarding the syntax:
in i18n: t('translations')
in gettext: _('translations')
Also it seems that you don't need to follow a specific syntax for the keys. (I think this would be better, but it isn't needed)
See here for an example application: https://github.com/grosser/gettext_i18n_rails_example
@connorshea regarding Transifex:
I suggested that in the past, but their open source plan doesn't apply for the EE edition.
Afaik gitlab talked with transifex, but deemed it to expensive to use.
But maybe something changed :)
Please consider Right To Left languages Like Farsi (Persian), Arabic, Hebrew, Pashto, etc. This languages needs to change the direction of the page layout and the contents.
It is not so complicated, though.
It's an absolutely must have feature these days to be able to localize user interface in local language (Russian in my case) as soon as GitLab pretends to be a single tool for SDLC management. There are not only developers any more, but end-users, managers, testers, UAT people and many others who are involved into software change process.
And last but not least it will be the killer feature against many other GIT repository management systems which means more GitLab users and clients.
GitLab Partner in Russia
@malessio In China, to the state-owned enterprises and government agencies, they’d better need the Chinese version. If not, they would have some problems to buy it.
To the private enterprises, if the Chinese version is free, they would prefer the Chinese version. All in all, it is more easy for Chinese people to operate the software in their native language.
What are the expectations of Chinese consumers with regards to localization? We may not be able to translate every string in time for the monthly release, would seeing English strings occasionally in the interface be problematic?
Would it be necessary for the documentation to be translated fully into Chinese?
Is there an expectation that we would have both Traditional Chinese and Simplified Chinese support?
We may not be able to translate every string in time for the monthly release, would seeing English strings occasionally in the interface be problematic?
@connorshea For a German translation, I don't think an incomplete translation would keep people from using it but still it looks unprofessional. I would separate preparing the translation process from delivering the translations. The initial translation could have a bit more time and ignore the release cycle until it has reached > 90%.
@winniehell We probably wouldn't enable a language until at least 95% due to how many strings we have. The problem I'm suggesting is that – after we've enabled a language – we wouldn't be able to update that language with new strings in time for the release on the 22nd.
There's also the mess that will be managing localizations for CE as well as EE. Merge conflict hell D:
@connorshea I wonder if it would be possible to have only one pool of texts containing both CE and EE. That would mean, CE is delivered with unused texts but make the whole process a lot easier (merge hell before translating ).
@winniehell the problem is that everything gets messed up when this happens:
Add a new EE-only feature
The strings are added to CE
Wait for CE => EE merge
Then merge the new feature, I guess?
Git was definitely not made to handle this kind of workflow :P
I suppose the alternative would be to implement the new feature without localization support, then add the strings to CE, then merge it into EE and replace the static strings with localizable versions. Fun.
replace the static strings with localizable versions
@connorshea Some translation systems have a fallback language attached to the translation key. For gettext often even the English text is used as translation key. So this step would not be necessary. It would instead be:
implement EE feature with placeholders (and fallback texts in English)
@connorshea It's just about a few extra placeholders, can't be that hard, can it? If merging between the two projects is such a big problem, how about feature flags?
Another option would be to keep the translation process of CE and EE independent but to automatically import translations from CE into EE. That would require that no CE string is translated in EE (hello merge hell again? )
Would it be necessary for the documentation to be translated fully into Chinese?
In my experience; the documentation is the single most important piece to be localised; and a localised UI is a hinderance if the doc has not been localised first.
Is there an expectation that we would have both Traditional Chinese and Simplified Chinese support?
Two very different markets.
In mainland China where Spoken Mandarin/Written Simplified Chinese is king, English skills are not common and localised product is necessary for any volume of sales. (this is where @SHCSINFO is located)
In Hong Kong/Maçau (spoken Cantonese/written Traditional Chinese) and Taiwan (spoken Mandarin/written Traditional Chinese), English skills are very common, and the need for a localised product there changes from "necessary" to "nice".
For other Asian markets:
In Korea (written Hangul) a localised UI is not necessary, but documentation is.
In Japan, localised product (UI AND doc) is again necessary.
I can't speak for large enterprise in Taiwan, but we in the community are very used to English contents. That being said, translation for Traditional Chinese is still very good to have (not all people are good with English), and some people could even tolerate Simplified Chinese if there's no Traditional Chinese.
I've just created #19996 (closed) to discuss the translation of the only the docs and to what extend this would help everyone involved. Please continue that discussion there.
@connorshea : That one "only" supports .po files, not yml files. This means that we'd have to use gettext/fast_gettext if we want to use this editor.
That's nothing bad, just something to keep in the back of your mind
Hi!
I want to contribute in Spanish translation for gitlab. In transifex I saw a team working on Polish translation and another one in Chinese translation, Is there any team working in Spanish translation?
For us, in Venezuela, would be very interesting to use gitlab in our language, because this would be a great help for many software projects that are migrating from github to gitlab.
Unfortunately we don't currently have official translation support, we haven't begun moving the GitLab app to use localized strings, all text in the app is currently "hard-coded" as English.
I've been interested in beginning this process, but due to lack of interest and necessity - as well as concerns that this would slow us down - it hasn't been a priority.
The projects you mention are unofficial and simply replace every English string with Chinese or Polish, which unfortunately doesn't allow us to support more than one language at a time.
Thanks @connorshea
I get it. but do you think it would be possible that we, on our own, as gitlab community, we could start our own translation into Spanish?
@petrizzo it would be possible, but it would be a ton of effort and frequently break between releases. If/when we implement proper translation support it'll likely require a huge effort to move the strings into our format.
Personally I wouldn't consider it worth the effort, but if your mind is set on it, it's definitely possible.
Anyway, I recommend to look at the Pontoon site which includes demo and gives you an impression on this. Localize the GitLab UI on pontoon.gitlab.com or any other place would be awesome.
If you should post the translation guide or make the strings available in some translation service, I can contribute and translate some part into Russian.
I had translated Gitlab to Chinese (Up to 8.14.0.preview, without help documentation), It's a really huge work.
hope this Issue to be solved, It will be a big help!!
I think this issue is critical for GitLab "Everyone can contribute" message. Until there is no translations, not everyone can contribute, a lot of users are just scared of English interface.
I had translated Gitlab to Chinese (Up to 8.14.0.preview, without help
documentation), It's a really huge work.
hope this Issue to be solved, It will be a big help!!
Hi!
I'm glad to read this. Which method did you use to do the translation?
@sytses we'll probably get a very long way by making contributing to repositories very easy, assuming i18n files are checked into git. We could expand that by providing helpers for those specific types of files. What did you have in mind?
On another note, I believe we're slowly getting more resilient to disruption by i18n, due to us now supporting multiple fonts and therefore different word lengths.
Translation would still be an immense effort with many downsides. I much prefer to build features that make this easy first, before actually translating GitLab itself.
I like Gettext a lot and used it successfully in a Rails project, the only downside was that the involved gems (gettext, fast_gettext, and gettext_i18n_rails) frequently took some time until they supported new Rails and Ruby versions (the latter is relevant because Gettext needs to be able to parse Ruby). This was 2-3 years ago so the situation probably has improved, at a glance it seems Ruby 2.3 and Rails 5.0 are currently supported.
Some other arguments for using Rails' builtin I18n support:
easier for contributors, most Rails developers are familiar with it but few have used Gettext
easier for third-party translations which already use Rails-i18n, e.g. the ones included in the devise and doorkeeper gems
@JobV I'm not sure but let's have a look at the simplest thing to do. Mattermost their translation process includes https://docs.mattermost.com/developer/localization-process.html#translations-updates They had to make scripts to import new strings and generate MRs with updated translations. The minimum thing for us would be to ensure people don't have to do that work. But even better would be reusing GitLab project credentials for the translation app.
@JobV Pootle is GPL 3.0 and written in Python so it will be controversial to ship it as part of our Omnibus packages. I propose we ask Mattermost for their scripts and work with Pootle to make it very simple to link Pootle to a GitLab project (exporting strings and generating MRs).
[With my GNOME and Mozilla localizer cap on] Do take "in-place localization" into consideration too. Something similar to Pontoon from Mozilla. I am not sure whether it is applicable to GitLab or not. Just wanted to make sure that too gets mentioned somewhere here My bad, it is already mentioned.. Based on my localization experience, erroneous translations are caused mainly because the context is not clear, i.e. translators have no idea which page/frame a string will come, how much space will be available etc..
@sytses I did not saw any option to authenticate with OAuth2 or OpenID Connect within Pootle. Pontoon does support OAuth2 and will have a closer look if it would be compatible with GitLab. I think authentication with GitLab credentials is essential if you think about shipping a translation system with GitLab. I would appreciate if you could have a closer look at Pontoon to get an impression.
@toupeira Thanks for the details, I had a closer look at Pontoon and they are also using django-allauth as Pootle does.
Nevertheless, the in place localization feature of Pontoon is awesome and from my perspective a must have for web apps, otherwise you end up with mailing screenshots all over with details on context because of wrong initial translations. What is your experience on managing translations across teams?
@bufferoverflow well my experience is mostly the "mailing screenshots all over" style :) So I agree that in-place localization could be a big time-saver (and quality-improver) in the long run, even though it probably requires more effort to set it all up. But from to the developer instructions it looks like it would integrate nicely with a Gettext-based workflow.
I did a lot in this space at Skype, and a little bit at Gitter too.
My 2c:
i18n is a long and daunting task, if you try to bite it off all at once, you'll never get started. Taking a small portion of the UI and just translating it should be the first step. You'll learn what works and what doesn't.
I would personally avoid having translation management as a feature of GitLab itself. It should be part of the build/release process. If people want to edit their translation string files, fine - but building out scope to do this as a feature seems like wasted effort.
With this in mind, and given merge implication mentioned above, this should just be part of GitLab, it's not an EE "feature"
If you externalise some (and ultimately, most, then all) of your strings, then you build translations into your release process. You can lean on the community, or external agencies to help by providing translated strings for new features/UI elements.
Speaking of UI... we found at Skype that Greek & German words were often significantly longer than English words. This actually impact UX design quite considerably as designers need to consider the variable length of a button and how that impacts it's spacing, positioning, etc...
I can't comment on the technologies used, but impact on performance does need consideration - especially at GL.com scale.
I'm happy to help drive this forward and try to make a first step in an up-coming release.
I help manage translations for CiviCRM.org (a free software contact relationship management tool for non-profits). We support ~ 25 languages and have had, over the years, more than 1200 contributors. +1 on all of the above.
it's rather common that an organisation will contract an external agency to translate into their language. It helps to use a translation tool that is not geek-specific. We use Transifex.com, but Pootle is OK too. In-line translation also helps to fix the smaller details, once a translation has passed the 90% mark.
when choosing a tool to manage translation strings (Transifex/Pootle), make sure that there is an option to "freeze" strings that have been reviewed. For example, if you support in-line translation, anyone may try to change a string to fit their needs or regional language. Having different permissions levels makes it less risky and easier to deal with change requests. You will probably get a ton of requests to join translation teams, propose changes, etc, and it's not always easy to strike a balance between "accept anything" and "require manual intervention/review by admins who will ignore those notifications".
CiviCRM is written in PHP, and initially developers used a php lib instead of native gettext. It had a 20% performance hit, whereas native gettext has a much smaller performance hit. I would recommend Ruby's native gettext, because it's rock solid and well supported.
the c-i servers of CiviCRM generate new translation (.mo) files daily, by pulling the strings from Transifex, converting them to gettext .mo, then making them available on the web. We also do this for extensions/contribs. This way, people can work on translation one day, and update their production environments the day after, without having to deal with compiling/converting translation files. Most translators are very non-technical.
I don't want to arm-chair on something I cannot really contribute on the code-level, but if there is anything I can do with regards to process, happy to help. Our local laws require us that employees must have the option to use software in the local language, so this is a blocker for wider adoption of Gitlab in the co-op where I work (Quebec/Canada).
I really like how Pontoon will prevent a lot of the problems of localizations (breaking buttons and lack of context for translators). https://github.com/mozilla/pontoon
Thanks for the insights @mlutfy! Very much appreciated.
@mydigitalself You have some great points. I propose you lead the effort of doing a first step, but with very heavy buy-in from production, performance, engineering, UX and frontend (all of engineering basically).
Practically, let's try to settle on a strategy in this thread, but create separate issues for the first steps. As you advised @mydigitalself, it should be a small start. This also immediately implies that we add controls / configurations for translations, so we'll cover a whole lot of ground with just that. A page like cycle analytics might be nice because it's relatively low impact and is not very responsive.
Some high level goals:
minimize performance impact
minimize engineering overhead
easy to contribute translations for everyone in the community
ability to do this iteratively, both the translation as the implementation
We need a list of what languages we want to support initially
Are we supporting any right to left languages?
General rule of thumb I have used in the past is to treat the length of the words in English to be between 1/2 to 2/3 of the total space needed for German and Dutch, and other long languages (like Greek).
Necessary line height/spacing is also impacted depending on the characters we need to support. Characters can be taller than the typical English characters.
Some languages have more complex characters which become less readable at smaller font sizes (can impact Chinese, etc.). I believe we are fine with our current type ramp, but just something to keep in mind based on languages we are supporting
Word choice (copy) is part of UX. When we get strings localized, it is sometimes helpful to add notes to the intent of the copy so localizers have a better shot at translating it appropriately
From a performance perspective we need to take care of two things:
Where do we store this data
How do we retrieve the data
Storing this in a DB might seem a natural solution, but it comes with extra performance overhead: for every request we have to fetch a few thousand translation rows/settings from the DB. The same applies to Redis, though Redis is usually a bit faster.
Retrieval wise I'd recommend to retrieve all data at once, instead of retrieving every translation whenever necessary. When using a database you definitely don't want to retrieve every translation one by one as this could result in thousands of extra queries per page.
from our Slack conversation: IMHO We should start with the high value localisations; Japanese, Simplified Chinese, Korean. In that order. They are worth the most money in terms of market share.
The only R2L languages that have economic significance are Hebrew and Arabic. They are also very hard as the whole UI is switched, and typically English words will be embedded in L2R fashion. Fortunatly, the economic impact of not doing R2L is small: English is widely used in Israel; and the market in the Arab world is too fragmented (What Pablo said about Spanish is an order of magnitude larger for Arabic)
Storing this in a DB might seem a natural solution, but it comes with extra performance overhead: for every request we have to fetch a few thousand translation rows/settings from the DB. The same applies to Redis, though Redis is usually a bit faster.
Why not to use built-in I18n.t with locale files stored in config/locales/?
IMHO We should start with the high value localisations; Japanese, Simplified Chinese, Korean.
Absolutely agree we should include the most high-value localisations. However, doing just one language is the bulk of the cost, additional languages are just marginal increases in translation effort whereas the big effort is externalising the strings and the code infrastructure to support this.
Where I'm going might sound like a crazy proposal, but I'd propose a well-spoken language within GitLab as a company or the language of whoever is assigned to work on this. Thay way we have a familiar environment to work with and can easily verify the translations are working correctly, and then we just add more languages to the locale files.
Required languages (when done) are: French, German, Spanish, Italian, Brazilian Portuguese, Korean, Japanese, Traditional Chinese, Simplified Chinese.
I updated the issue description.
I think either Spanish or Portuguese are the most commonly spoken languages at GitLab that fall in this list (where the continental Portuguese can also understand Brazilian Portuguese), so either of those would make a good start.
I would love to get involved with Traditional Chinese, specially to Taiwan variance. (Hong Kong uses Traditional Chinese but different terms than Taiwan)
@yorickpeterse I've to agree with @blackst0ne here - files are a nice place. We should just load these files in memory as I don't think it will be so much of an issue and particularly they just don't change.
@winniehell correct. Because there are almost no translation tools available for rubys YAML files.
But there are a lot of tools for gettext's .po files
@mydigitalself We have strings everywhere, in random Ruby classes, in HAML views, in JS files, and in some cases in the DB (like the "mentioned in issue #28489 (moved)" system note) just above this. I think the first step would be to integrate with gettext in a way that both the backend and frontend have access to it, which sounds like a mostly backend Platform task to me.
@DouweM My advide is to do what a Unix vendor did, which I used to work for:
get a script to convert all strings from "...." to _("...."), regardless of content, language or use.
add a suitable no-op function to any changed code, that makes the code work again.
from now on have all your developers use _("...") instead of just double quotes.
Now the i18n team can decide at leasure what to a) support using something like gettext, b) send generated strings to the l10n teams, or c) revert to the former form, since it is a technical string (protocol, etc.).
Most importantly: developers write code in english, don't have to make up any labels, don't slow down due to not understanding all the impact of the i18n.
@JobV sure, will look at Pontoon & Pootle. The Pontoon demo looks great, but it would probably be quite a bit undertaking to get that working with a dynamic web app vs a more static site.
We're currently recommending using a downloadable PO editor such as https://poedit.net/
Mike Bartlettchanged title from Internationalization / add translations to META: Internationalization / add translations
changed title from Internationalization / add translations to META: Internationalization / add translations
I see some people have already contributed some translations, and I'd like to ask how can I do that too. I'm willing to translate to Bulgarian, and I have no problem doing technical git-stuff (I'm a programmer). However, what is the correct flow of translating? Has it been explained in another issue? If so, please post a link. If not, here are some questions:
Does translation happen off-line (POEdit, etc.) or in some online system? I see translations being done here: https://translate.zanata.org/project/view/GitLab . Is that the chosen/preferred system? If so, whom should I contact to add me to the project and add the Bulgarian language?
If Zanata is the chosen system (and I have no experience with this particular one), do the translations automatically go into the Git repository or does that happen manually? If so, what is the proper process?
If contributing is done manually, are there any extra steps/tools that need to be run? How? I see a comment above about generating and app.js file… how is that done? If I know that, I can start working on the Bulgarian translation.
@htve I have already registered. My username is the same as here — lyubomirv — and I would like to translate to Bulgarian (bg). Please add me in Zanata.
@htve I'm already registered too. My username is fabiobeneditto and I like to work with pt_BR (Brazilian Portuguese). May you add me in Zanata to do this?
Thanks for your effort :)
@lyubomirv@fabiobeneditto@alexandre.alencar
Have added you to the project and have assigned their own language.
PS.Please translate v9.2 first, then import the full translation into v9.3 or wait for the merge version. You can continue to translate the difference between v9.3 and v9.2 while waiting for the merge version.
Now that we have a common repository, maybe we should do a common "META" issue for community translation, same goes for MRs, like that we'll have to compile the translation one time instead of multiple times.
In another hand, why not add and compile new translation during CI instead of requiring it to be added manually? All we'll have to do is scanning the /locales folder, do a bundle exec rake gettext:add_language[$language] for every folder name in /locales, and finish with bundle exec rake gettext:compile.
P.S.: 9.3 .pot file have been updated (+2 strings to translate, at least 1 translation removed)
Hi everyone, I'm the Founder/CEO (and original developer) of Transifex. I wanted to extend an invitation from our team to host GitLab's localization and share how excited we'd be to serve the GitLab community.
Even though I see that the testing/adoption of Zanata is underway, I thought it would be useful to reach out and try to be helpful.
To give a taste of what GitLab+Transifex look like, I went on and uploaded the files in locale/ from the master branch to the Transifex GitLab organization created by @sytses a few years back. If people want to play around with the platform, @sytses is the right person to add them to the org. Here are two screenshots from the dashboard and the translation editor.
A couple of points related to i18n issues raised and how they can be addressed, mostly in the context of Transifex:
Merging translations: Merging PO files can be a pain in the butt, since a single translation may cause line numbers to change in the POT file, causing huge diffs and inevitable merge conflicts. Thus, it's worth considering not storing the PO files in the repo at all. The English file can be generated at build- or merge-time and be pushed to Transifex with tx push --source. Tx uses the source file as a template and updates all the translation files. When the translation files are needed, grab them with a tx pull --translations.
Workflow: The typical way of integrating with Tx is to use the command-line Client (Python) to sync the files from a directory with Tx. The client can be run automatically from a CI system or directly from the Git repo. The next is a bit obvious, but if there are any phrases stored in a DB, the API can be used to upload/download them.
Missing translations. In some file formats, using a file with empty translations causes the build to fail. Two solutions: One, export the files with the English phrase untranslated when there is no translation. Two, enable the use of machine translation for the empty phrases so that the user at least sees something in their language vs seeing English. Pros and cons with each approach...
Markdown for documentation: It's supported natively in Tx, so that you can upload .md files, just like PO files.
String sizing in languages like Greek, German:
You can specify a character limit for a phrase. In the Editor, translators will see this and will get a warning if they go over.
To test the UI and make it more adaptive to long phrases, developers can download a so-called "pseudo translation file" from Tx. It includes weird tall characters and appends characters to each phrase to make it longer. This way, they can test the UI for issues before they reach the translators.
In-context translation is supported using screenshots (docs link). Uploading and mapping of the screenshots can be automated using the API.
GitLab EE: Its phrases can be hosted under the same or a different organization and be translated by the same or different teams, professional translators or a combination. Puppet and other mixed-license projects follow this paradigm. The two projects may use the same translation memory, so no duplication of work will happen.
Finally, I don't foresee that pricing will be an issue.
Hope this helped, even remotely. If I can be helpful in some way, regardless of the choice of translation platform, just ping me.
We're just embarking on this journey and are starting to evaluate tools such as Pontoon and Zanata. Would love to have a chat sometime. What's the best way to get in touch?
To be honest, we started the community translation project on Transifex, but the versioning system wasn't as good as on Zanata. Of course, Zanata has its downfalls (no "join project" button).
Moreover, some of the feature you talked about are behind a paywall (179€/month for 10 users, as an open-source project, we don't have that much money ).
For now, Zanata is closer to what we need, but as @mydigitalself said, we are still evaluating the tool, and may or may not decide to change on a latter date.
Zanata is now deployed six months ago version, on the version of the problem I have zanata administrator sent an email to ask, there is no answer to me. I think in the newer version there will be a feature of joining the project, and it is an open source project.
ps.If necessary, we are not familiar with a set of zanata is also a good idea.
@egeorget Classic versioning isn't supported. This is partly by design -- not that I'm 100% happy with the current approach . Depending on the complexity (number of live versions, the number of files in each version) there are good workarounds which are working for pretty complex projects and teams.
Having already had some experience with translating GitLab (I've translated 100% to Bulgarian), and already having a big experience participating in the community translation of many projects, I've noticed some general issues that I'd like to comment on:
Context part of strings:
In the code, the context for the translatable strings seems to be given as part of the string, separated from the actual string by a |, like this:
s__('ByAuthor|by')
Then in the PO file, this becomes: ```msgid "ByAuthor|by"msgstr ""
And in Zanata (or wherever), we get to translate the string `ByAuthor|by`. Now, the part before the `|` is the context, and what should actually be translated, is **only** the word "by". This is very strange and misleading and some translators (I too did, at first) may get confused and translate the whole thing.The proper way to give context should be as `msgctxt` in the PO file (which Zanata will show appropriately), or by a comment for the translators — a line starting with `# ` in PO (which, again, will be shown appropriately by Zanata). There should be a way to configure the system to convert the thing before the `|` from the code to either context or comment when creating the PO files.
set_password_message = _("You won't be able to pull or push project code via %{protocol} until you %{set_password_link} on your account") % translation_params
So, in the code, we first have a translation of `set a password`, and then that is input into the long sentence. The reason that is done is because the string `set a password` must be a link, and so must be surrounded by `<a></a>` eventually. However, this is **not** the best way to do it, translator-wise. The two strings (the sentence and the link part) may appear at very different places in the PO file, and the relation between them will not be obvious. Further more, if the little part appears before the sentence, it will be completely out of context (the context there `SetPasswordToCloneLink` is practically useless in this case, especially since the sentence does not have the same context). And what is more, the 'splitting' may be different for different languages, and so this have to be taken with a lot of care by the translator. The usual way to do these kind of things, is to either put the link tags in the translation (possibly adding a placeholder for the actual address): ```hamlset_password_message = _("You won't be able to pull or push project code via %{protocol} until you <a href="%{address}">set a password</a> on your account") % link_address
…or if that is not possible (if the link tags are more complex and/or only they may change in the future, turn the start and end tags into placeholders:```haml
set_password_message = _("You won't be able to pull or push project code via %{protocol} until you %{link_start}set a password%{link_end} on your account") % link_start link_end
(My `haml` might be wrong in the substitution part at the end, I haven't used `haml` so far, but I'm sure you get the point.) That way the translators will have all the context they need (`SetPasswordToCloneLink` is not needed), and will translate the whole thing in the best possible way right away, without having to dig into the code or the rest of the PO file. I think this doesn't make the developer's job harder but is a big deal for the translation.I'm guessing these things are better fixed early than late, as with everything, and I hope you agree with me that fixing them will make the translation process easier and better at the same time.
@mydigitalself Should I copy my comment above to some other issue, and if so, to which? Or is it visible enough here? Or do you think those are not real problems worth considering?
@bikebilly we are currently just sourcing Spanish & German internally as we have native speakers. The rest may be picked up by the community and partners.
Weblate is a free web-based translation management system which perfectly integrates with ci/cd. We use it for a long time now, i can really recommend using it for your internationalization plans, see: https://twitter.com/gitlab/status/879819397518458881.
@3_1_3_u Thanks for reaching out, unfortunately, we haven't been able to proofread those Ukrainian translations in time. @vsizov would you have a moment to do that? I've sent you an invite for Crowdin with the correct permissions.
We really do appreciate the tremendous effort you and the GitLab community are putting into translating GitLab.
In fact, the work done was far greater than we could hope for, which is why it takes longer than we would like to before new translations reach master:
Right now, the process of validating translations is still very labor intensive, we're working on automating things as much as possible, follow https://gitlab.com/gitlab-org/gitlab-ce/issues/33521 to see where were going. We've already started automating the validation of translations (for example in this build), but we're not there yet.
@bikebilly@victorwu@joshlambert please could you review the checklists in the description for your respective product areas and update to include any missing pages/sections?
It would be great if we could localize one page per release where possible. If you have anything planned, it be great to have the milestone marked in the list above too. Thanks!
@3_1_3_u Sorry, for now we don't picked translations into patch releases. This is because our externalized strings are still in flux: A message-id that was used in 10.0, might not be needed in 10.1. So picking those translations in could break translations in patch releases.
This means that everything translated and imported after the 7th of the month (our freeze date) won't make it in the release on the following 22nd.
GitLab is moving all development for both GitLab Community Edition
and Enterprise Edition into a single codebase. The current
gitlab-ce repository will become a read-only mirror, without any
proprietary code. All development is moved to the current
gitlab-ee repository, which we will rename to just gitlab in the
coming weeks. As part of this migration, issues will be moved to the
current gitlab-ee project.
If you have any questions about all of this, please ask them in our
dedicated FAQ issue.
Using "gitlab" and "gitlab-ce" would be confusing, so we decided to
rename gitlab-ce to gitlab-foss to make the purpose of this FOSS
repository more clear
I created a merge requests for CE, and this got closed. What do I
need to do?
Everything in the ee/ directory is proprietary. Everything else is
free and open source software. If your merge request does not change
anything in the ee/ directory, the process of contributing changes
is the same as when using the gitlab-ce repository.
Will you accept merge requests on the gitlab-ce/gitlab-foss project
after it has been renamed?
No. Merge requests submitted to this project will be closed automatically.
Will I still be able to view old issues and merge requests in
gitlab-ce/gitlab-foss?
Yes.
How will this affect users of GitLab CE using Omnibus?
No changes will be necessary, as the packages built remain the same.
How will this affect users of GitLab CE that build from source?
Once the project has been renamed, you will need to change your Git
remotes to use this new URL. GitLab will take care of redirecting Git
operations so there is no hard deadline, but we recommend doing this
as soon as the projects have been renamed.
Where can I see a timeline of the remaining steps?