Skip to content

Raise encoding confidence threshold to 50

What does this MR do?

Raise encoding confidence threshold to 50

Why was this MR needed?

It is recommended that we set this to 50: https://gitlab.com/gitlab-org/gitlab-ce/issues/35098#note_35036746

In this particular issue, the confidence was 42 for Shift JIS, but in fact that's encoded in UTF-8 just with a single bad character. In this case, we shouldn't try to treat it as Shift JIS, but just treat it as UTF-8 and remove invalid bytes.

Treating it like Shift JIS would corrupt the whole data.

Unfortunately, the diff which would cause this could not be disclosed therefore we can't use it as a test example.

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Closes #35098 (closed)

Edited by username-removed-423915

Merge request reports