Support NFD combining accents

Comment 1 by gnach... on October 04, 2010 22:44

It will be some time before we can handle NFD, so it's important to get NFC working as well as possible for the near term. I like the approach that you describe VTE taking. We could do that after going to a 4-byte representation for the character.

Comment 2 by gnach... on October 05, 2010 05:05

(No comment has been entered for this change)

Labels: ~~Type: Defect~~, Type: Enhancement

Comment 3 by gnach... on October 11, 2010 04:53

Issue #125 (closed) has been merged into this issue.

Comment 4 by gnach... on November 24, 2010 05:47

(No comment has been entered for this change)

Labels: ~~Priority: Medium~~, Priority: High

Comment 5 by gnach... on December 04, 2010 23:49

Issue #345 (closed) has been merged into this issue.

Comment 6 by gnach... on December 21, 2010 01:06

Issue #389 (closed) has been merged into this issue.

Comment 7 by gnach... on December 23, 2010 05:42

Issue #378 (closed) has been merged into this issue.

Comment 8 by gnach... on January 03, 2011 01:35

Egmont: are all non-spacing marks to be treated like combining marks? If not, how do I select an appropriate set of combining marks? I see this list of nonspacing marks:

http://www.fileformat.info/info/unicode/category/Mn/list.htm

Comment 9 by pe...@norrtorp.nu on January 03, 2011 09:40

Hope you don't mind me answering. :) Non-spacing marks are a subset of combining characters, and should be treated similar. Enclosing marks (Me) are also combining characters.
The unicode.org site is actually a quite good reference, http://unicode.org/glossary/#nonspacing_mark in this case.

PS. you can use libicu to check for categories if you don't want to keep the list up to date yourself. But perhaps Cocoa provides something as well...

Comment 10 by gnach... on January 03, 2011 10:09

Thank you! I was afraid it would be in Chapter 3 :)
For future reference, the good bits are here: http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf#G30602
at D51-D52. Specifically:

"Combining characters consist of all characters with the General Category values of Spacing Combining Mark (Mc), Nonspacing Mark (Mn), and Enclosing Mark (Me)"

Cocoa offers [NSCharacterSet nonBaseCharacterSet] which classifies characters as to whether they have a "non-spacing priority greater than 0", which I need to research further, but I suspect it's not quite right.

Comment 11 by gnach... on January 03, 2011 19:24

Spacing combining marks are acting funny. The sequence U+0071 U+093F U+0072 (lowercase q + DEVANAGARI VOWEL SIGN I + lowercase r) doesn't do what I'd expect in TextEdit. The 'q' and U+093F act as a single char for the purposes of selection, but for display part of the U+093F hangs over the 'r'. I doubt there are many Devangari iTerm2 users, but what is supposed to happen in this case?

Comment 12 by pe...@morth.org on January 03, 2011 20:03

I'm afraid this is beyond me. D50 clearly states:

The graphic positioning of a combining character depends on the last preced- ing base character, unless they are separated by a character that is neither a combining character nor either zero width joiner or zero width non- joiner. The combining character is said to apply to that base character.

To me it sounds like TextEdit is doing it wrong, but I'm no expert.

Comment 13 by pe...@morth.org on January 03, 2011 20:04

I meant D52, sorry

Comment 14 by gnach... on January 05, 2011 06:07

Dang, I'm really close. The only character I still render wrong that I can find is U+239D. In seems to draw one cell to the right of its actual position in Terminal and Textedit, but I can't see anything unusual about it in the Unicode databases. What am I missing?

Comment 15 by gnach... on January 05, 2011 06:41

I'm going to split out preserving NFD into a separate issue because it is a performance-killer.

Comment 16 by gnach... on January 05, 2011 06:42

This issue was closed by revision r438.

Status: Fixed

Comment 17 by gnach... on January 08, 2011 19:35

Issue #452 (closed) has been merged into this issue.

Comment 18 by michael.norr... on March 19, 2012 06:01

Though marked as fixed, the handling of NFD in filenames is still wrong. If you touch a file with name fooé, say, and then do an ls in a directory with lots of filenames, ls will misalign the columns because it doesn't realise that the two Unicode code points combine to form just one printed character.

Comment 19 by pe...@morth.org on March 19, 2012 07:56

That's because ls is outputting too few tabs, since it doesn't know about zero width/combining characters. IMO it's up to Apple to fix that (a bug report probably doesn't hurt)

Comment 20 by michael.norr... on March 19, 2012 20:34

Fair enough! Thanks for replying

Support NFD combining accents

Designs

Child items ...

Activity

Admin message

Admin message

Support NFD combining accents

Activity