There's either a bug in iTerm2 or weechat; double-width characters require both programs to use exactly the same definition of what is double-width and what isn't.
I wasn't able to reproduce with weechat installed with brew. Did you do any configuration of your weechat client? What version is it?
What version of screen do you have installed? Is this on Mac OS or ssh'ed to another machine? I couldn't repro by running weechat inside screen, either.
Would love to see a debuglog for when you run weechat locally. That looks like a bug in the VTx00 parser.
I'm running screen on the machine that I SSH into.
The reason I think it's not an issue with weechat is because on my Linux machine, the weechat screen is not misaligned (also over SSH).
I've attached a video of a bash session where I repeatedly press up and down between these commands:
echo ".::・┻┻☆()゚O゚)".:・┻┻☆()゚O゚)"
echo ".::・┻┻☆()゚.::・┻┻☆()゚O゚)"
Doing this will cause )" part to multiply and bash-3.2$ to disappear.
Out of the strings I posted earlier, I think this one is giving me the most issues: .::・┻┻☆()゚O゚)
Here is the debuglog of the replacing text in bash: debuglog.txt
On weechat (local, NOT via SSH), if I paste .::・┻┻☆()゚O゚) and then remove it with backspace, it will also remove characters that are in front of the pasted string, for example:
When removing the pasted string from @Luna_Moonfang(Zi)] .::・┻┻☆()゚O゚), it will end up looking like @Luna_Moonfang(Zi) (Missing ])
Please let me know if you need a debug log from anything else.
Modifier letters, in the sense used in the Unicode Standard, are letters or symbols that are typically written adjacent to other letters and which modify their usage in some way. They are not formally combining marks (gc=Mn or gc=Mc) and do not graphically combine with the base letter that they modify. They are base characters in their own right.
(emphasis mine)
Section 3.6 D59 also says:
Grapheme extender: A character with the property Grapheme_Extend.
Grapheme extender characters consist of all nonspacing marks, zero width joiner, zero width non-joiner, U+FF9E, U+FF9F, and a small number of spacing marks.
A grapheme extender can be conceived of primarily as the kind of nonspacing graphical mark that is applied above or below another spacing character.
The set of characters with the Grapheme_Extend property and the set of characters with the Grapheme_Base property are disjoint, by definition.
This character is an oddball that is both a base character (albeit one that does not belong to Grapheme_base) and one that combines with its preceding base character.
The bug arises because the function we use to find grapheme clusters, CFStringGetRangeOfComposedCharactersAtIndex, says that O゚ is a single composed character. But a composed character is different than a grapheme cluster. From Apple's docs:
A composed character sequence is a series of one or more characters where each is a combining character, zero-width joiner or non-joiner, voiced mark, or enclosing mark, optionally including a base character.
Is Apple's Function Broken?
The key question is if FF9F is a combining character, zero-width joiner, non-joiner, or enclosing mark.
Is U+FF9F a combining character?
The canonical combining class of U+FF9F is 0. This doesn't mean much, though. From D52 in 3.6:
Combining character: A character with the General Category of Combining Mark (M).
The general category is Lm, so it doesn't satisfy this part of the definition.
All characters with non-zero canonical combining class are combining characters, but the reverse is not the case: there are combining characters with a zero canonical combining class.
The combining category is 0, so we have learned nothing about U+FF9F from this rule.
Combining characters consist of all characters with the General Category values of Spacing Combining Mark (Mc), Nonspacing Mark (Mn), and Enclosing Mark (Me).
This rule is not satisfied by Lm.
Conclusion: I can find no evidence that U+FF9F is a combining character.
Is U+FF9F a zero-width joiner or a non-joiner?
No. The only zero-width joiner is U+200D, and the non-joiner is U+200C. See The Non-joiner and the Joiner in section 9.2.
Is U+FF9F an enclosing mark?
From D54:
Enclosing marks are a subclass of nonspacing marks that surround a base character, rather than merely being placed over, under, or through it.
U+FF9F is obviously not an enclosing mark.
Conclusion: Apple's function does not do what it claims
How should a terminal handle a letter modifier?
It seems that bash (therefore readline, and perhaps also ncurses) expects that U+FF9F occupies its own cell because it is a base character.
The issues that come to mind are:
We need a better way to segment a string into base characters that understands grapheme extenders/letter modifiers.
What does Terminal do? They have a better way of dong segmentation.
If U+FF9F is treated as a base character, the present logic would make it selectable independent of the preceding base character. I think cocoa does not do this because a grapheme extender "is applied above or below another spacing character".
We need some way of tying multiple base characters together for the purposes of selection.
Could this use the existing mechanism for double-width characters? How many grapheme extenders can be attached to a base character?
Using NSAttributedStrings to draw whole lines as we do in 3.1 will work as long as the cells are all on the same line. But what happens if there's a line break between a base character and a grapheme extender?