Revamp reply emails parsing
This is an attempt to summarize all the issues with reply emails parsing, making it easier to find clues.
All related issues
- HTML emails #2847 (closed), #3357 (closed), #15545 (closed), #18388 (closed), #23340 (closed)
- Inline/bottom replies support #3020 (moved), #14805 (moved), #20514 (moved)
- Strip signatures #3061 (moved), #14786 (moved)
- Ignore auto-generated emails #18548 (moved)
- Incident: https://gitlab.com/gitlab-com/infrastructure/issues/1#note_17599430 , #24003 (moved)
- Incident: https://0xacab.org/riseup/0xacab/issues/11
Challenges
- Different email clients (e.g. #18388 (closed))
- Different languages
- HTML emails
- Auto-generated emails
- Signatures
Suggested solutions
- We leave markers which we could recognize later in the emails (I think Discourse is doing this, also a ton of support tickets system)
- Have a list of different formats email clients could be using (some clients would use
|
for quoting) - Don't use Markdown, just plaintext (GitHub is doing this, but this could still be very terrible. Here's an example of woes)
Reference implementation
- https://github.com/github/email_reply_parser
- https://github.com/discourse/email_reply_trimmer
- https://github.com/discourse/discourse/commits/master/lib/email/receiver.rb
Some stopped effort
- https://gitlab.com/gitlab-org/gitlab-ce/commits/adopt-email_reply_trimmer (failed build: https://gitlab.com/gitlab-org/gitlab-ce/pipelines/3869613)