Revamp reply emails parsing
This is an attempt to summarize all the issues with reply emails parsing, making it easier to find clues.
All related issues
- HTML emails gitlab-foss#2847 (closed), gitlab-foss#3357 (closed), gitlab-foss#15545 (closed), gitlab-foss#18388 (closed), gitlab-foss#23340 (closed)
- Inline/bottom replies support gitlab-foss#3020 (moved), gitlab-foss#14805 (moved), gitlab-foss#20514 (moved)
- Strip signatures gitlab-foss#3061 (moved), gitlab-foss#14786 (moved)
- Ignore auto-generated emails gitlab-foss#18548 (moved)
- Incident: https://gitlab.com/gitlab-com/infrastructure/issues/1#note_17599430 , gitlab-foss#24003 (moved)
- Incident: https://0xacab.org/riseup/0xacab/issues/11
Challenges
- Different email clients (e.g. gitlab-foss#18388 (closed))
- Different languages
- HTML emails
- Auto-generated emails
- Signatures
Suggested solutions
- We leave markers which we could recognize later in the emails (I think Discourse is doing this, also a ton of support tickets system)
- Have a list of different formats email clients could be using (some clients would use
|
for quoting) - Don't use Markdown, just plaintext (GitHub is doing this, but this could still be very terrible. Here's an example of woes)
Reference implementation
- https://github.com/github/email_reply_parser
- https://github.com/discourse/email_reply_trimmer
- https://github.com/discourse/discourse/commits/master/lib/email/receiver.rb
Some stopped effort
- https://gitlab.com/gitlab-org/gitlab-ce/commits/adopt-email_reply_trimmer (failed build: https://gitlab.com/gitlab-org/gitlab-ce/pipelines/3869613)