From konstantin at linuxfoundation.org Thu Apr 24 19:08:39 2025 From: konstantin at linuxfoundation.org (Konstantin Ryabitsev) Date: Thu, 24 Apr 2025 13:08:39 -0400 Subject: [Remail] Remail problems with Content-type: 8bit multipart mails Message-ID: <20250424-smoky-dazzling-galago-9ca25c@lemur> Hello, all: We've discovered that remail has problems with mail containing 8-bit content. This causes a UnicodeEncodeError traceback for any emails containing non-ascii characters in the encrypted part. It looks like Thunderbird does this for sure -- other clients I've tried will always create a quoted-printable 7bit part before encrypting it. The traceback I'm seeing is when trying to save the message to the plaintext archive. Remail is still waiting on the OS upgrade (it's way behind the firewalls, so it's not getting prioritized), which is why it's still python-3.6, but I believe that will happen on a more modern version of python as well. : UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128) Traceback (most recent call last): File "/usr/local/remail/remail/remaild.py", line 196, in process_msg if ml.process_mail(msg, dest): File "/usr/local/remail/remail/maillist.py", line 339, in process_mail res = self.do_process_mail(msg, dest) File "/usr/local/remail/remail/maillist.py", line 300, in do_process_mail self.archive_mail(msg_plain, admin=dest.toadmin) File "/usr/local/remail/remail/maillist.py", line 153, in archive_mail mbox.add(msg) [...] File "/usr/lib64/python3.6/email/generator.py", line 406, in write self._fp.write(s.encode('ascii', 'surrogateescape')) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128) There are two separate problems. First is the actual decryption of content containing 8bit characters. By default, calling gpg.decrypt() will assume that the contents are in latin-1: https://gnupg.readthedocs.io/en/latest/#getting-started To get the correct binary data back from the decrypt() call, remail should set gpg.encoding = 'utf-8' after defining self.gpg in the __init__ method of gpg_crypt. However, by itself this doesn't fix the problem, it just fixes the corruption. There's still a backtrace because when remail creates a plaintext version of the message it doesn't expect 8bit content. In theory, this can be fixed *somewhere* by setting the policy to cte_type='8bit', but I got lost in all the places where the message gets created. I wanted to share my findings at this point in hopes that more eyes can start looking at this. -K