Launchpad encodes eMail Subject wrong

Asked by Thorsten Glaser

I just got an eMail with:

Subject: =?utf-8?q?=5BQuestion_=23708252=5D=3A_builds_abort_after_ca=2E_10=E2=80=9317_minutes=2C_no_logs?=

This obviously violates the usual RFCs. I suspect the underlying cause is https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=787511 and related bugs, with the obvious fix being shelling out to a PHP two‑ or three-liner to encode the Subject header.

For reference, here is how I encode the Subject header from a shell script (git post-receive hook):

        senc=$(print -nr -- "Subject: $subj" | php -r '
                mb_internal_encoding("UTF-8");
                echo mb_encode_mimeheader(file_get_contents("php://stdin"),
                    "UTF-8", "Q", "\012");
            ' 2>/dev/null) || error-handling

This is for the Korn shell; for nōn-Korn shells, replace “print -nr -- "Subject: $subj"” with “printf 'Subject: %s' "$subj"”. For Python/py3k, you’d obviously just write the header to encode into the subprocess’ stdin.

Oh, and the “"\012"” is so the output has Unix line endings, for consistency with the rest; these are converted to CRLF later in my script, by the MSA. Use “"\015\012"” if you need it with CRLF right there.

Question information

Language:
English Edit question
Status:
Expired
For:
Launchpad itself Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Colin Watson (cjwatson) said :
#1

In your Debian bug report you say that this is obviously broken and mention RFC822 in support of this, but I can't find any such limit in RFC822. Would you mind being more specific about the standard that is violated by this long header line?

Revision history for this message
Thorsten Glaser (mirabilos) said :
#2

Ah, right, in 822 itself it’s only alluded to; it’s a SHOULD in 2822 (but I generally say 822 when I mean 822/2822/5322/… because that’s the number I can remember).

RFC 2047 §2 does contain the relevant limitation:

   While there is no limit to the length of a multiple-line header
   field, each line of a header field that contains one or more
   'encoded-word's is limited to 76 characters.

Sorry about that.

Revision history for this message
Launchpad Janitor (janitor) said :
#3

This question was expired because it remained in the 'Open' state without activity for the last 15 days.