Chinese encoding not accepted by this server
Chinese encoding? I checked the data being sent - there shouldn't have ever been any kind of non-latin character data going into the emails. Maybe the application was using the wrong encoding for some reason? But I couldn't find any evidence for this either.
At this point I decided to try to find out under what circumstances the mail server would give this response. We were using Postfix, and after a bit of searching I discovered a file containing header checks that the server ran against every email in an attempt to trap spam messages. Partway into the file I found the line:
/^Subject: =?big5?/ REJECT Chinese encoding not accepted by this server
Ok, so the email was being caught by this check. It appeared to be a regular expression that was being checked against the email headers, followed by the action to take, and finally the message to return. I knew from dealing with email encoding in the past, that Big5 was a single-byte Chinese character set, and the question marks and equals sign were part of the syntax for specifying the encoding (RFC2047). For example, a subject header containing German characters encoded in Latin-1 might look like:
Subject: =?latin-1?q?Gro=DFbritannien?=
But the application should have been encoding the email headers as UTF-8 - I was setting that encoding explicitly in JavaMail - where had Big5 appeared from, all of a sudden? I looked again at one of the emails that was being rejected by the server. This time I noticed that the email subject began with the word "Big". Surely that couldn't be a coincidence?
Then it dawned on me. The regular expression didn't match what the author thought it did. Because, of course, '?' has a special meaning in regular expressions - it means "zero or one of the preceding character". Instead of matching the sequence of equals and question mark symbols, the expression was matching start-of-line, "Subject:", an optional equals, "big" and an optional "5". In other words, any email subject starting with the word "big" would trigger the rejection message. Nice one.
A little extra escaping with some backslashes solved the problem:
/^Subject: =\?big5\?/ REJECT Chinese encoding not accepted by this server
You can't do big business and make big profits if your mail server is trapping big false-positives!
]]>Chinese encoding not accepted by this server
Chinese encoding? I checked the data being sent - there shouldn't have ever been any kind of non-latin character data going into the emails. Maybe the application was using the wrong encoding for some reason? But I couldn't find any evidence for this either.
]]>