So I was diagnosing a problem at work. Automated emails were failing to send, but only in some cases. My investigation quickly revealed that the mail server was refusing to deliver them, instead giving back an error code and the following curious message:

Chinese encoding not accepted by this server

Chinese encoding? I checked the data being sent - there shouldn't have ever been any kind of non-latin character data going into the emails. Maybe the application was using the wrong encoding for some reason? But I couldn't find any evidence for this either.

At this point I decided to try to find out under what circumstances the mail server would give this response. We were using Postfix, and after a bit of searching I discovered a file containing header checks that the server ran against every email in an attempt to trap spam messages. Partway into the file I found the line:

/^Subject: =?big5?/    REJECT Chinese encoding not accepted by this server

Ok, so the email was being caught by this check. It appeared to be a regular expression that was being checked against the email headers, followed by the action to take, and finally the message to return. I knew from dealing with email encoding in the past, that Big5 was a single-byte Chinese character set, and the question marks and equals sign were part of the syntax for specifying the encoding (RFC2047). For example, a subject header containing German characters encoded in Latin-1 might look like:

Subject: =?latin-1?q?Gro=DFbritannien?=

But the application should have been encoding the email headers as UTF-8 - I was setting that encoding explicitly in JavaMail - where had Big5 appeared from, all of a sudden? I looked again at one of the emails that was being rejected by the server. This time I noticed that the email subject began with the word "Big". Surely that couldn't be a coincidence?

Then it dawned on me. The regular expression didn't match what the author thought it did. Because, of course, '?' has a special meaning in regular expressions - it means "zero or one of the preceding character". Instead of matching the sequence of equals and question mark symbols, the expression was matching start-of-line, "Subject:", an optional equals, "big" and an optional "5". In other words, any email subject starting with the word "big" would trigger the rejection message. Nice one.

A little extra escaping with some backslashes solved the problem:

/^Subject: =\?big5\?/    REJECT Chinese encoding not accepted by this server

You can't do big business and make big profits if your mail server is trapping big false-positives!