+1 vote
by (360 points)
edited

We are having an issue reading mails using Rebex. The mail is on a Courier IMAP server, the file looks like this:

Status: RO
From: "John Johnson" <john.johnson@company.com>
Subject: A mail
To: Peter Peterson; jack.jackson@company.com
Date: Mon, 29 Sep 2014 12:00:00 +0000
Message-Id: <ABC@company.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"    
This is a test

Notice that the first contact on the To line has no email address. We are not sure how this happened but we suspect it's a MS Outlook thing. When we parse the mail using Rebex it gives only one contact for To:

  • Address: "PeterPeterson; jack.jackson"@company.com
  • Displayname: (empty string)
  • User: PeterPeterson; jack.jackson

Is this a bug or is it intended behaviour?

Applies to: Rebex Secure Mail
by (58.9k points)
edited

A workaround for parsing such strange headers has been added into version 2014R3 of Rebex Secure Mail.

2 Answers

+1 vote
by (58.9k points)
edited
 
Best answer

Hello,

well, this is a bit unfortunate as semicolon is often used by various email clients (including Outlook) to delimit the recipients email addresses in the GUI when composing message, however when being sent, the email addresses are actually delimited by commas as can be checked by inspecting headers of a received email with two recipients.

So the delimiter which Rebex parser uses to delimit the different email addresses in the To, BCC and CC headers is a comma as defined by Section 3.4 of RFC 2822. I am sorry but we cannot consider semicolon as a valid delimiter here because it would certainly break a lot of other properly formed email headers. Moreover semicolon has already got another special meaning, as it delimits group addresses.

So to sum it up the problem here is the semicolon and if there was a comma instead, everything would be parsed well by Rebex Secure Mail:

To: Peter Peterson, jack.jackson@company.com

However, your example unfolded us a possibility to further improve our parser, which we will look into. The opportunity lies in the fact that a username in username@domain.com almost certainly will not include spaces... So if we take advantage of this fact, we would start treating your header like this:

  • Address: jack.jackson@company.com
  • Displayname: PeterPeterson;
  • User: jack.jackson

We know it is not perfect, as there will be only one resulting MailAddress in the collection but this is the most we can do with the improper header you encountered. Moreover if you consider a slight modification of your original header - just change the name:

To: Jack Jackson; jack.jackson@company.com

Then it can be clearly seen that to differentiate whether to treat the two entities separately or as one is more complicated task which is out of the reach of Rebex Secure Mail component.

by (360 points)
edited

You are correct, changing the semicolon to a comma fixes everything. I guess I'll have to edit the mail files before parsing them.

I'm not sure about the improvement you are suggesting. This would fix the symptoms but not the problem itself, making it less visible and possibly more dangerous.

Another oddity is that the space between Peter and Peterson was lost but the space between ; and jack remained. This is not a problem for us, using a comma also fixed this, I just find it odd.

+1 vote
by (144k points)
edited

I'm afraid the problem itself can only be fixed by fixing the application that actually generated that e-mail in the first place. Using ';' character to delimit recipients in actual message headers is wrong. Although treating it as ',' would parse your header into two recipients, it's easy to come up with other cases where that would be wrong (as described by Tomas at the end of his reply).

This said, there is no correct way to parse invalid e-mail headers. What we are trying to do now is to decide what is least-bad way. At this point, we are still discussing possible options and any suggestions are welcome.

Another option available to you is to bypass our parser and work directly with the raw form of the To header - to access this, just do this:

string rawTo = message.Headers["To"].Raw;

Then you can modify the value and parse it into MailAddressCollection, like this:

MailAddressCollection to = new MailAddressCollection(rawTo.Replace(';', ','));

This way, you will end up with two recipients. (Of course, one might come up with many cases where doing this would produce incorrect results.)

(Note: Although a lot of mail clients such as Outlook actually use ';' in their UI, they do in fact properly emit',' when saving or sending the message).

...