+1 vote
by (210 points)
edited

I'd like to load an EML file and get a .net string representing its raw contents after it's been read with the correct text encoding.

This works for a simple example:

Rebex.Mime.MimeMessage msg = new Rebex.Mime.MimeMessage();
msg.Load(filename);
MemoryStream ms = new MemoryStream();
msg.Save(ms);
Encoding encoding = msg.Charset;
if (encoding == null && msg.Parts.Count > 0)
     encoding = msg.Parts[0].Charset;
string s = encoding.GetString(ms.ToArray());

But will not work for files that have mixed character sets. I also looked at the MimeEntity.ContentString, but that no longer includes the headers and boundaries.

Is there any way to get the entire, parsed message contents as one string?

Applies to: Rebex Secure Mail

1 Answer

0 votes
by (144k points)
edited
 
Best answer

This is a difficult question to answer! :-)

If all parts of the message only use 7bit, quoted-printable and base64 content-transfer-encodings and all the headers are properly encoded (=standard-compliant), you can simply call Encoding.ASCII.GetString(ms.ToArray()) on the MemoryStream.

If one or more message parts use 8bit or binary content-transfer-encodings and a non-ASCII charset, things get very complicated. To get a single .NET string that contains all of these parts (each using a different charset), you would have to decode each part using a different charset. This process would produce a representation of the message contents, but it could no longer be called raw. Saving that string into a file would result in a broken message if you attempted to parse it later (unless you exactly reproduce the original charsets in different message parts).

So this leaves us with two options:

1) Get a .NET string representation of the message contents. This would be somewhat readable, but not necessarily a raw representation, as explained above. Also, when saved to a file, it would not produce a MIME-compliant message identical to the original one.

2) Find all message parts that use 8bit or binary content-transfer-encoding and change it to quoted-printable or base64. Find all improperly-encoded headers and fix them to be MIME-compliant. Then, save the message into a MemoryStream and call Encoding.ASCII.GetString(ms.ToArray()) to get a .NET string representation of the e-mail. The drawback of this is that the string will be the raw representation of the "fixed" message, not the original one.

If you like any of these options, please let me know. We can write some code that demonstrates how to do that.

By the way, why do you need a string representation of a MIME message? An answer to that question might help in finding the best solution.

by (210 points)
edited

Thanks for the detailed answer. I think you're right - it's not reasonable to keep the message around as a simple string and instead we will use your component to store all the parts in the right encodings as a MimeMessage object to ensure everything is kept in place.

...