How to detect text message body Charset

+1 vote
asked Jan 26, 2010 by Daniel Spurny (400 points)
edited Jan 26, 2010

i need for storing message body in database know encoding (charset) of message body. I am using s/mime MailMessage object for working with messages. This object has property DefaultCharset but after MailMessage.Load(filename) is this property empty. In saved message i see in begining of mime body part this: Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: quoted-printable How to get charset information for body part?

Applies to: Rebex Secure Mail

1 Answer

0 votes
answered Jan 26, 2010 by Lukas Pokorny (102,130 points)
edited Jan 26, 2010
 
Best answer

To get charset information for message body parts, use the MailMessage objects's AlternateViews collection to access the body parts and get the charset information from their ContentType property:

C#:

Encoding encoding = mail.AlternateViews[i].ContentType.Encoding;

VB.NET:

Dim encoding As Encoding = mail.AlternateViews(i).ContentType.Encoding

However, if you simply use MailMessage object's BodyText and BodyHtml properties to retrieve the body, you get a .NET string in an already-decoded form.

(The MailMessage object's DefaultCharset property is only used when constructing and saving e-mail messages.)

commented Jan 26, 2010 by Daniel Spurny (400 points)
Thank you for quick answer. You say "BodyText and BodyHtml properties are alredy decoded". In which charset are they decoded?
commented Jan 27, 2010 by Lukas Pokorny (102,130 points)
In .NET, all strings (instances of System.String class) use Unicode internally. From the application's point-of-view, this doesn't really matter: - BodyText and BodyHtml are .NET strings - they have been decoded from the mail message body using the appropriate charset. - If your database (and its .NET provider) is capable of storing any System.String value, you don't need to specify or store any charset info. - If you are storing strings as byte arrays instead, you are free to use any charset, as long as it can represent all Unicode characters (so both UTF-8 or UTF-7 are suitable as well).
commented Jan 28, 2010 by Daniel Spurny (400 points)
Thank you this is the knowledge which i need. ;)
...