Extracting html & embedded image

0 votes
asked Jan 18 by ys (120 points)

Hi,

With this code:

var mail = new Rebex.Mail.MailMessage();
mail.Load(@"c:\temp\some-email-with-html-content.eml");
File.WriteAllText(@"c:\temp\some-email-with-html-content.htm", mail.BodyHtml);

I can extract the html but the embedded base64'd images in the .eml file is not saved as embedded images .htm (it's saved as cid:...)

Is there a way to do this ?

Thanks !

1 Answer

+1 vote
answered Jan 18 by Lukas Matyska (39,480 points)

The embedded images are stored in the MailMessage.Resources collection.

To convert HTML mail to ordinary HTML page, you need to manipulate the HTML mail body. You have to replace cid:ID with appropriate string. You can either extract embedded images to files and use the filename instead of cid:ID or embed the image data into HTML page directly like this:

foreach (var res in mail.Resources)
{
    if (res.ContentId == null || 
        res.MediaType == null || 
        !res.MediaType.StartsWith("image/", StringComparison.OrdinalIgnoreCase))
        continue;

    MemoryStream ms = new MemoryStream();
    using (var content = res.GetContentStream())
    {
        content.CopyTo(ms);
    }
    byte[] data = ms.ToArray();
    string cidString = string.Format("cid:{0}", res.ContentId.Id);
    string dataString = string.Format("data:{0};base64,{1}", res.MediaType, Convert.ToBase64String(data));

    // replace image link (cid:) with image data (data:)
    mail.BodyHtml = mail.BodyHtml.Replace(cidString, dataString);
}

Please note, that this code only shows the way, how to do it. To make it robust you should handle letter case and spaces in "cid:" string.

...