Message size from GetMessageInfo doesn't match actual bytes retrieved using GetMessage

0 votes
asked May 24 by mtaber (220 points)

I've found that some of the messages downloaded from the Imap server aren't the same size as what the ImapMessageInfo says they're supposed to be. Initially, I was downloading the message to a MailMessage object and then writing it to disk like this:

var message = _imapClient.GetMailMessage(uniqueId);
message.Save(filePath);

Unfortunately, this parses the data returned into a MailMessage object. When you write the MailMessage to disk, the size is virtually always different from what GetMessageInfo says it should be.

Apparently the MailMessage parser automatically compensates for any oddities it runs into and reorders the resulting MimeMessage when you call message.Save. That actually makes sense, but I didn't see that called out in any of the tutorials, nor was it documented anywhere that I could find.

In an effort to save the messages in a way that the local filesize reflects what the server says, I bypassed using MailMessage as an intermediate object and wrote it directly to disk as follows:

var messageInfo = _imapClient.GetMessageInfo(uniqueId, ImapListFields.Envelope);
var fileLength = _imapClient.GetMessage(message.UniqueId, filePath);
Console.WriteLine($"ImapMessageInfo=[{messageInfo.Length}] bytes. Downloaded Message[{fileLength}] bytes");

This worked for the most part, but I found a message in the mailbox that the Imap server says is 2106 bytes via the ImapMessageInfo. When I download that same message using GetMessage, it comes out as 2137 bytes.

The result of the above code is:

ImapMessageInfo=[2106] bytes. Downloaded Message[2137] bytes

I downloaded this message directly from Gmail using the web client and their copy of the message is saved to disk as 2106 bytes so I'm fairly confident the size returned by GetMessageInfo is correct.

I did a diff on the files and found that while all of the headers look fine, the body of the message directly downloaded has \r as the line terminator for everything after the message headers(which is the last 31 lines of the file). In the one downloaded by Rebex, those same lines are \r\n, which identifies the 32 byte difference.

I added additional logging code to the Imap event handlers to see what was happening and Rebex is definitely retrieving 2137 bytes. Here's the output of that:

Command: R0000A UID FETCH 714995 (UID BODY.PEEK[])
Response: * 189271 FETCH (UID 714995 BODY[] {2137} State[Downloading]: Transferred 0 bytes. Progress:0/2137 bytes
State[Downloading]: Transferred 24 bytes. Progress:24/2137 bytes
State[Downloading]: Transferred 154 bytes. Progress:154/2137 bytes
State[Downloading]: Transferred 412 bytes. Progress:412/2137 bytes
State[Downloading]: Transferred 926 bytes. Progress:926/2137 bytes
State[Downloading]: Transferred 1952 bytes. Progress:1952/2137 bytes
State[Downloading]: Transferred 2137 bytes. Progress:2137/2137 bytes
State[None]: Transferred 0 bytes. Progress:0/0 bytes
Response: ...2137 bytes...
Response: )

My best guess based on this is that these carriage-returns are being converted into CR-LF before they're sent from the server. I got the same result when I used GetMessage to write the data to a MemoryStream instead of a file.

I played around with the _imapClient.Encoding, but that didn't seem to make a difference in the number of bytes being downloaded.

Do you have any ideas for how to get the raw message or is this a known issue without a workaround because it's a problem on Google's side? Any other thoughts on how to keep the local file size the same as what's provided via ImapMessageInfo? Or for all practical purposes, does this not matter at all?

I had planned on using the file size to help figure out whether or not I had a complete and accurate copy of each individual message rather than having downloaded just the headers. But if downloading the entire message still results in a situation where the number of bytes doesn't match, that wouldn't work. Any ideas or help would be appreciated.

Applies to: Rebex Secure Mail

1 Answer

0 votes
answered May 27 by Lukas Matyska (55,470 points)

Your findings are correct. However, please note that Rebex Imap client does not modify SIZE response received from the server, it does not modify raw data received from the server neither. So, the behavior you are experiencing is related to the server.

The RFC 3501 defines SIZE as:

The number of octets in the message, as expressed in [RFC-2822]
format.

But some servers do not respect this. For example outlook.office365.com reports some internal size. It probably stores the messages in proprietary format different from [RFC-2822] and reports size of the stored message (without converting the message into [RFC-2822]).

Gmail seems to do similar. It reports the internal size (size of the stored message received from SMTP). If you download the message using Gmail API, it provides the raw message stored at Gmail. But when you download the message using IMAP protocol, Gmail normalizes line endings to CR-LF, breaking the SIZE reported.

To your use-case:

  • If the GetMessage() method does not throw exception, it means that the message was downloaded successfully, so you don't need to bother with message size.

  • What you mean by "I had a complete and accurate copy of each individual message"?

    • If the IMAP server is working correctly, then it sends whole message when requested by the GetMessage() method.
    • To validate correctness of the message, you should validate message signature (of course, the messages have to be signed in this case).
commented May 28 by mtaber (220 points)
>> What you mean by "I had a complete and accurate copy of each individual message"?
If I iterate over the messages and only call "GetMessageHeaders", I can get everything except the contents and attachments of the message. However I won't know the size of just the headers until I have downloaded them.

In the vast majority of the cases, this size isn't going to match the total size of the complete message. When I compare the size on disk to the size reported by the Imap server, if they don't match, then I know that I've only retrieved the headers and that means I should download the entire message.

However, if the server is changing the size on the fly, then that logic is broken. What would happen is that the server would say 2106 and the disk would say 2137. Because they don't match, the natural assumption would be that I didn't download the complete message. (I understand that I could in theory say that if what I have on disk is larger, then I have the entire message, but I feel like that's a faulty assumption)

So instead, this means that I need to change the basis of my comparison and store additional metadata locally. Otherwise, if I use that discrepancy to determine if I need to download the entire message, I would end up downloading the same message over and over again. Instead, I could store something like:
long expectedSize, actualSizeOnDisk;
bool downloadedCompleteMessage;

What I don't like about that is that there are so many ways that servers & clients seem to be violating the RFC's is that if a message on the server is changed or updated, it might maintain the same uid but the size is now different. How do I know if the message was updated? ie: It was a draft message and it was updated with new content that should be downloaded?

I think the above mechanism will work, and if the expectedSize changes from what the Imap server says, then ignore the actualSizeOnDisk & download it, then update the actualSizeOnDisk. If I need to do a filesystem integrity check against the metadata for each message where we have downloaded the complete message & not just the headers, use the actualSizeOnDisk rather than the expectedSize (which would come from the Imap Server).
commented May 29 by Lukas Matyska (55,470 points)
Please note that messages on true IMAP server cannot be modified.
If you update a Draft message, then its UniqueId is changed, so from client view it is equivalent to situation: old message was removed, new (updated) message was added.

I have verified that Gmail behaves this way (true IMAP server).

You are correct that there are IMAP servers violating RFC. You can encounter an "IMAP" server, which can modify existing messages.

I think, that in your case using downloadedCompleteMessage metadata is the best solution.
If it is possible for you, you can do it by file naming. Example:
 - if you download only message headers, name the file like "Head-XYZ.eml" or "XYZ.head.eml"
 - if you download whole message, name the file like "Msg-XYZ.eml" or "XYZ.msg.eml"
...