We have a dotnet core application running in a container, using the IMAP client to retrieve a large number of messages from a folder.
The approach is roughly like this:
- create new Imap instance, connect and login
- SelectFolder
- get TotalMessageCount
- call imap.GetMessages(ImapMessageSet.All, ImapListFields.Envelope, messageReceiver)
- in messageReceiver we get Info property, and also call ToMailMessage(). The message details are written to a DataTable which is periodically written to downstream services and cleared.
We have a HMail test server set up with 100k randomly generated test emails.
What we find is that memory usage builds up over time.
Using ANTS memory profiler, we see the following:
based on two snapshots (at approx 10k and 50k emails, which is approx 2min and 8min30s) we see an increase of approx 47MB of strings
based on two snapshots (at approx 10k and 90k emails, which is approx 1min42s and 18min50s) we see an increase of approx 94MB of strings
Most of these strings seem to have email info (not content) e.g.
- 14244 FETCH (UID 14244 RFC822.SIZE 2146 FLAGS () INTERNALDATE \"12-Oct-2017 19:02:45 +0100\" ENVELOPE (\"Thu, 12 Oct 2017 18:02:45 +0100\" \"fishcake luctus vehicula\" ((\"x@y.com\" NIL \"admin\" \"test-server.com\")) ((\"admin@test-server.com\" NIL \"admin\" \"test-server.com\")) ((\"admin@test-server.com\" NIL \"admin\" \"test-server.com\")) ((\"tester@test-server.com\" NIL \"tester\" \"test-server.com\")) NIL NIL NIL \"<30bf5a14b2451c5b65bfdfd053f04a4b@test-server>\") BODY[] {14243})
It appears that these strings are not eligible for GC until after the client has disconnected and cleaned up.
Additionally, the memory profiler instance retention graph seems to show that the Imap client is holding onto these items:
Imap -> _items (List) -> System.Object[] -> System.String
This is a problem for us as we could potentially run out of memory in our container for example if there is an unusually large load or multiple concurrent loads.
So the questions are:
1) are we using the client sensibly? Is there any way we can reduce memory consumption out of the box?
2) should we consider 'batching up' when we have a large number of emails to retrieve? For example check total message count up front and batch up requests into (say) 1k emails and after each batch dispose the Imap client and re-create?
3) something else - ideally we do not want to impose a low maximum limit on number of emails that can be retrieved in any single load.
Thanks in advance,
Nick