0 votes
by (190 points)

We use Rebex.FileServer_v7.0.8816 created an SFTP server. It ran well until last week when a memory leak occurred.

Usually, the memory used by the service is 500MB. When uploading files, the memory used quickly rises to 25GB and is not automatically released.

We have generated a core dump file, there are a lot of byte arrays. It seems to be due to some byte arrays not being properly released.

The dumpheap state and gcroot is like below:
Result of: dumpheap -stat

7f6166fe8ec8  19,271        770,840 System.RuntimeType
7f6168c83258  15,997      1,279,760 System.Signature
7f6167097bc8  18,323      1,485,448 System.SByte[]
7f6166feb0f8   3,631      1,495,328 System.Object[]
7f616799b188  21,921      2,279,784 System.Reflection.RuntimeMethodInfo
7f616f858730       2      4,194,352 System.Tuple<System.ArraySegment<System.Byte>, System.Action<System.Exception>>[]
7f61670e13d0     401      9,081,248 System.Char[]
7f616f859918 348,422     11,149,504 vsfpj.hyghk+bkosq
7f616f858188 348,422     13,936,880 System.Tuple<System.ArraySegment<System.Byte>, System.Action<System.Exception>>
7f616709d2e0  59,390     19,074,870 System.String
7f616f857cb8 348,422     22,299,008 System.Action<System.Exception>
7f616f859720 348,443     25,087,896 vsfpj.pudhe+vuuht
55913e109de0 811,274  3,057,897,968 Free
7f616783bdc0 709,006 22,892,701,011 System.Byte[]
Total 3,343,412 objects, 26,079,474,739 bytes

Result of: dumpheap -mt 7f616783bdc0

7f61357c0070     7f616783bdc0         65,573
7f61357d0098     7f616783bdc0             32
7f61357d0218     7f616783bdc0         65,573
7f61357e0240     7f616783bdc0             32
7f61357e03a8     7f616783bdc0         65,573
7f61357f03d0     7f616783bdc0             32
7f61357f0538     7f616783bdc0         65,573
7f6135800560     7f616783bdc0             32
7f61358006c8     7f616783bdc0         65,573
7f61358106f0     7f616783bdc0             32
7f6135810858     7f616783bdc0         65,573
7f6135820880     7f616783bdc0             32
7f61358209e8     7f616783bdc0         65,573
7f6135830a10     7f616783bdc0             32
7f6135830b78     7f616783bdc0         65,573
7f6135840ba0     7f616783bdc0             32
7f6135840d08     7f616783bdc0         65,573
7f6135850d30     7f616783bdc0             32
7f6135850e98     7f616783bdc0         65,573
7f6135860ec0     7f616783bdc0             32
7f6135861028     7f616783bdc0         65,573
7f6135871050     7f616783bdc0             32
7f61358711b8     7f616783bdc0         65,573
7f61358811e0     7f616783bdc0             32
7f6135881348     7f616783bdc0         65,573
7f6135891370     7f616783bdc0             32
7f61358914d8     7f616783bdc0         65,573
7f61358a1500     7f616783bdc0             32
7f61358a1668     7f616783bdc0         65,573
7f61358b1690     7f616783bdc0             32
7f61358b17f8     7f616783bdc0         65,573
7f61358c1820     7f616783bdc0             32
7f61358c1988     7f616783bdc0         65,573
7f61358d19b0     7f616783bdc0             32
7f61358d1b18     7f616783bdc0         65,573
7f61358e1b40     7f616783bdc0             32
7f61358e1ca8     7f616783bdc0         65,573

Result of: gcroot

      -> 7f61305472c0     System.Net.Sockets.Socket
      -> 7f6130547338     System.Net.Sockets.SafeSocketHandle
      -> 7f6130547838     System.Net.Sockets.SocketAsyncContext
      -> 7f61301a7d48     System.Net.Sockets.SocketAsyncEngine
      -> 7f61301a7db0     System.Collections.Concurrent.ConcurrentDictionary<System.IntPtr, System.Net.Sockets.SocketAsyncEngine+SocketAsyncContextWrapper>
      -> 7f6130ec4b58     System.Collections.Concurrent.ConcurrentDictionary<System.IntPtr, System.Net.Sockets.SocketAsyncEngine+SocketAsyncContextWrapper>+Tables
      -> 7f6130ec4890     System.Collections.Concurrent.ConcurrentDictionary<System.IntPtr, System.Net.Sockets.SocketAsyncEngine+SocketAsyncContextWrapper>+Node[]
      -> 7f613001c640     System.Collections.Concurrent.ConcurrentDictionary<System.IntPtr, System.Net.Sockets.SocketAsyncEngine+SocketAsyncContextWrapper>+Node
      -> 7f613001c4a0     System.Net.Sockets.SocketAsyncContext
      -> 7f613001c5c0     System.Net.Sockets.SocketAsyncContext+BufferMemoryReceiveOperation
      -> 7f613001c580     System.Action<System.Int32, System.Byte[], System.Int32, System.Net.Sockets.SocketFlags, System.Net.Sockets.SocketError>
      -> 7f6130ff7c50     System.Net.Sockets.SocketAsyncEventArgs
      -> 7f6130ff7d38     System.EventHandler<System.Net.Sockets.SocketAsyncEventArgs>
      -> 7f6130ff7c20     vsfpj.deniy
      -> 7f5a3a6b55e0     System.Action<System.Net.Sockets.SocketException, System.Int32>
      -> 7f5a3a6b55b8     vsfpj.nrzxq+nijrx
      -> 7f6130ff7a98     vsfpj.ppytl
      -> 7f61313fb5b0     vsfpj.utieg
      -> 7f61313fb618     System.Collections.Generic.Dictionary<System.Int32, vsfpj.pfmkl>
      -> 7f61314c6658     System.Collections.Generic.Dictionary<System.Int32, vsfpj.pfmkl>+Entry[]
      -> 7f61314c66b8     vsfpj.pfmkl
      -> 7f61314b10c8     vsfpj.hyghk
      -> 7f61314b1268     System.Collections.Generic.Queue<System.Tuple<System.ArraySegment<System.Byte>, System.Action<System.Exception>>>
      -> 7f61404cacc8     System.Tuple<System.ArraySegment<System.Byte>, System.Action<System.Exception>>[]
      -> 7f5a9a924c80     System.Tuple<System.ArraySegment<System.Byte>, System.Action<System.Exception>>
      -> 7f5a9a914bc0     System.Byte[]

Is there any idea?
Thank you!

1 Answer

0 votes
by (148k points)

Would it be possible to periodically call something like this (every 5 minutes, for example) to find out whether SFTP sessions are getting stuck at the server?

foreach (var session in fileServer.Sessions)
{
    fileServer.LogWriter.Write(LogLevel.Info, GetType(), 0, "Server", $"Session {session.Id}: IP: {session.ClientAddress} ID: {session.Context} Duration: {(int)session.Duration.TotalMinutes} min LastActivity: {session.LastActivity.ToLocalTime():yyyy-MM-dd HH:mm:ss}");
}

Additionally, there is a setting that limits the maximum idle time for SFTP sessions. Setting it to a non-zero value ensures that idle sessions are not getting stuck:

fileServer.Settings.MaxIdleDuration = 30 * 60; // 30 minutes

This is actually quite important - we have observed SFTP sessions at our test.rebex.net server that got established successfully, but then stopped transmitting any packets (not even FIN or RST, so they were still considered active by the server and caused memory consumption to gradually grow).

by (190 points)
Last month, the SFTP server experienced a memory leak issue again. Even after I restarted the process, the problem persisted, but it disappeared automatically after about a day. This suggests that it may not be a cumulative issue.

I thoroughly analyzed the service’s core dump and decompiled the Rebex.FileServer code. The issue seems to be related to a queue of type Queue<Tuple<ArraySegment<byte>, Action<Exception>>>. For some unknown reason, the number of elements in this queue increases dramatically, causing memory usage to spike rapidly (typically, memory usage increases by 20GB within a minute).

Could this be related to the network environment? What steps should I take next?

I look forward to your response.
by (148k points)
Thanks for bringing this to our attention.

There is only one usage of that Queue<Tuple<ArraySegment<byte>, Action<Exception>>> in Rebex.FileServer, and it represents the SFTP subsystem's outgoing packet queue. This is actually one of the known bottlenecks we are addressing for the next major update.

It is possible for SFTP clients to trigger a memory usage spike (although none of our users reported the symptoms matching the severity of what you describe). Basically, to trigger a memory usage spike, an SFTP client would have to keep sending lots of SFTP 'read data' requests, and stop receiving SFTP responses. This might also occur due to a network issue that prevents server's TCP packets from reaching the SFTP client. But after 180 seconds, which should trigger a timeout that would clean up the excess resources for the respective SFTP session. Technically, the memory spike is not a bug, although an SFTP server could perhaps choose to employ some kind of throttling to mitigate the memory usage.

But if the queue consumes 20 GB of memory within a minute, something very strange is going on. In Rebex.FileServer, individual SFTP read requests are limited to 64 KB of data, so there would have to be more than 300 000 read requests within a minute to result in that kind of memory usage. Unless the SFTP client was attempting (either due to a bug or deliberately, perhaps as a kind of denial-of-service attack) to issue as many requests as it can as fast as it can, this should not happen.

I noticed you just contacted us via e-mail as well, and I'm going to respond to that now as well soon. We are also going to think about what we can do to help you further analyse the issue. In the meantime, it would be helpful if you could describe what kind of actions the SFTP clients connecting to the server are actually supposed to perform. Also, how does the issue manifest from the client side?
by (148k points)
Update: Based on your e-mail, I would say it's now no longer necessary to describe what kind of actions the SFTP clients connecting to the server are actually supposed to perform, although it might still be helpful. My guess is something along the lines of "transferring very large files over a very fast network connection".
...