Deadlock in "Rebex.FileTransfer" library Version="5.0.7290"

Question

Deadlock in "Rebex.FileTransfer" library Version="5.0.7290"

asked Aug 25, 2021 by oprakash (120 points)

strong textWe are using "Rebex.FileTransfer" library Version="5.0.7290" in our product. We have encountered a deadlock in production due to this library.

Could you please confirm following;

Is there a possibility of Deadloack in Rebex generated threads running stack cited below?
- Since we did not see this deadlock during 2 years of usage; so we would like to know what special conditions may have caused this deadlock now.
- What is your suggestion of mitigating this issue? Also please let us know if this issue is fixed in version later than 5.0.7290.

Please let us know if you need more info.

Observations from memory dump.
0:006> !dlk
Examining SyncBlocks...
Scanning for ReaderWriterLock(Slim) instances...
Scanning for holders of ReaderWriterLock locks...
Scanning for holders of ReaderWriterLockSlim locks...
Examining CriticalSections...
Scanning for threads waiting on SyncBlocks...
Scanning for threads waiting on ReaderWriterLock locks...
Scanning for threads waiting on ReaderWriterLocksSlim locks...
Scanning for threads waiting on CriticalSections...
DEADLOCK DETECTED
CLR thread 0x3c holds the lock on SyncBlock 000001d2d5ec69f0 OBJ:000001d2bc7a67c0[System.Object]
...and is waiting for the lock on SyncBlock 000001d2d5ff2c78 OBJ:000001d2bc7a67f0[System.Object]
CLR thread 0x50 holds the lock on SyncBlock 000001d2d5ff2c78 OBJ:000001d2bc7a67f0[System.Object]
...and is waiting for the lock on SyncBlock 000001d2d5ec69f0 OBJ:000001d2bc7a67c0[System.Object]
CLR Thread 0x3c is waiting at Rebex.Net.SshChannel.bksd(jnrr)(+0x17 IL,+0x42 Native)
CLR Thread 0x50 is waiting at Rebex.Net.SshChannel.bkry()(+0x17 IL,+0x40 Native)

1 deadlock detected.

The threads involved in deadlocks are thread 11 & 13 they are running following call stack.

0:013> !syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner
361 000001d2d5ec69f0 3 1 000001d2d60adc60 958 11 000001d2bc7a67c0 System.Object
362 000001d2d5ff2c78 3 1 000001d2d629e110 2c54 13 000001d2bc7a67f0 System.Object

364 000001d2d5ec6788 1 1 000001d2d629e110 2c54 13 000001d2bc7a21a0 System.Collections.Generic.List`1[[System.Threading.ManualResetEvent, System.Private.CoreLib]]

Total 427
CCW 1
RCW 1
ComClassFactory 0
Free 377

0:011> !clrstack
OS Thread Id: 0x958 (11)
Child SP IP Call Site
000000f076bfe898 00007ff985f667c4 [GCFrame: 000000f076bfe898]
000000f076bfea38 00007ff985f667c4 [GCFrame: 000000f076bfea38]
000000f076bfea88 00007ff985f667c4 [HelperMethodFrame1OBJ: 000000f076bfea88] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef)
000000f076bfebe0 00007ff9078c2b32 Rebex.Net.SshChannel.bksd(jnrr)
000000f076bfec40 00007ff9078c4991 Rebex.Net.SshChannel.bkso(Byte[], Int32, Int32)
000000f076bfecd0 00007ff9078c46cb Rebex.Net.SshChannel.Send(Byte[], Int32, Int32)
000000f076bfed10 00007ff9078c445e gldk.iier(jnrr, glcu)
000000f076bfed90 00007ff9078c429c gldk.iidq(Rebex.Net.SftpServerType, glcu)
000000f076bfee10 00007ff9078c071d Rebex.Net.Sftp.qjjf(glcu)
000000f076bfee70 00007ff9078bc0d3 Rebex.Net.Sftp.qjjd(System.String, System.String, Rebex.Net.SshPrivateKey, glcu)
000000f076bfeef0 00007ff9078bbcb4 Rebex.Net.Sftp.Login(System.String, System.String)
000000f076bfef30 00007ff9078bbb63 Rebex.Net.FileTransferClient.Login(System.String, System.String)
000000f076bfef70 00007ff9076fbd64 Airwatch.PushRelayServer.RebexImpl.PushRelayServerConnectionRebexImpl.Connect(Int32)
000000f076bfefd0 00007ff9076fb871 AirWatch.PullService.Business.PushRelayServerBusiness+cDisplayClass261.b0()
000000f076bff020 00007ff949b2d409 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [E:\A_work\694\s\src\mscorlib\shared\System\Threading\ExecutionContext.cs @ 167]
000000f076bff0a0 00007ff949be13e7 System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef) [E:\A_work\694\s\src\mscorlib\src\System\Threading\Tasks\Task.cs @ 2440]
000000f076bff140 00007ff949c1a9ec System.Threading.ThreadPoolWorkQueue.Dispatch() [E:\A_work\694\s\src\mscorlib\src\System\Threading\ThreadPool.cs @ 588]
000000f076bff5c0 00007ff966a93053 [DebuggerU2MCatchHandlerFrame: 000000f076bff5c0]
0:011> ~13s
ntdll!NtWaitForMultipleObjects+0x14:
00007ff9`85f667c4 c3 ret
0:013> !clrstack
OS Thread Id: 0x2c54 (13)
Child SP IP Call Site
000000f08d9fe428 00007ff985f667c4 [GCFrame: 000000f08d9fe428]
000000f08d9fe5c8 00007ff985f667c4 [GCFrame: 000000f08d9fe5c8]
000000f08d9fe618 00007ff985f667c4 [HelperMethodFrame1OBJ: 000000f08d9fe618] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef)
000000f08d9fe770 00007ff9078fe1d0 Rebex.Net.SshChannel.bkry()
000000f08d9fe7d0 00007ff9078c1c87 Rebex.Net.SshChannel.bkrt(Byte[], Int32, Int32)
000000f08d9fe860 00007ff90772ac30 Rebex.Net.SshSession.exvy(Byte[], Int32, Int32)
000000f08d9fe920 00007ff907725f0f Rebex.Net.SshSession.exvx()
000000f08d9fea10 00007ff907726914 Rebex.Net.SshSession.xaxv.exwv()
000000f08d9fea80 00007ff90772658b xaxw+ndag.bnom()
000000f08d9feab0 00007ff907726335 xaxw+ndag.bnon()
000000f08d9feaf0 00007ff96f627080 System.Threading.Thread.ThreadMainThreadStart() [E:\A_work\180\s\corefx\src\System.Threading.Thread\src\System\Threading\Thread.cs @ 93]
000000f08d9feb20 00007ff949b2d409 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [E:\A_work\694\s\src\mscorlib\shared\System\Threading\ExecutionContext.cs @ 167]
000000f08d9fed78 00007ff966a93053 [GCFrame: 000000f08d9fed78]
000000f08d9ff000 00007ff966a93053 [DebuggerU2MCatchHandlerFrame: 000000f08d9ff000]

Applies to: File Transfer Pack, Rebex FTP/SSL

1 Answer

Lukas Pokorny · Answer 1 · 2021-08-25T11:26:55+0000

commented Aug 25, 2021 by oprakash (120 points)
edited Aug 25, 2021 by oprakash

commented Aug 25, 2021 by Lukas Pokorny (150k points)

Based on further investigation, we can now confirm that you are right - the deadlock you reported today is indeed at a different location than the previous one (which we fixed in v5.0.7320). Fortunately, the original name of SshChannel.bkry() is SshChannel.SendEofInternal(), and this is the location of the second deadlock, the one we we fixed in v5.0.7357.

The rarity of the deadlock's occurrence is relative. I'm sure there are scenarios where it is very common, and scenarios where it's very rare. We actually discovered this issue ourselves because it caused some of our automated tests to fail occasionally after migrating to a new hardware.

It only manifests when very specific conditions were met, and not consistently, which is common with these kinds of issues - the exact timing of code execution on different threads is not at all deterministic, and the same applies for network communication and SSH's TCP packets. All it takes for a hidden deadlock to occur is for a TCP socket send to take a little bit more or a little bit less time than usual.

Anyway, this comes from the issues description: When the thread processing SshChannel.OnPacketReceive(...) calls SendEofInternal() upon receiving SSH_MSG_CHANNEL_EOF, we already have acquired a lock _sendSyncPacket, and we attempt to acquire lock for _syncSend as well within SendEofInternal(). However, if at that point another thread is waiting inside SshChannel.SendInternal(...) for the remote side to increase its receive window size, a deadlock occurs, because SendInternal keeps a lock for _syncSend while waiting. OnPacket waits for SendInternal to free the _syncSend lock, while SendInternal waits for OnPacketReceive to process the remote side's packet informing about the receive window size adjustment.

So upgrading to v5.0.7357 (https://www.rebex.net/file-transfer-pack/history.aspx#2020R1) or later would indeed resolve this issue. There is currently no other mitigation for it. We can prepare a custom hotfix based on v5.0.7290 if you prefer not to upgrade to v5.0.7357 (or later) yet.

commented Aug 25, 2021 by oprakash (120 points)

commented Aug 25, 2021 by anonymous

commented Aug 26, 2021 by Lukas Pokorny (150k points)

commented Aug 26, 2021 by oprakash (120 points)

commented Aug 26, 2021 by Lukas Pokorny (150k points)

Deadlock in "Rebex.FileTransfer" library Version="5.0.7290"

364 000001d2d5ec6788 1 1 000001d2d629e110 2c54 13 000001d2bc7a21a0 System.Collections.Generic.List`1[[System.Threading.ManualResetEvent, System.Private.CoreLib]]

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Categories

Deadlock in "Rebex.FileTransfer" library Version="5.0.7290"

364 000001d2d5ec6788 1 1 000001d2d629e110 2c54 13 000001d2bc7a21a0 System.Collections.Generic.List`1[[System.Threading.ManualResetEvent, System.Private.CoreLib]]

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Categories