Deadlock in Rebex.Net.SshSession class

0 votes
asked Jan 4 by sharmamayank_vmware (220 points)
edited Jan 4 by sharmamayank_vmware

strong textRebex Product Version : 4.0.6930
.Net Framework : Core 2.0 application on Windows
CLR Version: 4.6.27414.6

We are experiencing slow uploads and delayed timeouts.

On memory dump analysis we have detected deadlock which may be attributed to Rebex initiated threads.

We need your help to identify the root cause and suggest workarounds that could contain this situation for our customers.

Also, we have the following specific questions for you.

  1. Given the call stacks of deadlocked threads; Is this deadlock limited to Connect / Login overloads only?

  2. What other methods on FileTranferClient class may be affected?

  3. Will this affect both Sync and Async overloads on FileTransferClient?

Callstack of threads causing deadlock

First Thread

OS Thread Id: 0x334 (9)
Child SP IP Call Site
000001fb1457df98 00007ffbd9876594 [GCFrame: 000001fb1457df98]
000001fb1457e138 00007ffbd9876594 [GCFrame: 000001fb1457e138]
000001fb1457e188 00007ffbd9876594 [HelperMethodFrame_1OBJ: 000001fb1457e188] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef)
000001fb1457e2e0 00007ffb4374c003 Rebex.Net.SshSession.cgsk()
000001fb1457e350 00007ffb4374b108 Rebex.Net.SshSession.cgss(Byte[])
000001fb1457e4b0 00007ffb4391047a Rebex.Net.SshSession+kaij.eyci()
000001fb1457e500 00007ffba22ed409 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
000001fb1457e580 00007ffba23a13e7 System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)
000001fb1457e620 00007ffba23da9ec System.Threading.ThreadPoolWorkQueue.Dispatch()
000001fb1457eaa0 00007ffba2a70473 [DebuggerU2MCatchHandlerFrame: 000001fb1457eaa0]

Second Thread

OS Thread Id: 0x1e3c (10)
Child SP IP Call Site
000001fb15a7bcd8 00007ffbd9876594 [GCFrame: 000001fb15a7bcd8]
000001fb15a7be78 00007ffbd9876594 [GCFrame: 000001fb15a7be78]
000001fb15a7bec8 00007ffbd9876594 [HelperMethodFrame_1OBJ: 000001fb15a7bec8] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef)
000001fb15a7c020 00007ffb43758d11 Rebex.Net.SshSession.cgsb(xmbf, Boolean)
000001fb15a7e760 00007ffba2a3be59 [FaultingExceptionFrame: 000001fb15a7e760]
000001fb15a7ec60 00007ffb43758a8f Rebex.Net.SshSession.cgsb(xmbf, Boolean)
000001fb15a7ed30 00007ffb438dabf3 Rebex.Net.SshSession.cgsf(System.Exception)
000001fb15a7eda0 00007ffb4374ecc3 Rebex.Net.SshSession.cgso(Byte[], Int32, Int32)
000001fb15a7ee40 00007ffb4374d851 Rebex.Net.SshSession.cgsn(Byte[], Int32, Int32)
000001fb15a7ef00 00007ffb4374c4cf Rebex.Net.SshSession.cgsm()
000001fb15a7eff0 00007ffb4375968c Rebex.Net.SshSession.swqo.rrgw()
000001fb15a7f060 00007ffb43759401 swqp+mlsc.qghs()
000001fb15a7f090 00007ffb4374f90e swqp+mlsc.qght()
000001fb15a7f0d0 00007ffb4374f7d6 xlpi.viie()
000001fb15a7f120 00007ffba22ed409 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
000001fb15a7f1a0 00007ffba23a13e7 System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)
000001fb15a7f240 00007ffba22ed409 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
000001fb15a7f4a8 00007ffba2a70473 [GCFrame: 000001fb15a7f4a8]
000001fb15a7f720 00007ffba2a70473 [DebuggerU2MCatchHandlerFrame: 000001fb15a7f720]

1 Answer

0 votes
answered Jan 6 by Lukas Pokorny (107,350 points)

Thanks for letting us know about this. It indeed looks like there is a potential deadlock that gets triggered in some scenarios.

1) According to your call stacks, the deadlock is not occuring within Connect / Login. It seems to be occuring later during an SSH renegotiation.

2) Renegotiation can occur at any time (it can be initiated both by the client or the server), so this could potentially affect all methods that trigger some SFTP communication.

3) Yes.

We believe we have identified and fixed the cause of the deadlock. Please check your e-mail for a hotfix download link. However, we have not yet managed to actually reproduce the actual deadlock, so please let us know if it persists.

commented Jan 7 by sharmamayank_vmware (220 points)
edited Jan 7 by sharmamayank_vmware
Thank you for your response, and quick turn around on the hotfix.

Unfortunately, we cannot use this hotfix, as this doesn't have a NullReferenceException fix. As suggested by this forum, we have upgraded the Rebex version to 5.0.7290 to avail fix for NullReferenceException. So we now need a hotfix (if applicable) on top of the 5.0.7290 version.

1.  Our CI process doesn't play well with referred binaries. Could we get a NuGet package, instead?

2. We would like to know when these fixes will be included in the next stable release.?
Any dates? We are keen on using stable releases.

3. Could you give us some idea on the probable impact of this fix, we have concerns on performance impact?
commented Jan 7 by Lukas Pokorny (107,350 points)
The provided hotfix build is based on 5.0.7290 and includes a hotfix for the deadlock in addition to that.

1. I have sent a link to NuGet package to your e-mail.

2. We are unable to provide any dates at this time. We will only publish the new version when we are confident that it indeed fixes the deadlock. We have not even been able to reproduce a scenario leading to the deadlock yet, so any date any this point would be just a guess. However, please note that the hotfix release is identical to 5.0.7290.0 (except the deadlock fix), so it might actually be more stable than 5.0.7290.0.

3. The fix will not negatively affect performance at all.
commented Jan 7 by sharmamayank_vmware (220 points)
Thanks!!

Please add me in the mailer list so that we get to know when new version is released.
commented Jan 7 by Frantisek Bosek (1,500 points)
Hi Mayank, I have added you to our mailing list for Rebex Newsletter (new versions announcement). https://us5.campaign-archive.com/home/?u=6e0e34c882602d1cf12bc584b&id=e7c9ad3959
commented Jan 16 by Lukas Pokorny (107,350 points)
We have published the new version: https://www.rebex.net/file-transfer-pack/history.aspx#2019R4.2
A newsletter will follow in a day or two.
...