0 votes
by (120 points)

We've been using Rebex for SFTP connections for 10 years and it works great.
Recently, we've updated our old version (2012) to R6.4, and later to R6.9
Since then, our customers are experiencing some problems, where out of the blue the communication fails, or there are some problems establishing communication.

The error seem to be "The connection was closed by the server. Make sure you are connecting to an SSH or SFTP server"

Our customers use our IDE for programming a PLC that has an SFTP server on it, and in the test case, it is connected to the PC using a USB cable.

I have made some code example:
a Console application in C# that opens new connections:

static void Main(string[] args)
    {
        int success = 0;
        Stopwatch sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < 1000; i++)
        {
            Sftp sftp = new Sftp();
            try
            {
                sftp.Connect("169.254.100.1", 22);
                Console.WriteLine(true);
                success++;
                //sftp.Disconnect();
                //sftp.Dispose();
            }
            catch
            {
                Console.WriteLine(false);
                sftp.Disconnect();
                sftp.Dispose();
            }
            System.Threading.Thread.Sleep(100);
        }

        sw.Stop();

        Console.WriteLine($"Total Time: {sw.ElapsedMilliseconds}");
        Console.WriteLine($"Success: {success}/1000");
        Console.ReadLine();
    }

If I run this example with Rebex 2012, then there is a 95% success

If I run the same example with Rebex 2022 (R6.9), I get 54% success. I see the connect method connects much faster in R6.9 (compared to 2012), and it looks like it's reusing sockets or some resources.

If I uncomment the 2 rows of sftp.Disconnect() and sftp.Dispose() (the dispose has no effect to the results, and Disconnect makes all the difference), then the success goes up to 99% and even 100% with R6.9, and I also notice that Connect() is now slower (it takes more time to connect)

I'm creating new instances of the new sftp client, so there shouldn't be any "leak" between them, however it seems there is some.

Just to be clear. My actual code does disconnect the sftp connection when it done, and there was some place in the code that didn't had it (recently added) and it was opening and closing connections frequently, but even after fixing this issue, I'm still able to reproduce the issue on our IDE (The code cannot be shared since it has millions lines of code).

Using the same IDE with Rebex 2012 dlls produces a stable connection, while with R6.9 there are times where connection to the server fails with the error above, and sometimes in the middle of the transfer the communication fails with that error, and when the code tries to open a new connection (for retries) it fails again.

Can you assist?

10x

Applies to: Rebex SFTP
by (147k points)
This almost looks like some kind of denial-of-service attack protection is getting activated at the server... No one else appears reported anything similar yet. Perhaps the 2012 version was a bit slower during the SSH negotiation phase, and therefore avoided this by not being able to establish so many connections? Does it make any difference if you increase the delay in Thread.Sleep(...) to a second or two?
by (120 points)
I've increased the sleep to 2 seconds, and the 12th connection failed.
Your theory fails since with sleep of 100ms or even 2ms there is a 100% success when I use "Disconnect" on old connections, and that with rebex 2012 (with no sleep) without disconnect there is a 95% success, and the average time for each iteration is 404ms (much less than the 2 seconds).

It's not the server fault in this case.

1 Answer

0 votes
by (147k points)

I see the connect method connects much faster in R6.9 (compared to 2012).

It's most likely using elliptic curve Diffie-Hellman, which is substantially faster than the classic modular Diffie-Hellman and was not supported in the old version of Rebex SFTP.

I'm creating new instances of the new sftp client, so there shouldn't be any "leak" between them, however it seems there is some.

There is a substantial leak and it's caused by your code. By failing to dispose instances of Sftp object, you are leaving them orphaned until .NET's garbage collector identifies and disposes them. However, each of these Sftp objects represents an active SSH session (connected, negotiated, but not yet authenticated). And until the GC disposes these, the underlying SSH session remain active.

Additionally, the operation of the garbage collector is by its nature somewhat unpredictable, which means your code might produce quite different results each run.

The following code demonstrates what might actually be going on, and it does it in a way that does not rely on unpredictable garbage collector behavior - it simply keeps successfully connected Sftp instances in a list:

var list = new List<Sftp>();
for (int i = 0; i < 1000; i++)
{
    Sftp sftp = new Sftp();
    try
    {
        sftp.Connect("193.19.176.97", 22);
        Console.WriteLine(true);
        success++;
        list.Add(sftp); // ensure the object stays alive
        //sftp.Disconnect();
        //sftp.Dispose();
    }
    catch
    {
        Console.WriteLine(false);
        sftp.Disconnect();
        sftp.Dispose();
    }
    System.Threading.Thread.Sleep(100);
}

When running this against an OpenSSH server at 193.19.176.97, I get results quite similar to yours, with first failure occurring at around 13th connection attempt.

And these are the final results after 1000 runs:
Total Time: 138141
Success: 122/1000

This particular OpenSSH server is using the default values of MaxStartups, which specifies the maximum number of concurrent unauthenticated connections and also introduces some randomness (see https://linux.die.net/man/5/sshd_config for details), so the results seem to be entirely consistent with the server-side configuration.

On the other hand, if I connect to 195.144.107.198 instead (with no DDoS protection), I get 100% success:
Total Time: 125610
Success: 1000/1000

Furthermore, when I reconfigure the MaxStartups at 193.19.176.97 to allow 100 concurrent unauthenticated connections, I do actually get 100 successful connections (using the same client code) before the first failure.

Based on these experiments, I would say that server-side DDoS protection is indeed the cause of those "The connection was closed by the server. Make sure you are connecting to an SSH or SFTP server" errors.

by (120 points)
Thank you for your reply. I don't think that the PLC has a DDos protection. However, the code was opening and closing a connection for each file it had to upload to the server, and there were also some other actions that for each of them a connection was opened and closed.
Instead, I changed the code so only 1 connection will be opened at start, and the same connection will be used for all actions, and the connection is now much more stable.
I also added a reconnect retry when transfer fails, and reconnect attempt when connect itself fails, so it should be more tolerant to errors now.
by (147k points)
Yes, reusing a connection is recommended in any case. SSH negotiation is still quite CPU-intensive, even with elliptic curve Diffie-Hellman.
However, the "The connection was closed by the server. Make sure you are connecting to an SSH or SFTP server" error actually occurs before the negotiation takes place, and indicates that the client was able to connect, but did not receive the server identification string. A network protocol analyzer such as Wireshark (https://www.wireshark.org) can be used to determine whether this is indeed the case.
...