Problem with sftp PutFile using offsets

0 votes
asked Apr 8 by j355y (130 points)

Hello,

I don't know if I'm doing something wrong, and there's probably something that I don't understand...

With Rebex 5.0.7733 in a .Net Core 3.1 app.

I'm getting a file, byte by byte, from an Azure Blob Storage. Of course, my end goal is not byte by byte, but for testing purposes, I'm doing that.

I then do a PutFile, for each byte, incrementing the offset. I never have any problem with the first byte. The file is written on the SFTP, and contains my first char. As soon as I increment the offset in PutFile, I get an error. If I let the offset always to 0, my file will be overwritten, correctly, by each byte. So, the stream is ok, the length is ok, the problem is with appending to the file I guess.

the code :

 long offset = 0;
                long remainingBytes = 0;
                long length = 1;
                do
                {
                    MemoryStream stream = new MemoryStream();
                    azUtilities.GetFileStreamBlock(_AzureFileName, offset*length, ref stream, length, out remainingBytes);
                    stream.Position = 0;
                    client.PutFile(stream, _SftpFileName, offset*length, stream.Length);                        
                    offset++;
                    stream.Dispose();
                } while (remainingBytes > 0);

The error :

2021-04-08 18:42:49.494 DEBUG Sftp(1)[1] SSH: Authentication successful.
2021-04-08 18:42:49.502 DEBUG Sftp(1)[1] SSH: Opening channel 'session' (initial window size: 131072, max packet size: 129024).
2021-04-08 18:42:49.529 DEBUG Sftp(1)[1] SSH: Requesting subsystem 'sftp'.
2021-04-08 18:42:49.555 DEBUG Sftp(1)[6] SSH: Adjusted remote receive window size: 0 -> 1048576.
2021-04-08 18:42:49.569 INFO Sftp(1)[1] Command: SSH_FXP_INIT (4)
2021-04-08 18:42:49.588 INFO Sftp(1)[1] Response: SSH_FXP_VERSION (4, 1 extension)
2021-04-08 18:42:49.590 INFO Sftp(1)[1] Info: Using SFTP v4 on a Windows-like platform.
2021-04-08 18:42:49.594 INFO Sftp(1)[1] Command: SSH_FXP_REALPATH (1, '.')
2021-04-08 18:42:49.651 INFO Sftp(1)[1] Response: SSH_FXP_NAME (1, '/')
2021-04-08 18:42:49.652 INFO Sftp(1)[1] Info: Home directory is '/'.
2021-04-08 18:42:52.571 INFO Sftp(1)[1] Command: SSH_FXP_OPEN (2, '/c_DRV3546_Plateforme_Integration_Biztalk_A/UNIT/l2c/new-file.txt', 26)
2021-04-08 18:42:52.654 INFO Sftp(1)[1] Response: SSH_FXP_HANDLE (2, 0x255775CC9D22E8A6FC31)
2021-04-08 18:42:52.658 DEBUG Sftp(1)[1] Command: SSH_FXP_WRITE (3, 0x255775CC9D22E8A6FC31, 0, 1 byte)
2021-04-08 18:42:52.672 DEBUG Sftp(1)[1] Response: SSH_FXP_STATUS (3, 0, 'The write completed successfully')
2021-04-08 18:42:52.674 INFO Sftp(1)[1] Command: SSH_FXP_CLOSE (4, 0x255775CC9D22E8A6FC31)
2021-04-08 18:42:52.847 INFO Sftp(1)[1] Response: SSH_FXP_STATUS (4, 0, 'The operation completed')
2021-04-08 18:42:54.353 INFO Sftp(1)[1] Command: SSH_FXP_OPEN (5, '/c_DRV3546_Plateforme_Integration_Biztalk_A/UNIT/l2c/new-file.txt', 2)
2021-04-08 18:42:54.409 INFO Sftp(1)[1] Response: SSH_FXP_HANDLE (5, 0xC1CB976A22B3693F8D22)
2021-04-08 18:42:54.410 DEBUG Sftp(1)[1] Command: SSH_FXP_WRITE (6, 0xC1CB976A22B3693F8D22, 1, 1 byte)
2021-04-08 18:42:54.427 INFO Sftp(1)[1] Response: SSH_FXP_STATUS (6, 4, 'Cannot write to a file not opened for writing!')
2021-04-08 18:42:54.478 ERROR Sftp(1)[1] Info: Rebex.Net.SftpException: Failure; Cannot write to a file not opened for writing!.
   at hljlg.hzynn.jfphx(jbrvv p0, Type p1)
   at Rebex.Net.Sftp.aquhm(tnzlq p0, wwrjk p1, String p2, Stream p3, Int64 p4, Int64 p5, owqet p6)

Do I have to do something to keep the file open for writing??

Or something else?

Thank you,
Jessy

Applies to: Rebex SFTP

1 Answer

+1 vote
answered Apr 9 by Lukas Pokorny (121,830 points)

Hello,

This seems to be caused by a server-side bug (or limitation of its file storage - I'll get back to this later). The following snippet of the log shows that the file has been opened for writing (the number of 2 indicates file open mode of SSHFXFWRITE). Despite this, the server rejects the write request, claiming otherwise:

Command: SSH_FXP_OPEN (5, '/c_DRV3546_Plateforme_Integration_Biztalk_A/UNIT/l2c/new-file.txt', 2)
Response: SSH_FXP_HANDLE (5, 0xC1CB976A22B3693F8D22)
Command: SSH_FXP_WRITE (6, 0xC1CB976A22B3693F8D22, 1, 1 byte)
Response: SSH_FXP_STATUS (6, 4, 'Cannot write to a file not opened for writing!')

Please contact the server operators or vendor to find out more about this.

My guess is that the server simply doesn't support random access writes and expects the whole file to be uploaded sequentially with no seeking (and reports an odd error if the client attempts this).

You might try experimenting a bit with Sftp object's Stream-based API (GetStream(string remotePath, FileMode mode, FileAccess access) method) to find out which operations are actually supported.


However, you might be wondering why the file mode is different (26 vs 2) when a zero remoteOffset is specified in the PutFile call. The reason for this is that PutFile behaves slightly differently in that case - instead of opening the file with SSHFXFWRITE mode only, it opens it in SSHFXFWRITE + SSHFXFCREAT + SSHFXFTRUNC mode, which also causes the file to be truncated if it exists and created if it does not exist. In other words, it behaves like the PutFile(Stream sourceStream, string remotePath) overload.

We understand this might be undesirable in some scenarios, so there is an option that changes this behavior, causing PutFile with zero remoteOffset to not SSHFXFTRUNC mode with zero offset:

client.Settings.DisablePutFileZeroOffsetTruncate = true;

But this does not seem relevant in your scenario.

commented Apr 9 by j355y (130 points)
Thank you for your answer.

The goal is to be able to transfer files from Azure to an SFTP, with the code running in a
Kubernetes Container with limited resources.  So what I wanted to do is get / put
the file MB by MB.

At the company where I work at, we are dealing with tons of external SFTP vendors to do
transactions with them.  So I HAVE to be compliant with all the others.

Any suggestions on how I could achieve this?  Transfer or stream the file,
chunk by chunk? Avoid having to downaload the complete file in memory,
and more importantly, avoiding at all cost to have to temporarize the file on disk
because they may contain personnal/financial/medical data.  And if the file becomes
stored on disk, I will have to comply to a lot of security measures that I'm trying
avoid having to do... :)

Thank you,
Jessy
commented Apr 9 by Lukas Pokorny (121,830 points)
Perhaps the Stream-based API (GetStream or GetUploadStream method) might be the way to go? This makes it possible to open/create a remove file at an SFTP server as a writable stream, and then you might write data chunk-by-chunk to it.

(Alternatively, consider the approach described at https://forum.rebex.net/2892/performance-issue-sftp-direct-stream-access-getuploadstream?show=2893#a2893 if it turns out this is not sufficiently fast.)
commented Apr 9 by j355y (130 points)
Thanks a lot,
using the GetUploadStream, I was able to upload my file, writing in the Stream.

I don't know if you have the answer to this.... but while doing so, what will happen to memory consumption?

It maybe a newbie question... It's been a while since I had to care about memory management that much.  But while writing to the stream, where goes my chunk,
once I cleared the original stream?  Let's say I have a file of 500mb, that I get by chunks of 100mb.... once I write to the stream, and I dispose my "azure stream" (of course, depending on the GC), will my memory consumption still increase?  

using (var sftpStream = client.GetUploadStream(_SftpFileName))
{
    do
    {
        MemoryStream azureStream = new MemoryStream();
        azUtilities.GetFileStreamBlock(_AzureFileName, offset * length, ref azureStream, length, out remainingBytes);
        sftpStream.Write(azureStream.ToArray(), 0, System.Convert.ToInt32(azureStream.Length));
        sftpStream.Flush();
        offset++;
        azureStream.Dispose();
    } while (remainingBytes > 0);
}

Thank you
commented Apr 9 by Lukas Pokorny (121,830 points)
sftpStream.Write method constructs a series of small SFTP packets based on the supplied data and sends them to the server. But each packet is only constructed once the previous has been sent to the server, so the additional memory usage stays very low. The packets are very small (they don't contain a copy of the data, just a reference to your MemoryStream's buffer). When the method returns, the data has already been written to the remote file, and GC is easily be able to take care of both our small packets and your MemoryStream.

To make the whole process even more memory-efficient, you can actually safely reuse the same MemoryStream for all chunks as well, and use GetBuffer() instead of ToArray() to prevent creating a copy of the data:

using (var sftpStream = client.GetUploadStream(_SftpFileName))
{
    MemoryStream azureStream = new MemoryStream();     
    do
    {
        azUtilities.GetFileStreamBlock(_AzureFileName, offset * length, ref azureStream, length, out remainingBytes);
        sftpStream.Write(azureStream.GetBuffer(), 0, System.Convert.ToInt32(azureStream.Length));
        sftpStream.Flush();
        offset++;
        azureStream.SetLength(0);
    } while (remainingBytes > 0);
    azureStream.Dispose();
}
commented Apr 9 by j355y (130 points)
Thanks a lot Lukas!
commented Apr 9 by Lukas Pokorny (121,830 points)
Oops, just now I noticed the "ref" keyword in azUtilities.GetFileStreamBlock method's azureStream argument... Does this mean the method might actually create a new instance of MemoryStream and sets azureStream to that? If not, the "ref" keyword is not actually needed. If it does, then you might as well keep your original code and just use GetBuffer() instead of ToArray() (unless the GetFileStreamBlock creates a new MemoryStream in a way that prevents this method from working).
commented Apr 9 by j355y (130 points)
You are right, not need for the "ref", the stream is not created in the called method....  Thanks again!
...