0 votes
by (120 points)

I have an unusual use case for the SFTP server component. We're evaluating it, but there's one particular blocker that's probably more to do with SFTP than the actual library - I'm hoping there's a solution.

We're trying to use the SFTP service to present database content to our users. The hierarchy of the data is different to how we're hoping to present it; essentially we can extract the data from the internal database on demand. To make this available to our users, we want to present it in a virtual hierarchy, something like:

|- Date (directory)
|-- Sector (directory)
|--- File1.dat (file)
|--- File2.dat (file)
|--- File3.dat (file)

.. and so on. We've been able to create the virtual tree, and navigating to the files works great - so far, so good!

However, extracting the "file" is the issue. The files don't actually exist; retrieving the file triggers an internal extraction process, which can take a few seconds. Again, we've got this process working fine - downloading the "file" via an SFTP client triggers the extraction, which creates a stream to send to the client. However, because the initial file length isn't known, the download inevitably fails. From what we can see, GetLength is fired BEFORE GetContent. Because the content length isn't actually known until we extract it, GetLength will be 0 - so the server sends a ton of bytes but the client is expecting 0.

Is there some way that we can detect when GetLength is being fired due to a file download - that way, we could do the data extraction as part of GetLength (and hold it to ultimately be returned by GetContent)? Alternatively, is there a way to detect when GetLength is being fired as part of a normal directory listing - so we can just return a nominal length value when the client is "browsing" (as finding out the true length is an expensive process), but the correct length when the data is being downloaded?

Failing either of those, is there any event or method that's run before GetLength before the file download starts?

1 Answer

0 votes
by (144k points)

This might be a bit more complex than it appears - SFTP protocol is basically a POSIX-like file system API, and does not actually offer any "download file" or "upload file" operations. Instead, when an SFTP clients intends to download a file, it opens it for reading and receives a handle (using SSH_FXP_OPEN request). Then it issues a series of read requests (SSH_FXP_READ) using that handle until the end of file is reached. Finally, it closes the file handle (using SSH_FXP_CLOSE request).

If the SFTP client just downloads a file this way, GetLength will not be called at all. However, some SFTP clients have chosen to determine the file length prior to performing the download sequence by issuing an SSH_FXP_STAT or equivalent request - this is not really needed (as explained above), but it makes it possible for the client to display transfer progress, for example.

But in your scenario, this presents a problem, because when the client issues SSH_FXP_STAT, it does not imply that a file transfer is about to be started. SSH_FXP_STAT/SSH_FXP_LSTAT requests are used for other purposes as well, for example to determine file attributes or to determine whether a file or directory exists. Unfortunately, this implies that it's not possible to determine whether GetLenth has been called because the client is about to start a transfer, and this is due to SFTP protocol's design.

However, it's worth noting that many SFTP clients won't mind if GetLength returns 0, and proceed with the download anyway. Our Rebex SFTP client does, and I just verified that OpenSSH's sftp client does as well (it issues SSH_FXP_LSTAT and SSH_FXP_STAT before opening the file for reading, which results in GetLength getting called twice, but if it returns zero, OpenSSH's sftp still downloads the whole file - it simply reads the file data until the end of file is reached).

Which SFTP client did you use for the download? Are your users only going to use a specific SFTP client, or can they choose to use whatever client they want? If there is some control over SFTP clients that are to be used, it might be possible to find a workaround that makes those clients work fine. Otherwise, perhaps some caching mechanism could be used?

...