SFTP GetList - Duplicate Filename on Server

0 votes
asked Feb 20 by bloomt (150 points)

I'm seeing a discrepancy between the GetList and GetRawList methods. Using GetList I see two files, and GetRawList I see three. I believe this is because two of the files have the same filename on the server.

I've contacted the server owner and told them the duplicate filename behavior isn't very friendly to clients. I'm just curious as to why the GetList method misses the newest of the two same-name files while GetRawList lists everything. If GetList returned everything I probably wouldn't have noticed the server's peculiar behavior.

The server identifies itself as SSH-2.0-WU Transmission SFTP SVR1. I don't know what the actual server software is.

Image of the difference in count:
The missing file is ubpr2.DAT dated Feb 20
Difference in count.

Log of activity inside the DonloadFilesByDate method shown above:
2017-02-20 10:53:13.060 INFO Sftp(1)[9] Command: SSH_FXP_OPENDIR (4, '/out')
2017-02-20 10:53:13.099 INFO Sftp(1)[9] Response: SSH_FXP_HANDLE (4, 0x30)
2017-02-20 10:53:13.100 INFO Sftp(1)[9] Command: SSH_FXP_READDIR (5, 0x30)
2017-02-20 10:53:13.142 INFO Sftp(1)[9] Response: SSH_FXP_NAME (5, 3 items)
2017-02-20 10:53:13.145 INFO Sftp(1)[9] Command: SSH_FXP_READDIR (6, 0x30)
2017-02-20 10:53:13.173 INFO Sftp(1)[9] Response: SSH_FXP_STATUS (6, 1, 'EOF reached for Mailbox [/out].')
2017-02-20 10:53:13.174 INFO Sftp(1)[9] Command: SSH_FXP_CLOSE (7, 0x30)
2017-02-20 10:53:13.205 INFO Sftp(1)[9] Response: SSH_FXP_STATUS (7, 0, 'The operation completed')
2017-02-20 10:53:13.882 INFO Sftp(1)[9] Batch: Calling GetItems(string = '/out', TraversalMode = 'Recursive').
2017-02-20 10:53:13.895 DEBUG Sftp(1)[9] Batch: Executing multi-file operation: Listing, source = '/out', target = '', TransferMethod.Copy, MoveMode.All, LinkProcessingMode.FollowLinks, ActionOnExistingFiles.ThrowException.
2017-02-20 10:53:13.897 DEBUG Sftp(1)[9] Batch: Normalizing source path ('/out').
2017-02-20 10:53:13.898 DEBUG Sftp(1)[9] Batch: Checking source path ('/out').
2017-02-20 10:53:13.900 INFO Sftp(1)[9] Command: SSH_FXP_LSTAT (8, '/out')
2017-02-20 10:53:13.940 INFO Sftp(1)[9] Response: SSH_FXP_ATTRS (8)
2017-02-20 10:53:13.942 DEBUG Sftp(1)[9] Batch: Multi-file operation started.
2017-02-20 10:53:13.952 DEBUG Sftp(1)[9] Batch: Retrieving items of directory ('/out').
2017-02-20 10:53:13.953 INFO Sftp(1)[9] Command: SSH_FXP_OPENDIR (9, '/out')
2017-02-20 10:53:14.006 INFO Sftp(1)[9] Response: SSH_FXP_HANDLE (9, 0x31)
2017-02-20 10:53:14.007 INFO Sftp(1)[9] Command: SSH_FXP_READDIR (10, 0x31)
2017-02-20 10:53:14.048 INFO Sftp(1)[9] Response: SSH_FXP_NAME (10, 3 items)
2017-02-20 10:53:14.048 INFO Sftp(1)[9] Command: SSH_FXP_READDIR (11, 0x31)
2017-02-20 10:53:14.076 INFO Sftp(1)[9] Response: SSH_FXP_STATUS (11, 1, 'EOF reached for Mailbox [/out].')
2017-02-20 10:53:14.076 INFO Sftp(1)[9] Command: SSH_FXP_CLOSE (12, 0x31)
2017-02-20 10:53:14.127 INFO Sftp(1)[9] Response: SSH_FXP_STATUS (12, 0, 'The operation completed')
2017-02-20 10:53:14.131 DEBUG Sftp(1)[9] Batch: Multi-file operation done.
2017-02-20 10:53:14.638 INFO Sftp(1)[9] Command: SSH_FXP_OPENDIR (13, '/out')
2017-02-20 10:53:14.676 INFO Sftp(1)[9] Response: SSH_FXP_HANDLE (13, 0x32)
2017-02-20 10:53:14.676 INFO Sftp(1)[9] Command: SSH_FXP_READDIR (14, 0x32)
2017-02-20 10:53:14.731 INFO Sftp(1)[9] Response: SSH_FXP_NAME (14, 3 items)
2017-02-20 10:53:14.733 INFO Sftp(1)[9] Command: SSH_FXP_READDIR (15, 0x32)
2017-02-20 10:53:14.762 INFO Sftp(1)[9] Response: SSH_FXP_STATUS (15, 1, 'EOF reached for Mailbox [/out].')
2017-02-20 10:53:14.762 INFO Sftp(1)[9] Command: SSH_FXP_CLOSE (16, 0x32)
2017-02-20 10:53:14.791 INFO Sftp(1)[9] Response: SSH_FXP_STATUS (16, 0, 'The operation completed')
Applies to: Rebex SFTP

1 Answer

0 votes
answered Feb 21 by Lukas Pokorny (85,590 points)
edited Mar 22 by Lukas Pokorny
 
Best answer

You are right - when Sftp.GetList encounters duplicate filename, it silently ignores it. The main reason for this is the Dictionary-like nature of SftpItemCollection class - a filename currently acts as a key and can be used as an indexer.

Update: In Rebex SFTP 2017 R2, this behavior can be modified by settings Sftp.Settings.SkipDuplicateItems to false.

Ignoring duplicate filenames is not entirely desirable, but there are not many alternatives:
a) fail when duplicate filename is encountered
b) modify SftpItemColection to allow duplicate filenames

We originally did (a), but some of our clients actually encountered duplicate filenames and complained. Switching to (b) would be problematic as well. It would make our API harder to use (not a good way to address a rather rare server bug). Users don't expect duplicate files and there would be no way to distinguish one from the other when working with them through the SFTP protocol. Instead, we modified the parser to simply ignore duplicate filenames six years ago.

This said, it's true that silently ignoring this server-side issue might be undesirable in many scenarios. In the next release, we will log a warning into the communication log when a duplicate filename is encountered, which will lower the chance that this will get unnoticed. We might even add an option that makes the parser fail in this case (if there is actually demand for that).

Which behavior would you prefer?

commented Feb 21 by bloomt (150 points)
I figured it was a least of all evils design decision. The real problem is of course the server's behavior since I can never fetch both versions of the same filename.

I would prefer fail-stop with an immediate exception.
I'm moving financial transaction files. If the wrong one gets transferred and posted then it's up to human error checking to find the issue. This is important to me because if nobody notices then somebody could get their power turned off for non-payment. I think an exception with a warning about bad server behavior is appropriate. This server quirk is rather exceptional after all! An option flag to ignore the issue could be provided with the caveat that you can't know which file you're getting in advanced. This way complainers need only make a small change to get historic operation, and they own any mishaps that result.
I'm sure it's not so easy for you to make such a decision though. I'll take the logged warning if that's what you can do. In the meantime I'll perform a GetList and GetRawList then compare the results to detect this issue.
commented Mar 7 by Lukas Pokorny (85,590 points)
Thanks for your thoughts. I would prefer fail-stop as well myself, but making this default behavior is problematic as that would affect existing customers.

Anyway, we will add Sftp.Settings.SkipDuplicateItems options to the next release - when set to false, it will raise an exception when duplicate item is detected. It will be set to 'true' by default for now, but I hope that we eventually change that.
commented Mar 7 by bloomt (150 points)
Thanks for the update. I'll be sure to use this option when it's released.
commented Mar 22 by Lukas Pokorny (85,590 points)
We just released Rebex SFTP 2017 R2 that adds Sftp.Settings.SkipDuplicateItems.
commented Mar 22 by bloomt (150 points)
Wow that was fast. I've already integrated the change. Excellent support!
...