0 votes
by (270 points)
edited

I have two SftpItemCollection objects. I need to compare them (based on file name, size and modified date) and extract the items that exists on both.

Ex: SftpItemCollection 1
SftpItem 1 = abc.txt, 09/21/2011, 100 bytes
SftpItem 2 = xyz.txt, 09/21/2011, 100 bytes

SftpItemCollection 2
SftpItem 1 = zzz.txt, 09/21/2011, 100 bytes
SftpItem 2 = abc.txt, 09/21/2011, 100 bytes

I need SftpItem 1 from SftpItemCollection 1 as the final result.

This is essentially the Enumerable.Intersect functionality in .net.

link:Enumerable.Intersect

What options do I have? Thanks.

Applies to: Rebex SFTP

2 Answers

0 votes
by (72.7k points)
edited
 
Best answer

To use the Linq Intersect functionality you have to implement your own IEqualityComparer. The code can look like this:

private class SftpItemEqualityComparer : IEqualityComparer<SftpItem>
{
    public bool Equals(SftpItem x, SftpItem y)
    {
        return x.Name == y.Name && x.Size == y.Size && x.Modified == y.Modified;
    }

    public int GetHashCode(SftpItem obj)
    {
        return obj.Name.GetHashCode() ^ (int)obj.Size ^ obj.Modified.GetHashCode();
    }
}

// usage in Linq Intersect method
IEnumerable<SftpItem> result = collection1.Intersect(collection2, new SftpItemEqualityComparer());

If you cannot use Linq, implement your own Intersect method like follows:

private SftpItemCollection Intersect(SftpItemCollection collection1, SftpItemCollection collection2)
{
    SftpItemCollection result = new SftpItemCollection();
    foreach (SftpItem item1 in collection1)
    {
        SftpItem item2 = collection2[item1.Name];
        if (item2 != null && item1.Size == item2.Size && item1.Modified == item2.Modified)
            result.Add(item1);
    }
    return result;
}

// usage of the Intersect method
SftpItemCollection result = Intersect(collection1, collection2);
by (270 points)
edited

Thanks, Lukas. With respect to the second solution, I am worried about the performance, since it is a linear (and not a binary) look up. I am not sure about Linq Intersect method in terms of performance. These are definitely good options.

On a side note, it would be nice if Rebex implemented/supported binary look up/search natively in the SftpItemCollection class.

0 votes
by (72.7k points)
edited

You don't have to be worried about the performance. Please note, the SftpItemCollection behaves as an associative array (see the line collection2[item1.Name]), so searching time complexity is O(1) = Constant time (it works with a Dictionary internally).

If the performance has to be taken into consideration I suggest you one optimization of the second solution. The optimal approach of the Intersect problem on unsorted collection runs in O(n) = Linear time complexity. But in real life it matter whether the 'n' is 3 or 3000. The second algorithm iterates over the collection1, but the optimal solution is to iterate over the shortest one.

private SftpItemCollection Intersect(SftpItemCollection collection1, SftpItemCollection collection2)
{
    bool swapCollections = collection1.Count > collection2.Count;

    SftpItemCollection shorter, larger;
    if (swapCollections)
    {
        shorter = collection2;
        larger = collection1;
    }
    else
    {
        shorter = collection1;
        larger = collection2;
    }

    SftpItemCollection result = new SftpItemCollection();
    foreach (SftpItem item1 in shorter)
    {
        SftpItem item2 = larger[item1.Name];
        if (item2 != null && item1.Size == item2.Size && item1.Modified == item2.Modified)
            result.Add(swapCollections ? item2 : item1);
    }
    return result;
}

To make a picture about the Linq Intersect method time complexity, you can add informative printings into the SftpItemEqualityComparer methods. You will see the algorithm used in .NET 4.0 iterates over both collections. So it runs in O(n+m), but our modified solution runs in O(Min(n,m)), where 'n' and 'm' are lengths of collections.

Please note the Linq Intersect method is not evaluated when the method is called, but each time the resulting IEnumerable is used. And also please note that if the source collection is modified after calling the Intersect method and before using the resulting IEnumerable the change is taken into effect, which can be unwanted behaviour.

by (270 points)
edited

Thanks again Lukas. I like the non-linq approach. I didn’t realize the “SftpItemCollection behaves like an associative array” when I initially eplied. This is really helpful. I appreciate it!

...