0 votes
by (140 points)
edited

Actually I have another quick question or better an optimization tip. I wish to open the ZIP file and check all the files inside it. This is how I do it now:

using (ZipArchive zip = new ZipArchive(f.Path, ArchiveOpenMode.Open))
{
       ZipItemCollection jpegFiles = zip.GetItems( 
            "*.jpg", ArchiveTraversalMode.NonRecursive, ArchiveItemTypes.Files);

       // display info about each item
       foreach (ZipItem item in jpegFiles)
       {
             if (item.LastWriteTime >= DatumOd && item.LastWriteTime <= DatumDo)
             {
                    // Do something with files
             }
        }
}

Again if ZIP file hold many files this can take up to few seconds to open and it freezes the whole application. Any other faster way to look at properties of the files inside ZIP file?

Thank you again in advanced.

Applies to: Rebex ZIP
by (140 points)
edited

Done some Stopwatch tests. It actualy takes the longest to just open the ZIP file. GetItems and foreach loop is very fast. This is what I get from log:

13:38:54.358: Before I open ZIP = 0 ms

13:39:02.389: After file is opened = 8029 ms

13:39:02.405: GetItems = 10 ms

13:39:02.639: foreach loop = 234 ms

So the puzzling question is why it takes that long to open a file that contains 2880 images.

by (140 points)
edited

Done even more tests. And all are the same. It takes little less then 10 seconds to open a ZIP file with 2880 jpg files inside.

14:05:11.530: Before I open ZIP = 0 ms

14:05:20.295: After file is opened = 8767 ms

14:05:20.311: After GetItems = 10 ms

14:05:20.530: After foreach loop = 217 ms

And while doing it, it freezes the entire application GUI, even if file is opened inside a BackgroundWorker thread.

I tried to open it like so:

using (ZipArchive zip = new ZipArchive(f.Path, ArchiveOpenMode.Open, ArchiveAccessMode.Read))

But no effect.

by (140 points)
edited

And yet another test. This time I used DotNetZip library. Much, much faster:

14:23:40.405: Before I open ZIP = 0

14:23:40.420: After file is opened = 23

14:23:40.670: After foreach loop = 238

Let me know if I can help to debug this problem and help to improve your library.

2 Answers

0 votes
by (147k points)
edited

I tried reproducing this issue - first, I created a ZIP file that contains 2880 JPEG images (about 64KB each). Then, I tried running the following simple console application:

    public static void Main()
    {
        DateTime DatumOd = DateTime.MinValue;
        DateTime DatumDo = DateTime.MaxValue;

        int t = Environment.TickCount;
        using (ZipArchive zip = new ZipArchive("images.zip", ArchiveOpenMode.Open))
        {
            ZipItemCollection jpegFiles = zip.GetItems("*.jpg",
                ArchiveTraversalMode.NonRecursive, ArchiveItemTypes.Files);

            // display info about each item
            foreach (ZipItem item in jpegFiles)
            {
                if (item.LastWriteTime >= DatumOd && item.LastWriteTime <= DatumDo)
                {
                    // Do something with files
                }
            }
        }
        Console.WriteLine("Duration: {0}ms", Environment.TickCount - t);
    }

It only took 93 miliseconds on a year-old Dell laptop. So the question is - why are your results entirely different? I'm sure we are going to find out. First, please download the ZIP file I used, give it a try and let me know whether it's fast or slow. Thanks!

by (140 points)
edited

Hello Lucas. I get exactly the same slow opening of the file.

07:34:21.748: Before I open ZIP = 0 ms
07:34:26.904: After file is opened = 5156 ms
07:34:26.904: After GetItems = 3 ms
07:34:27.123: After foreach loop = 225 ms

So it takes a good 5 seconds. With using DotNetZip it takes few miliseconds. I am partly blaming the slow HDD and cluttered OS. But still compared to other ZIP library it is working much slower to open the ZIP file. The HDD LED gets lit and the whole application freezes until file is fully opened.

But if I reopen it again it will only take couple of miliseconds:

07:37:39.858: Before I open ZIP = 0
07:37:39.904: After file is opened = 46
07:37:39.920: After GetItems = 3
07:37:40.139: After foreach loop = 223

I am guessing this is due to some sort of caching. Even if I close the application and reopen it it will open the file instantly. But if I try to open some other ZIP file with many images it will slowly open it again. With other ZIP library this has no difference. It opens any ZIP file instantly.

Any other tests you wish me to make? It would interesting to figure out why this happens.

by (18.1k points)
edited

I've probably found a cause of the problem: When reading info about individual ZIP items, Rebex ZIP walks through the whole ZIP file. It seeks to begining of each file within the ZIP archive, reads the info and seeks to the next file.

This should not be necessary - there is a redundant list of all archive items on the very end of the ZIP archive. So only seeking to the end of the file and then to circa 400 KB is needed (instead od seeking through whole 180 MB of the archive).

We'll examine, why RebexZIP does both seeking and we'll send you some solution later.

Thanks for reporting it!

by (18.1k points)
edited

We've made a fix of this issue and published it in a beta build: http://www.rebex.net/getfile/4b856cb940364a6483411c0f037c9270/RebexZip-BetaBuild4947-Trial-Binaries.zip [Note: this download link expires on 2013-08-16]

Please download this fix, give a try and write us, whether it has helped.

by (140 points)
edited

Problem is.......SOLVED! The opening speed is now measured in milliseconds again. I am glad I was able to help you find a solution.

Thank you.

0 votes
by (72.7k points)
edited

The discussed optimization is now included in Rebex ZIP 2013 R2. Thank you for your help.

...