Tuesday, July 31, 2012

Improved XLSX Load Time

Although I already did a little work on xlsx file import performance at the beginning of this year's GSOC, Kohei and Markus have now changed my focus from ods import performance to xlsx formula import performance improvements.  Kohei made a large xlsx test file that contains one sheet with 5 columns and 20,000 rows of formula cells.  This file took a long time for the current master (LibreOffice 3.7) to load.  I did some profiling on the test file and saw one area of code that was unnecessarily being called repeatably which was wasting a lot of time during import.  So I made some changes and tested the load times.  The load time change was quite dramatic.

Please keep in mind these are not rigorously, scientifically performed tests.  They are just to give an idea of the improvements we are making.  I did these tests on a machine with a 3.2GHz AMD Athon 64 X2 Dual Core Processor 6400+ and 8GB of RAM running 64-bit GNU/Linux.  I used LibreOffice 3.5.4 and the latest build of my feature branch which gradually gets merged to master (LibreOffice 3.7).


LibreOffice 3.5.4 took 8 minutes and 30 seconds to load this file.  Before this change, LibreOffice 3.7  took 51 seconds to load this file, but after this change, it only took 6 seconds to load the file!  That's an 88% reduction on LibreOffice 3.7 from before the change to after the change, and a 99% reduction from LibreOffice 3.5.4!

17 comments:

  1. I am currently using libreoffice 3.5.4 and am finding that xlsx files are taking too long to open. How can I use those changes that you have mentioned in this blog post?

    ReplyDelete
  2. Hi, Raghu,

    For the improvements in this specific post, you'll need to download and build the source code because 3.7 hasn't been released yet. Go to http://www.libreoffice.org/developers-2 to see how to get and build the source. Otherwise, you'll have to wait until 3.7 is released early next year. See http://wiki.documentfoundation.org/ReleasePlan#3.7_release.

    There are some XLSX improvements in 3.6 mentioned at http://wiki.documentfoundation.org/ReleaseNotes/3.6#Performance. You can get the latest 3.6 release at http://www.libreoffice.org/download. (If your OS version isn't currently detected, go to http://www.libreoffice.org/download/?nodetectto choose the LibreOffice for your specific OS/distro.)

    ReplyDelete
    Replies
    1. Thanks Daniel. I will install 3.6 and go from there.

      Delete
  3. Hi Daniel,

    I have some really big xlsx files (~250mb +). Will 3.7 handle them correctly? Actually v3.6 doesn't even open them...

    ReplyDelete
    Replies
    1. Hi, sorry, it took so long for me to reply. I'm just now getting back into LibreOffice development.

      It's hard for me to say without knowing the actual file. We are always striving to improve, though.

      Now LibreOffice 4.0.0 RC2 is available, so you can try your file out on that version if you wish.

      Delete
  4. Thanks, this is superb!

    How is the performance of CSV? A friend complained (with 3.5 I believe) that a 39mb CSV takes 30s with libreoffice, and <10s with excel. These files are pretty common in the field of biology, which is way too attached to MSOffice IMO. Will there be any improvements there?

    ReplyDelete
    Replies
    1. Hi, sorry, it took so long for me to reply. I'm just now getting back into LibreOffice development.

      It looks like Kohei is doing some really interesting work in this area (and other areas), and it looks very promising: http://kohei.us/2012/08/08/orcus-integration-into-libreoffice/

      Delete
  5. That must feel great when you make progress like this, well done, Rob

    ReplyDelete
  6. I frequently work with xslx files of several thousand rows. Very slow xlsx import of libre calc slows down my work flow considerably.
    I tried to checkout current trunk from http://cgit.freedesktop.org/libreoffice
    but it says for calc there haven't been any changes since 9 months? Could you please point me to an up-to-date git / some nightly builds with the improvements changes you describe here? Thanks a lot for your efforts...

    ReplyDelete
    Replies
    1. Hi, sorry, it took so long for me to reply. I'm just now getting back into LibreOffice development.

      I think your best bet is to try git://gerrit.libreoffice.org/core and follow the instructions at https://wiki.documentfoundation.org/Development/Native_Build.

      Thanks!

      Delete
  7. https://bugs.freedesktop.org/show_bug.cgi?id=47534

    40 minutes to open xlsx file. 4.0 beta doesn't appear to have changed it.

    ReplyDelete
    Replies
    1. Hi, sorry, it took so long for me to reply. I'm just now getting back into LibreOffice development.

      Indeed as Markus said in the bug report, it looks like something is wrong with the test file. I tried opening the file with Excel 2007. Excel reported that the file had unreadable content and was able to successfully repair the file. I was able to use LibreOffice 6.2.2 and a recent master (4.1) to successfully open the repaired version of the file.

      Delete
    2. Ha! Sorry, I meant 3.6.2 not 6.2.2.

      Delete
  8. Are the 3.7-after improvements in the 4.0 release?

    I have a file that takes about 15 minutes to load in version 3.5 and in the Version 4.0.0.2 took 30 seconds. I'm happy, but still the XLS version of the file takes less then 5 seconds...

    File data:
    1st tab - 36k lines
    2nd tab - 13k lines
    3rd tab - 17k lines
    Columns - 36

    ReplyDelete
    Replies
    1. I'm glad to hear about the awesome improvement. We'll keep trying to make it even better.

      Delete
  9. I've got 32Gb memory and core i7 processor, and can't open 25Mb xlsx file? 4.0.1.2...

    ReplyDelete