Although I already did a little work on xlsx file import performance at the beginning of this year's GSOC, Kohei and Markus have now changed my focus from ods import performance to xlsx formula import performance improvements. Kohei made a large xlsx test file that contains one sheet with 5 columns and 20,000 rows of formula cells. This file took a long time for the current master (LibreOffice 3.7) to load. I did some profiling on the test file and saw one area of code that was unnecessarily being called repeatably which was wasting a lot of time during import. So I made some changes and tested the load times. The load time change was quite dramatic.
Please keep in mind these are not rigorously, scientifically performed tests. They are just to give an idea of the improvements we are making. I did these tests on a machine with a 3.2GHz AMD Athon 64 X2 Dual Core Processor 6400+ and 8GB of RAM running 64-bit GNU/Linux. I used LibreOffice 3.5.4 and the latest build of my feature branch which gradually gets merged to master (LibreOffice 3.7).
LibreOffice 3.5.4 took 8 minutes and 30 seconds to load this file. Before this change, LibreOffice 3.7 took 51 seconds to load this file, but after this change, it only took 6 seconds to load the file! That's an 88% reduction on LibreOffice 3.7 from before the change to after the change, and a 99% reduction from LibreOffice 3.5.4!
I am currently using libreoffice 3.5.4 and am finding that xlsx files are taking too long to open. How can I use those changes that you have mentioned in this blog post?
ReplyDeleteHi, Raghu,
ReplyDeleteFor the improvements in this specific post, you'll need to download and build the source code because 3.7 hasn't been released yet. Go to http://www.libreoffice.org/developers-2 to see how to get and build the source. Otherwise, you'll have to wait until 3.7 is released early next year. See http://wiki.documentfoundation.org/ReleasePlan#3.7_release.
There are some XLSX improvements in 3.6 mentioned at http://wiki.documentfoundation.org/ReleaseNotes/3.6#Performance. You can get the latest 3.6 release at http://www.libreoffice.org/download. (If your OS version isn't currently detected, go to http://www.libreoffice.org/download/?nodetectto choose the LibreOffice for your specific OS/distro.)
Thanks Daniel. I will install 3.6 and go from there.
DeleteHi Daniel,
ReplyDeleteI have some really big xlsx files (~250mb +). Will 3.7 handle them correctly? Actually v3.6 doesn't even open them...
Hi, sorry, it took so long for me to reply. I'm just now getting back into LibreOffice development.
DeleteIt's hard for me to say without knowing the actual file. We are always striving to improve, though.
Now LibreOffice 4.0.0 RC2 is available, so you can try your file out on that version if you wish.
Thanks, this is superb!
ReplyDeleteHow is the performance of CSV? A friend complained (with 3.5 I believe) that a 39mb CSV takes 30s with libreoffice, and <10s with excel. These files are pretty common in the field of biology, which is way too attached to MSOffice IMO. Will there be any improvements there?
Hi, sorry, it took so long for me to reply. I'm just now getting back into LibreOffice development.
DeleteIt looks like Kohei is doing some really interesting work in this area (and other areas), and it looks very promising: http://kohei.us/2012/08/08/orcus-integration-into-libreoffice/
That must feel great when you make progress like this, well done, Rob
ReplyDeleteYes, indeed! Thanks!
DeleteI frequently work with xslx files of several thousand rows. Very slow xlsx import of libre calc slows down my work flow considerably.
ReplyDeleteI tried to checkout current trunk from http://cgit.freedesktop.org/libreoffice
but it says for calc there haven't been any changes since 9 months? Could you please point me to an up-to-date git / some nightly builds with the improvements changes you describe here? Thanks a lot for your efforts...
Hi, sorry, it took so long for me to reply. I'm just now getting back into LibreOffice development.
DeleteI think your best bet is to try git://gerrit.libreoffice.org/core and follow the instructions at https://wiki.documentfoundation.org/Development/Native_Build.
Thanks!
https://bugs.freedesktop.org/show_bug.cgi?id=47534
ReplyDelete40 minutes to open xlsx file. 4.0 beta doesn't appear to have changed it.
Hi, sorry, it took so long for me to reply. I'm just now getting back into LibreOffice development.
DeleteIndeed as Markus said in the bug report, it looks like something is wrong with the test file. I tried opening the file with Excel 2007. Excel reported that the file had unreadable content and was able to successfully repair the file. I was able to use LibreOffice 6.2.2 and a recent master (4.1) to successfully open the repaired version of the file.
Ha! Sorry, I meant 3.6.2 not 6.2.2.
DeleteAre the 3.7-after improvements in the 4.0 release?
ReplyDeleteI have a file that takes about 15 minutes to load in version 3.5 and in the Version 4.0.0.2 took 30 seconds. I'm happy, but still the XLS version of the file takes less then 5 seconds...
File data:
1st tab - 36k lines
2nd tab - 13k lines
3rd tab - 17k lines
Columns - 36
I'm glad to hear about the awesome improvement. We'll keep trying to make it even better.
DeleteI've got 32Gb memory and core i7 processor, and can't open 25Mb xlsx file? 4.0.1.2...
ReplyDelete