Description: Hello all, Apache Parquet becomes more and more popular in the data community. It comes from Big Data on Hadoop but is now supported by all major editors. The benefits of it being it's fast... however, sometimes, I would like to just open it like a table file. Best regards, Simon Actual Results: You can't connect to an Apache Parquet File Expected Results: You can read an Apache Parquet File Reproducible: Always User Profile Reset: No Additional Info: None
https://qa.blog.documentfoundation.org/2023/11/22/qa-dev-report-october-2023/ > 40. Kohei Yoshida upgraded liborcus and added support for conditional loading of > Apache Parquet files into Calc
Already done for Calc for version 24.2 with b14583ba37a6d7ce398ccd3cf339f954785b03d8 Kohei: anything to comment here? About Base, what about loading the file into Calc and using the Calc file as a data source in Base?
Also, maybe the info could be added to https://wiki.documentfoundation.org/ReleaseNotes/24.2#Orcus-based_filters ?
Hello. First of all, thanks for the fast answer. I tried to open some samples from https://www.tablab.app/datasets/sample/parquet and it doesn't seem to work and Apache Parquet doesn't seem to be present in the file formats that are available. Version: 7.6.4.1 (X86_64) / LibreOffice Community Build ID: e19e193f88cd6c0525a17fb7a176ed8e6a3e2aa1 CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win Locale: fr-FR (fr_FR); UI: fr-FR Calc: threaded Best regards, Simon
(In reply to Simon Aubert from comment #4) > Hello. First of all, thanks for the fast answer. I tried to open some > samples from https://www.tablab.app/datasets/sample/parquet and it doesn't > seem to work and Apache Parquet doesn't seem to be present in the file > formats that are available. > > Version: 7.6.4.1 (X86_64) / LibreOffice Community > Build ID: e19e193f88cd6c0525a17fb7a176ed8e6a3e2aa1 > CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: > win > Locale: fr-FR (fr_FR); UI: fr-FR > Calc: threaded > > Best regards, > > Simon You should try with LibreOffice 24.2. you can download it from https://es.libreoffice.org/descarga/libreoffice/?type=rpm-x86_64&version=24.2.0
(In reply to Buovjaga from comment #2) > Already done for Calc for version 24.2 with > b14583ba37a6d7ce398ccd3cf339f954785b03d8 The commit tells explicitly, that the support depends on "orcus has been built with the parquet import filter enabled". With current master on Windows, I can't open files from the resource mentioned in comment 4.
So I tried with the 24.2 and it's still KO and absent from the supported file formats. Do I miss something? Best regards, Simon
Reproduced in Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community Build ID: 60150ef4b8fc1d0a30f20c3d9ed6ba0725da16a5 CPU threads: 8; OS: Linux 6.1; UI render: default; VCL: gtk3 Locale: es-ES (es_ES.UTF-8); UI: en-US Calc: threaded
Created attachment 191819 [details] sample 1
Created attachment 191820 [details] sample 2
Created attachment 191821 [details] sample 3
(In reply to Xisco Faulí from comment #8) > Reproduced in > > Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community > Build ID: 60150ef4b8fc1d0a30f20c3d9ed6ba0725da16a5 > CPU threads: 8; OS: Linux 6.1; UI render: default; VCL: gtk3 > Locale: es-ES (es_ES.UTF-8); UI: en-US > Calc: threaded So did you build orcus with parquet filter enabled?
Allow me to give you guys some clarification... In the current state on the master branch, the internal orcus is built without the parquet filter support. The change referenced by the commit only introduces all necessary hooks to enable Parquet support when orcus is built with the parquet filter enabled, but that commit itself is not adequate to load parquet files. Now, to enable parquet filter in orcus, you first need to build the Apache Arrow library since that becomes orcus's new dependency. And to build the Apache Arrow library, you need to build the libraries that the Arrow library itself depends on. Depending on how many features of Parquet you want to enable (Parquet can support multiple compression algorithms), you may need to build a few extra libraries or even more. So, even in a minimal configuration, we are talking about 3-4 extra libraries that need to be built before we can turn on the parquet filter support in orcus. Here is the main obstacle. Most of these libraries use CMake as their only build system. So if we want to build all of them as part of the regular TDF build, we first need to find a way to either integrate CMake support into our GNU Make based build system, or somehow have them built outside of our core build system and only reference them (or something). Unfortunately I was not able to come up with a good solution for integrating these libraries, which is the reason why the internal orcus is built without parquet support at the moment... Having said that, if someone wants to experiment with this, the easiest way to enable Parquet support is to build orcus outside of the libreoffice build along with all of its parquet related dependencies, and use --with-system-orcus to treat it as a system-provided orcus library when building libreoffice.
(In reply to Kohei Yoshida from comment #13) > Having said that, if someone wants to experiment with this, the easiest way > to enable Parquet support is to build orcus outside of the libreoffice build > along with all of its parquet related dependencies, and use > --with-system-orcus to treat it as a system-provided orcus library when > building libreoffice. This may be a completely doable strategy for distro builds though, but I'm not sure whether it's desirable to have the distro builds have features the TDF build lacks.