Bug 40186 - Command-line conversion from HTML produces HTML, not RTF, DOC, etc., if output filter name is not specified explicitely
Summary: Command-line conversion from HTML produces HTML, not RTF, DOC, etc., if outpu...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
3.4.2 release
Hardware: Other macOS (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: RTF Commandline
  Show dependency treegraph
 
Reported: 2011-08-17 19:30 UTC by em36
Modified: 2022-03-30 12:00 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description em36 2011-08-17 19:30:06 UTC
This problem seems to have been introduced in 3.3 or later. Under OS X, when I try to convert an HTML file to another format (RTF, DOC), using the command-line, the output file is an HTML file. This is an example of the command line:

cd '/Applications/' ; LibreOffice.app/Contents/MacOS/soffice.bin --headless --nofirststartwizard --invisible --convert-to rtf --outdir /Users/username/ '/Users/username/testfile.html'

Other input formats work correctly. Is there anything I should be doing differently with the current version to make this work with HTML?
Comment 1 Björn Michaelsen 2011-12-23 12:32:54 UTC Comment hidden (obsolete)
Comment 2 Roman Eisele 2012-05-07 09:21:13 UTC
Compare Bug 46026 - "Command line converter + Conversion issues/inconsistencies between odt->doc, docx, pdf".
Comment 3 Florian Reisinger 2012-08-14 13:57:33 UTC Comment hidden (obsolete)
Comment 4 Florian Reisinger 2012-08-14 13:58:53 UTC Comment hidden (obsolete)
Comment 5 Florian Reisinger 2012-08-14 14:03:26 UTC Comment hidden (obsolete)
Comment 6 Florian Reisinger 2012-08-14 14:05:42 UTC Comment hidden (obsolete)
Comment 7 Roman Eisele 2012-08-16 15:40:39 UTC
Wait a minute -- I can (still) reproduce this bug:

REPRODUCIBLE with
* LibreOffice 3.5.6.2 (Build-ID: e0fbe70-dcba98b-297ab39-994e618-0f858f0)
* LibreOffice 3.6.0.4 (Build ID: 932b512)
both with German langpack installed, both running on MacOS X 10.6.8 (Intel).

Using the command line argument given in the original description, and a simple .html file named "testfile.html" and saved in my user folder, a file "testfile.rtf" is generated, which does not contain RTF data, but HTML data.

Of course, I am no LibreOffice --headless line expert, and can’t tell if there is an error in the command line argument supplied by the original reporter (maybe the command line options of soffice.bin have changed, and therefore the --convert-to argument is no longer honored?). Someone else should tell this.

But nevertheless I can confirm that the command does not work: it is strange (and really a bug) if we produce a file named *.rtf which contains HTML data.
Comment 8 Roman Eisele 2012-08-16 16:19:40 UTC
Well, it DOES work with LibreOffice 3.6 if I specify the filter to use:

  "LibreOffice.app/Contents/MacOS/soffice.bin" --headless
  --nofirststartwizard --invisible --convert-to 'rtf:Rich Text Format'
  --outdir /Users/username/result '/Users/username/testfile.html'

(NB that 'rtf:Rich Text Format' seems necessary; rtf:Rich_Text_Format with underscores, and WITHOUT quotation marks, does not work).

The same is true for other target file formats; e.g.,
  ... --convert-to 'doc:MS Word 97' ...
works, but
  ... --convert-to doc              ...
does not work: the generated file has the extension .doc, but contains still HTML data, so the generated file is invalid.


But I still don’t understand why the short version used by the original reporter:
  --convert-to rtf
does not work; according to

  http://help.libreoffice.org/Common/Starting_the_Software_With_Parameters

which gives the example
  --convert-to pdf
I would expect that it is not necessary to specify the filter explicitely.


Therefore adjusted the Summary: the problem is that the output filter name is required, while the documentation says it is optional.
Comment 9 Roman Eisele 2012-08-16 16:33:33 UTC
@Stephan Bergmann:

Hello Stephan, I could not find out which developer(s) should be informed about this issue; I insert you into the CC list because I remember (but I may be wrong ;-) that you have fixed some other issues with running LibO in headless mode.

Can you please take a short look at this issue and try to tell if this is
(a) a problem in LibreOffice (and then, which developer(s) could be interested
    in fixing it?), or if this is
(b) just a documentation error (if the output filter name is required
    in any case, the documentation is wrong in saying that it is optional)?

Or can you give me a hint who (if not you) could help here?

Thank you very much in advance for any hints!
Comment 10 Stephan Bergmann 2012-08-20 14:05:20 UTC
#libreoffice-dev:

<sberg> btw, any dev having insight into fdo#40186, "--convert-to rtf" not working while "--convert-to 'rtf:Rich Text Format'" does
<kendy> sberg: I'd try vmiklos, but there's public holiday in Hungary today :-(
<sberg> kendy, thanks, will cc him on the bug
<caolan> sberg: I might suspect some change in the filter module in source/config subdir. There's also some weirdness where Text Encoded in the file type list is now "csv,txt" which looks very odd to me
Comment 11 Maxim Monastirsky 2014-11-10 13:58:28 UTC
The reason for this bug seems quite simple. HTML files are opened in Writer/Web by default [1], but the RTF filter is registered with the Writer DocumentService [2], so when searching for a filter for Writer/Web it couldn't be found. And indeed, when changing the DocumentService of the filter entry, or forcing to search a filter for Writer's DocService, it gets the right filter and outputs RTF. So it should be easy to hack this to search also for a Writer filter, when no filter found for Writer/Web. But I wonder whether this still need fixing, givan that since 9df3a83c304f3dd0e0233d234dc6036ab5eefb77 there is an easy workaround (adding --writer to the command). Any thoughts?

[1] http://opengrok.libreoffice.org/xref/core/filter/source/textfilterdetect/filterdetect.cxx#140
[2] http://opengrok.libreoffice.org/xref/core/filter/source/config/fragments/filters/Rich_Text_Format.xcu#29
Comment 12 QA Administrators 2015-12-20 16:07:26 UTC Comment hidden (obsolete)
Comment 13 QA Administrators 2019-05-14 03:00:26 UTC Comment hidden (obsolete)
Comment 14 Dennis Roczek 2020-05-06 17:39:27 UTC
With
Version: 6.4.2.2 (x64)
Build-ID: 4e471d8c02c9c90f512f7f9ead8875b57fcb1ec3
CPU-Threads: 4; BS: Windows 10.0 Build 18363; UI-Render: Standard; VCL: win; 
Gebietsschema: de-DE (de_DE); UI-Sprache: de-DE
Calc: CL

Id do get the following output on Windows:

c:\Program Files\LibreOffice\program>soffice.bin --headless --nofirststartwizard --invisible --convert-to rtf --outdir c:\temp\ "c:\temp\1.html"
Error: no export filter for c:\temp\1.rtf found, aborting.
Error: no export filter

hence, soewhat fixed --> closing

(through not the fine way of doing it)