Bug 30732 - Character formatting not retained in entries of TOC, table lists, etc.
Summary: Character formatting not retained in entries of TOC, table lists, etc.
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: high enhancement
Assignee: Tamas Bunth
URL:
Whiteboard: target:4.4.0 unitTestNotes:46
Keywords: difficultyInteresting, easyHack, skillCpp
: 41111 75021 88046 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-10-09 07:47 UTC by RGB
Modified: 2023-10-11 14:13 UTC (History)
23 users (show)

See Also:
Crash report or crash signature:


Attachments
See Comment 3 (11.34 KB, application/vnd.oasis.opendocument.text)
2010-12-14 21:58 UTC, Rainer Bielefeld Retired
Details
Test file with fields that do not keep formating (10.11 KB, application/vnd.oasis.opendocument.text)
2016-02-17 17:06 UTC, RGB
Details

Note You need to log in before you can comment on or make changes to this bug.
Description RGB 2010-10-09 07:47:59 UTC
Originally reported here:
http://www.openoffice.org/issues/show_bug.cgi?id=27377
See original bug report for description.
Comment 1 Thorsten Behrens (allotropia) 2010-11-18 16:25:13 UTC
Cedric, time for a little eval - can this be cast into an (relatively) easy hack (telling from the macro in the OOo issue)?
Comment 2 Cédric Bosdonnat 2010-11-19 01:28:05 UTC
A nice starting point to hack on this would be
http://opengrok.go-oo.org/xref/writer/sw/source/ui/index/toxmgr.cxx#UpdateOrInsertTOX

I think this could be an almost easy hack.
Comment 3 Rainer Bielefeld Retired 2010-12-14 21:56:04 UTC
That's a really important feature (currently for technical sheets I see big problems with automatic TOC), but I doubt that it's an "EasyHack".

There are good reasons to define formatting in styles templates for the tables and indexes, and we should think for each style Item separately whether it should be taken in the TOC lines. This discussion should have to be finished before work on code can start, I belive a draft for a specfication should eb done in the WIKI
Comment 4 Rainer Bielefeld Retired 2010-12-14 21:58:18 UTC
Created attachment 41135 [details]
See Comment 3
Comment 5 RGB 2010-12-15 08:47:23 UTC
(In reply to comment #3)
> That's a really important feature (currently for technical sheets I see big
> problems with automatic TOC), but I doubt that it's an "EasyHack".
> 
> There are good reasons to define formatting in styles templates for the tables
> and indexes, and we should think for each style Item separately whether it
> should be taken in the TOC lines. This discussion should have to be finished
> before work on code can start, I belive a draft for a specfication should eb
> done in the WIKI

Agree. There was a macro for OOo 2.0 that modified the TOC to recreate the manual formatting, but that macro does not work on 3.x and, first of all, it was a nasty hack. A proper solution is much needed.
In fact, the ability to not only accept, but also to _ignore_ some formatting (line breaks to create two line headings comes to my mind) will be very important.
Comment 6 ryan.jendoubi@gmail.com 2011-04-05 07:47:03 UTC
(In reply to comment #3)
> That's a really important feature (currently for technical sheets I see big
> problems with automatic TOC), but I doubt that it's an "EasyHack".
> 
> There are good reasons to define formatting in styles templates for the tables
> and indexes, and we should think for each style Item separately whether it
> should be taken in the TOC lines. This discussion should have to be finished
> before work on code can start, I belive a draft for a specfication should eb
> done in the WIKI

I have added it to the /Development/Easy_hacks page, under "Slightly more interesting hacks" to reflect its difficulty.

I've put forward my thoughts re: individual style items there for discussion (if indeed I understood what you meant by style items; I'm not even sure what's meant by "table lists")
Comment 7 ryan.jendoubi@gmail.com 2011-04-11 02:49:08 UTC
http://opengrok.libreoffice.org/xref/writer/sw/source/core/doc/doctxm.cxx#1248

This seems to be where the text of the heading is pulled from the document and put into the TOX (for TOX_OUTLINELEVEL anyway):

  SwTxtNode* pTxtNd = rOutlNds[ n ]->GetTxtNode();
<snip>
  SwTOXPara * pNew = new SwTOXPara( *pTxtNd, sSwTOXElement::TOX_OUTLINELEVEL );
  InsertSorted( pNew );

I'm still taking baby steps into the code but is the problem that SwTxtNodes lost their formatting? Solution to get a SwCntntNode instead and perform suitable style cleanup manually? Is this anywhere near the right track?
Comment 8 Cédric Bosdonnat 2011-04-14 01:23:27 UTC
(In reply to comment #7)
> http://opengrok.libreoffice.org/xref/writer/sw/source/core/doc/doctxm.cxx#1248
> 
> This seems to be where the text of the heading is pulled from the document and
> put into the TOX (for TOX_OUTLINELEVEL anyway):
> 
>   SwTxtNode* pTxtNd = rOutlNds[ n ]->GetTxtNode();
> <snip>
>   SwTOXPara * pNew = new SwTOXPara( *pTxtNd, sSwTOXElement::TOX_OUTLINELEVEL );
>   InsertSorted( pNew );
> 
> I'm still taking baby steps into the code but is the problem that SwTxtNodes
> lost their formatting? Solution to get a SwCntntNode instead and perform
> suitable style cleanup manually? Is this anywhere near the right track?

You got to the TOX code which is great... but I don't think it's that the nodes lost their formatting. You may want to have a look at this method as it generates the whole TOX entry:

http://opengrok.libreoffice.org/xref/writer/sw/source/core/doc/doctxm.cxx#1599

I had a quick look, but it sounds like the text is simply copied from the source text node to the TOC node... without the formatting attributes.
Comment 9 Bob Harvey 2011-04-26 16:58:00 UTC
I'm not entirely convinced by this.  Would you really want text from a<heading 1> in the TOC in 19 point bold?

To make this universally useful, each selectable style would need to carry two formats - one for use inline, and one for cross-references (e.g. toc, index, cross-references..  The second could be "as in-line but font=12,nobold" sort of thing, or a complete definition.
Comment 10 RGB 2011-04-27 00:06:40 UTC
(In reply to comment #9)
> I'm not entirely convinced by this.  Would you really want text from a<heading
> 1> in the TOC in 19 point bold?
> 
> To make this universally useful, each selectable style would need to carry two
> formats - one for use inline, and one for cross-references (e.g. toc, index,
> cross-references..  The second could be "as in-line but font=12,nobold" sort of
> thing, or a complete definition.

No, I just want to preserve sub and super scripts for example, and possible italics and bold applied on top of the text, i.e. everything that is NOT part of the heading paragraph style.
Comment 11 Björn Michaelsen 2011-07-01 05:54:26 UTC
Comments from the EasyHack page:
Character formatting not retained in entries of TOC, table lists, etc.

Background: All available in the following bug reports:

LibreOffice bug 30732

OpenOffice.org bug 27377

This issue has been around since 2004. it was suggested that discussion should move here to hash out requirements. Please do so here, with commentary.

    Paragraph Styles should not be respected - obvious, really.
    Character Styles should be respected
    Manual formatting should be respected 

This attachment to the OpenOffice.org bug shows that the problem exists in contexts other than indexes / tables of contents. Therefore maybe the link given in this comment on the LO bug isn't the root of the problem after all?

What other complications are there?

Skills: C++
Comment 12 Björn Michaelsen 2011-12-23 11:33:00 UTC Comment hidden (obsolete)
Comment 13 Björn Michaelsen 2011-12-23 12:56:55 UTC Comment hidden (obsolete)
Comment 14 Stephan van den Akker 2012-03-08 08:07:40 UTC
Lots of votes for this bug over at Apache, and for good reason. At the office I get complaints from co-workers about this as well.

In our case we mainly use subscripted symbols in paragraph headers. These subscripts show up unformatted in the TOC.

So, from our point of view, it would be fine to just retain any character formatting in the TOC.
Comment 15 Owen Genat (retired) 2012-04-03 17:26:04 UTC
(In reply to <a href="show_bug.cgi?id=30732#c5">comment #5</a>)
&gt; In fact, the ability to not only accept, but also to _ignore_ some 
&gt; formatting (line breaks to create two line headings comes to my mind) 
&gt; will be very important.

I am not sure I agree with this however it does highlight a difficult aspect to the issue. Any multi-line heading containing Unicode Line Breaking Algorithm properties, such as 

Mandatory Break (BK) http://unicode.org/reports/tr14/#BK
Non-breaking (GL) http://unicode.org/reports/tr14/#GL
Word Joiner (WJ) http://unicode.org/reports/tr14/#WJ

should have those properties respected (mirrored) in the TOC. The difficulty, as I understand it, is that the TOC is simply a form of x-ref and some of these properties (e.g., BK) would not be appropriate in an in-text x-ref.

(In reply to <a href="show_bug.cgi?id=30732#c11">comment #11</a>)
&gt; Manual formatting should be respected

This is essentially the main problem I have tried to illustrate. The aspect of respecting character styles (superscript, italic, etc.) would seem less an issue.
Comment 16 Florian Reisinger 2012-05-18 09:37:32 UTC
Deleted "Easyhack" from summary.
Comment 17 Alex 2012-10-08 12:32:50 UTC
Version 3.6.2.2
See style "Contents 2" -> Modify -> Tabs -> Type Right.
Update Index/Table
Paragraph -> Tabs ->Type Left!
Must be "Right".
Apply a style "Contents 2", Paragraph -> Tabs ->Type Right!
Update Index/Table, alignment again "Left"...
Comment 18 Marcio H Zuchini 2013-06-28 20:23:16 UTC Comment hidden (me-too)
Comment 19 Björn Michaelsen 2013-10-04 18:46:00 UTC Comment hidden (obsolete)
Comment 20 Bob Harvey 2013-11-19 05:55:41 UTC
(In reply to comment #18)

> I think the point is just to keep superscript, subscript, underline, italic
> and bold character formatting. Or am I missing the point?


I think that's nearly true, but I would generalise to keep /all/ immediate formatting applied over and above the original text's intrinisc style.  so perhaps the casting of a single character into a different font for some reason might be included too, or perhaps the modification of text colour for some bizzare reason.
Comment 21 Cédric Bosdonnat 2014-01-20 08:56:59 UTC
Restricted my LibreOffice hacking area
Comment 22 Regina Henschel 2014-02-15 17:28:11 UTC
*** Bug 75021 has been marked as a duplicate of this bug. ***
Comment 23 Tobias Lippert 2014-05-07 22:37:00 UTC
I am not 100% clear on which formats should be retained. I am currently retaining super and sub-script, because this is the most important thing for anyone working in sciences and a major problem for anyone in this field trying to write a text with libre office.

If more complex logic is required, I would suggest to track this in a separate bug.
Comment 24 Daniel Trebbien 2014-05-11 14:43:14 UTC
(In reply to comment #23)
> I am not 100% clear on which formats should be retained. I am currently
> retaining super and sub-script, because this is the most important thing for
> anyone working in sciences and a major problem for anyone in this field
> trying to write a text with libre office.
> 
> If more complex logic is required, I would suggest to track this in a
> separate bug.

Please also retain italics. This is useful when preparing a TOC of legal precedents cited by a brief, where names of cases should be italicized.
Comment 25 Tobias Lippert 2014-05-17 10:25:43 UTC
*** Bug 41111 has been marked as a duplicate of this bug. ***
Comment 26 E. Trece 2014-05-20 13:50:10 UTC
(In reply to comment #23)
> I am not 100% clear on which formats should be retained. I am currently
> retaining super and sub-script, because this is the most important thing for
> anyone working in sciences and a major problem for anyone in this field
> trying to write a text with libre office.
> 
> If more complex logic is required, I would suggest to track this in a
> separate bug.

Yes, as Daniel suggests, italics are important too if your headings contain words in foreign languages and latin phrases (e.g. "Indonesia's policy of /Konfrontasi/"), document titles (e.g. "Sports in /Pravda/ (1960-1965)"), or you just need some kind of emphasis (e.g. "On why we /need/ italics"). And in lists of tables, illustrations, etc., this is surely much more common. I'm sure that must be the case also in the sciences.
Comment 27 Commit Notification 2014-06-06 08:00:30 UTC
Tobias Lippert committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=9088a4c2d18f59c22fceb81829441b704603415d

fdo#30732 Retain selected character attributes for table of contents



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 28 Stephan van den Akker 2014-07-12 07:46:38 UTC
Confirmed fixed for subscript, superscript and italics. 

"selected character attributes" means no bold, underline, font changes, font colours and Unicode Line Breaks, right? 

This will probably not satisfy Owen Genat (see comment 15), but it will do for my use cases (engineering / science). 

Nice work, Tobias! I vote for VERIFIED FIXED.

Tested on
OpenSuSE 12.3 (64-bit)
LOdev version: 4.4.0.0.alpha0+
Build ID: 8b499cea76577b4221fccb17703aa9e86b625e90
Comment 29 Owen Genat (retired) 2014-07-13 10:43:46 UTC
(In reply to comment #28)
> Confirmed fixed for subscript, superscript and italics. 

Confirmed for these forms of direct formatting for Table of Contents entries only. These forms are not yet supported:

- Character styles using these characteristics (e.g., the pre-defined Emphasis style or custom styles using super/subscript) in ToC entries. 
- Illustration Index or Table Index entries (using direct formatting or a character style). 
- Footnote anchors (small superscripted identifier) in Table of Contents, Illustration Index, or Table Index entries.
- Cross-references to any Heading, Caption, Bookmark, or Reference mark that include these forms of direct formatting or character style.
 
> "selected character attributes" means no bold, underline, font changes, font
> colours and Unicode Line Breaks, right? 

This may be a reference to not only a limited sub-set of characteristics, but also those applied only via direct formatting.

> This will probably not satisfy Owen Genat (see comment 15)

:^) That comment was mainly to point out the intricacies involved (as I saw them), particularly in relation to line breaking situations (which are likely out of scope). Italic+superscript+subscript is a good start, but I do feel it would be good if the points listed above as unsupported were included. Both the Apache issue and this bug cite other forms of index. Character styles / cross-references would seem an unfortunate omission. We do now however have a workaround.
 
> Nice work, Tobias! I vote for VERIFIED FIXED.

Well done from me also, although I am more hesitant on calling it fixed. Tested under Crunchbang 11 x86_64 running v4.4.0.0.alpha0+ Build ID: 3fdd4f069d5436cf39708004af7fda8175fbc4c2
Comment 30 Tobias Lippert 2014-07-13 15:17:51 UTC
@Stephan van den Akker

> no bold, underline, font changes, font colours
correct

> and Unicode Line Breaks, right? 

I have not changed the logic for handling whitespaces. The source code has a check for '\n' in it. (ToxWhitespaceStripper.cxx:25)
I have verified that linebreaks with shift+enter do not appear in the Table of Contents.

However, if there is a method which operates on sal_Unicode and detects whitespaces, it should be used here instead. However, I could not find one.
Comment 31 Harry Chapman 2014-10-10 02:27:13 UTC
I'm not sure if the fix has already included this, but I would really love if the TOC could transfer *highlighting* from headings to the index.
Comment 32 Tobias Lippert 2014-10-10 15:36:24 UTC
Hello Harry,

I have explicitly included only a few selected character formats. There might be users who would rather not have the highlights in the table of contents.

Unfortunately, I do not know who decides which formats should be applied in the table. If you find out, and get a positive feedback, I can help to add the functionality. :-)

Tobias
Comment 33 Harry Chapman 2014-10-15 03:14:30 UTC
(In reply to Tobias Lippert from comment #32)
> Hello Harry,
> 
> I have explicitly included only a few selected character formats. There
> might be users who would rather not have the highlights in the table of
> contents.
> 
> Unfortunately, I do not know who decides which formats should be applied in
> the table. If you find out, and get a positive feedback, I can help to add
> the functionality. :-)
> 
> Tobias

Hi Tobias,
Thanks for your response! Yeah I understand that transferring the highlighting wouldn't be ideal for everyone - I thought that as it's the default behaviour on MS Word it wouldn't be too controversial, but it's definitely possible that people might get up in arms about it. I'm not really involved in the LO community so wouldn't know how to gauge opinions on this...

I know it would probably be a lot more work, but I guess the ideal situation would be for each sort of character formatting to be able to be turned on and off in the settings of the TOC index. Or, alternatively, you could have a check box to turn on and off the transfer of *all* formatting. Just a thought ;)
Comment 34 Adolfo Jayme Barrientos 2015-01-05 17:26:03 UTC
*** Bug 88046 has been marked as a duplicate of this bug. ***
Comment 35 Robinson Tryon (qubit) 2015-12-14 06:59:15 UTC Comment hidden (obsolete)
Comment 36 jani 2016-02-17 07:22:35 UTC
A polite ping, are you still working on this ?
Comment 37 Frederic Parrenin 2016-02-17 08:49:23 UTC
Superscripts and subscripts now work in recent versions.
Comment 38 RGB 2016-02-17 15:56:28 UTC
(In reply to Frederic Parrenin from comment #37)
> Superscripts and subscripts now work in recent versions.

Just in TOCs, not in fields. TOC now works wonderfully (thanks!!!), but if you insert a cross reference or use a chapter field for your headers or footers, the problem persists (tested on 5.1). 

Do I need to fill a new issue for fields or can we continue to use this one?
Comment 39 Tobias Lippert 2016-02-17 16:10:55 UTC
Hello - I thought this was only about the TOC, and that this issue was fixed.
@RGB Can you provide a simple test file? (Just to see if we mean the same thing.) I will check if my fix can easily be ported to other fields.
Comment 40 RGB 2016-02-17 17:06:58 UTC
Created attachment 122738 [details]
Test file with fields that do not keep formating

(In reply to Tobias Lippert from comment #39)
> Hello - I thought this was only about the TOC, and that this issue was fixed.
> @RGB Can you provide a simple test file? (Just to see if we mean the same
> thing.) I will check if my fix can easily be ported to other fields.

Sure, here it is. It contains a TOC, a formatted heading and two field, one a cross reference to the heading and other (on the page footer) a Chapter field.
Comment 41 jani 2016-04-12 13:40:40 UTC
Tobias@ are you still working on this bug (otherwise please unassign it) ?
Comment 42 Tobias Lippert 2016-04-12 13:46:38 UTC
Unassigned.
Comment 43 raal 2016-05-05 06:41:34 UTC
(In reply to Tobias Lippert from comment #42)
> Unassigned.

setting status to new, unassigne Tobias.
Comment 44 jani 2016-06-01 08:04:07 UTC
Closing this bug, since it works in TOC. It seems to be open for fields, but there are no code pointer and no mentor.
Comment 45 Francisco Pina Martins 2017-10-02 14:53:15 UTC
I just wanted to add that Small Caps formatting in a heading does not carry to the TOC.
Should I open a new issue for this, or should this one get reopened?
Comment 46 Buovjaga 2023-10-08 15:37:16 UTC
Notes for unit test writers:

Tests for subscript and superscript were added in 1ca2a2119ad3e910f848344d51ba9ec173880715, so the only remaining thing to test is italics.

Revert has to be done manually.