Bug 140731 - EDITING Capitalize Every Word operation slow with large file and enabled change tracking
Summary: EDITING Capitalize Every Word operation slow with large file and enabled chan...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.1.0.3 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.2.0 target:7.4.0
Keywords: perf
Depends on:
Blocks: Track-Changes
  Show dependency treegraph
 
Reported: 2021-03-01 10:16 UTC by NISZ LibreOffice Team
Modified: 2023-01-23 14:31 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Example file from Writer (30.21 KB, application/vnd.oasis.opendocument.text)
2021-03-01 10:16 UTC, NISZ LibreOffice Team
Details

Note You need to log in before you can comment on or make changes to this bug.
Description NISZ LibreOffice Team 2021-03-01 10:16:36 UTC
Created attachment 170145 [details]
Example file from Writer

Attached file contains about 12 thousand words of Lorem ipsum on 21 pages.
Change tracking is enabled.
When selecting Format – Text – Capitalize Every Word the operation takes about 15 seconds on my machine, and after that rejecting all changes takes about 45 seconds.
Without change tracking enabled it is fast: about 1 second.

Steps to reproduce:
    1. Open attached file
    2. Ctrl-A
    3. Format – Text – Capitalize Every Word
    4. Edit – Track Changes – Reject All

Actual results:
(in step 3) Creating  ~12 thousand tracked changes is rather slow.
With only 2 pages and 1000 words it took only ~1 second, and with 4 pages it started to be noticeably janky.

(in step 4) Rejecting all those changes is about three times slower.

Expected results:
Making redline creation/rejection fast even in such an extreme (or even accidental) use case.
For the short term, not making redlines at all for this operation (above some fixed amount of words) might work as well.

LibreOffice details:
Version: 7.2.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: e60bebd4c5257b0f592d27c74399de1498ac725b
CPU threads: 4; OS: Windows 10.0 Build 18363; UI render: Skia/Raster; VCL: win
Locale: en-US (hu_HU); UI: en-GB
Calc: CL

Additional Information: 

Bibisected using bibisect-win64-7.1 to:
URL: https://cgit.freedesktop.org/libreoffice/core/commit/?id=2d3c77e9b10f20091ef338e262ba7756eb280ce9

Author: László Németh <nemeth@numbertext.org>
Date:   Thu Nov 5 10:17:03 2020 +0100

    tdf#109266 sw change tracking: track transliteration
    
    Format->Text->UPPERCASE, tOGGLE cASE etc. weren't
    supported by change tracking.
Before this the operation was not change tracked.
Comment 1 Commit Notification 2021-03-24 14:08:41 UTC
Balazs Santha committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/e463d239555d3a4dc61797eeb8c638b6442112a3

tdf#140731: sw transliteration: avoid too many redlines

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 2 László Németh 2021-03-24 14:09:30 UTC
Commit description:

tdf#140731: sw transliteration: avoid too many redlines

As a workaround for the performance regression
from commit 2d3c77e9b10f20091ef338e262ba7756eb280ce9
(tdf#109266 sw change tracking: track transliteration),
switch off redlining to avoid ~freezing, if a single
transliteration could result too many (>~500) redlines.

A single transliteration creates n redlines
for n paragraphs of the selected text, except in
the case of transliterating to title case, where it
creates n redlines for n words. It's very easy
to freeze Writer, because Writer's slowing down with
n redlines is described by an O(n²) (quadratic) time
complexity. Eg. in an experiment, title casing
~660 words was 6 sec, but ~3000 words was 85 sec,
regarding to creating 660 vs 3000 redlines.

Note: this is a partial revert of commit 2d3c77e9b10f20091ef338e262ba7756eb280ce9, if the
selection contains more than 500 paragraphs (or in the
case transliterating to title case, ~500 words).
Comment 3 Commit Notification 2021-05-10 08:56:19 UTC
Balazs Santha committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/e9f9e2315ba5a4f10ac0d3a6a6a6cca711d49b6f

sw: test fix of tdf#140731 (freezing with track changes)

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 4 Commit Notification 2022-01-21 07:49:31 UTC
Xisco Fauli committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/b438a4502d2b388012b0744374e86d1ff0543e8d

tdf#140731: sw: move UItest to CppUnittest

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 5 Mike Kaganski 2022-12-01 12:27:00 UTC
I would think that modifying the title case function to prepare a full selection/per-paragraph text, and replacing as a whole, to produce results similar to what UPPERCASE gives, would make the hack unneeded here? Because disabling redlining when user needs it is ... wrong?
Comment 6 László Németh 2022-12-01 13:24:38 UTC
(In reply to Mike Kaganski from comment #5)
> I would think that modifying the title case function to prepare a full
> selection/per-paragraph text, and replacing as a whole, to produce results
> similar to what UPPERCASE gives, would make the hack unneeded here? Because
> disabling redlining when user needs it is ... wrong?

Hi, Mike! Great idea! There is a trade-off within the different implementations, keeping the old tracked changes, portion formatting etc., so it's not an easy task,  and the result could be worse easily. Using the recent word-level track changes it's easy to fix the bad title casing of the title case algorithm by reverting the bad capitalization of some words with a few clicks (see the fix for NatNum12: https://wiki.documentfoundation.org/ReleaseNotes/7.5#Default_.E2.80.9Cspell_out.E2.80.9D_number_and_currency_formats).
So I suggest to combine or extend the recent solution, i.e. apply your fix only at formatting too many paragraphs (if it's possible without losing other formatting or data), when redlining is disabled now.
Comment 7 Mike Kaganski 2022-12-01 13:57:00 UTC
(In reply to László Németh from comment #6)
> Using the recent word-level track changes it's easy to fix the bad title casing
> of the title case algorithm by reverting the bad capitalization of some words
> with a few clicks (see the fix for NatNum12

Heh. I didn't realize that this was a regression - or did I misunderstand? I don't quite see which fix do you refer to specifically - it would be better if you pointed to a commit, rather than to release notes that point to a bug with multiple commit notifications :)

But regarding to "it's easy to fix the bad title casing ... with a few clicks" - that is so wrong!
See bug 152340. It easily overflows the undo stack, so in addition to not being able to "easily fix", one looses ability to undo something they did a few operations ago.

No, please don't. It is plain wrong. When one wants to change something, they just type over, and it all still is one big change, that simply includes some unchanged words. Just as UPPERCASE does. See how it includes parts that are already uppercase, even when they are at the very start/end of the selection.