Bug 68924 - FORMATTING: Words split & lines end too soon - problem with 'hard spaces' in scanned text
Summary: FORMATTING: Words split & lines end too soon - problem with 'hard spaces' in ...
Status: RESOLVED NOTOURBUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.1.1.2 release
Hardware: Other Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: BSA
Keywords:
Depends on:
Blocks:
 
Reported: 2013-09-04 09:44 UTC by John Rose
Modified: 2013-09-05 08:41 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
An .odt file just containing this paragraph. (10.54 KB, application/vnd.oasis.opendocument.text)
2013-09-04 09:44 UTC, John Rose
Details
screen print showing the spaces (129.79 KB, image/png)
2013-09-04 12:10 UTC, Cor Nouws
Details
Example Screenshot of Displayed Document (32.29 KB, image/png)
2013-09-04 12:24 UTC, John Rose
Details

Note You need to log in before you can comment on or make changes to this bug.
Description John Rose 2013-09-04 09:44:52 UTC
Created attachment 85179 [details]
An .odt file just containing this paragraph.

Problem description:
Words split & lines end too soon after deleting unwanted paragraph breaks at the end of every line. 


Steps to reproduce:
I have created a document by scanning pages from various printed documents. As a result, there are paragraph breaks where I do not want them (i.e. at the end of every line). When I delete them, words split & lines end too soon. 

PS I noticed that the style for the scanned text was Pre-formatted. So I changed it to Default using Edit>Replace. However, it made no difference to the word splitting. I've also tried using AutoText to correct it.

Current behavior:
In a paragraph, words split & lines end too soon. In the example below, the word 'hand' is split over 2 lines & 'of' is at the end of a line when there is room for 'AKQJ10' (the first 2 lines displayed below is actually 1 line in Writer):
With a 5-card major suit in a 5-3-3-2 hand, open one of the major suit. Exceptionally, an otherwise suitable h
and with a weak 5-card major (not rebiddable: max 2 out of
AKQJ10) may be opened 1NT. All other balanced hands in the 12-14 HCP range are opened 1NT.

Expected behavior:
With a 5-card major suit in a 5-3-3-2 hand, open one of the major suit. Exceptionally, an otherwise suitable hand with a weak 5-card major (not rebiddable: max 2 out of AKQJ10) may be opened 1NT. All other balanced hands in the 12-14 HCP range are opened 1NT.

      
Operating System: Ubuntu
Version: 4.1.1.2 rc
Comment 1 John Rose 2013-09-04 09:49:13 UTC
I found this bug when using LibreOffice 3.5 (under Ubuntu Precise 64 bit). I've since upgraded to version 4.1.1.2 (using Ubuntu's Launchpad LibreOffice ppa) & the bug is still present. I was not able to specify 3.5 when submitting the bug. So I specified version 4.1.1.2 for both first encountered and current version of LibreOffice.
Comment 2 Cor Nouws 2013-09-04 10:51:08 UTC
Hi John,

thanks for the report & the example file.
The problem that you see is due to 'hard spaces'  (Insert > Formatting mark ...) that are in place between many words. E.g. in between 'AKQJ10)' and  'may' and 'be' etc. 
They are grey for my settings.

So I think the problem comes from the scanning software that does this?

Hope this helps & regards,
Cor

NB When reporting a bug, it's the habbit that you enter the oldest known version with the bug in the version field. Just for future reference of course.
Comment 3 John Rose 2013-09-04 11:03:00 UTC
Cor,

You state that the problem is due to 'hard spaces'. I have put hard spaces (i.e. non-breaking space) using Insert > Formatting mark into the document that I previously attached. When I do so a dot (centralised vertically i.e. not the same as a period) is NOT shown (by View > Nonprinting characters set ticked) at that position. Please try this yourself.

I therefore suggest that this is a bug, admittedly caused by having a paragraph break at the end of every line & then removing them by Edit > Replace etc.

Regards,
John
Comment 4 John Rose 2013-09-04 11:04:22 UTC
Sorry, I forgot to mention that there are no 'hard coded' spaces (i.e. nonprinting spaces) in the previously attached document.
Comment 5 Cor Nouws 2013-09-04 12:10:31 UTC
Created attachment 85193 [details]
screen print showing the spaces

Hi John,

the attachment you made is full with hard spaces. See the screen print pls.
Comment 6 John Rose 2013-09-04 12:24:32 UTC
Created attachment 85195 [details]
Example Screenshot of Displayed Document

Part of Writer Window
Comment 7 Cor Nouws 2013-09-04 14:49:03 UTC
(In reply to comment #6)

there is an setting in Tools > Options > Writer > Formatting aid ...
Comment 8 John Rose 2013-09-04 17:19:44 UTC
I have set that for non-breaking spaces now, though it is partially equivalent to 'Nonprinting Characters' & 'Field Shadings' in View etc. It makes no difference to the display except to display shading i.e. there are no spaces at the end of lines as per your previous attachment.
Comment 9 Urmas 2013-09-05 04:43:11 UTC
Do you have Arial font installed?
Comment 10 John Rose 2013-09-05 06:07:30 UTC
Urmas,

I do have Arial font installed. I use it as my default font i.e. as the font in my default template. This document uses it throughout.
Comment 11 Cor Nouws 2013-09-05 07:24:36 UTC
(In reply to comment #8)
> I have set that for non-breaking spaces now, though it is partially
> equivalent to 'Nonprinting Characters' & 'Field Shadings' in View etc. It
> makes no difference to the display except to display shading i.e. there are
> no spaces at the end of lines as per your previous attachment.

It helps when you add a normal space after e.g. " AKQJ10) "
Comment 12 John Rose 2013-09-05 07:46:12 UTC
In the offending paragraph, I've replaced the non-breaking spaces by standard spaces. The paragraph then did wrapping correctly. I need to do this throughout a document. But I didn't see a way, especially as I didn't see a way to identify a non-breaking space by means of a regular expression. I tried using a regular space in the Find & Replace, but that only found regular spaces. Is there a way od doing this replacement through a document, without laboriously doing it space character by character?

PS I still think there is a bug in the way that non-breaking space characters cause arbitrary line feeds. If they worked correctly, then offending paragraphs would only be on one line rather than multiple lines.
Comment 13 Cor Nouws 2013-09-05 08:22:30 UTC
(In reply to comment #12)
> In the offending paragraph, I've replaced the non-breaking spaces by
> standard spaces. The paragraph then did wrapping correctly. 

Sorry that I did not wrote that immediately. I thought that would be obvious.

> Is there a way od doing this replacement through a document,
> without laboriously doing it space character by character?

No idea by head. Pls try the regular support listed here ;)
  www.libreoffice.org/get-help/
Comment 14 John Rose 2013-09-05 08:41:21 UTC
Cor,

I've just had to key in this comment again as my previous attempt seemed to collide with yours!

It was too obvious how to find (& replace) non-breaking spaces globally in a document. I simply copied a non-breaking space & pasted it into the 'Search for' text box (in the Find & replace popup). Another solution (of using \x00a0) did not work in 4.1.1.2, though it works in 3.5.3.2 (see http://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=23474). 

I think that there should be an enhancement (or correction of an omission, depending on your point of view) in Writer to allow searching for non-breaking space (in the'Search for' text box in the 'Find & Replace' popup) using a Regular Expression: use of \s seems obvious. This expression could also be used to do replacement of something else by a non-breaking space (in the 'Replace with' text box in the 'Find & Replace' popup).

As I previously said, I still think there is a bug in the way that non-breaking space characters cause arbitrary line feeds. If they worked correctly, then offending paragraphs would only be on one line rather than multiple lines.