Bug 83836 - FILESAVE: Corrupted content.xml after saving a ODS spreadsheet
Summary: FILESAVE: Corrupted content.xml after saving a ODS spreadsheet
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.2.6.2 release
Hardware: Other Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: BSA
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-14 08:19 UTC by Jan Rathmann
Modified: 2017-04-17 15:25 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
Defect spreadsheet created by calc (50.84 KB, application/vnd.oasis.opendocument.spreadsheet)
2014-09-14 08:19 UTC, Jan Rathmann
Details
Spreadsheet with manually fixed content.xml (50.78 KB, application/vnd.oasis.opendocument.spreadsheet)
2014-09-14 08:20 UTC, Jan Rathmann
Details
Screenshot of output of wdiff command highlighting the changes of content.xml (432.74 KB, image/png)
2014-09-14 08:23 UTC, Jan Rathmann
Details
Second defect spreadsheet created by calc (59.18 KB, application/vnd.oasis.opendocument.spreadsheet)
2014-10-07 12:15 UTC, Jan Rathmann
Details
Second spreadsheet with manually fixed content.xml (59.13 KB, application/vnd.oasis.opendocument.spreadsheet)
2014-10-07 12:16 UTC, Jan Rathmann
Details
Another corrupted spreadsheet (919.58 KB, application/vnd.oasis.opendocument.spreadsheet)
2014-10-21 14:44 UTC, Pierre-Alain Dorange
Details
file is ok (61.84 KB, application/vnd.oasis.opendocument.spreadsheet)
2014-10-21 16:22 UTC, Pierre-Alain Dorange
Details
this file is corrupted (61.51 KB, application/vnd.oasis.opendocument.spreadsheet)
2014-10-21 16:23 UTC, Pierre-Alain Dorange
Details
fixed document (36.36 KB, application/vnd.oasis.opendocument.text)
2014-10-21 18:51 UTC, raal
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Rathmann 2014-09-14 08:19:36 UTC
Created attachment 106241 [details]
Defect spreadsheet created by calc

Problem description: 

Recently I created a speadsheet with Libreoffice Calc and saved it. Later I tried to open it again, but Calc could not open the file and outputted the following error message:

"Read-Error.
Format error discovered in the file in sub-document content.xml at 2,8442(row,col)."

After investigation I found out that the cause were duplicated attributes within some XML tags in content.xml. After manually deleting those attributes and adding the modified content.xml back to the ods file, I was able to open the spreadsheet again.

I have attached the defect/corrupted spreadsheet as created by Calc, the manually fixed spreadsheet that can be opened again, and a screenshot showing the output of wdiff when comparing the two content.xml-files.

Steps to reproduce:
1. Try to open spreadsheet-defect.ods
2. Calc will fail to open it and quits with the error message quoted above.

Current behavior:
Spreadsheet file is corrupted and cannot be used without manual knowledge of the interns of ODF files/XML format.

Expected behavior:
While saving files, Calc should under every circumstance produce a file that can be opened again.

Kind regards,
Jan
              
Operating System: Ubuntu
Version: 4.2.6.2 release
Comment 1 Jan Rathmann 2014-09-14 08:20:45 UTC
Created attachment 106242 [details]
Spreadsheet with manually fixed content.xml
Comment 2 Jan Rathmann 2014-09-14 08:23:25 UTC
Created attachment 106243 [details]
Screenshot of output of wdiff command highlighting the changes of content.xml
Comment 3 m_a_riosv 2014-09-14 09:22:15 UTC
I can't reproduce with Win7x64
Version: 4.2.6.3 Build ID: 3fd416d4c6db7d3204c17ce57a1d70f6e531ee21

Please try resetting user profile, sometimes solves strange issues.
https://wiki.documentfoundation.org/UserProfile
Comment 4 Jan Rathmann 2014-09-14 11:35:51 UTC
I have tested it with a pristine user profile, and also with the current version of OpenOffice and Gnumeric (within a virtual machine) - none of them was able to open the file (OpenOffice showed the same error message). So it has nothing to with my particular user profile.

A standardized way to reproduce the bug is to boot a virtual machine from an Ubuntu 14.04.1-Live-CD and then try to open the file. It does not work.

Kind regards,
Jan
Comment 5 MM 2014-09-14 18:06:08 UTC
(In reply to comment #4)
> I have tested it with a pristine user profile, and also with the current
> version of OpenOffice and Gnumeric (within a virtual machine) - none of them
> was able to open the file (OpenOffice showed the same error message). So it
> has nothing to with my particular user profile.
> 
> A standardized way to reproduce the bug is to boot a virtual machine from an
> Ubuntu 14.04.1-Live-CD and then try to open the file. It does not work.


This issue supposed to be fixed. 
Checkout https://bugs.freedesktop.org/show_bug.cgi?id=80499
The error should be in the *saving* part, not the loading.
Yes, the file can't be opened, but that's because it's somehow a bit corrupted.

Can you reproduce the bug again, by loading the correct file and saving to a corrupt one ?
Comment 6 Jan Rathmann 2014-09-14 20:53:15 UTC
Unfortunately I was not able so far to reproduce the conditions that lead Calc to create the corrupted file. If I make small modifications to the fixed file, Calc does save it correctly. Thus I hoped that the corrupted file would give a hint of how the corruption was caused.

Kind regards,
Jan
Comment 7 raal 2014-09-25 08:50:37 UTC
Setting as NEEDINFO, please set as UNCONFIRMED if you will have steps to reproduce or bug occurs again. Thank you.
Comment 8 David Tonhofer 2014-09-27 22:26:17 UTC
May or may not be related:

For the first time ever I have an error like this in "LibreOffice/4.2.6.3" today (more precisely, 4.2.6.3.3, on Fedora 20):

---------------------
Read-Error.
Format error discovered in the file in sub-document content.xml at 2,37882(row,col).
---------------------

Uh-oh, great. 

(Looks like we have come full circle and are back at the kind of situation that caused Neal Stephenson to crack and write his "In the Beginning .. Was the Command Line" anti-MS rant. Or nearly so. Ok, manual fixing 'R us!)

Some peeves:

1) Could content.xml be formatted more nicely? Like with CRLF and indentations? The jar file is compressed anyway.... 

2) The error message could indicate what the parser doesn't so particularly like. What's special with position 37882? Is the attribute before that position or after that position problematic?

3) Extreme Brittleness! Why not spit out a fat warning and continue processing regardless?  


In this case:

<style:style style:name="T11" style:family="text">
<style:text-properties 
fo:font-weight="bold" 
style:font-weight-asian="bold" 
style:font-weight-complex="bold" 
style:font-name="Liberation Sans" 
style:font-name-asian="DejaVu Sans" <---- the end of this is position 37882 
style:font-name-complex="DejaVu Sans" 
fo:font-weight="normal" 
style:font-weight-asian="normal" 
style:font-weight-complex="normal" 
fo:font-style="italic" 
style:font-style-asian="italic" 
style:font-style-complex="italic"/>
</style:style>

So it looks like

style:font-weight-asian 
style:font-weight-complex

have been defined twice, but this has nothing to do with the position.


Try more info?

Let's see whether the document is well-formed:

$ xmlwf content.xml 

content.xml:2:37882: duplicate attribute

$ xmllint --format content.xml 

content.xml:2: parser error : Attribute fo:font-weight redefined
style="italic" style:font-style-asian="italic" style:font-style-complex="italic"
                                                                               ^
content.xml:2: parser error : Attribute style:font-weight-asian redefined
style="italic" style:font-style-asian="italic" style:font-style-complex="italic"
                                                                               ^
content.xml:2: parser error : Attribute style:font-weight-complex redefined
style="italic" style:font-style-asian="italic" style:font-style-complex="italic"
                                                                               ^
content.xml:2: parser error : Attribute fo:font-weight redefined
ght="normal" style:font-weight-asian="normal" style:font-weight-complex="normal"
                                                                               ^
content.xml:2: parser error : Attribute style:font-weight-asian redefined
ght="normal" style:font-weight-asian="normal" style:font-weight-complex="normal"
                                                                               ^
content.xml:2: parser error : Attribute style:font-weight-complex redefined
ght="normal" style:font-weight-asian="normal" style:font-weight-complex="normal"
                                                                               ^

Ok, let's try this:


$ xmllint --recover --format content.xml > content_new.xml
$ mv content_new.xml content.xml
$ xmllint --format content.xml 

content.xml:435: parser error : Attribute fo:font-weight redefined
style="italic" style:font-style-asian="italic" style:font-style-complex="italic"
                                                                               ^
content.xml:435: parser error : Attribute style:font-weight-asian redefined
style="italic" style:font-style-asian="italic" style:font-style-complex="italic"
                                                                               ^
content.xml:435: parser error : Attribute style:font-weight-complex redefined
style="italic" style:font-style-asian="italic" style:font-style-complex="italic"
                                                                               ^
content.xml:474: parser error : Attribute fo:font-weight redefined
ght="normal" style:font-weight-asian="normal" style:font-weight-complex="normal"
                                                                               ^
content.xml:474: parser error : Attribute style:font-weight-asian redefined
ght="normal" style:font-weight-asian="normal" style:font-weight-complex="normal"
                                                                               ^
content.xml:474: parser error : Attribute style:font-weight-complex redefined
ght="normal" style:font-weight-asian="normal" style:font-weight-complex="normal"
                                                                               ^

Okidoki, this is a good error message that tells me what to fix. After fixing:

$ xmlwf content.xml 

No error messages? Hell yeah.

Pack it up, do not forget M to not create a new manifest:

$ jar Mcf Fixed.odt Configurations2/ content.xml manifest.rdf META-INF/ meta.xml mimetype settings.xml styles.xml Thumbnails/

$ soffice Fixed.odt.

WORKS! Until next time, of course.

I suppose the problem comes from merging/unmerging and copy-pasting cells in the original spreadsheet, something this spreadsheet has seen a lot of. Let's see what happens.
Comment 9 David Tonhofer 2014-09-27 22:37:18 UTC
Oh darn, the content is messed up now due to "significant whitespace". Looks like pretty-rinting content.xml is not such a good idea after all. Sob! Oh well, fixing again, with no prettyprinting in between.
Comment 10 Julien Nabet 2014-10-03 20:32:37 UTC
Miklos: thought this tracker might interest you.
It seems the file is corrupted because of duplicated attributes (see the  interesting David Tonhofer's comment 8).
In the fdo#84621 put in see also, here are the duplicate found with xmllint --format:
font-size-asian
font-size-complex
font-weight-asian
font-weight-complex
text-underline-color
text-underline-style

Any idea if some (all?) of them are already fixed on master?
Comment 11 Jan Rathmann 2014-10-07 12:12:20 UTC
Today I have been bitten by the bug again, while working on the same file. This time there was only one occurence of duplicate attributes, namely:

<style:text-properties fo:color="#000000" fo:color="#ff0000"/>

in 2,16701 (row,col).

The tag seems to relate to a field in the spreadsheet that has some text with color set to red and some other with color set to black. So I tried to change text color in random field, copied text between fields etc. to find a way to reproduce the steps one needs to make in Calc to trigger the bug, but unfortunately I had no success so far - all other files that were created after that were fine.

I'll attach my modfied spreadsheet again, the faulty version and the manually fixed one.

Kind regards,
Jan
Comment 12 Jan Rathmann 2014-10-07 12:15:17 UTC
Created attachment 107485 [details]
Second defect spreadsheet created by calc
Comment 13 Jan Rathmann 2014-10-07 12:16:07 UTC
Created attachment 107486 [details]
Second spreadsheet with manually fixed content.xml
Comment 14 Julien Nabet 2014-10-09 21:33:23 UTC
Jan: for the test, could you give a try to LO 4.3.2 (see https://launchpad.net/~libreoffice/+archive/ubuntu/ppa).
I mean, perhaps the bug is already fixed on this version(?)
Comment 15 Jan Rathmann 2014-10-11 14:30:49 UTC
Julien, I would gladly try LO 4.3.2, but the main problem is that I still was not able to deduct a series of steps in Calc that make the bug appear and lead to the generation of faulty files. The two times the bug has happened recently I was doing "normal work" on that file and could not exactly remember, which actions I had made before in which order, any attempts to find such editing steps/actions through dedicated testing have failed so far.

The only thing I could test with 4.3.2 at this stage is to see, whether it handles the opening of such files with faulty tag duplicates more gracefully, i.e. opens them anyway and just prints out a warning. But since all other Open-Document-aware applications I have tested (OpenOffice, Gnumeric, Calligra Sheets) also refuse to open the faulty file, I don't assume that this has been changend in LO recently (while a more graceful behaviour towards documents with duplicate tags would be a great improvent from an end users view IMHO).

Kind regards,
Jan
Comment 16 Julien Nabet 2014-10-11 14:33:25 UTC
Jan: in this case, I propose you to put this one WFM for the moment.
If you finally find the steps to reproduce this, don't hesitate to reopen this tracker of course.
Also, perhaps with 4.3.2 you won't be able to reproduce it at all! :-)
Comment 17 Jan Rathmann 2014-10-11 17:10:11 UTC
Julien: Ok, than the best thing for me seems to be to upgrade to LO 4.3.2 and watch out if the bug will occur again. If that's the case, I'll reply again here.

Kind regards,
Jan
Comment 18 Julien Nabet 2014-10-13 18:03:20 UTC
(nitpicking mode :-)) WFM since there's no specific fix spotted.
Comment 19 Pierre-Alain Dorange 2014-10-21 14:41:46 UTC
Someone at our office face the same bug (corrupted spreadsheet) twice today.

Config :
LibreOffice 4.2.6.2 on MacOS X 10.6.8 (last stable)

* Opening a spreadsheet 
* save it under a new name (over a Mac network)
* modify some cells
* save

At that time the user can contionu to work on the file, but if he close and try to reopen, it was corrupted.

---------------------
Read-Error.
Format error discovered in the file in sub-document content.xml at 2,137417(row,col).
---------------------

I try old and recent LO version and also OpenOffice, but each time the same error.
I thought the corruption was made during save.

The user can not told me exactly what was done on the file (just modify some text in some cells : it was a text tab).

As the file was corrupted, i restore the previous version (few hour ago).
The restore file open nice (but without modifications).

The user redo the modifications, re-save-as, re-close, : the file was corrupted a second time.

I got a copy of the corrupted and not-corrupted (but unmodified). 
I try to reproduce with some random modifications on the original and save, but i do not reproduce the bug.

I'll investigate more.

Set Status to UNCONFIRMED until we can found a way to reproduce the bug.
It was definitevely a bug but can't found a way to reproduce it.
Comment 20 Pierre-Alain Dorange 2014-10-21 14:44:39 UTC
Created attachment 108189 [details]
Another corrupted spreadsheet

Another corrupted spreadsheet made with LO 4.2.6 on MacOS X 10.6.8
Comment 21 Julien Nabet 2014-10-21 15:02:25 UTC
Does he reproduce the problem with LO 4.3.2 + backup of a file non corrupted?
Comment 22 Pierre-Alain Dorange 2014-10-21 16:20:40 UTC
Tracking the bug, following comments #19 

I ask the user to redo the job (again) starting from the uncorrupted spreadsheet and to "save as" with a new name each 5 minutes.
I got 10 files, the 4 first one are OK, the last 6 are corrupted.

The user can't tell exactly what was done between step 4 and 5 but i seems that he just modify some text and insert a few lines.

Following the 2 files : test-4-ok (good) and test-5-bad (corrupted).

As i can't open the content.xml (i got an editor but the whole xml is one 1 line not convinient to examine and mannually correct).
But fixing the test-5-bad and compare to test-4-ok will show what was modified.

Excepting it will help.
Comment 23 Pierre-Alain Dorange 2014-10-21 16:22:33 UTC
Created attachment 108193 [details]
file is ok

This file is good, but the modification done next will lead to test_5_bad that is corrupted
Comment 24 Pierre-Alain Dorange 2014-10-21 16:23:28 UTC
Created attachment 108194 [details]
this file is corrupted

Only few modifications leads to this corrupted file (compare with test_4_ok)
Comment 25 Julien Nabet 2014-10-21 17:40:35 UTC
Pierre:
exactly similar errors which have been indicated in comment 10.

Here's the output of a xmllint after having unzipped your corrupted file.
xmllint --format content.xml > content_new.xml
content.xml:2: parser error : Attribute fo:font-weight redefined
ght="normal" style:font-weight-asian="normal" style:font-weight-complex="normal"
                                                                               ^
content.xml:2: parser error : Attribute style:font-weight-asian redefined
ght="normal" style:font-weight-asian="normal" style:font-weight-complex="normal"
                                                                               ^
content.xml:2: parser error : Attribute style:font-weight-complex redefined
ght="normal" style:font-weight-asian="normal" style:font-weight-complex="normal"

                                                                               ^
Please ask to your colleague to give a try to 4.3.2 LO version + ask him to rename his LO directory profile (see https://wiki.documentfoundation.org/UserProfile#Mac_OS_X).

If he reproduces this with this version, ask him about the details between steps 4 and 5 then don't hesitate to reopen this tracker and write those.
Comment 26 raal 2014-10-21 18:51:58 UTC
Created attachment 108201 [details]
fixed document

here is fixed file
problem in:
<style:style style:name="T5" style:family="text"><style:text-properties style:font-name="Arial" fo:font-size="10pt" fo:font-weight="bold" style:text-underline-style="none" style:text-underline-color="font-color" fo:font-style="normal" style:text-outline="false" fo:text-shadow="none" style:text-position="0%" style:font-size-asian="10pt" style:font-size-complex="10pt" style:font-weight-asian="bold" style:font-weight-complex="bold" style:font-style-asian="normal" style:font-style-complex="normal" style:font-weight-asian="normal" style:font-weight-complex="normal"/></style:style>

deleted:
 style:font-weight-asian="normal" style:font-weight-complex="normal"

I tried to compare document test_4_ok.ods and fixed document and looks the same (used function in calc Edit-> Compare document).
Comment 27 Pierre-Alain Dorange 2014-10-21 19:19:30 UTC
I manually compare the fixed file by raal (thanks) and there is differences.
In fact my colleague delete english text (to keep only french).

Between revision_4 and revision_5 the part modified are line 12 to line 17.

I try to reproduce this at home (with LO 4.3.0.4) and got no corrupted file.
But it was a proof, just a clue.
I'll try tomorrow at the office with my colleague using the same method he use.