Bug 82757 - Commit of unchanged date field corrupts the field
Summary: Commit of unchanged date field corrupts the field
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Base (show other bugs)
Version:
(earliest affected)
4.4.0.0.alpha0+ Master
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Lionel Elie Mamane
URL:
Whiteboard: target:4.4.0 target:4.3.2
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-18 08:53 UTC by tim
Modified: 2014-08-21 06:38 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Test Database (12.75 KB, application/vnd.sun.xml.base)
2014-08-18 08:53 UTC, tim
Details

Note You need to log in before you can comment on or make changes to this bug.
Description tim 2014-08-18 08:53:40 UTC
Created attachment 104807 [details]
Test Database

I was testing Version: 4.3.2.0.0+
Build ID: 2087b54aab25ddbab5128af9f777b1686bc0ce5e
TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:libreoffice-4-3, Time: 2014-08-17_03:55:33 

This was to check that the problem I had in #82151 has been fixed.  It has.  

However, another issue has arisen.  The attached self-contained DB demonstrates it.

I have a table with a creation date and update count in it.  I set the creation date on the 1st update/insert and then not again.  The update count gets set for the 1st update and then incremented.  This is done by a form macro in a 'before record action'.

The attached database has the table and a form in it plus one macro (loosely based on the system which has the problem).  Open the form.  Enter something in the info column and save or tab to the next row.  All is well.

Now modify the info column in the row just entered.  The date changes (and is sometimes invalid), even though the code has not changed it.

Now I can fix this in my code (and have in my main system so as to get on with what I was doing), by not committing the creation date field if I haven't changed it.

However, this is a change from 4.3.0.4, and it can't be right that any field gets corrupted in this way by committing it unchanged.

If I've not been clear please let me know.  I've been chasing this issue for many hours and it was really hard to find in my main system since the similar code I use there for the whole of my system has been unchanged for a long time and is somewhat hidden and automatic.  The problem manifested itself by producing an invalid date, but it didn't say which table or which field was at fault.
Comment 1 Lionel Elie Mamane 2014-08-18 17:06:53 UTC
You are using text controls with an ambiguous format for dates. Don't do that. It is fragile, and has always been. If it worked in your precise setup before, it was by pure luck (or dare I say pure accident). It likely didn't work in previous versions if you only changed the locale of the computer (e.g. from English to German, and with associated default date format from M/D/Y to D/M/Y). I have successfully reproduced the problem in LibreOffice 4.1 in some configurations.

The issue resolves around what date is "18/08/14". Depending on context and locale, this could be:

 * 14 August 18
 * 14 August 2018
 * 14 August 1918

 * 18 August 14
 * 18 August 2014
 * 18 August 1914

 * 8th day of month number 18 of year 14
 * 8th day of month number 18 of year 2014
 * 8th day of month number 18 of year 1914

And in processing the Commit, several different interpretations are used, some fixed by the SQL standard and some fixed by the current locale and some fixed by the default locale (American English). I regret the fact that several different interpretations are used, but I don't think I can easily fix that.

Replace the text field by a Date field (plus a time field), this will be much more robust. An IMHO inferior solution which seems to work is to use a formatted field instead of a text field. Probably (but I didn't check in the code) the formatted field knows its format, so it avoids the issue of several differing string-to-date interpretations.
Comment 2 Commit Notification 2014-08-18 18:43:38 UTC
Lionel Elie Mamane committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=5eaa0e76fb59ed7699c12b84a6d9c052fec3c128

fdo#82757 call convertStringToNumber and detectNumberFormat with same locale



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 3 Lionel Elie Mamane 2014-08-18 18:53:06 UTC
I tracked down one reason that (in some scenarios) it used to work with older LibreOffice versions but not anymore with newer ones: the string-to-date conversion (parsing) has become stricter, so that it does not allow invalid dates such as "35th day of month" or "14th day of month 18" (in a calendar that has less than 18 months).

The situation was that:

 1) One parse was done with the format and locale
    attached to the column in the table.

 2) A second parse was done with the default locale,
    but only to see what type of value came out,
    the date read from it was thrown out.

With previous versions the second parse gave an invalid date, but that was thrown away anyway, while in more recent versions the second parse failed, falling back to passing the string as string (instead of as date) to the database layer, which was parsing the string as ISO format (which led to the corruption... in ISO, 18/12/8 is "8 December 18").

I changed the second parse to use the same locale, which makes much more sense. So with this commit, "it should work again as before".
Comment 4 tim 2014-08-18 19:37:28 UTC
Thanks very much.  At least a minor inconsistency has been smoothed over, even if not really a fault.  I do appreciate it.


A few comments:

The English, like the Germans, also use D/M/Y - the Americans use M/D/Y (I know not why).

I know I should ideally, use Date and Time. However, there was a reason why I used Text.  I think it was that I have a lot of old data imported from MS Access that was a bit variable and suspect and so Text was easier to implement.  It has been that way for some time now, certainly back to LO 3.3 or thereabouts.

I have just tested using a formatted field as YYYY-MM-DD HH:MM:SS and that also works OK, so I have more than one way out of my dilemma, including the change you have made.

I'll retest and verify when I can get a daily deb build I'm sure it's in.
Comment 5 Lionel Elie Mamane 2014-08-19 07:51:36 UTC
(In reply to comment #4)
> Thanks very much.  At least a minor inconsistency has been smoothed over,
> even if not really a fault.  I do appreciate it.

Just to be completely clear: I *do* think that, as far as possible, we should avoid "just commit without change" to change (corrupt) the data. OTOH, a commit *should* write the data to the database (so that e.g. in a multi-user scenario, if the data had changed from another SQL user, it is overwritten again). Ideally, the same data that was read. However, in the context of a text control, it is sometimes not possible. Think e.g. of a format "D/M/YY", where the date now in the database is "13 September 1515". This is displayed as "13/9/15", and is stored by the text control (which knows its content as *text*) as this string. Which when parsed by the commit will be understood as "13 September 2015". And that's what will be written to the database.

Having a "Date" control avoids that problem entirely, since it knows its content as a *Date*, even if it is displayed in a lossy way, it is still stored as a date -> no loss.

Having a "formatted field" control probably also avoids that problem, since (I think, not checked), it knows its content as a double-precision float (number of days since 30 December 1899 or something like that). Again, even if displayed in a lossy way, there is no loss internally, and a "commit without change" will commit the same value, not something reparsed from the displayed string (again, I predict, not formally checked). This works as long as there is enough precision in a double-precision float, that is for date+time up to about microseconds for dates not too far in the future. If you try to use nanoseconds in a formatted field, you'll have rounding.

> I know I should ideally, use Date and Time. However, there was a reason why
> I used Text.  I think it was that I have a lot of old data imported from MS
> Access that was a bit variable and suspect and so Text was easier to
> implement.  It has been that way for some time now, certainly back to LO 3.3
> or thereabouts.

If you want your data to be handled (and considered) as text, then store it in a text (VARCHAR) database (table) column. If you want to "clean up" your "a bit variable and suspect" data, then do so, but in a way *you control* (*you* choose the parsing locale for a *one-time* text-to-date conversion) and then from there on consider it as "clean" data: dates, not text.

My general point is to match the type of data expected by the database and the type of data handled by the control... in the case of Dates, text is not a good match; "Date" is the best match, but double-precision float is an OK match (because there is a "canonical" bijection).

> I have just tested using a formatted field as YYYY-MM-DD HH:MM:SS and that
> also works OK,

Yes; the second parse I mentioned above (or really, in this case, even the first parse since in a British English locale it will try to read 2014-08-14 as "day 2014 of month 8 of year (20)14") will fail, fallback to giving a string to the SQL system, which has no notion of locale in dates, and uses ISO YYYY-MM-DD which is a match with the string...
Comment 6 Lionel Elie Mamane 2014-08-19 07:58:06 UTC
Also, to be clear, when I said "date + time", I meant two controls bound to the same timestamp (Date+Time) database column. There is special support so that they cooperate and a commit (even with change) of the time control does not touch the date part and vice-versa.
Comment 7 tim 2014-08-19 09:50:49 UTC
Thanks very much for your very detailed response.

Short term I will just avoid committing unchanged data.

Longer term I will probably go for the formatted field.  I have a lot of forms and subforms, and the fields are hidden (there are actually two, one for creation and one for update).  

Adding two more fields for something not visible is quite a bit of work, although I agree that in a purist sense I should go that route.  I will look into how to do that before embarking on too many alterations.

I will also test your change in due course so that the report can go to verified status.  I assume it will be in 4.3.2 daily by tomorrow?
Comment 8 Lionel Elie Mamane 2014-08-19 10:18:22 UTC
(In reply to comment #7)
> I will also test your change in due course so that the report can go to
> verified status.  I assume it will be in 4.3.2 daily by tomorrow?

Only if someone reviews it on gerrit for the 4.3 branch today :) Else, in the daily of the day after someone does that. https://gerrit.libreoffice.org/11003

It should be in the 4.4.0 alpha (master) dailies of today (if any exists).
Comment 9 Commit Notification 2014-08-20 08:12:51 UTC
Lionel Elie Mamane committed a patch related to this issue.
It has been pushed to "libreoffice-4-3":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=ff4d02e35e5187211da457facbb07b8e89c0d556&h=libreoffice-4-3

fdo#82757 call convertStringToNumber and detectNumberFormat with same locale


It will be available in LibreOffice 4.3.2.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 10 tim 2014-08-21 06:38:39 UTC
Verified in Version: 4.3.2.0.0+
Build ID: b2d54aa61607e477cb4b81f1a70e555ee3adb0af
TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:libreoffice-4-3, Time: 2014-08-20_23:12:37

Thanks very much for the fix, and the advice.