We're updating the issue view to help you get more done. 

Inconsistency in NativeStore BTree indices after transactional update operation

Description

We encountered a bug in the Sesame NativeStore causing inconsistencies of the BTree indices after a transactional update operation. The problem seems to be caused by low-level problems in the BTree implementation.

These inconsistencies appear for very particular index states, and are thus very hard to reproduce. We managed to find a database state as well as an update operation, which deterministically allows to produce corrupt BTree indices. Please find attached a Java Eclipse project (including the required (anonymized) data), which reproduces the problem.

High-Level Steps of the program:

1) Load the initial database
2) In a single transaction delete a particular context, then add new data (which contains partially overlapping statements)
3) Analyse the database and index states

Observations:

1) Not all statements of the particular context ("oldContext") are deleted
2) SPOC and PSOC are out of synch
2a) number of statements differs (Hint: in the output look for "Problematic Triple"). "Problematic Triple" is added correctly to PSOC, but not SPOC.
2b) in the SPOC index some triples of "oldContext" are not deleted
2c) in the PSOC index the triples are correctly added to "newContext"

Consequences for users:

Depending on which index is used for evaluating a particular user query, the results are different. This might be very problematic on the application level, when the data is consumed (e.g. for computations).

Output (You can find the actual difference sets in the attached file):

Preserving initial state ...
Number of statements: 59991
Removing old context
Adding updated context
Not deleted statements: 17
Repository size with SPOC index only: 48606
Repository size with PSOC index only: 48607
Computing differences of sets...
Difference SPOC MINUS PSOC: 17
Difference PSOC MINUS SPOC: 18
Different statements in SPOC MINUS PSOC (Mind the contexts):

  • NOTE: see attached file for further Details
    Different statements in PSOC MINUS SPOC (Mind the contexts):

  • NOTE: see attached file for further Details

Further analysis:

  • After TripleStore.commit() is finished, some statements remain in uncommitted state. The method checkAllCommitted(), if uncommented, reports 35 "unexpected triples".

  • The bug happens when the updatedTriplesCache becomes invalid due to too many updates: isValid() returns false, because recordCount.get() > maxRecords.get(). Increasing the cache size in TripleStore


long maxRecords = indexes.get(0).getBTree().getValueCountEstimate();

leads to the bug not occurring any more in this specific setting.

  • The bug occurrence does not depend on the specific data values, but on the sequence of index operations. URIs and values could be replaced with aliases without affecting the ability to reproduce the bug.

  • Removing certain statements in the update.nt file prevented the bug from occurring (e.g., removing the first statement). Removing some other statements did not lead to the same effect.

  • Note: The provided example uses v2.7.2. We managed to reproduce the bug also with version 2.6.6 using logically equivalent code, but with the old transaction mechanism.

  • In this specific scenario, SPOC index was corrupted, while PSOC was in a clean state. However, in other scenarios we observed both PSOC and SPOC indices being corrupted.

Environment

None

Status

Assignee

JeenB

Reporter

Andreas Schwarte

Components

Fix versions

Affects versions

2.7.2
2.6.6

Priority

Blocker