We're updating the issue view to help you get more done. 

Constant (uncontrollable) growth of BTree index size in NativeStore due to empty nodes

Description

In a fix was introduced which prevented index corruption of the BTree for large transactions (including both add and remove operations).

However, this caused a critical problem in the physical BTree layout on disk: the repository grows infinitely for large transactions which include add and remove of the same triple set.

The issue is caused by the fact that after the end of a transaction the BTree contains empty nodes (which require physical space) with usage counter greater than zero. Note: the BTree internally uses Node#use() and Node#release() calls which are unbalanced after the fix introduced in SES-1867.

We observed the issue in a test (with a large transaction of ~140000 Triples) which performs the following operations in a loop:

1) add initial data stat to ctx1
2) repeat (increment i): in a single transaction remove ctx[i-1], and add the same data to ctx[i]

After each step, we checked the number of triples in the repository as well as the physical disk storage.

The observations of this tool in Sesame 2.7.2 and 2.7.3 are as follows:

With Sesame 2.7.2

Run 0: Repo size: 142014, size of directory: 14134537
Run 1: Repo size: 142014, size of directory: 25784375
Run 2: Repo size: 142014, size of directory: 25684049
Run 3: Repo size: 142014, size of directory: 25682035
Run 4: Repo size: 142014, size of directory: 25682069
Run 5: Repo size: 142014, size of directory: 25561263
Run 6: Repo size: 142014, size of directory: 25561297
Run 7: Repo size: 142014, size of directory: 25510131
Run 8: Repo size: 142014, size of directory: 25510165
Run 9: Repo size: 142014, size of directory: 25481527
Size of directory: 25481527

With Sesame 2.7.3

Run 0: Repo size: 142014, size of directory: 14134537
Run 1: Repo size: 142014, size of directory: 25835575
Run 2: Repo size: 142014, size of directory: 36565737
Run 3: Repo size: 142014, size of directory: 46933379
Run 4: Repo size: 142014, size of directory: 57047061
Run 5: Repo size: 142014, size of directory: 66931343
Run 6: Repo size: 142014, size of directory: 76481785
Run 7: Repo size: 142014, size of directory: 85784403
Run 8: Repo size: 142014, size of directory: 94783901
Run 9: Repo size: 142014, size of directory: 103580623
Size of directory: 103580623

Note: the first jump in size occurs due to the fact that the BTree keeps unused nodes with IDs less than the maximal used-ID for future reuse.

Please find attached a possible patch which resolves the issues described in (see comments about the assertion error in NativeStoreConsistencyTest), and in addition prevents the BTree from growing infinitely. The patch moves the call of Node#use() from pushStacks() to #rotatedLeft() and hence avoids the imbalance of the usages-count.

Running the above described test with Sesame 2.7.3 (including the possible patch) yields the following results:

Run 0: Repo size: 142014, size of directory: 14134537
Run 1: Repo size: 142014, size of directory: 25784307
Run 2: Repo size: 142014, size of directory: 25683981
Run 3: Repo size: 142014, size of directory: 25681967
Run 4: Repo size: 142014, size of directory: 25682001
Run 5: Repo size: 142014, size of directory: 25561195
Run 6: Repo size: 142014, size of directory: 25561229
Run 7: Repo size: 142014, size of directory: 25510063
Run 8: Repo size: 142014, size of directory: 25510097
Run 9: Repo size: 142014, size of directory: 25481459
Size of directory: 25481459

Environment

None

Status

Assignee

JeenB

Reporter

Andreas Schwarte

Labels

None

Components

Fix versions

Affects versions

2.7.3

Priority

Critical