We're updating the issue view to help you get more done. 

Severe concurrency issues in HttpRepository of Sesame 2.7: number of threads can grow uncontrolled under certain conditions

Description

When testing and working with Remote Repositories of Sesame 2.7 beta, we encountered severe problems that arise in concurrent environments.

As it is hard to see in the code (and only reproducible in environments with a high number of parallel requests / a high number of requests in a short time), I will try to illustrate the issue using the HttpTupleQuery as an example.

In HttpTupleQuery#evaluate the HTTPTupleQueryResult (i.e. the query request itself which once evaluated becomes an iteration) is executed in a shared (unlimited) cached threadpool. Note in particular that the HTTP Request itself is sent only from a new thread retrieved from the thread pool, causing the limited number of connections of the HTTPClient to not have a blocking effect. The consequence is that the number of threads grows in an uncontrolled way, when requests are submitted in parallel (or even shortly one after the other).

In contrast, in the SPARQLRepository this cannot happen: here the HTTP query request is first sent (thus using the inherent queuing of the multi threaded connection manager, i.e. a limit of currently 20), and then the result is passed in a BackgroundTupleResult to a separate new thread. Here the number threads in the extra pool is limited by the number of connections.

In my setting (which worked perfectly in Sesame < 2.6.10 where we did not have this concurrency in the remote repository) everything worked perfectly.

In my opinion this issue can arise quite easily in a practical situation with frequent database access, i.e. the number of threads cannot be controlled causing the application to stop working properly.

Environment

None

Status

Assignee

Andreas Schwarte

Reporter

Andreas Schwarte

Labels

None

Components

Fix versions

Affects versions

2.7.0-beta1

Priority

Major