We're updating the issue view to help you get more done. 

Performance improvement by replacing lists with sets in PathIteration code

Description

Per mailing list discusion

Jerven Bolleman was looking into a profiler trace of my code and noticed that a lot of time is spent in the PathIteration code. Which is expected when looking at a path query

Looking a bit deeper I saw that this code uses two lists for caching reported and unreported values. I noticed that one of the most common operation on this list is contains
This is not an optimal operation on a list. I replaced this with a HashSet and this gives improved performance. I was wondering if I missed a reason for using a list instead of a
HashSet there? If not should I open a issue?

Dale Visser looked at the code you're talking about (EvaluationStrategyImpl.java in the sesame-queryalgebra project), and I agree, the two private fields reportedValues and unreportedValues are declared as List<ValuePair> and implemented with ArrayList<ValuePair>. However, the only operations used on them are add(ValuePair) and contains(ValuePair). They should be declared Container<ValuePair>, and given an implementation type that is very efficient for add() and contains(), such HashSet<ValuePair>, as you used.

Environment

None

Status

Assignee

JeenB

Reporter

Jerven Bolleman

Labels

Components

Fix versions

Affects versions

2.6.10

Priority

Major