Per mailing list discusion
Jerven Bolleman was looking into a profiler trace of my code and noticed that a lot of time is spent in the PathIteration code. Which is expected when looking at a path query
Looking a bit deeper I saw that this code uses two lists for caching reported and unreported values. I noticed that one of the most common operation on this list is contains
This is not an optimal operation on a list. I replaced this with a HashSet and this gives improved performance. I was wondering if I missed a reason for using a list instead of a
HashSet there? If not should I open a issue?
Dale Visser looked at the code you're talking about (EvaluationStrategyImpl.java in the sesame-queryalgebra project), and I agree, the two private fields reportedValues and unreportedValues are declared as List<ValuePair> and implemented with ArrayList<ValuePair>. However, the only operations used on them are add(ValuePair) and contains(ValuePair). They should be declared Container<ValuePair>, and given an implementation type that is very efficient for add() and contains(), such HashSet<ValuePair>, as you used.