We're updating the issue view to help you get more done. 

UTF-8 BOM mark causes Rio parsers to fail

Description

Attached a minimized version of a TriG file that should be legal, as far as I can tell. Certainly, it was originally produced by a Sesame export.

Parsing this file with Sesame 2.7.0 results in the following error:

FATAL ERROR: Expected ':', found '@' [line 1]

Upon doing a debug run through the TrigParser, I noticed that in TriGParsre.parseStatement() (lines 88-122), when processing this file, the variable reading characters from the currentline (`c`) gets the value '65279' before it proceeds to the the first char in the file, namely the @ char of the first prefix declaration. This result in the check on line 107 (directive.startsWith('@')) to fail, and this eventually causes the parse errors.

Char 65279 is an UTF-8 byte order marker, if I'm not mistaken, but I am at a complete loss to understand why it shows up here. Surely the InputStreamReader should take care of this?

Environment

None

Status

Assignee

Peter Ansell

Reporter

JeenB

Labels

None

Components

Fix versions

Affects versions

2.7.0

Priority

Minor