The LiteralImpl class in Sesame's model currently normalizes all language tags to lower case before passing them on to the rest of Sesame. This is strictly speaking according to the spec, as the RDF Concepts & Abstract syntax recommendation states that language tags should be normalized to lower case.
However, the issue reporter has stated a need for being allowed to store language tags 'as-is', that is, without Sesame normalizing them.
5b. I posted to public-rdf-comments: http://lists.w3.org/Archives/Public/public-rdf-comments/2014Jan/0011.html :
"Lexical representations of language tags may be converted to lower case. The value space of language tags is always in lower case."
"Lexical representations of language tags MAY be normalized, according to BCP47 section 2.1.1. "Formatting of Language Tags" (country codes in upper case, script codes capitalized, the rest in lower case).
Language tags MAY also be normalized by converting all to lower case, but BCP47 normalization is preferred.
No matter which method is chosen, the semantics of language tags MUST NOT depend on case.
In particular, implementations MUST NOT store as separate statements, two statements that differ only by the case of language tags."
I noticed your comment on public-rdf-comments. May be best to wait until the RDF-WG makes a formal response before continuuing. Given the timelines, particularly the recent January 9 updated, it is possible that the comment will be postponed to a further release.
However, I am definitely interested in making the different behaviours easily configurable.
One issue is that the ValueFactory.createLiteral(String,String) method doesn't currently have any context in which to put configuration parameters. This means that, to be internally consistent with RDF-1.1 Abstract Concepts w.r.t language tag/triple comparisons, in a performant manner, a single normalisation algorithm must occur in or below that method. If it were possible to configure ValueFactory, in a similar way to RDFParser, then the user could specify their desired algorithm at that point.
Another alternative, is to make it a configuration setting for RDFWriter, and produce the desired upper/lower/BCP47/etc., case by modifying the language tags on the fly inside of RDFWriter.handleStatement. It could use an existing setting, (ie, RDFWriter.getWriterConfig().set(BasicParserSettings.LANGUAGE_HANDLERS, ...)) so there wouldn't be a need for any new configuration settings. If the RDF-WG do respond and keep the "value space is lowercase" condition, then, to be interoperable, all Sesame Sail/Repository implementations could still be emitting Statement objects with lowercase language tags, which would then be modified by the RDFWriter to the users specifications.
Agree, we must wait for RDF-WG response.
In this case variety is a bad thing, so let's not add needlessly. There are "only" 2 alternatives lower-case and BCP47-case
Based on the RDF-1.1 direction in the abstract syntax specifying that the value space for language tags is lower cased strings, I am working on the first patch for version 4 that will allow comparisons using that value space, even if the parser or repository uses a different case.
Assigning this ticket to you peter, since you are active on this issue.