Emoji SPARQL😝!

If emojis have Unicode code points, then we can...

I knew that emojis have Unicode code points, but it wasn't until I saw this goofy picture in a chat room at work that I began to wonder about using emojis in RDF data and SPARQL queries. I have since learned that the relevant specs are fine with it, but as with the simple display of emojis on non-mobile devices, the tools you use to work with these characters (and the tools used to build those tools) aren't always as cooperative as you'd hope.

After hunting around a bit among these tools, I did have some with fun this. Black and white emojis, as shown in the Browser column of the unicode.org Emoji Data page, display with no problem in my Ubuntu terminal window and in web page forms, but I wanted the full-color emojis from that page's Sample column. The Emacs Emojify mode did the trick, so what you see below are screen shots from there.

sample RDF with emoji

I started by converting that same unicode.org web page (as opposed to the site's much larger Full Emoji Data page) to a Turtle file called emoji-list.ttl with a short perl script. (You can find both in github at emojirdf.) On the right, you can see triples from that web page's row about the french fries emoji. For the keywords assigned to each character, the Emoji Data web page has links, so it was tempting to use the link destinations as URI values for the lse:annotation values instead of strings, but some of those link destinations have local names like +1, which won't make for nice URIs in RDF triples.

I thought about augmenting my emoji-list.ttl file to turn it into an emoji ontology. I first dutifully searched for "emoji rdf" on Google (which asked me "did you mean emoji pdf? emoji def?") to avoid the reinvention of any wheels. The most promising search result was an Emoji Ontology that adds some interesting metadata to the emojis, but its Final emoji ontology in OWL/XML format has little to do with OWL or even RDF, and I didn't feel like writing the XSLT to convert its additional metadata to proper RDF.

With no proper emoji ontology already available, I thought more about creating my own by adding triples that would arrange the emojis into a hierarchical ontology or taxonomy. This would let me say that the ant 🐜 and the honeybee 🐝 are both insects, and that the ox 🐂 and the many, many cats are mammals, and then I could query for animals and see them all or query for insects and see just first two. This would add little, though, because the existing annotation values already serve as a non-hierarchical tagging system that identifies insects, so I could just query for those lse:annotation values.

Some of these annotation values led to some fun queries of the emoji-list.ttl file. I used Dave Beckett's Redlands roqet as a query processor, telling it to give me CSV data that I redirected to a file. Here's a query asking for the character and label of any emojis that have both "face" and "cold" in their annotation values:

PREFIX lse:  <http://learningsparq.com/emoji/> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?char ?label
  ?s lse:annotation 'face', 'cold' ;
     rdfs:label ?label ;
     lse:char ?char .

It returned this result, showing that "cold" can refer to both low temperature and wintertime sniffles:

result of first SPARQL emoji query

This next query uses emojis in string data to ask which annotations have tagged both the alien head and one of the moon face emojis:

SPARQL query

(Apparently, Emacs SPARQL mode thinks that the "not" in "annotation" is the SPARQL keyword, because it resets the substring's font color.) Here is the query result; note that, as is typical with many query tools, the first row is the variable name, not a returned value:


Emoji Unicode code points range from x1F600 to x1F1FF, which SPARQL spec productions 164 - 166 say are legal for use in variable names. The following query requests the satellite dish character's annotation values and stores them in a variable whose three-character name is three emojis:

SPARQL emoji query

Here is our result:

SPARQL query result

This is actually why I used roqet—the Java-based SPARQL engines that I first tried may have implemented the spec faithfully, but some layer of the Java tooling underneath them couldn't handle the full extent of Unicode in every place where it should.

Emojis in RDF data are not limited to quoted strings. When I told roqet to run a query against this next Turtle file, which uses emoji characters as prefixes and as subject and predicate local names in its one triple, it had no problem:

Turtle file with emoji properties

This final query went even further, and roqet had no problem with it: it defines a bowl of spaghetti emoji as a namespace prefix and then, using emojis for the variable names, asks for the subjects and objects of any triples that have the predicate from the one triple in the Turtle file above.

Turtle file with emoji properties

Of course, it's difficult to read, and the fact that running the query and even just displaying it required me to dig around for the right combination of tools doesn't speak well for the use of emojis in queries. Besides being a fun exercise, though, the experience and the result—that it all ultimately worked—provided a nice testament to the design of the Unicode, RDF, and SPARQL standards.

Please add any comments to this Google+ post.