12 June 2016

Emoji SPARQL😝!

If emojis have Unicode code points, then we can...

I knew that emojis have Unicode code points, but it wasn't until I saw this goofy picture in a chat room at work that I began to wonder about using emojis in RDF data and SPARQL queries. I have since learned that the relevant specs are fine with it, but as with the simple display of emojis on non-mobile devices, the tools you use to work with these characters (and the tools used to build those tools) aren't always as cooperative as you'd hope.

After hunting around a bit among these tools, I did have some with fun this. Black and white emojis, as shown in the Browser column of the unicode.org Emoji Data page, display with no problem in my Ubuntu terminal window and in web page forms, but I wanted the full-color emojis from that page's Sample column. The Emacs Emojify mode did the trick, so what you see below are screen shots from there.

sample RDF with emoji

I started by converting that same unicode.org web page (as opposed to the site's much larger Full Emoji Data page) to a Turtle file called emoji-list.ttl with a short perl script. (You can find both in the github repository emojirdf.) On the right, you can see triples from that web page's row about the french fries emoji. For the keywords assigned to each character, the Emoji Data web page has links, so it was tempting to use the link destinations as URI values for the lse:annotation values instead of strings, but some of those link destinations have local names like +1, which won't make for nice URIs in RDF triples.

I thought about augmenting my emoji-list.ttl file to turn it into an emoji ontology. I first dutifully searched for "emoji rdf" on Google (which asked me "did you mean emoji pdf? emoji def?") to avoid the reinvention of any wheels. The most promising search result was an Emoji Ontology that adds some interesting metadata to the emojis, but its Final emoji ontology in OWL/XML format has little to do with OWL or even RDF, and I didn't feel like writing the XSLT to convert its additional metadata to proper RDF.

With no proper emoji ontology already available, I thought more about creating my own by adding triples that would arrange the emojis into a hierarchical ontology or taxonomy. This would let me say that the ant 🐜 and the honeybee 🐝 are both insects, and that the ox 🐂 and the many, many cats are mammals, and then I could query for animals and see them all or query for insects and see just first two. This would add little, though, because the existing annotation values already serve as a non-hierarchical tagging system that identifies insects, so I could just query for those lse:annotation values.

Some of these annotation values led to some fun queries of the emoji-list.ttl file. I used Dave Beckett's Redlands roqet as a query processor, telling it to give me CSV data that I redirected to a file. Here's a query asking for the character and label of any emojis that have both "face" and "cold" in their annotation values:

PREFIX lse:  <http://learningsparq.com/emoji/> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?char ?label
WHERE 
{ 
  ?s lse:annotation 'face', 'cold' ;
     rdfs:label ?label ;
     lse:char ?char .
}

It returned this result, showing that "cold" can refer to both low temperature and wintertime sniffles:

result of first SPARQL emoji query

This next query uses emojis in string data to ask which annotations have tagged both the alien head and one of the moon face emojis:

SPARQL query

(Apparently, Emacs SPARQL mode thinks that the "not" in "annotation" is the SPARQL keyword, because it resets the substring's font color.) Here is the query result; note that, as is typical with many query tools, the first row is the variable name, not a returned value:

annotation
face
nature
space

Emoji Unicode code points range from x1F600 to x1F1FF, which SPARQL spec productions 164 - 166 say are legal for use in variable names. The following query requests the satellite dish character's annotation values and stores them in a variable whose three-character name is three emojis:

SPARQL emoji query

Here is our result:

SPARQL query result

This is actually why I used roqet—the Java-based SPARQL engines that I first tried may have implemented the spec faithfully, but some layer of the Java tooling underneath them couldn't handle the full extent of Unicode in every place where it should.

Emojis in RDF data are not limited to quoted strings. When I told roqet to run a query against this next Turtle file, which uses emoji characters as prefixes and as subject and predicate local names in its one triple, it had no problem:

Turtle file with emoji properties

This final query went even further, and roqet had no problem with it: it defines a bowl of spaghetti emoji as a namespace prefix and then, using emojis for the variable names, asks for the subjects and objects of any triples that have the predicate from the one triple in the Turtle file above.

Turtle file with emoji properties

Of course, it's difficult to read, and the fact that running the query and even just displaying it required me to dig around for the right combination of tools doesn't speak well for the use of emojis in queries. Besides being a fun exercise, though, the experience and the result—that it all ultimately worked—provided a nice testament to the design of the Unicode, RDF, and SPARQL standards.

Please add any comments to this Google+ post.

17 May 2016

Trying out Blazegraph

Especially inferencing.

I've been hearing more about the Blazegraph triplestore (well, "graph database with RDF support"), especially its support for running on GPUs, and because they also advertise some degree of RDFS and OWL support, I wanted to see how quickly I could try that after downloading the community edition. It was pretty quick.

Downloading from the main download page with my Ubuntu machine got me an rpm file, but I found it simpler to download the jar file version that I could start as a server from the command line as described on the Nano SPARQL Server page. I found the jar file (and several other download options) on the sourceforge page for release 2.1.

The jar file's startup message tells you the URL for the web-based interface to the Nano SPARQL Server, shown here:

At this point, uploading some RDF on the UPDATE tab and issuing SPARQL queries on the QUERY tab was easy. I was more interested sending it SPARQL queries that could take advantage of RDFS and OWL inferencing, so after a little help from Blazegraph Chief Scientist Bryan Thompson via their mailing list (with a quick answer on a Saturday) I learned how: I had to first create a namespace on the NAMESPACES tab with the Inference checkbox checked. The same form also offers checkboxes for Isolatable indexes, Full text index, and Enable geospatial when configuring a new namespace. I found this typical of how Blazegraph lets you configure it to take advantage of more powerful features while leaving the out-of-box configuration simple and easy to use.

For finer-grained namespace configuration, after you select checkboxes and click the Create namespace button, a dialog box lets you edit the configuration details, with each of these lines explained in the Blazegraph documentation:

I wanted to check Blazegraph's support for owl:TransitiveProperty, because this is such a basic, useful OWL class, as well as its ability to do subclass inferencing. I created some data about chairs, desks, rooms, and buildings, specifying which chairs and desks were in which rooms and which rooms were in which buildings, and also made dm:locatedIn a transitive property:

@prefix d: <http://learningsparql.com/ns/data#> .
@prefix dm: <http://learningsparql.com/ns/demo#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

dm:Room rdfs:subClassOf owl:Thing .
dm:Building rdfs:subClassOf owl:Thing .
dm:Furniture rdfs:subClassOf owl:Thing .
dm:Chair rdfs:subClassOf dm:Furniture .
dm:Desk rdfs:subClassOf dm:Furniture .

dm:locatedIn a owl:TransitiveProperty. 

d:building100 rdf:type dm:Building .
d:building200 rdf:type dm:Building .
d:room101 rdf:type dm:Room ; dm:locatedIn d:building100 . 
d:room102 rdf:type dm:Room ; dm:locatedIn d:building100 . 
d:room201 rdf:type dm:Room ; dm:locatedIn d:building200 . 
d:room202 rdf:type dm:Room ; dm:locatedIn d:building200 . 

d:chair15 rdf:type dm:Chair ; dm:locatedIn d:room101 . 
d:chair23 rdf:type dm:Chair ; dm:locatedIn d:room101 . 
d:chair35 rdf:type dm:Chair ; dm:locatedIn d:room202 . 
d:desk22 rdf:type dm:Desk ; dm:locatedIn d:room101 . 
d:desk59 rdf:type dm:Desk ; dm:locatedIn d:room202 . 

The following query asks for furniture in building 100. No triples above will match either of the query's two triple patterns, so a SPARQL engine that can't do inferencing won't return anything. I wanted the query engine to infer that if chair 15 is a Chair, and Chair is a subclass of Furniture, then chair 15 is Furniture; also, if that furniture is in room 101 and room 101 is in building 100, then that furniture is in building 100.

PREFIX dm: <http://learningsparql.com/ns/demo#> 
PREFIX d: <http://learningsparql.com/ns/data#> 
SELECT ?furniture
WHERE 
{ 
  ?furniture a dm:Furniture .
  ?furniture dm:locatedIn d:building100 . 
}

We need the first triple pattern because the data above includes triples saying that rooms 101 and 102 are located in building 100, so those would have bound to ?furniture in the second triple pattern if the first triple pattern wasn't there. This is a nice example of why declaring resources as instances of specific classes, while not necessary in RDF, does a favor to anyone who will query that data—it makes it easier for them to specify more detail about exactly what data they want.

When using this query and data in a namespace (in the Blazegraph sense of the term) configured to do inferencing, Blazegraph executed the query against the original triples plus the inferred triples and listed the furniture in building 100:

Several years ago I backed off from discussions of the "semantic web" as a buzzphrase tying together technology around RDF-related standards because I felt that the phrase was not aging well and that the technology could be sold on its own without the buzzphrase, but the example above really does show semantics at work. Saying that dm:locatedIn is a transitive property stores some semantics about that property, and these extra semantics let me get more out of the data set: they let me query for which furniture is in which building, even though the data has no explicit facts about furniture being in buildings. (Saying that Desk and Chair are subclasses of Furniture also stores semantics about all three terms, but that won't be as interesting to a typical developer with object-oriented experience.)

Blazegraph calls their subset of OWL RDFS+, which was inspired by Jim Hendler and Dean Allemang's RDFS+ superset of RDF that added in OWL's most useful bits. (It's similar but not identical to AllegroGraph's RDFS++ profile, which has the same goal.) Blazegraph's Product description page describes which parts of OWL it supports, and their Inference And Truth Maintenance page describes more.

A few other interesting things about Blazegraph as a triplestore and query engine:

  • The REST interface offers access to a wide range of features.

  • Queries can include Query Hints to optimize how the SPARQL engine executes them, which will be handy if you plan on scaling way up.

  • I saw no no direct references to GeoSPARQL in the Blazegraph documentation, but they recently announced support for geospatial SPARQL queries. (I've been learning a lot about working with geospatial data at Hadoop scale with GeoMesa.)

Blazegraph's main selling points seems to be speed and scalability (for example, see its Scaleout Cluster mode) and I didn't play with those at all, but I liked seeing that SPARQL querying with inferencing support can take advantage of such new hotness technology as GPUs. It will be interesting to see where Blazegraph takes it.


Please add any comments to this Google+ post.

23 April 2016

Playing with a proximity beacon

Nine-dollar devices send URLs to your phone over Bluetooth.

I've been hearing about proximity beacons lately and thought it would be fun to try one of these inexpensive devices that broadcast a URL for a range of just a few meters via Bluetooth Low Energy (a.k.a. BLE, which I assume is pronounced "bleh"). Advocates often cite the use case of how a beacon device located near a work of art in a museum might broadcast a URL pointing to a web page about it—for example, one near Robert Rauschenberg's Bed in New York's Museum of Modern Art could broadcast the URL http://moma.org/collection/works/78712, their web site's page with information about the work. When the appropriate app on your phone (or perhaps your phone's operating system) saw this, it would alert you to the availability of this localized information.

beacon in phone charger

You can find these beacons for as little as $14, and even cheaper on eBay, where colorful bracelet versions can cost less then $10. Most need batteries, typically the kind you put in a watch, so to avoid this I got a RadBeacon USB from Radius Technologies that draws its power from any USB port where you plug it in. At the right you can see mine plugged into a conference swag phone recharger.

I also chose this one because it supports Google's Eddystone open beacon format, Apple's iBeacon format, and Radius Network's AltBeacon. I haven't dug into the pros and cons of these different formats yet; I just wanted something that was likely to work out of the box with both my Samsung S6 Android phone and my wife's iPhone. The RadBeacon USB did fine.

You configure it with a phone app built for that particular beacon product line. The Android RadBeacon app generally worked, although I often had to press "Apply" several times and restart Bluetooth before new settings would actually take hold. Its documentation shows the kinds of properties it lets you set, such as the URL to broadcast and the Transmit Power (which affects the battery life and the distance that the URL is broadcast—in a museum, you want people receiving the URL of the painting in front of them, not the one twenty feet to the left of it).

I had set mine to the URL of a sample web page that I created for this purpose. While waiting for my RadBeacon to arrive in the mail, after Dan Brickley tweeted the mobiForge article Eddystone beacon technology and the Physical Web, I learned a lot from it about which components of my web page would be picked up by an app that received the broadcast URL.

After I configured the beacon, the open source physical web app found it and displayed the following on my Samsung S6:

screenshot of physical web app

Tapping the blue title took the phone to the web page. This all worked the same, with the same app, on my wife's iPhone.

I don't want to have to bring such an app to the foreground every time I want to check for nearby beacons, so I was glad to see that the app also added something to my phone's notifications list:

screenshot of Android notifications

Touching the notification sent the phone to the referenced web page.

Both notifications above show what the app pulled from my sample web page: the content of the head element's title element and the value of the content attribute from the meta element that had a name attribute value of "description". They also displayed the hastily-drawn favicon image I created for the web page.

A beacon won't broadcast just any URI that you want, because the allowable length is somewhat limited. (This could vary by beacon product.) The article mentioned above describes the role of URL shorteners in the architecture. Still, the idea of such inexpensive hardware using URIs to identify things brings a nice semantic web touch to an Internet of Things architecture.

One experiment I tried was the use of Audio Tag Tool to add every metadata field available to an MP3. I then configured my beacon to broadcast that MP3's URL, but none of the metadata showed up on my phone's display. I thought that the idea of location-specific audio might be interesting. (You could also implement location-specific audio with much older technology—for example, Victrolas—but the ability to control the audio from a central server could lead to interesting possibilities.)

The museum use case for beacons is nice and cultured, but I wonder about the attraction of a technology whose real main use case for now is to pump ads at people. (When was the last time you scanned a QR code with your phone?) I say "for now" because I remain hopeful that creative people will come up with more interesting things to do with these, especially if they dig into the Eddystone, iBeacon, and AltBeacon APIs. For example, you could add features to your own apps to check for or even act as beacons, communicating with other beacons and apps around your phone whether these devices had Internet connections or not. The Opera browser's use of schema.org metadata stored in web pages referenced by beacons is also promising, and I know that Dan is putting more thought into what role schema.org can play.

The idea of the broadcast URL showing up as a notification on your phone that you can follow or ignore is much simpler than starting up a special app on your phone and then pointing the phone at one corner of a poster, which the QR enthusiasts thought we'd be happier to do. The short article 5 Common Misconceptions About Beacons and Proximity Marketing gives a good perspective on where beacons can fit into the communications ecosystem in general and the world of marketing in particular. The article is from one of several companies building a business model around advertising via beacons, but like I said above, I hope that the APIs inspire other users for them as well.


Please add any comments to this Google+ post.

"Learning SPARQL" cover

Recent Tweets

    Archives

    Feeds

    [What are these?]
    Atom 1.0 (summarized entries)
    Atom 1.0 (full entries)
    RSS 1.0
    RSS 2.0
    Gawker Artists