Converting CSV to RDF

The simplest way yet.

There are probably dozens of ways to convert comma-separated values to parsable RDF, but I recently came up with one that was so simple that I wanted to share it.

Here is a sample CSV list:

"red" , "blue", "gray"

If I put the following before it and a period after it,

@prefix : <http://rdfdata.org/csv#> . :csvList :item 

I get this: parsable RDF using the Turtle syntax.

@prefix : <http://rdfdata.org/csv#> . :csvList :item "red" , "blue", "gray" . 

That's it. It works as a single line like that, but it's easier for human eyes to read if you look at it as an abbreviated version of the following:

@prefix : <http://rdfdata.org/csv#> .  
:csvList :item "red" .
:csvList :item "blue" .
:csvList :item "gray" .

Or, "the csvList resource has 'red', 'blue', and 'gray' as item property values".

I just made up the URI, subject, and predicate. Your next step would probably be to use SPARQL to convert them to something more appropriate to your application.

I've used the semicolon in Turtle and SPARQL many times to avoid repeating a triple's subject for multiple triples. I've used the comma, which delimits a list of objects that go with the same subject and predicate, less often, and it's the key to the trick here: that a CSV list is already a part of Turtle syntax.

Converting CSV data to RDF in just about any programming language would be a very short script, and it's easy enough with products such as TopBraid Composer, so I'm not interested in accumulating a list of other ways to do it here, unless you can beat mine for simplicity. I just thought it was neat that something as simple as prepending a short string and appending a period would turn a CSV list into legal, parsable RDF.

1971 Chevrolet Nova (Custom) '937 CSV' 1

7 Comments

Cute! Sadly, CSV escapes internal quotes by doubling them, and Turtle requires them to be escaped as \", so this trick will only work if you have no quotes in your data.

Of course, it also totally fails to capture any of the semantics of table rows/columns/cells, so it's not like you were going to use it for real!


Glenn,

You're assuming that spreadsheets are the only source of CSV. I've already used this trick for real, when I was passing a few values from a Javascript script to a SPARQL query that was acting on RDF data combined from several sources.


Good point. Replace the word "real" in my comment with "whole spreadsheets"!


Bob, that's neither a list in CSV nor RDF/Turtle.

It's a row in CSV (being picky, but I'll explain why in a second), but more importantly you've created a set in RDF, not a list.

The Turtle list syntax would be:

:csvRow rdf:value ("red" "blue" "grey")

Why does that matter? Because, as much as I agree with your "next step would probably be to use SPARQL to convert them to something more appropriate to your application" (it's what we do in the JSON2RDF approach of Linked Open Services), you've lost the structure and can't differentiate between columns in a graph pattern.

Jumping back to the comment about CSV rows, this is clearer if (instead of having homogeneous data across columns in your source), you had something like:

"red", "FF0000"
"green", "00FF00"
"blue", "0000FF"
"yellow", "FFFF00"

You could project this into a list of lists:

(("red" "FF0000")
("green" "00FF00")
("blue" "0000FF")
("yellow" "FFFF00"))

(A valid Turtle doc being:

Then could could actually make a construct like:

CONSTRUCT
{?item rdfs:label ?colour; rdf:value ?code}
WHERE
{[rdf:first ?item] .
?item rdf:first ?colour; rdf:rest [rdf:first ?code]}

Leading to:
[rdfs:label "red"; rdf:value "FF0000"] .
[rdfs:label "green"; rdf:value "00FF00"] .
[rdfs:label "blue"; rdf:value "0000FF"] .
[rdfs:label "yellow"; rdf:value "FFFF00"]


Ideally, of course, rather than these being blank nodes you would reuse or mint a URI scheme for them, but this requires two new features of SPARQL 1.1 to include in the query.


Thanks Barry, that's interesting.

What I did worked for my needs--it wasn't just a demo, but something in an actual application I was developing for a client--but I appreciate the clarification of terminology.


No problems. Actually I only realised this was so long ago after I posted. I think you were in the thread about tools to achieve this, including Google Refine?


No, I wasn't really looking for extra tools. I just had to hand off a bit of data from some Javascript to TopBraid Composer and was looking for the simplest way to represent it as parsable triples, and I thought it was neat how simple it turned out to be. Neat enough to blog it...