« Instant tech marketing copy | Main | Clever video about Web 2.0 and XML »

Semantic data entry

Instead of motivating users to use new tools, can we build on the tools that they're already motivated to use?

I think that Tim O'Reilly is overly pessimistic about semantic web technology, but in a recent O'Reilly Radar posting that was part of the freebase vs. semantic web technology debate bouncing around about two week ago, he brought up an important issue that's often overlooked: what motivates a user to go to the extra trouble to indicate the semantics of a piece of data to a program that may read that data? For example, when you add "On April 2nd, breakfast will be served at 8" to a web page, any literate English speaker can understand it. What motivates you to attach the string "2007-04-02T08:00" and an indication of its type somewhere in there? The belief that you're making a better world isn't good enough—you have to believe that it will help the people interested in eating that breakfast.

Tim's posting and a recent conversation I had with Eric Miller about using semweb technology to accumulate knowledge in the workplace got me to wondering about why intranet wiki and SharePoint deployments that I've taken part in didn't work too well. The first level answer is that the users felt no motivation to add the kind of information that those programs are good at accumulating. Instead of asking the obvious second level question (how do we motivate these users to use these programs) I have a different question: how can a knowledge-sharing system build on the users' current practices for storing knowledge?

This brings up today's big question: how do people store knowledge? For example, if Joe HR Guy brings his laptop to a meeting and types up meeting notes during the meeting, what program is he typing with? If Joe tells Jane Project Manager a URL for something that will help her with her current project, how does she remember this URL and what it's for? (This assumes she wasn't told by email, in which case we have one important answer to this question: she remembers that it's in an email from Joe. In fact, if Joe was responsible for taking minutes at his meeting, he may have typed them into his email client so that he could send them to everyone invited to the meeting. I'm more interested in Joe's personal notes about what's worth remembering from the meeting.)

Is anyone's aware of research on this issue? If ten or fifteen people leave comments here about how they store such information, it won't help much, because I want to know about a more representative sample of the population—while I'm sure that some of you use some combination of Emacs, nxml, and elisp macros to automate data entry on everyday topics like I do, I know that's not representative. That's why I asked about Joe HR Guy and Jane Project Manager. I want to know what accountants and assistant vice presidents in all kinds of industries use, not what other XML/metadata geeks use.

For the most part, I'm sure they use Bill Gates' tool of choice: MS Word. I know one business process analyst who takes meeting notes using nested bulleted lists in Word, and it works out just fine for him and for everyone who has to read his notes. Many people, when going to a meeting where they have to work out A, B, C, and D for W, X, Y, and Z, record notes about the relationships by opening up a blank spreadsheet, writing "A B C D" across the top row and "W X Y Z" down the first column, and then filling in the spreadsheet as they talk about A's relationship to W and C's relationship to X. (Or, perhaps they create a tab in the worksheet for each of four categories to allow more three-dimensional accumulation of information.)

Then there's the third corner of the MS Office triumvirate: PowerPoint, which few people use to take notes on ongoing activities, but which many use to assemble knowledge for transmission to other people. We can all complain about a presentation that consists of bulleted lists, but ideas like OPML and its esteemed competition wouldn't have gotten any traction if nested lists of items weren't often a more straightforward, structured approach to storing and transmitting knowledge than paragraphs of prose.

What do you see around you? What commonly available applications do less technical people use to store knowledge as they accumulate it, or can we just assume a default of email folders plus MS Office files scattered around their hard disks? Have you ever heard of any broad, systematic study of how people do this and patterns that may have shown up? I'm not sure what's "semantic" about this particular data, except that it's recorded knowledge that would benefit from aggregation with related knowledge to create a whole that's greater than the sum of its parts, but that sounds pretty worthwhile.


(Note: I usually close comments for an entry a few weeks after posting it to avoid comment spam.)

Hi Bob,

You asked if there was any research in this area, and i'm happy to say there is.

Firstly there is doingPad, which is a collaboration between MIT and the University of Southampton into ways to capture and recall "information scraps", including determining the semantics of what the user has typed.


There is also Rich Tags, a University of Southampton project to utilise the "tagging" method of data entry as a way of semantically marking up resources, and hence building on the current tools that people use, as you say.



Daniel Alexander Smith

Taking about MS tools you forget OneNote - the actual note taking and sharing tool from MS (and popular with the non-cs people around here).

And you might want to search for "Social Semantic Desktop" (Nepomuk project and others) - they adress similar questions (although I don't know how much empirical data they have/have published).

You might check out some of the papers at http://pim.ischool.washington.edu/breakouts.htm. While they probably won't answer your question directly, there are a number of interesting papers. The ideas and authors might make for a good starting point if you were serious about doing some research on your own. Hope that helps! - Brian

I generally agree with your contention that we need to go where the people are - in my ISWC keynote some years ago I suggested that we are wrong in thinking that semantics make it harder for people to author - the secret is to harness them to make it easier - I think Freebase is a good example of this direction - when you say someone is a "person" it suggests the properties you should fill in for a person - I think RDFS and OWL open a lot of potential for this, I'm a tad frustrated because I could never talk any of my grad students into doing this work - maybe it'll be time to revisit this in my new lab... For example, imagine a "home page creator" which would provide hints of topics people like on a homepage (hobby, family, etc.) - you could then enter a term (Scuba diving) and it would search for an ontology in that area - letting you then use it to enter info (or to extend it yourself, another idea freebase got right). I wasn't thinking so much of retrofitting existing tools (although when I was at DARPA I proposed that adding DAML to clipart objects could make a great improvement in searching for powerpoint - another one worth revisiting?) -- anyway, this is all to say I think that you are right about needing to think a lot more about the value proposition and to figure out what would motivate people to do the right thing, by making it easier for them to do what they do anyway (or to create social worth in the annotation, sort of like why people created web pages in the first place)


When it comes to what I call the "Web 2.0" application profile, we have Blogs, Wikis, Shared Bookmark Managers, Feed Aggregators, Blog Rolls, Photo Galleries etc..

I have attempted to address the fusion of Web 2.0 application profiles (in fact Distributed Collaborative Apps & Services) via our OpenLink Data Spaces (ODS) platform.

In a nutshell, ODS gives the Web 2.0 user or developer an accelerated leap into the Data Web (or Web 3.0) without any RDF Tax.

I have also authored a number of demonstrations via my blog which has been a live demonstration of all of this for a very long time :-)

BTW - there are Live (Demonstration) and Live (serious experimentation)

instances of ODS for anyone to evaluate today.

It's a bit of a trap to think about this in terms of what "people do" to record knowledge / data, and Tim O'Reilly's comments fall into that trap. There are a lot of different kinds of contexts which shape why people record information / data, and what is or isn't a "record" is also affected by context.

For example, I might be willing to use a calendar program at work, but not use one for my personal calendar at home. The different contexts shape my motivations towards the efforts of data entry, and I might have different standards of "semantically correct" calendars between home and work.

So, a fully structured vCalendar event might be required at work, whereas "this week: Colin's party Sat" might be all I need at home.

There are many layers that factor into what people do--everything from the physical entry devices to the concepts of a computer (e.g., vs a network) to the concept of an application to the concepts of sites / pages / files, etc. And, then, there are our motivations towards obligation, communication, gratification, etc.

There are many kinds of designs that can make recording data (or more semantically elaborate data) easier or better for people. But, they're generally only better relative to specific contexts of people's motivations and needs, given the constraints of the computers.

(By that "constraints of the computers" bit, I mean to imply that one could alternately create interfaces uniquely suited to specific types of semantic data entry, e.g., a physical device that looks like a wall calendar with icons for people and project names that makes it easy to generate data sets of FOAF + vCard-RDF + DOAP + etc.)

Thanks everyone, there are a lot of great leads here.

Jay - I agree that trying to generalize this too much would lead to a mess, which is why both of my use cases are about using information that people record in order to get work done at their jobs. I think that's a fine place to start.

Going through some old bookmarks, I just found the Keeping Found Things Found project at the University of Washington, which looks valuable for this research.