Folksonomy, or How I Learned to Stop Worrying and Love the Mess Clay Shirky, Stewart Butterfield, Joshua Schachter, Jimmy Wales http://conferences.oreillynet.com/cs/et2005/view/e_sess/6329 At the O'Reilly Emerging Technology Conference San Diego, California, 16 March 2005 Impressionistic transcript by Cory Doctorow doctorow@craphound.com -- Clay: Why did you open up to user categorization and what's surprised you about it? Jimmy: We launched ours last summer after knowing we needed it for a long time. For the first two weeks in English Wikipedia, it was a madhouse with all kinds of categorization. The Germans were more reserved but after a few weeks it caught on there. Eventually, because people could adjust categories, it all settled down. We did it because that's the Wikipedia way -- we never considered doing it any other way. Stewart: It's not really categorization on Flickr -- it's about letting users remember. If I add the "Norma" tag to pix of my mom, whose name is Norma, I don't think it goes into the Norma category. The unfortunate thing about the term "folksonomy" is that it implies that it's a replacement for categorization. People categorize things by noting what they do or don't have: mammals have hair and live babies; does it have property a? then it's a whatever. Joshua: I was collecting 20,000 links in a text-file, and somewhere along the way I started adding a hash mark and some text, so I could e.g. grep out all the WiFi links and send them to a friend. Later I built a Web version so I could send an URL to a friend, but it was standalone. Eventually I made it massively multiplayer. The interesting group behavior is the tagging that isn't categorization, e.g., "To read" -- not a category, though it has a big group and a lot of social and user context. People make tags for groups working together, workflow in RSS -- that's what's most surprising. Clay: Stewart and Joshua talked about the value to the individual and Jimmy talked about the group. How do you resolve the tension between the individual and the group. Jimmy: We have a group goal: producing a high-quality encyclopedia. If there's a tension between an individual and a group, it's a tension between the individual and the encyclopedia. Joshua: But you attain consensus -- for some stuff it's clear what category something is in and for others it's very nebulous. If it's a confusing or complicated categorization task, maybe there is some consensus building tool needed. Delicious is in part a reaction to Wiki where everyone fights over the same space. We each get our own space which sometimes collapses. Wikipedia in the Delicious folksonomies top terms are "free" "encyclopedia" and "reference" -- which doesn't appear on the Wikipedia page. That means that Delicious users think of Wikipedia in ways that you don't. Stewart: There is one article on San Diego on Wikipedia and a hundred on Delicious. On Flickr, people tag their Tijuana photos with "ETECH" because they're on their trip to ETECH. They tag their photos of Tokyo hotel rooms with TOKYO. Clay: The circle-in-square group has widespread agreement on Flickr. Social activity has arisen despite the individual bias. On Delicious, people use the comment fields to have discussions. Stewart: When we first created tagging, people used it to do group projects, e.g. "globe" for photos of globular things. Marc Canter: Now that I have tags, can I connect them across different systems Jimmy: I've been talking to Technorati about sharing database dumps of our users' tags. Stewart: Technorati's always doing that. We have 12,000,000 tags, as compared to the average English vocabulary of 25,000 words. Joshua: The vast majority are single-use, or typos, or what-have-you. Just give people tools to trim hedges in their data-gardens. We have different axes of why you're tagging, what you're tagging and how it happens. Flickr you mostly tag your stuff for your own purposes, in Technorati, it's your stuff for others' purposes, and in Delicious, it's others' stuff for your purposes. They're different things, they don't necessarily flow together. Clay: You guys have RESTful APIs, so someone else can do this. Stewart: The objective of tags shouldn't be to exhaustively cover the field -- we'll have a million photos of Tokyo, and if the TOKYO tag only gets you 400k of them, it's OK. You're only going to look at 20 of them anyway. Joshua: I've been trying to resolve this with a UI that gives you your tags, the most popular tags others have applied to your links, etc. We don't want you do be dominated by group-think -- your intuition is the most memorable thing. If you tag something with Java, it doesn't matter that it's more about networking or P2P -- because you think of it as Java and you'll find it that way. Clay: In Flickr and Delicious -- in traditional systems, "user" and "time" are impermissible categories, you want eternal and universal categories. But in Folksonomies we want My Stuff Others' Stuff and Recent stuff. Audience: Semantic Web people are trying to create large-scale taxonomies for categorization. How does Folksonomy work with this? Stewart: Wikipedia is best: a large group of casuals and a small group of curators who sort it all out. It's not impossible, but it's the kind of thing you'd have to pay me to do. Jimmy: Creating a large-scale category system, a small group of domain experts can't even remotely compete with a large group of people. I wouldn't even want to think of what it would cost to replicate the Wikipedia categories with paid labor. Stewart: It's a deep philosophical issue: Ontology is a controversial subject. The idea that it's possible to cleave nature at the joints is controversial. Yes, there are countries, Uzbekistan is a country, but ask a physicist or a biologist and the categories are very fraught. Joshua: The problem is that ontology doesn't tie in with what my users are trying to do, like remembering something later. You tend not to be too broad nor too narrow, but rather try for a middle ground in your personal tags that serve as mnemonic aids. If you're into insects, you might have BEETLES, MOSQUITOS, etc, but if you're not into insects, you might just have INSECTS. Audience: These systems seem perfect for RDF implementations -- do you use them? Joshua: I don't think RDF and tags fight -- the Semantic Web is a way to pull in schemes to your database when it's convenient and useful. I implemented RDF triple-store and it was 7 orders of magnitude slower. It might be useful to me later for things other than file formats and interchange. Stewart: There's a lot implied in RDF that isn't in tags, or stuff that's always the same in each tag: THE PICTURE|HAS TAG|$TAG. David Weinberger: How much metadata do we need for tags -- who created them, in which app, what date, what language, is it a location tag, etc? How much do we need to place into the tags themselves? Joshua: If you encumber the tag with hierarchical categorization they become less easy to type/use, and you raise the barrier to entry. It may make sense given the data, but it doesn't make sense given the usability. Stewart: It has to be post-facto -- you have to infer it from the stuff the user has entered, but the user won't enter it. Mapr doesn't make people choose from a list of 50 states, it just knows that CLEVELAND OH is a place. Jimmy: If we had a CLEVELAND OH! category and a CLEVELAND, OH category we just fix it. When it's CARDINALS and there's a bird and a baseball team, the fact that we're heirarchical fixes it. Stewart: You can figure this out by examining the user's set -- if she's got Thrushes and Robins, you know it's a bird. Clay: It's turtles all the way up -- there's a link associated with every tag that can be tagged, etc.