Folksonomy, or How I Learned to Stop Worrying and Love the Mess

Clay Shirky, Stewart Butterfield, Joshua Schachter, Jimmy Wales

http://conferences.oreillynet.com/cs/et2005/view/e_sess/6329

At the O'Reilly Emerging Technology Conference

San Diego, California, 16 March 2005

Impressionistic transcript by

Cory Doctorow

doctorow@craphound.com

--

Clay: Why did you open up to user categorization and what's
surprised you about it?

Jimmy: We launched ours last summer after knowing we needed it
for a long time. For the first two weeks in English Wikipedia, it
was a madhouse with all kinds of categorization. The Germans were
more reserved but after a few weeks it caught on there.
Eventually, because people could adjust categories, it all
settled down. We did it because that's the Wikipedia way -- we
never considered doing it any other way.

Stewart: It's not really categorization on Flickr -- it's about
letting users remember. If I add the "Norma" tag to pix of my
mom, whose name is Norma, I don't think it goes into the Norma
category. The unfortunate thing about the term "folksonomy" is
that it implies that it's a replacement for categorization.
People categorize things by noting what they do or don't have:
mammals have hair and live babies; does it have property a? then
it's a whatever.

Joshua: I was collecting 20,000 links in a text-file, and
somewhere along the way I started adding a hash mark and some
text, so I could e.g. grep out all the WiFi links and send them
to a friend. Later I built a Web version so I could send an URL
to a friend, but it was standalone. Eventually I made it
massively multiplayer. The interesting group behavior is the
tagging that isn't categorization, e.g., "To read" -- not a
category, though it has a big group and a lot of social and user
context. People make tags for groups working together, workflow
in RSS -- that's what's most surprising.

Clay: Stewart and Joshua talked about the value to the individual
and Jimmy talked about the group. How do you resolve the tension
between the individual and the group.

Jimmy: We have a group goal: producing a high-quality
encyclopedia. If there's a tension between an individual and a
group, it's a tension between the individual and the
encyclopedia.

Joshua: But you attain consensus -- for some stuff it's clear
what category something is in and for others it's very nebulous.
If it's a confusing or complicated categorization task, maybe
there is some consensus building tool needed.

Delicious is in part a reaction to Wiki where everyone fights
over the same space. We each get our own space which sometimes
collapses.

Wikipedia in the Delicious folksonomies top terms are "free"
"encyclopedia" and "reference" -- which doesn't appear on the
Wikipedia page. That means that Delicious users think of
Wikipedia in ways that you don't.

Stewart: There is one article on San Diego on Wikipedia and a
hundred on Delicious. On Flickr, people tag their Tijuana photos
with "ETECH" because they're on their trip to ETECH. They tag
their photos of Tokyo hotel rooms with TOKYO.

Clay: The circle-in-square group has widespread agreement on
Flickr. Social activity has arisen despite the individual bias.
On Delicious, people use the comment fields to have discussions.

Stewart: When we first created tagging, people used it to do
group projects, e.g. "globe" for photos of globular things.

Marc Canter: Now that I have tags, can I connect them across
different systems

Jimmy: I've been talking to Technorati about sharing database
dumps of our users' tags.

Stewart: Technorati's always doing that. We have 12,000,000 tags,
as compared to the average English vocabulary of 25,000 words.

Joshua: The vast majority are single-use, or typos, or
what-have-you. Just give people tools to trim hedges in their
data-gardens. We have different axes of why you're tagging, what
you're tagging and how it happens. Flickr you mostly tag your
stuff for your own purposes, in Technorati, it's your stuff for
others' purposes, and in Delicious, it's others' stuff for your
purposes. They're different things, they don't necessarily flow
together.

Clay: You guys have RESTful APIs, so someone else can do this.

Stewart: The objective of tags shouldn't be to exhaustively cover
the field -- we'll have a million photos of Tokyo, and if the
TOKYO tag only gets you 400k of them, it's OK. You're only going
to look at 20 of them anyway.

Joshua: I've been trying to resolve this with a UI that gives you
your tags, the most popular tags others have applied to your
links, etc. We don't want you do be dominated by group-think --
your intuition is the most memorable thing. If you tag something
with Java, it doesn't matter that it's more about networking or
P2P -- because you think of it as Java and you'll find it that
way.

Clay: In Flickr and Delicious -- in traditional systems, "user"
and "time" are impermissible categories, you want eternal and
universal categories. But in Folksonomies we want My Stuff
Others' Stuff and Recent stuff.

Audience: Semantic Web people are trying to create large-scale
taxonomies for categorization. How does Folksonomy work with
this?

Stewart: Wikipedia is best: a large group of casuals and a small
group of curators who sort it all out. It's not impossible, but
it's the kind of thing you'd have to pay me to do.

Jimmy: Creating a large-scale category system, a small group of
domain experts can't even remotely compete with a large group of
people. I wouldn't even want to think of what it would cost to
replicate the Wikipedia categories with paid labor.

Stewart: It's a deep philosophical issue: Ontology is a
controversial subject. The idea that it's possible to cleave
nature at the joints is controversial. Yes, there are countries,
Uzbekistan is a country, but ask a physicist or a biologist and
the categories are very fraught.

Joshua: The problem is that ontology doesn't tie in with what my
users are trying to do, like remembering something later. You
tend not to be too broad nor too narrow, but rather try for a
middle ground in your personal tags that serve as mnemonic aids.
If you're into insects, you might have BEETLES, MOSQUITOS, etc,
but if you're not into insects, you might just have INSECTS.

Audience: These systems seem perfect for RDF implementations --
do you use them?

Joshua: I don't think RDF and tags fight -- the Semantic Web is a
way to pull in schemes to your database when it's convenient and
useful. I implemented RDF triple-store and it was 7 orders of
magnitude slower. It might be useful to me later for things other
than file formats and interchange.

Stewart: There's a lot implied in RDF that isn't in tags, or
stuff that's always the same in each tag: THE PICTURE|HAS
TAG|$TAG.

David Weinberger: How much metadata do we need for tags -- who
created them, in which app, what date, what language, is it a
location tag, etc? How much do we need to place into the tags
themselves?

Joshua: If you encumber the tag with hierarchical categorization
they become less easy to type/use, and you raise the barrier to
entry. It may make sense given the data, but it doesn't make
sense given the usability.

Stewart: It has to be post-facto -- you have to infer it from the
stuff the user has entered, but the user won't enter it. Mapr
doesn't make people choose from a list of 50 states, it just
knows that CLEVELAND OH is a place.

Jimmy: If we had a CLEVELAND OH! category and a CLEVELAND, OH
category we just fix it. When it's CARDINALS and there's a bird
and a baseball team, the fact that we're heirarchical fixes it.

Stewart: You can figure this out by examining the user's set --
if she's got Thrushes and Robins, you know it's a bird.

Clay: It's turtles all the way up -- there's a link associated
with every tag that can be tagged, etc.