News TED Talks

How IBM Watson can mine knowledge from TED Talks

Posted by: and
What is the relationship between income inequality and crime? IBM Watson is imagining a way to search across TED Talks to build a nuanced answer to that question. Image courtesy IBM Watson

What is the relationship between income inequality and crime? IBM Watson is imagining a way to search across TED Talks to build a nuanced answer to that question from the speakers who tackle it. Image courtesy IBM Watson

Imagine being able to ask a panel of TED speakers: Will having more money make me happy? Will new innovations give me a longer life? A new technology from IBM Watson is set to help people explore the ideas inside TED Talks videos — by asking the questions that matter to them, in natural language.

Users will be able to search the entire TED Talks library by asking questions about ideas — and then go directly to the segments within each video where the ideas are discussed. By analyzing concepts within each video, Watson can curate a playlist of short video clips that offer many perspectives on the user’s question. Below each clip is a timeline that shows more concepts that Watson found within the talk, so that users can “tunnel sideways” to follow another interest that’s contextually related, allowing for serendipitous exploration. Users can keep leaping from one video to another, using concepts as a bridge to other topics that might interest them.

Today, IBM and TED showed a demo of the technology at World of Watson, a symposium and hackathon in Brooklyn aimed at expanding the role of cognitive computing in society.

The new project came out of a brief hallway meeting at a TED event last winter, and was developed by an ad hoc team in about eight weeks. It’s a great example of how quickly developers can build new applications on the open Watson platform.

Here, IBM project leads Jeffrey Coveyduc and TED.com editor Emily McManus discuss how a simple idea turned into a prototype of a brand-new way to find insights in unstructured video data.

Emily: Last winter at a TED Institute conference in Berlin, I watched Dario Gil from IBM’s Research group talk about Watson and cognitive computing, and I loved his vision of how computers and humans could work together to answer hard questions using natural language analysis. I started thinking: what kinds of new questions could we answer from TED Talks, if we fed them into Watson? We have a set of almost 2,000 videos covering a pretty broad set of human knowledge. And sometimes it’s just hard to know what’s in there — we have good metadata, but people think in language, not metadata. So I asked Dario backstage: What could Watson do with our TED Talks videos and transcripts?

Jeff: Dario came to me to discuss. As mutual fans of TED Talks, we both knew there was a tremendous amount of compelling information and insight contained in those videos. But it wasn’t just the content — it’s the fact that TED’s mission is about spreading ideas, and here at IBM we’re using Watson to help extract knowledge that’s relevant to you and was previously difficult to access. It seemed like a great match.

Emily: TED Talks has an API to help developers access our videos and transcripts, but to speed up the delivery of all this material, since we were both in New York City, Aja Bogdanoff from my team threw all 1,900 videos, transcripts and metadata on a hard drive. It felt very Mission: Impossible. I told the team at IBM, “I’ll hand you the package as if we’re both spies.”

Jeff: We referred to the hard drive as “the football,” and kept the project under wraps, nicknaming it “Secret Squirrel.” We went into the effort without an ending in mind, not knowing where the adventure would take us. This was important, because the team — about a dozen experts in natural language processing, visualization, video analysis, speech recognition, along with media artists — needed the freedom to be creative, to experiment, to make mistakes as they explored what was possible.

The team used a bunch of Watson services from our developer cloud, like Concept Insights, Personality Insights, AlchemyAPI’s Concept Tagging and Entity Identification, along with some of Watson’s core capabilities like natural language processing and its ability to understand context in questions.

At the end of our first flurry of activity, Watson had assembled all of the TED videos and the ideas within them into a visualization of the TED universe. Within this universe, ideas began to group themselves into neighborhoods — and we found that the relationship of these neighborhoods was interesting in itself, offering us new insights into the TED Talks as a whole. For example, Music talks are located close to those focused on Time and Mind.

Emily: Four weeks ago, I got a call from Dario. He said he had something exciting to tell my team and could we come over to Astor Place the next day.

He demonstrated Secret Squirrel and how Watson had in essence indexed all our TED Talks — and was now able to show a universe of concepts extracted from TED videos, including some topics that were not covered in our current metadata. (Who knew we had so many talks that mentioned World War II?)

Then they showed us how a user could ask Watson a question in natural language and get a string of short clips from many talks, building a nuanced and complex answer to a big question. We know this is how our users want to talk about talks — when someone’s trying to remember a video they once saw, they don’t remember keywords, they remember ideas.

Jeff: Applying Watson in this way has shown us how we can make it much easier to explore diverse expertise and viewpoints in video content. For example, think of all the TV journalism and university MOOCs content that is produced.

On the TED.com site, visitors can search TED Talks by keywords and other metadata. Together, we’ve shown that by adding Watson, it’s possible to go beyond keywords and extract the essence of each talk, identifying the ideas and concepts within it. Doing so will enable people to discover information they care about. This project is focused on TED videos, but the broader idea of video analysis is a big deal. More than 95% of the world’s digital material is video. Unlike traditional text search, it’s difficult and time consuming to find particular slices of information in video content.

We know that some of the great advances produced by humans come at the intersection of disciplines and through the collision of ideas. We think this project is helping show that the vast storehouse of video that society is producing, once it’s unlocked, could become a shared source of creativity and innovation.

Read Fast Company’s coverage of the announcement »

This post will also appear on the Watson blog. Read much more there about IBM’s cognitive computing system »