Transcript

Three powerful talks from TED2007

This week we’re posting three of the most-talked-about talks from TED2007 — Ngozi Okonjo-Iweala, John Doerr and Blaise Aguera y Arcas’ remarkable demo of Seadragon/Microsoft Photosynth.

Ngozi Okonjo-Iweala, the former Finance Minister for Nigeria (and the first woman to hold that job), argues for investment — rather than aid — as the means to help Africa. Okonjo-Iweala will also speak at next week’s TEDGlobal conference in Arusha, Tanzania. John Doerr, legendary Silicon Valley venture capitalist, has turned his investment focus from high tech to greentech — because his daughter asked him to. Blaise Aguera y Arcas, software architect for Microsoft and architect of Seadragon, put Microsoft’s jaw-dropping Photosynth software through its paces in a demo that had TED2007 abuzz. (Recorded March 2007 in Monterey, CA.)

NEW: Read the transcripts >>

What I’m going to show you first, as quickly as I can, is some foundational work- some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago.

(zooms in on a photo on a desktop screen of many photos)

This is Seadragon, and it’s an environment in which you can either locally or remotely interact with vast amounts of visual data. (zooms out again) We’re looking at many, many gigabytes of digital photos here, seamlessly and continuously zooming and panning through the thing, rearranging it in any way we want.

(photos spin and reshuffle, starts zooming and panning in and out again)

And it doesn’t matter how much information we’re looking at, how big these collections are, or how big the images are, and that most of them are ordinary digital camera photos.

(zooms in on a map within the set of photos)

But this one, for example, is a scan from the Library of Congress, and it’s in the 300 megapixel range.

(zooms in ultra-close on the scan, showing detail of Library of Congress stamp)

It doesn’t make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment.

(zooms in on book pages, each chapter arranged page by page in columns)

It’s also very flexible architecture- this is an entire book. So this is an example of non-image data. This is
f2 i Bleak House
f1 i0 , by Dickens. Every column is a chapter,

(zooms WAY down so we can see just how much text there is, and how detailed and fine the print is)

And to prove to you that it’s really text, and not an image, we can do something like so, to really show that this is a real representation of the text, it’s not a picture. Maybe this is a kind of an artificial way to read an e-book, I wouldn’t recommend it, this is a more realistic case-

(zooms out, pictures flip, then he zooms back in on an e-book version of
f2 i The Guardian
f1 i0 )

This is an issue of
f2 i The Guardian
f1 i0 . Every large image is the beginning of a section, and this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper, which is an inherently multi-scale kind of medium. We’ve also done a little something with the corner of this particular issue of
f2 i The Guardian
f1 i0 . We’ve made up a fake ad-

(zooms in on car ad)

That’s very high resolution, much higher than you’d be able to get in an ordinary ad, and we’ve imbedded extra content-

(zooms in on tiny imbedded information inside ad)

You want to see the features of this car, you can see it here. Or, other models-

(zooms in on different imbedded section, then scrolls and zooms to super tiny tech specs)

Or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We hope that this means no more pop-ups, and other kind of rubbish like that should be necessary.

(switches to world map)

Of course mapping is one of those really obvious applications for technology like this, and this one I really won’t spend any time on except to say that we have things to contribute to this field as well.

(zooms in on street map of San Jose, then out again)

But- those are all the roads in the U.S., superimposed on top of a NASA geo-spatial image.

So let’s pull up, now, something else-

(photosynth intro page)

So this is actually live on the web now. You can go check it out. This is a project called Photosynth, which really marries two different technologies. One of them is Seadragon, and the other is some very beautiful computer vision research done by Noah Snavely, a graduate student at the University of Washington, co-advised by Steve Seitz at U-dub, and Rick Szeliski at Microsoft Research. A very nice collaboration. And so this is live on the web, it’s powered by Seadragon, you can see that when we kind of do these sorts of views, where we can dive through images and have this kind of multi-resolution experience-

(scrolls through images, gradually revealing a meshed panoramic view of a mountainside and lake, in which he zooms in and out to reveal levels of detail)

But the spatial arrangement of the images here is actually meaningful. The computer vision algorithms have registered these images together, so they correspond to the real space in which these shots, all taken near Grassy Lakes in the Canadian Rockies, all these shots were taken. So you see elements here of a stabilized slide show, or panoramic imaging, and these things have all been related spatially. I’m not sure if I have time to show you any other environments. There are some that are much more spatial.

But I’d like to jump straight to one of Noah’s original data sets. And this is from an early prototype of Photosynth that we first got working in the summer-

(3D image of Notre Dame cathedral, rendered in bright dots on black background)

To show you what I think is really the punch line behind this technology- the Photosynth technology- And it’s not necessarily so apparent from looking at the environments that we’ve put up on the website- We had to worry about the lawyers and so on.

(pans around so full grounds, facade, etc. of Notre Dame is seen)

This is a reconstruction of Notre Dame cathedral, that was done entirely computationally from images scraped from Flickr. You just type Notre Dame into Flickr, and you get some pictures of guys in t-shirts, and of the campus, and so on, and each of these orange cones represents an image that was discovered to belong to this model.

(shows that each cone represents a photographer, clicking on cone shows what part of ND is photographed, then the individual detail photo of the facade is zoomed in on)

And so these are all Flickr images. And they’ve all been related spatially in this way, we can just navigate in this very simple way-

(begins scrolling through photos in a similar way to the mountain landscape before, going around the grounds, zooming in on architectural features, etc)

(applause) Thank you. You know, I never thought that I’d end up working at Microsoft, and it’s very gratifying to have this kind of reception here. (laughter)

(zooms back, showing series of long shots of ND, related spatially as before)

So this is- As you can see, this is lots of different types of cameras, it’s everything from cell phone cameras to professional SLRs, quite a large number of different- but it’s stitched together into this environment- if I can find some of the sort of weird ones-

(switches to view of grid of all the different photos, zooms into corner)

There- so many of them are occluded by faces, and so on- Somewhere in here there’s actually- there’re a series of photographs- here we go.

(zooms in on picture of guy in front of a poster)

This is actually a poster of Notre Dame that registered correctly. OK? So if we- we can dive in from the poster to a physical view of this environment.

(zooms on rose window in poster, which gives way to a photo of the rose window)

So, what the point here really is, is that we can do things with the social environment- this is now taking data from everybody, from the entire collective memory of, visually, of what the Earth looks like, and link all of that together. All of those photos become linked together and they make something emergent that’s greater than the sum of the parts. You have a model that emerges of the entire Earth, think of it as the long tail to Stephen Lawler’s Virtual Earth work.

And this is something that grows in complexity as people use it, and whose benefits become greater to the users as they use it. Their own photos are getting tagged with metadata that somebody else entered. If somebody bothered to tag all of these saints and say who they all are, then my photo of Notre Dame cathedral suddenly gets enriched with all that data. And I can use it as an entry point to dive into that space, into that metaverse, using everybody else’s photos, and do a kind of cross-modal and cross-user social experience that way. And of course, a by-product of all of that is immensely rich virtual models of every interesting part of the Earth, collected not just from overhead flights and from satellite images and so on but from the collective memory. Thank you so much.

(Chris Anderson walks on)

CA: Do I understand this right- that what your software is going to allow is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to basically link together?

BAyA: Yes. What this is really doing is discovering- it’s creating hyperlinks, if you will, between images. And it’s doing that based on the content inside the images, and that gets really exciting when you think about the richness of the semantic information that a lot of those images have. Like when you do a web search for images- you type in phrases, and the text on the web page is carrying a lot of information about what that picture is of. Now what if that picture links to all of your pictures? Then the amount of semantic interconnection- the amount of richness that comes out of that is really huge. It’s a classic network effect.

CA: Blaise, that is truly incredible. Congratulations.

BAyA: Thanks so much.