Finding answers to complex questions on the Internet is often a challenge, as a simple search can sometimes lead you down a rabbit hole of impersonal data. In A Haystack Full of Needles: Cutting Through the Clutter of the Online World to Find a Place, Partner or President, Jim Hornthal explores groundbreaking new approaches to discovering the useful insights buried deep within our complex and noisy datasphere. Hornthal, a venture capitalist in Silicon Valley, introduces us to innovators who are pushing the edges of data science and data visualization by applying the principles of pattern recognition to isolate relevant signals in the noise. Their efforts will have enormous implications for the way we practice medicine, discover music and movies, and even identify our romantic partners.
Curious to hear more about the ideas he explores in his e-book, the TED Blog asked Hornthal a few questions over email.
So much content comes pouring at us every day and it’s getting trickier to find material that is useful. What gives you hope?
It comes from the efforts of great innovators and entrepreneurs who are investing their talents and creating technologies to manage the data, filter the noise, and provide powerful new tools to help users navigate the growing tsunami of content. So while an online search today can often feel generic and not suited to your individual tastes, there are a lot of innovators who continue to push the limits of what’s possible. And there’s hope that we can stay ahead of this unimaginable content overflow.
In you book you talk about ‘discovery engines’ as promising great rewards in this area. What are those?
A discovery engine is a smart set of algorithms and techniques that help identify a potential list of useful answers for a user when there is more than one “right” answer. One of the key defining points of discovery engines is that they get smarter with each use. For example, Pandora uses sophisticated pattern recognition techniques to build custom playlists. The default playlists are further refined by the billions of “thumbs up” and “thumbs down” that Pandora users have registered as they listen to the music Pandora recommends.
What is the difference between the capabilities of the old-fashioned search engines and this next-generation discovery engine?
Filtering out the junk is something that both approaches to data mining have to tackle. However, next generation discovery engines are structured more like specialized “genomes” that reflect the complexity of a user’s interests and needs, and then generates a custom set of alternatives. For example, Triporati‘s Destination Genome Project deals with 560 quadrillion potential unique combinations of attributes (interests and activities) to present personalized “bucket lists” from over 2,500 indexed destinations for leisure travelers looking to discover their perfect vacation destination.
While data analysis programs can help us become smarter and more efficient consumers of goods, the ability to find the signal in the noise comes with added responsibilities and risks when protecting our personal data. What are they?
There is a trade off we all make between providing information providers with a clearer picture of who we are (our interest graph, demographic profile, etc.), and the expected increase in relevance that this disclosure implies. What is not always so clear is how that data might be used, and what its ‘shelf life’ might be. If information providers are transparent about their intent (and that does not mean hiding behind obtuse “Terms of Service” paragraphs that most people don’t read or can’t understand) and consumers willingly opt in for the benefits provided by the increased level of intimacy and sharing, then the system works. When either part starts to break down, the potential for harm increases. Be aware of who you are telling what to, and don’t be afraid to use “Incognito Windows” or “Private Browsing” settings on your browsers when you are surfing new sites that you don’t necessarily know or trust.
What will the search or discovery engine of 2015 look like? Where are we heading?
Siri is a leading indicator of sophisticated voice recognition and contextual search to deliver more relevant results. With continued higher bandwidth and faster processing speeds, big data challenges will be more readily tamed by the pioneers in data science and data visualization, whose combined efforts should help consumers.