Shyam Sankar isn’t satisfied with the current state of data analysis. In his recent TEDTalk, “The rise of human-computer cooperation,” Sankar explained why we have a responsibility to create computer programs that drive human-centered decisions, rather than trying to supplant them with computer-centered data processing. In his talk, Sankar — the Director of Forward Deployed Engineering at Palantir Technologies, which is devoted to real-world data analysis — briefly touched on his company’s role in the case known as the Sinjar records.
In 2007, Palantir worked with the Combating Terrorism Center at West Point to sift through and analyze files uncovered in Sinjar, Iraq, a town near the Syrian border, containing records for 700 foreign fighters recruited to aid al Qaeda in Iraq.
Curious to know more about this project, the TED Blog caught up with Sankar and Brian Fishman — then the lead researcher at the CTC on the Sinjar records, now working at Palantir — to talk about the role of the increasingly visible foreign fighters in Syria.
What is the relationship between the Sinjar records and the current conflict in Syria?
Sankar: When the [Syrian] regime was essentially helping — from an anti-American perspective — transit foreign fighters through Syria into Iraq to fuel the insurgency that was happening there, they never really contemplated that the ideological insurgents would settle into Syria, make that home, and then at a later point in time — with the Arab Spring — essentially fuel the insurgency and the revolt against the regime itself. And that a lot of the data that was captured in a context related to the counterinsurgency in Iraq would become critical to understanding who the players are that are actually fighting the Syrian regime right now.
Do you think that foreign fighter groups like al Qaeda are going to play a big role in the conflict in Syria?
Fishman: I think there’s pretty clear evidence that there is a strong jihadi component within the “rebel alliance” opposed to the Assad regime. I’m skeptical that al Qaeda or jihadis — and those two terms, we often use them interchangeably but they’re really not — I think they’re likely to benefit from the rebellion in Syria, but they’re unlikely to come to dominate the Syrian rebellion. I mean, most people, when given a choice between al Qaeda and basically anybody else, choose anybody else. But I think what you see now is that the jihadis, including al Qaeda, that have experience fighting in Afghanistan and experience fighting in Iraq, can bring militarily relevant skills to the table in Syria, and fighters that used to be, you know, bakers and shopkeepers, six months ago, are going to look for that kind of assistance where they can get it. One of the places they can get it these days is from jihadis.
You’ve got a dynamic where there is no singular opposition. There’s this immensely variable collection of people and organizations that are all sort of roughly pointed in the same direction and, within that mess, a group like al Qaeda and, speaking a little bit more broadly, jihadis in general, are going to be able to find folks that they can latch onto. What the Sinjar records showed was that there were these networks, some of them ideologically minded, some of them criminally minded, that existed in Syria going back to at least 2007, or at least 2006, that were tolerated to some degree by the Syrian regime.
And in some ways that’s just an extension of the same dynamic, right? At that point there was this wide collection of people that were generally pointed in the same direction, vis-à-vis Iraq, in that they didn’t like the American presence there and they wanted to disrupt that. But what you’re seeing, I think, is that when you play with groups like al Qaeda, there’s blowback, and when you play with jihadis, there tends to be blowback. We learned that in the 1980s in Afghanistan. I think the Assad regime has learned that in this case. Certainly many of the Iraqi tribal folks that cooperated with al Qaeda early in the Iraq War learned that, and I think that rebel groups in Syria are going to learn that now.
Without the Sinjar records and the more nuanced human-centric analysis that Palantir does, what difficulties might you have faced in trying to parse out these nebulous groups?
Sankar: Essentially [Palantir] allows you to go beyond the first order effects. So, the first order of realization from Sinjar is: Okay, we now know where the foreign fighters are coming from. The second order might be: Oh, I can now characterize how they’re getting here. What does the network of coordinators look like? That in and of itself is really interesting and novel and was difficult to do without Palantir. The third order of effects are things like: if you look at the rise of Libyan foreign fighters, it correlates significantly with a speech and the activity of Abu Yahya al-Libi, who was a prominent Libyan cleric, but he rose to becoming the number two in al Qaeda. And so having this early warning in 2007 that there’s a new dominant and prominent figure, that’s not in the data itself. It’s when you bring that data and combine it with all the other data, and the knowledge you have of the world, that the insight emerges.
Fishman: When we were doing this back in late 2007, early 2008, we had the Sinjar records, and we did a lot of hard work without Palantir at first, to do some basic statistics and learn what we could. We could do all of those kinds of things, but what we couldn’t do, or what would have been very, very difficult for us to do, was some of the second order analysis on, for example, the funnelers, the folks that helped transit people through Syria into Iraq. We had personnel records that corresponded to each individual traveling fighter, and we were able to generate statistics about that fighter, but we weren’t able to easily understand the networks that were embedded within that data asking different kinds of questions, and Palantir helped us ask those kinds of questions.
We were able to identify all of the different fighters that had coordinated with specific smugglers, and we could also easily see the kinds of payments that those fighters were making to each smuggler, and from that we were able to make judgments about whether or not those smugglers were motivated just by, you know, criminality and financial resources, or whether or not they were interested and motivated by ideology. That gave us a sense of what this network actually looked like in Syria, because it was extremely variable. You couldn’t just say, “Every smuggler in Syria is a jihadi.” Some of them were criminals, and understanding that variation is really important.
When you get lots of information loaded into a sort of dynamic platform like Palantir, you can ask any sort of question that comes to your mind, and you don’t necessarily know ahead of time the kinds of questions that you want to ask. I think that that’s illustrated even more now, when we look back, and we had no idea five years ago that the Sinjar records would be useful for at least having a starting point, a baseline, for understanding and thinking about the role of jihadis in a rebellion in Syria today.
How has human-centric data mining changed wartime intelligence tactics in the past few decades, especially since the first Persian Gulf War, or the 1990 Gulf War?
Sankar: In a more conventional fight, you have a well-defined adversary. I don’t want to pick on any country — but you have some country, that’s the adversary who is trying to hurt you. You’re trying to assess their motives, you’re trying to understand how they think about the world. Why are they moving tanks here or there? But as a result, the analysis — I don’t want to say it becomes linear — but the problem is significantly more constrained and focused. But in today’s world, it’s unclear who is your adversary or if you have an adversary. It’s more about understanding. Understanding is a very nuanced thing. And so you can’t just focus on you and the counter-party. You don’t even have a counter-party. It’s you and the world, and contextualizing every piece of information. And in a sense, you know, the Assad regime … you can understand the marriage of convenience that’s happening between the rebels and the ideologues, but you’re also going to want to understand in a post-Assad world how does that unfold? And a lot of that is going to be informed by, who are the ideologues? How are they meshing? Who are the personalities? What motivates them? It’s no longer a constrained counter-party. It’s a fabric, and mapping that fabric becomes very, very hard. It’s intractable using conventional means.
For the Center for Combating Terrorism at West Point, why is it so hard to snuff out members of jihadist groups? Is it because their technology evades us? Or is it something more traditional, like really well kept secrets, or big guns, or — in this case — just confusing data?
Fishman: A lot of the folks that have been involved in terrorist organizations over time seem to have gone offline. They’re not exposing themselves to technological data collection, and, you know, at the end of the day — garbage in, garbage out, right? If you don’t have that much data to analyze, then you don’t have that much to analyze.
Sankar: From my technologist’s perspective, if you think about the fundamental cycle of understanding, usually what happens is that a human is sitting down, thinking. They develop a hypothesis. They explore that hypothesis. That hypothesis leads to some amount of insight. But more interesting than the insight is actually the subsequent hypotheses that are generated from that exploration. So I think of something, I have an idea, I explore it, I come up with three new ideas that I need to explore. So deep understanding comes from maybe going around that cycle 20 times, so the velocity through which you can go through that cycle becomes really important. If you’re drowning in data on one hand and you don’t know where to start on the other, the most important thing is being able to get started and iterate on those cycles very quickly. So how quickly can I ask questions of the data and get answers so I can generate the next set of meaningful questions? Because it’s going to take me a while before the questions I’m actually asking are truly insightful and change the course of how we’re thinking about the world.
I think that’s the difficulty with the computer-only approach. The questions you can ask are highly constrained, and you never get to the interesting questions. In this context, what’s very difficult about analyzing the jihadists is it’s a very recent phenomenon, it’s changing very quickly — on a world history scale, it’s recent — and so we don’t necessarily always know where to begin or have the the depth of understanding that we do about, say, Russia, or just adversaries on a nation-state level.
In an article from Bloomberg Businessweek, the author cites a hypothetical example given by Palantir, in which we could use security video footage from an ATM machine or phone records or geolocation information to find out if a person is a potential terrorist. Shyam, as you mentioned in your talk, this kind of data mining obviously has dangerous implications for privacy and for people’s civil liberties. Could you speak to the gray areas in preemptive counterterrorism?
Sankar: Yes. … The paradox here is essentially in how you decide what data you’re going to share and what data you can use under what circumstances. We kind of bristle — I know it’s going to seem like a subtle technical distinction — but we bristle at the idea of being a data mining platform … [Here's] the data mining approach at the core level: Essentially, you develop an algorithm that looks at all the data to come up with things that the algorithm suspects are suspicious. Our approach is to have humans, who have to have predicates … where the data is actually protected. So, as a hypothetical example, if you’re in the Department of Defense, you can’t see any information on U.S. persons. Even though you’re seeing large amounts of data, the data you can see is constrained by constitutional and legal mandates, and having a way that is verifiable by a third party or an Inspector General that those mandates are enforced is part of the platform. So it’s a big deal. I think privacy and civil liberties are always a discussion around what, as a society, do we believe are the right rules and mandates, but our goal as a company is for democratic societies to be able to decide those rules and then guarantee that they’re enforced.
In Marc Goodman’s ominous talk on crime in the future, he gave an example of the terrorists’ ops center in the 2008 Mumbai attacks, which was monitoring BBC, al Jazeera, CNN, and local stations in real time. What if the terrorists had access to Palantir? Are you ever worried that your work will fail to “protect the Shire,” as it were?
Sankar: Obviously it would be devastating, and we do everything we can to keep [Palantir] out of the wrong hands. In terms of failing to protect the Shire, we aspire to make the world a better place. We obviously can’t prevent every bad thing from happening, but I think it’s a noble thing for computer scientists — especially people who would otherwise kind of jokingly be in a cubicle unable to affect the world — to do what they can to make the world a better place.
You said earlier that you and Marc caught up in Palo Alto. Are your views in conflict with one another? While you are very idealistic about technology, he seems to have the cynic’s view.
Sankar: I don’t know if I’d call him a cynic. I know it can seem that way, but the question is — are we thinking critically about [technology’s implications]? Because the cynic, in my mind, and maybe this is because I’m a technologist, would be the Luddite who says, “Wow, look at how all this can be used for evil. We should just give up.” I think in Marc’s mind, by thinking critically about how it could be perverted and building defenses on it, we ensure the future, and that’s a perspective I agree with. When we were fighting the Russian mob, at Paypal — I called it organized crime in my talk, so as not to call it the Russians — I don’t need any more scrutiny from them. Their fundamental thing is they’re highly adaptive. They kept adapting to everything you learned how to block. And so I think that’s structurally similar to what’s Marc saying. It’s just the rate of adaptation and the level of damage the adversary can inflict have increased tremendously. So to not think about how someone could synthesize your DNA and put it at a crime scene, it calls the entire justice system that we’ve built into question since DNA testing came around, and I think that has some really interesting and fundamental implications. And I’m positive, as a technologist, as a society, we can figure out how to defeat that sort of gaming of our system — but not if we’re burying our head in the sand.
What on the frontier of human and computer interaction excites you?
Sankar: I think there’s a lot more to come with the integration of non-computer data. It could be video, images. It could be the way that people think about and categorize this sort of information, but essentially applied to really important problems. We’ve been doing some of this stuff with child pornography. The image itself has a lot of context — [for example,] where are they located? What’s going on? The platform was used to take down the largest child pornography ring in the world. At Google Ideas, we did a presentation — Brian’s actually pretty closely involved in it — on organ trafficking. Every one of these domains that we’re pushing into influences how we want to think about human-computer symbiosis. The question we tend to ask is: what is the problem in the world we want to solve? How can the technology support it? Which is exactly the same position that Licklider was coming from when he was thinking about human-computer symbiosis. And yes, artificial intelligence would be great, but today, what can I do today? Today I can use the computer to solve the problems in this way.
Fishman: Shyam’s the technologist here, but the idea that really fascinates me is the notion that one of the things that we’re doing at Palantir is redefining how information is stored and how people interact with it fundamentally. In the future, you could have libraries that were accessible through a platform like Palantir, where you are essentially exploring information via relationships, and books are modeled in Palantir. And that’s the kind of thing that I would like to see in the future, is ways to break down existing corpuses of data so that it’s more searchable, more accessible, easier for people to access globally. Because at the end of the day, the whole purpose here is to make this information accessible to people, so that they can do things with it, and I think that there is a lot we can do about bringing different incarnation sources into this platform in order to do that.
Sankar: We used to call it emancipatory intelligence. Most systems you need to be a technologist to use. Google made every person a researcher. Palantir makes every person an analyst.