Evaluating the ICC’s Chief Prosecutor

Geoff Dancy joins us for this contribution to our ongoing joint symposium with Opinio Juris on the Next ICC Prosecutor. Geoff is an Associate Professor of Political Science at Tulane University.

Systematically evaluating the performance of prosecutors is extraordinarily difficult. Even in the criminal law-obsessed United States, attempts to develop prosecutor performance measures have not really gotten off the ground.

One problem is that publics often do not share a clear conception of what they want from prosecutors, despite the power that these individuals wield in most common law systems. Another problem is that performance evaluation reports can easily get bogged down in an off-putting morass of indicator design and audit culture newspeak.

The challenge of developing shared evaluative criteria and tools of prosecutor assessment, if hard at the domestic level, is even harder at the International Criminal Court (ICC). The job of chief prosecutor is a wholly unique position that has existed for less than two decades. Its occupant serves a much wider group of constituencies than do national or local prosecutors, and she faces radical disagreements on what the office should try to achieve—fueled by well-meaning and not-so-well-meaning observers alike.

On top of that, in June 2021, when the newest prosecutor assumes office, he or she will find limited guidance on how to measure success. As I and others have written, the Court itself has attempted to develop performance benchmarks over the last five years, but mainly produced a collection of 63 loosely configured indicators with little discussion of how they fit into a coherent theory. And it’s doubtful that this year’s much anticipated independent expert review of the ICC will provide a clearer set of evaluative guidelines.

In fact, we should probably accept the reality that there will never be a single, uniform performance framework that can be applied to the Office of the Prosecutor (OTP). Running OTP is less like flying a plane and more like triage. The job cannot be converted into a checklist.

In fact, the way we conduct performance review of the ICC is contingent on our plural worldviews, and the divergent preferences that derive from them. We will never eliminate this pluralism, but we can do better to recognize it, and engage in more systematic thinking when judging the ICC Prosecutor’s work. I have three suggestions in this regard.

  1. Distinguish between performance evaluation and impact assessment

Through they are regularly lumped together by evaluation scientists, gauging how well a person or organization has performed a job is distinct from assessing their broader impact in the world.

Performance evaluation simply means determining the worth of a thing. Worth, or value, is subjective, so this task necessarily introduces normative judgment. In the case of OTP operations, performance evaluation usually involves weighing how well the chief prosecutor makes decisions in situations of discretion.

Ordinarily, critics conduct such evaluations by approving or disapproving of “outputs” produced by prosecutorial choices. These choices include how long to engage in preliminary examination, which investigations to pursue, which individuals to target, which charges to bring, whether to seal an arrest warrant, how to collect evidence, and how to present evidence. Negative evaluations, which seem most common, usually take the following form: “X action by OTP was not ideal.” This implies that things would have been more “just, right, equitable, or reasonable” if the Prosecutor would have done thisinstead of that.”

Impact assessment is different. It means studying how the outputs of particular decisions translate into observable changes in society. By definition, this must include some kind of causal inference.

For example, it is widely accepted that Luis Moreno Ocampo’s decision in late 2010 to charge rival Kenyan politicians Uhuru Kenyatta and William Ruto – for crimes against humanity related to election violence from three years prior – inspired the two to form an unexpected anti-ICC coalition. Forces combined, they came from behind to win the 2013 presidential election and allegedly used their new power to intimidate witnesses and promote African non-cooperation with the Court. From this, many contend (e.g., here and here) that the ICC intervention had perverse effects: it entrenched the impunity of perpetrators, caused harm to victims and witnesses, and undercut the Court’s general legitimacy.

This contention is an impact assessment because it posits a relationship between charges issued by OTP (X) and external events (Y), in this case an electoral victory by accused atrocity criminals.

It is useful to distinguish between impact assessments like this and performance evaluations because good impacts do not always come from good performance. Some lauded the Prosecutor for simultaneously summoning Kenyatta and Ruto because doing so demonstrated a “political balance” that would dispel complaints about victor’s justice. However, this positively evaluated move arguably produced impacts that were on balance negative. The point is, we may positively judge a strategic move in the moment, only to find out later that the outcomes of that move were less than desirable.

  1. Be transparent about performance evaluation criteria

This raises an important point. When one is evaluating the relative worth of a prosecutorial decision or strategy, it is imperative that one specify the normative criteria being applied. The ICC commentariat often rushes to judgment following big moves by the Prosecutor, but rarely do writers specify how they have chosen to frame their particular critique.

There are no outright successes for the Prosecutor. Whatever decision she makes will be attacked by some, and complemented by others. The reason is that there are myriad standards available for appraising her work, and they do not all fit neatly into the same theoretical model. I can think of four such models, all of which point to different evaluative touchstones, measuring good performance in disparate ways.

The first is managerial model, which comes from public administration and zeroes in on the prosecutor’s role as the leader of an organization with a public budget and a regular staff. For the managerialist, the most important criteria for evaluating the chief prosecutor are economy and efficiency. In short, the prosecutor is operating best when limiting costs, and over time, devoting fewer resources to producing more convictions.

Managerialism pushes the Prosecutor to do whatever she can to secure the apprehension of suspects under warrant, which means being pragmatic and cooperating with states. In many ways, this reflects the outlook of the Assembly of State Parties, who would prefer a “basic size” OTP with quicker trials and more bang for the buck. The benchmark outputs for this model are low cost, expeditiousness, arrest rate, and conviction rate.

The second is the rule of law model. Advocates who care most that rule of law standards remain unsullied place a great deal of emphasis on procedural fairness, transparency, and impartiality, as well as the prosecutor’s adherence to apoliticism. Though the structure of the court imposes “difficult policy dilemmas on even the most well-meaning and politically detached Prosecutor,” rule of law purists will insist on absolute non-recognition of political circumstances in Prosecutor decision-making.

According to this model, the most important criteria for evaluating the ICC Prosecutor is whether she horizontally transmits legal norms throughout the international community and vertically transmits legal accountability through positive complementarity. The benchmark outputs for this model are factors like objective, evidence-based investigations and filings; accessibility of information; and local capacity-building in target states.

Third is the anti-impunity model. Presumably, because the ICC has a consistent budget that is unlikely to balloon any time in the near future, the Prosecutor will only be able to pursue a certain handful of cases in any given year. Recognizing this, those who think that the Court serves an expressive or “socio-pedagogical” function—that it sends a message of accountability to would-be perpetrators in the future–might place more emphasis on making a splash.

Rather than plodding along with medium-level defendants, anti-impunity advocates would urge the Prosecutor go after big fish, regardless of how much political power they wield. For instance, adherents to this model would support moving forward with an investigation in Afghanistan, even if it means running afoul of great powers like the United States. Anti-impunity benchmark outputs are things like high-profile arrest warrants and public communications.

Fourth and finally is the victims’ rights model. For observers most concerned about the right to effective remedy and reparations, it is absolutely critical for the Prosecutor to consult, register, protect, and encourage victims to participate, even if it creates inconveniences. For instance, to victims’ rights proponents, it was wrong for Moreno Ocampo to resist a right for victims to participate in investigations in the DRC situation, even though granting such a right would have limited the Prosecutor’s discretion and sapped precious OTP resources.

Victims’ rights advocates might also insist that the Prosecutor pursue many more cases down the chain of command, as victims are most tormented by local atrocity criminals, not architects of violence living in the capital. Benchmark outputs for this model are number of trials, number of victim participants, witness security measures, and reparations.

These four models are not necessarily mutually exclusive. One might agree in principle with elements of all of them. But the evaluative prescriptions that derive from each will at times contradict. For example, the managerial push for expeditiousness might conflict with the emphasis on fair trials and rights of the defense, both important elements of the rule of law model.  Or the victims’ rights-based demand for more trials with more victims participation will obviously come into conflict with the managerial push for less expenditure, especially given that over 70% of ICC costs go to staff, not operating expenses. And the anti-impunity emphasis on symbolic messaging and attention-grabbing maneuvers might fly in the face of all other models’ emphasis on careful political and legal strategy.

These theoretical models are also not collectively exhaustive; there are probably others. However, even this limited presentation demonstrates that the Prosecutor cannot perform well according to all sets of standards at once. It would be best if analysts acknowledged this reality—that decisions imply tradeoffs—when deploying particular benchmarks for measuring Prosecutor performance.

  1. Attempt to link outputs of interest to impacts in the world

Being open about one’s evaluative criteria means engaging in normative theory: Why do I think that some decisions are better than others? Impact assessment means engaging in causal theory: Can it be shown that certain decisions and outputs create change?

A regrettable fact about studies of the ICC is that—despite being followed closely and studied extensively—we actually do not know whether its performance on various key benchmarks is at all related to its impacts in the world. For instance, we all think that convictions matter, but do we know that they do?

Assessing the effect that the Prosecutor’s decisions have on outcomes is a three-step process: (1) identify certain instances of prosecutorial discretion (2) isolate and evaluate the output (or non-output) that derives from the Prosecutor’s choice, and (3) test whether that output produces outcomes of interest in the world.

By now, you have likely spotted a hole in my argument. Instead of separating between performance evaluation and impact assessment, why don’t we just evaluate the prosecutor’s performance based solely on how effectively it generates positive impacts? In other words, why not promote an alternative evaluative model called pure consequentialism? In short, if the Prosecutor produces good outcomes, she has performed well.

It’s not that easy. One must choose what impacts are the most desirous. What should be the ICC’s goals? Regular answers are that the Court should promote perceptions that it is a legitimate and efficient institution, that it should aim to end conflicts in situation countries, or that it should attempt to deter future atrocities.  Because the world is complex, we cannot possibly analyze all impacts, intended and unintended, at once. Whichever outcome we highlight is a choice that must be guided by the analysts’ preferences, which falls back on our normative models.

Once one isolates an outcome of interest, a second problem is that conducting impact assessment is very hard. It requires careful counterfactual reasoning.

Say we think that Prosecutor incompetence led to Bemba’s acquittal (output), which will lead to more violence in Central African Republic in the near term (outcome of interest). This means saying “Near-term violence in the CAR following Bemba’s acquittal would not have existed in an alternative world where his conviction was upheld.” Technically, it is impossible to prove this is true because we cannot observe the non-existent alternative world we are using for comparison. It is a mental construct. Nevertheless, we must assess the feasibility of argument the best we can, with resort to observable and undeniable patterns in collected evidence.

The best scholarship on the impact of the Court does exactly that. Some authors use controlled statistical models to demonstrate correlations between Court actions and positive impacts like fewer acts of mass violence over time (here,here, and here), but also negative impacts like longer tenure of abusive leaders. Others carefully trace the causal mechanisms that link certain Prosecutorial interventions to conflict dynamics in specific contexts like the DRC and Kenya, and find decidedly mixed results. In both types – correlational and process-tracing impact studies – the best scholarship consists of formally outlined assumptions and theories, which can be weighed with evidence.


When it comes to reviewing the work of the ICC’s chief prosecutor, the state of the field is this: there is no standard approach. Everyone must identify their own preferred criteria for evaluating performance, and do their best to study the links between the prosecutor’s choices, measurable outputs, and real-world impacts.

My argument, in short, boils down to this: when criticizing the Prosecutor’s decisions, specify the normative assumptions or theory that undergird your criticism. Then, if you are able, causally assess whether your understanding of good Prosecutor performance actually matters in the real world.

In this process, most beneficial is transparency, willingness to challenge one’s own worldview, and a healthy skepticism in the face of unjustified certitude. Incidentally, these are all qualities we might hope to find in a good Prosecutor.

