Evaluation Policy Framework
The CHEAF collaborates with partners to promote long-term solutions that help people live healthy, productive lives. Achieving our ambitious goals requires rigorous evaluation so we and our partners can continually improve how we carry out our work.
Evaluation is the systematic, objective assessment of an ongoing or completed intervention, project, policy, program, or partnership. Evaluation is best used to answer questions about what actions work best to achieve outcomes, how and why they are or are not achieved, what the unintended consequences have been, and what needs to be adjusted to improve execution. When done well, evaluation is a powerful tool to inform CHEAF and partner decision-making about how to optimize scarce resources for maximum impact. It is distinct from other forms of measurement that focus only on observing whether a change has occurred, not why or how that change occurred.
Our current practice in evaluation is characterized by variation, and in the absence of a policy, decision-making is left to individual program teams and program officers. Because the foundation supports a diverse range of partners and projects, it is necessary to have a clear organizational understanding of how evaluation should vary to best inform decision-making across each of these areas.
Our evaluation policy is intended to help CHEAF staff and our partners align their expectations in determining why, when, and how to use evaluation. More specifically, the policy encourages CHEAF teams to be more transparent, strategic, and systematic in deciding what and how to evaluate. Our aim is to integrate evaluation into the fabric of our work, achieve early alignment with our partners about what we are evaluating and why, and generate evidence that is useful to us and our partners as we move forward.
Our evaluation policy is rooted in our business model, which involves working with partners to achieve the greatest impact. Early in the grant proposal process, we work with prospective partners to define and agree on measurable outcomes and indicators of progress and success. This enables our partners to learn as they carry out their work, rather than be distracted by requirements to measure and report at every step along the way.
This approach reinforces the role of evaluation in testing innovation, making improvements, and understanding what works and why to learn quickly from failure and replicate success.
The policy is also rooted in the foundation’s core values: collaboration, rigor, innovation, and optimism.
More specifically:
CHEAF organizes its resources by initiatives, each in a specific program. Each initiative has its own goals and priorities, partners and grantees, and allocation of resources. Initiative teams execute their strategies by making investments (grants, contracts, and program-related investments) as well as through advocacy work.
CHEAF teams measure the progress of their initiatives and investigate what works best to achieve priority outcomes using many different types of evidence. A combination of evaluation findings, partner monitoring data, grantee reports, modeling, population-level statistics, and other secondary data offer a more cost-effective and accurate alternative to large summative evaluations. We use all of these sources, including evaluation where relevant, expert opinion, and judgment, to decide how to refine CHEAF initiatives on a regular basis.
Evaluation is particularly warranted in the following instances:
When evidence is needed to fill a knowledge gap or evaluate a significant policy decision. Evaluation can help to resolve uncertainty and determine the relative cost-effectiveness of different interventions, models, or approaches.
When we and our partners need a better understanding of how a cluster of important investments or a specific program or project is performing.
When an organization, intermediary, or consortium that we work with is at a critical stage of development and can benefit from an independent performance assessment.
When a program team needs to assess the progress of a new operational model or approach, evaluation provides reliable, independent feedback about what needs to be improved to strengthen our approach and partner relationships.
Evaluation is a high priority when program outcomes are difficult to observe, and knowledge is lacking about how best to achieve results—such as when we collaborate with partners who are working to improve service delivery or effect behavioral change, identify, replicate, or scale innovative models, or catalyze change in systems, policies, or institutions.
Evaluation is a low priority when the results of our efforts are easily observable. It is also a low priority when our partners are conducting basic scientific research, developing but not distributing products or tools, or creating new data sets or analyses. In such cases, our partners’ self-reported progress data and existing protocols (such as for clinical trials) provide sufficient feedback for decision-making and improvement.
Program teams are not expected to use evaluation to sum up the results of CHEAF initiatives. This would not be the best use of scarce measurement and evaluation resources for two reasons: 1) the impact of our investments cannot easily be differentiated from that of our partners’ investments and efforts, and 2) CHEAF leaders are more interested in learning how our teams can make the best use of resources and partnerships and how to strengthen program execution.
Evaluation is a contested discipline. We are aware of the ongoing and healthy debate about what types of evidence are appropriate to inform policy and practice in U.S. education and in international public health and development. However, the diversity of our partners and areas of focus precludes us from promoting only certain types of evaluation evidence as acceptable for decision-making.
We avoid a one-size-fits-all approach to evaluation because we want our evaluation efforts to be designed for a specific purpose and for specific intended users. This approach to evaluation design, which we call fit to purpose, has three elements:
The following three designs represent the vast majority of the evaluations we support.
Evaluation is a high priority when program outcomes are difficult to observe, and knowledge is lacking about how best to achieve results—such as when we collaborate with partners who are working to improve service delivery or effect behavioral change, identify, replicate, or scale innovative models, or catalyze change in systems, policies, or institutions.
Evaluation is a low priority when the results of our efforts are easily observable. It is also a low priority when our partners are conducting basic scientific research, developing but not distributing products or tools, or creating new data sets or analyses. In such cases, our partners’ self-reported progress data and existing protocols (such as for clinical trials) provide sufficient feedback for decision-making and improvement.
Program teams are not expected to use evaluation to sum up the results of CHEAF initiatives. This would not be the best use of scarce measurement and evaluation resources for two reasons: 1) the impact of our investments cannot easily be differentiated from that of our partners’ investments and efforts, and 2) CHEAF leaders are more interested in learning how our teams can make the best use of resources and partnerships and how to strengthen program execution.
Evaluations that help our partners strengthen the execution of projects are among the most relevant for CHEAF because they provide feedback about what is and isn’t working within a specific location or across locations.
We use this type of evaluation in the following scenarios:
Such evaluations should be designed with the following considerations in mind:
Evaluations may include impact estimates if those are needed to inform important decisions—about scaling up an initiative, for example, or about the level of penetration needed to ensure a certain level of impact. Impact estimates should not be used as proof of macro-level impact, however.
Because the assumptions used to construct impact estimates can lead to large error margins, a robust baseline of key coverage indicators is essential, along with data on how these indicators have changed over time. The population-level impact can then usually be determined through modeling or the use of secondary data.
In select cases, it may be necessary to determine a causal relationship between the change in coverage and the desired population-level impact. If so, the design should include a plausible counterfactual, usually obtained through modeling or comparison with national or sub-national trends.
Evaluations that produce causal evidence can be used to decide whether to scale up or replicate pilots, innovations, or delivery models. They can also provide essential knowledge to the foundation, our partners, policymakers, and practitioners.
We use this type of evaluation in the following scenarios:
Evaluations of causal relationships should be designed with the following considerations in mind:
Evaluations of causal relationships should not be used when existing proxies of effectiveness and outcomes are sufficient. They are also not appropriate for evaluating whole packages of interventions with multiple cause-and-effect pathways.
Evaluations that provide a neutral assessment of the effectiveness of an organization or operating model can inform foundation and partner decision-making about how best to use financial or technical resources, resolve challenges, and support ongoing progress.
We use this type of evaluation selectively in the following scenarios:
Evaluations of institutional effectiveness and operating models should be designed with the following considerations:
Such evaluations are largely qualitative and should not seek to assess the causal relationship between a partner organization or operating model and program outcomes.
Our evaluation policy is a starting point for strengthening how we use evaluation within CHEAF and with our partners. We complement it with resources and designated roles within CHEAF that enable clear decision-making about when and how to use evaluation and facilitate consistent management of evaluations and use of findings. These resources and roles are detailed in the following sections.
Program teams that work with partners each have an evaluation plan, which they share openly with partners to promote collaboration, joint evaluation, and learning within and outside CHEAF. The plan identifies existing evidence and the critical gaps that we and our partners need to fill to inform decision-making and build knowledge.
Program officers consult the team plan before making decisions about specific evaluations to ensure that evaluation investments fit into an overall strategic framework. They also consult with CHEAF’s central Strategy, Measurement, & Evaluation team, which works with all program teams at the foundation to find opportunities to invest in and share evaluations that have cross-program relevance and to advance innovation in evaluation methods.
During the grant development process, our program officers and partners discuss and decide whether an evaluation will be needed to ensure alignment on expectations and sufficient resources to produce useful evaluations. Key factors include the following:
All CHEAF-funded evaluations—whether conducted by independent parties or integrated into our partners’ work—are recorded in CHEAF’s evaluation registry. This helps us track evaluation spending and findings and ensure continuity and consistency regardless of any foundation or partner staff turnover.
Responsibility for evaluation takes place at many levels of CHEAF: