Statistics is a key tool in science, helping us to understand what data reveals about important questions. Yet, the idea of “statistical evidence” remains difficult to define. Professor Michael Evans from the University of Toronto explores this complex issue in his recent study, published in Encyclopedia 2024.

The field of statistics is concerned with situations where there is a quantity of interest whose value is unknown, data has been collected and it is believed that this data contains evidence concerning the unknown value. There are then two major problems that statistical theory is supposed to answer based on the data: (i)  provide a reasonable value for the quantity of interest together with a measure of accuracy of the estimate, and  (ii) assess whether there is evidence in favor of or against a hypothesized value for the quantity of interest together with a measure of the strength of the evidence. For example, an estimate of the proportion of those infected with COVID-19 who will suffer serious disease is certainly of interest or, based on  measurements taken by the Webb telescope, it is desirable to know whether evidence for or against the hypothesized existence of dark matter has been obtained. 

As the paper discusses, there are two broad themes for how these problems are addressed: the evidential and the decision approaches. The evidential approach focuses on ensuring that any statistical methodology used is based clearly on the evidence in the data. By contrast, decision theory aims to use methodologies that minimize potential losses based on an assumed penalty measure on incorrect conclusions. For scientific applications, however, it is argued that prioritizing the evidence in the data fits comfortably with the fundamental aim of science, namely, determining the truth. Professor Evans’ article places him firmly in the evidential camp.

The following quote from the paper establishes a basic problem for the evidential approach: “Most statistical analyses refer to the concept of statistical evidence as in phrases like “the evidence in the data suggests” or “based on the evidence we conclude”, etc. It has long been recognized, however, that the concept itself has never been satisfactorily defined or, at least, no definition has been offered that has met with general approval.”

The fundamental issue then for the evidential approach is: how should statistical evidence be defined? For without a clear prescription of what statistical evidence means, how can it be claimed that a particular methodology is evidence based? Professor Evans’ article reviews many of the approaches taken over the years to address this question.

There are some well-known statistical methods that are used as expressions of statistical evidence. Many are familiar with the use of p-values for problem (ii). There are well-known issues with p-values as measures of statistical evidence and some of these are reviewed in the article. For example, there is the need to choose a cut-off alpha to determine when a p-value is small enough to say there is evidence against a hypothesis and there is no natural choice for alpha. Moreover, p-values never provide evidence in favor of a hypothesis being true. The concept of confidence interval is strongly associated with the p-value and so suffers from similar defects. 

One substantial attempt to establish the concept of statistical evidence as central to the field of statistics was made during the 1960’s and 70’s by Allan Birnabum and his work is discussed in the paper. This resulted in the discovery of a number of interesting relationships among principles that many statisticans subscribe to, like the likelihood, sufficiency and conditionality principles. Birnbaum did not succeed in fully characterizing what is meant by statistical evidence but his work points to another well-known division in the field of statistics: frequentism versus Bayesianism. Birnbaum sought a definition of statistical evidence within frequentism. The p-value and confidence interval are both frequentist in nature. A frequentist imagines the statistical problem under study being repeated many independent times and then searches for statistical procedures that will perform well in such a sequence. 

By contrast, a Bayesian wants the inference to depend only on the observed data and does not consider such an imagined sequence. A cost to the Bayesian approach, is the need for the analyst to provide a prior probability distribution for the quantity of interest that reflects what the analyst believes about the true value of this quantity. After seeing the data, a Bayesian statistician is compelled to update their beliefs, as expressed by the posterior probability distribution of the quantity of interest. It is the comparison of the prior and posterior beliefs that leads to a clear definition of statistical evidence through the intuitive principle of evidence: if the posterior probability of a particular value being true is greater the corresponding prior probability, then there is evidence that this is the true value and if the posterior probability is smaller than the prior probability, then there is evidence against it being the true value. It is the evidence in the data that changes beliefs and the principle of evidence characterizes this explicitly. 

As explained in Professor Evans’ paper, additional elements beyond the principle of evidence are required. To estimate and to measure the strength of the evidence, it is necessary to order the possible values of the quantity of interest and a natural way to do this is through the relative belief ratio: the ratio of the posterior probability of a value to its prior probability. When this ratio is greater than 1, then there is evidence in favor and the bigger this is the more evidence there is in favor, and conversely, when the ratio is less than 1. The relative belief ratio leads to natural answers to both the estimation and the hypothesis assessment problems.

There is much more discussed in the paper including how we deal with the inherent subjectivity in statistical methodology, as in the use of model checking and checking for prior-data conflict. Perhaps most surprising, however, is that the evidential approach via relative belief, leads to a resolution between frequentism and Bayesianism. Part of the story is that the reliability of any inference should always be assessed and that is what frequentism does. This arises in the relative belief approach via controlling the prior probabilities of getting evidence against a value when it is true, and getting evidence in favor of a value when it is false. In the end, inference is Bayesian, as it reflects beliefs and provides a clear definition of statistical evidence, while controlling the reliability of inferences is frequentist. Both play key roles in the application of statistics to scientific problems.

As the world becomes more reliant on data-driven insights, understanding what qualifies as solid evidence is increasingly important. Professor Evans’ research offers a thoughtful foundation to tackle this pressing issue.

Journal Reference

Evans, M. “The Concept of Statistical Evidence, Historical Roots, and Current Developments.” Encyclopedia 2024, 4, 1201–1216. DOI: https://doi.org/10.3390/encyclopedia4030078

About the Author

Michael Evans is a Professor of Statistics at the University of Toronto. He received his Ph.D. from the University of Toronto in 1977 and has been employed there ever since with leaves spent at Stanford University and Carnegie Mellon University. He is a Fellow of the American Statistical Association, he served as Chair of the Department of Statistics 1992-97, Interim Chair 2022-23 and as President of the Statistical Society of Canada 2013-2014. He has served in a number of editorial capacities: Associate Editor of JASA Theory and Methods 1991-2005, Associate Editor of the Canadian Journal of Statistics 1999-2006 and 2017-present, Associate Editor of the journal Bayesian Analysis 2005-2015 and as an Editor 2015-2021, Subject matter Editor for the online journal FACETS (current) and Associate Editor of the New England Journal of Statistics in Data Science (current).
Michael Evans’ research has been concerned with multivariate statistical methodology, computational statistics, and the foundations of statistics. A current focus of research is the development of a theory of inference called relative belief which is based upon an explicit definition of how to measure statistical evidence. Also, his research is concerned with the development of tools to deal with criticisms of statistical methodology associated with its inherent subjectivity. He has authored, or co-authored, numerous research papers as well as the books Approximating Integrals via Monte Carlo and Deterministic Methods (with T. Swartz) published by Oxford in 2000, Probability and Statistics: The Science of Uncertainty (with J. Rosenthal) published by W.H. Freeman in 2004 and 2010 and Measuring Statistical Evidence Using Relative Belief published by CRC Press/ Chapman and Hall in 2015.