Comments on Proof in Science from Alan Journet, Co-Facilitator of Southern Oregon Climate Action Now

For some 30 years I taught Introductory Biology courses I developed for biology majors and non-majors at Southeast Missouri State University. The focus of these courses was on exploring how science is conducted along with the contributions it offers for our understanding about how the works and the limitations of the process.  The following discussion represents a brief summary of the understanding that I hoped students would take away from the experience. 

Science and Proof

The methodology we have developed for exploring how the material (physical / real) world operates is called ‘science.’  The goal of science is to seek and elucidate patterns or relationships in that real world. Science remains the most effective method we have for determining the cause and effect relationships that help us identify the natural rules (the physical principles) governing how our world operates.  However, although this process remains the most reliable method for understanding the inherent rules determining how our world operates, it has limitations.

The ‘scientific method’ is based on the principle of testing hypotheses.  The term ‘testing’ is often misunderstood. In science, this means seeking or  constructing situations (i.e. experiments) that can tell us if the hypothesis is false.  The experimenter spends considerable time exploring what variables (x1, x2, x3 etc.) might influence a given outcome (y), and seeking or designing situations (experiments) that will indicate if hypothesized cause (x1) is the prime factor influencing (y), or if some other variable(x2, x3, etc.)  is the prime cause, or if fluctuation in y is  purely random.

Those who question the conclusions of science often argue that ‘this’ or ‘that’ conclusion has not been ‘scientifically proven.’  This criticism of scientific conclusions exhibits an unfortunate failure to understand the limitations of science, or – equally likely – a willingness to promote public misunderstanding of what the scientific method can provide.

There are essentially three reasons relating to the experimental process employed by science that explain why science cannot provide us with certainty:

 1. Different results in a different place and time

When we conduct experiments, the results of which indicate (x1) causes (y) these experiments are conducted in a specific location at a specific time.  However, if another research team were to conduct the same experiment in a different place and time, they may record different results.  While our results might suggest (x1) causes (y) the results elsewhere may not.  Thus, we cannot conclude that our results are the only possible results.  This limitation is remedied by conducting a diversity of tests at different  times and locations, and assessing what the pattern suggests.  Finding the same results many, many times, is what leads to consensus.

2. Recording the expected result for the wrong reasons

When our experiment indicated that variable (x1) causes result (y) it is based on our having designed an experiment that controls for all other variable that might possibly have affected (y). However, it is always possible that there is some variable (xn) that influence (y) that we have not even imagined or considered and thus have not controlled.  The only remedy that accounts for this possibility is for the experimenter to understand as much about the system in which they are operating that they have controlled for all other possible variables.  Clearly, however, we can never be absolutely certain that there does not exist some other mystery variable about which we have no knowledge.

3. Probability and the limitation of statistical analysis

Scientific experimentation is usually (though not always) based on the collection of numerical data.  These numerical data are then subjected to statistical analysis.  This analysis is designed to answer the question: Does variation in our hypothesized cause – variable (x1) – induce variation in the hypothesized effect – variable (y).  When we consider the results, we apply statistical analysis to determine the probability that variable (x1) influences variable (y).  Let’s  look at some possible outcomes:

If the results of a test of whether Temperature (Y) increased over time were as follows, what would you conclude:

 

Test of Y (Temperature) is a function of X1 (Time)

 

These results (the dashed line represents the ‘best fit’ of a line through the data (which is what Regression Analysis does)  would probably suggest that, indeed, Temperature is increasing with Time.

But suppose instead, you recorded these results:

With this data set, you’d be less convinced that Temperature increases with time.

And then, suppose your results looked like this:

Now, you’d probably be convinced that there is no relationship.  If the Regression line is horizontal, the conclusion is that there is no relationship:  Y is not a function of (fluctuates independently of) X.

Statistical analysis is designed to ask the question: What is the probability that the data represent the last example: ‘no relationship?’  This is termed the Null Hypothesis of not pattern or relationship.

In conventional analysis, the scientific principle is to use the 5% rule.  This means that only if the data indicate a probability of less than 5% that there is no relationship, will we reject the null hypothesis of no relationship and accept the hypothesis that there is, indeed, a pattern or relationship.

But, here’s the crunch.  That 5% criterion represents the 1 in 20 rule.  This means that we only reject the null hypothesis of no pattern or relationship and accept the alternate hypothesis of a pattern or relationship if the chance of no pattern or relationship is less than 1 in 20.  As will be evident, there is always the possibility (1 in 20) the conclusion that a pattern or relationship exists is wrong, that we conclude a pattern or relationship exists when really there is none.  On the other hand, and possibly more likely, we may conclude there is no pattern or relationship when one actually exists.

So What Can we Conclude?

While the above discussion leads to the conclusion that we can never be 100% certain that x1 causes y, we can offer both certainty about a refutation. If our tests of the relationship between x1 and y consistently reveal there is none, we can confidently falsify the hypothesis that x1 causes y.  On the other hand, if repeated tests in different locations and times, and constructed differently, suggest that x1 causes y, our confidence in the veracity of the relationship grows.  The more a hypotheses about the relationship between x1 and y is tested and it remains unfalsified, the more confident in that hypothesis we become.

The Remedy to Doubt

The bottom line is that although science offers us the best protocol for understanding how the (real) world works, it is by no means infallible.  We increase our confidence in scientific conclusions by looking for experiments that really explore all alternative possible hypotheses and which have been replicated and confirmed sufficiently that our confidence in the outcomes grows.  Fortunately, in science, we don’t base our conclusions on just one study, but wait until the replication has occurred.

The difference between this and ‘common knowledge’ or political argument is that in the latter, when we are seeking to evaluate opinions, we tend to look for evidence that supports the opinion; we do not set up genuine tests that will tell us if we are wrong. Science seeks to falsify opinions, whereas in everyday life we seek to confirm opinions – and, not surprisingly, we find evidence to support it – even if the opinion is totally wrong.

Errors in Decision-making

Once we accept that thee is no certainty on science, that we always have the possibility of drawing incorrect conclusions from our research, we have to recognize the implication of this limitation.

Decision theory provides a guide to thinking about the implications of this conundrum by first  defining the Errors:

A Type I Error (False positive) occurs when a researcher conclude a pattern or relationship exists when it does not.  This is equivalent to a test for COVID-19 indicating the subject has the disease when they do not.

A Type II Error (False negative) occurs when a researcher concludes no pattern or relationship exists when one does. This is equivalent to a test for COVID-19 indicating the subject has no disease when they do.

The problem in science – as in all life decisions (large or small) – is that when we make a decision we are always subject to the possibility of making either a Type I or Type II error – and we cannot evade that reality.

The two-by-two contingency table organized as a scientific  Truth Table (below) illustrates the dilemma.  The left axis represents the Reality we are trying to understand.  The Pattern or Relationship we are exploring / hypothesizing may be True (i.e., exists) or False (doesn’t exist).  The horizontal axis then represents our interpretation / conclusion about that reality.  We may conclude it exists (True) or it doesn’t exist (False). The top left combination and the bottom right combination  of reality and our interpretation are great.  If we land in one of these boxes, there is no problem.  But if we land in either the top right or bottom left we have a problem.

 

The general rule of thumb in science is to make the criterion probability where we switch our conclusion from ‘no pattern or relationship’ (the Null Hypothesis) to one of ‘pattern or relationship’ such that we avoid making a Type I error.  That is what the 5% (0.05) rule discussed above does.  We will only conclude there IS a pattern/relationship if the probability that there is no pattern/relationship is below 5% (0.05).  This means the probability that our conclusion of a pattern/relationship must be 95% (0.95) or greater before we will conclude that it is True.  There are times when we want our confidence in there being a pattern/relationship to be even higher that 95% before we will conclude that it’s True.  For example (warning – trivially silly example).  Suppose I am testing the hypothesis that eating peas causes cancer.  Before I will warn a pea-loving nation that they should stop eating peas, I would want to be as sure as I can be that peas ARE carcinogenic.  Thus, I might raise my criterion for accepting the hypothesis from 95% to 99% or even higher (say 99.9%).  This is the same as lowering my criterion (called the alpha or critical value) level for rejecting the null hypotheses from 5% to 1% or 0.1%).

The insightful reader will immediately see a problem.  If I decrease the probability that I make a Type I Error, I automatically increase the probability of making a Type II Error.  Thus, back to the silly pea example,  if I reduce the probability criterion for concluding peas cause cancer when they don’t, I inevitably increase the probability of concluding peas don’t cause cancer when they are, indeed, carcinogenic.   Thus, wherever we set our critical value, we are balancing the costs of making a Type I versus a Type II Error.  The scientific convention is always to err on the side of a Type II error. This reduces the probability of our making claims about relationships when those claims are false; this protects our credibility.  But, to someone who is really concerned about the dangers of suffering cancer, lowering the critical value may be exactly the wrong tactic.  Such a person might reasonably argue: “If there is any danger that peas are carcinogenic, I want to know.”  Thus, such a person would want to raise that critical value.  Indeed, they might argue: “Even if there is a slightly greater tan 50:50 chance that peas are carcinogenic, I want to know.”  Such a person would want that critical value to be raised to 50% (0.5).  In science, this is never done; the only question asked is whether that critical value should be 5%, 1% or 0.1%.

Risk Enters the Equation

The discussion of Type I and Type II errors bring us to the conundrum about drawing inferences about the hypothesized link between human emissions of greenhouse gases and global warming and its climate change consequences.  To the simple discussion of the probability of an event occurring, we need to add the severity of that outcome should it occur (as was introduced in the pea example where cancer rather than a short-term upset stomach was identified as the possible outcome).  This is because decision-making often involves the assessment of risk.

Risk is more than merely a measure of the probability of an event occurring, it is also a measure of the severity of an event should it occur.  Thus:  Risk = Probability * Severity.

We often consider risk when making decisions. For example: ‘Should I cross the street to the ice cream truck for a treat or avoid risking being hit by a truck and skipping the ice cream?  The treat is a benefit, but the truck encounter would be a great cost.  So, if we want the ice cream, we make very certain there is no truck in sight.  In so doing we can minimize the risk, and increase the pleasure. This is undertaken in scientific experimentation by designing a very robust and legitimate test of our hypothesis.

Pascal’s Wager, building on the Truth Table above, is a useful way to consider this dilemma.  In the parallel consideration of Global Climate Change, we identify Reality again on the vertical axis. But, on the horizontal axis we identify whether we accept or reject the wager (hypothesis of human greenhouse gas emissions causing global climate change) and either take action or not.

So, let’s complete the contingency table with the risk outcomes:

Top left is a positive outcome: we accept the hypothesis and act to reduce emissions, thus averting disaster though with some economic cost.  Bottom left is a less positive outcome in that we accept the hypothesis and induce an economic depression, but there never was a problem.  Working our way anti-clockwise, we come to the bottom right which turns out to be the best possible outcome in that we do not incur any economic cost, but there never was a problem.  The last option – top right -is by far the worst.  Here we reject the hypothesis and it was true, thus we take no action and global climate change continues destroying our natural ecosystem, agriculture, fisheries and forestry, and our human health while it stimulates total global economic collapse and the end of civilization as we know it.

What this contingency table tells us rather clearly is that our best choice is definitely to avoid the top right outcome.  Rejecting the science may lead us to the best case outcome, but it also leads to the worst, and the severity of that outcome is such that we assuredly should avoid it.  While concluding that the climate science consensus is true may result in some economic depression if we act and it turns out to be false, if we take action and that consensus is true, we can protect the livability of our planet.

For an entertaining 15 minute video illustrating Pascal’s Wager prepared by High School Chemistry teacher Greg Craven, visit How It All Ends

Hypotheses, Theories and (Natural) Laws

The term theory is commonly (i.e., in everyday language) misused.  The phrase ‘it’s just a theory’ is the perfect illustration of this mistaken usage.  The problem is that ‘theory’ in everyday language is what in science we call a ‘hypothesis.’  Thus:

Hypothesis is the tentative opinion or idea that the scientist generates as a possible statement of a pattern or as an explanation for a phenomenon (i.e., the possible cause for some effect).

A theory, however, comes much later after considerable confidence has developed as a result of repeated studies on a phenomenon producing parallel results. The American Museum of Natural History, for example, offers the following: “A theory is a well-substantiated explanation of an aspect of the natural world that can incorporate laws, hypotheses and facts. The theory of gravitation, for instance, explains why apples fall from trees and astronauts float in space. Similarly, the theory of evolution explains why so many plants and animals—some very similar and some very different—exist on Earth now and in the past, as revealed by the fossil record.”  This largely seems reasonable though I have difficulty with the almost glib user of the term ‘fact’ as though this term is well understood and has a generally accepted definition that incorporates the notion that we can have certainty (a conclusion that was rejected above).

According to Live Science , the term ‘Law’ on the other hand, is the description of an observed phenomenon. It doesn’t explain why the phenomenon exists or what causes it. Good examples of these are the Law of Gravity and the Laws of Thermodynamics.  We may not understand exactly why these work as they do, but we are incredibly confident that they cannot be broken.

Bottom line, here, is that you will not hear a scientists state something like: “Oh, that’s just a theory,” since the concept of theory demands we have a great deal of confidence in its veracity.