For this week’s Crimes of Statistics COP session, I’d like to have an open discussion about the history and the controversy surrounding the p-value. I’ll write-up my thoughts and some quotes here to discuss with you. If you are unable to make this session and want to add your thoughts, please add a comment to this post, and we can start the discussion on-line as well.
So, how did the p-value come into existence? Are you aware of the history of this commonly used aspect of statistics? We all feel compelled to report the p-values from our analyses – ok, let’s be honest in order to get our research published, we NEED to publish our p-values. But, how many of us have conducted research that we have NOT published because it didn’t measure up to the 0.05 mark? We have been taught and maybe brain-washed to believe that our results must have a p-value that is less than 0.05. I cannot count the number of times that I’ve worked with students who are so disappointed when their results do not yield this “significant” value – so many years of work and NOTHING is significant, and they feel that they can no longer publish their results except in their thesis.
But, let’s take a closer look at this p-value and how it came about. I’ll admit that I’ve always been fascinated by the p-value and the “magical” powers of “0.05”. Many of you have heard me say that there is nothing magical about 0.05, and I’m sure you’ve thought, oh… she’s off her rocker – I’ve been taught from Day 1 in 2nd year statistics that in order for us to talk about our stats, we need that p-value that is < 0.05. Well… let’s talk about this for a bit.
Here is a famous quote from Fisher (1926) that started it all:
“If one in twenty does not seem high enough odds, we may, if we prefer it, draw the line at one in fifty (the 2 percent point), or one in a hundred (the 1 percent point). Personally, the writer prefers to set a low standard of significance at the 5 percent point, and ignore entirely all results which fail to reach this level. A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.”
Yup! this is how it all started! If you enjoy reading more about this history, you later discover that Fisher did not intend for his statement to be interpreted as it was and used in the way it is today.
Amazing how a statement made in a journal article in 1926 and to this day, we still use the 5% rule. We have lots of research over the past century (yes, 92 years!), that has been hidden away because it never made the cut. What does this mean? What are the implications??
Let’s continue down that road for a moment, we recognize and acknowledge that most publications report results where p < 0.05. In other words, we tend to publish only results that are significant, right? Who is really interested in reading a study where there no significant results. Well how about this quote from Moore (1979) talking about this very challenge or problem:
“Such a publication policy impedes the spread of knowledge. If a researcher has good reason to suspect that an effect is present, and then fails to find significant evidence of it, that may be interesting news. Perhaps more interesting than if evidence in favour of the effect at 5% level had been found.”
Hmm… we are dealing with the same issue in 2018. What are your thoughts? Let’s talk about the implications of this? And how do we define 0.05? If we stick to that rigid line of 0.05, how do YOU handle a p-value of 0.45 or one of 0.54?
I’m looking forward to a great discussion! See you tomorrow, Wednesday, January 18, 2016 in ANNU Rm 101 at 9:30am.
Fisher, R.A. (1926). The arrangement of field experiments. J. Ministry of Agric. Great Britain 23: 503-513.
Moore, D.S. (1979). Statistics: Concepts and Controversies. San Francisco: Freeman.