Quantcast
Channel: junk science – Error Statistics Philosophy
Viewing all articles
Browse latest Browse all 34

Some ironies in the ‘replication crisis’ in social psychology (3rd installment)

$
0
0

freud mirror espThere are some ironic twists in the way social psychology is dealing with its “replication crisis”, and they may well threaten even the most sincere efforts to put the field on firmer scientific footing–precisely in those areas that evoked the call for a “daisy chain” of replications. Two articles, one from the Guardian (June 14), and a second from The Chronicle of Higher Education (June 23) lay out the sources of what some are calling “Repligate”. The Guardian article is “Physics Envy: Do ‘hard’ sciences hold the solution to the replication crisis in psychology?”

The article in the Chronicle of Higher Education also gets credit for its title: “Replication Crisis in Psychology Research Turns Ugly and Odd”. I’ll likely write this in installments…(2nd, 3rd)

^^^^^^^^^^^^^^^

The Guardian article answers yes to the question “Do ‘hard’ sciences hold the solution“:

Psychology is evolving faster than ever. For decades now, many areas in psychology have relied on what academics call “questionable research practices” – a comfortable euphemism for types of malpractice that distort science but which fall short of the blackest of frauds, fabricating data.

But now a new generation of psychologists is fed up with this game. Questionable research practices aren’t just being seen as questionable – they are being increasingly recognised for what they are: soft fraud. In fact, “soft” may be an understatement. What would your neighbours say if you told them you got published in a prestigious academic journal because you cherry-picked your results to tell a neat story? How would they feel if you admitted that you refused to share your data with other researchers out of fear they might use it to undermine your conclusions? Would your neighbours still see you as an honest scientist – a person whose research and salary deserves to be funded by their taxes?

For the first time in history, we are seeing a co-ordinated effort to make psychology more robust, repeatable, and transparent.

“Soft fraud”? (Is this like “white collar” fraud?) Is it possible that holding social psych up as a genuine replicable science is, ironically, creating soft frauds too readily?

Or would it be all to the good if the result is to so label large portions of the (non-trivial) results of social psychology?

The sentiment in the Guardian article is that the replication program in psych is just doing what is taken for granted in other sciences; it shows psych is maturing, it’s getting better and better all the time …so long as the replication movement continues. Yes? [0]

^^^^^^^^

It’s hard to entirely dismiss the concerns of the pushback, dubbed in some quarters as “Repligate”. Even in this contrarian mode, you might sympathize with “those who fear that psychology’s growing replication movement, which aims to challenge what some critics see as a tsunami of suspicious science, is more destructive than corrective” (e.g., Professor Wilson, at U Va) while at the same time rejecting their dismissal of the seriousness of the problem of false positives in psych. The problem is serious, but there may be built-in obstacles to fixing things by the current route. From the Chronicle:

Still, Mr. Wilson was polite. Daniel Gilbert, less so. Mr. Gilbert, a professor of psychology at Harvard University, … wrote that certain so-called replicators are “shameless little bullies” and “second stringers” who engage in tactics “out of Senator Joe McCarthy’s playbook” (he later took back the word “little,” writing that he didn’t know the size of the researchers involved).

Wow. Let’s read a bit more:

Scrutiny From the Replicators

What got Mr. Gilbert so incensed was the treatment of Simone Schnall, a senior lecturer at the University of Cambridge, whose 2008 paper on cleanliness and morality was selected for replication in a special issue of the journal Social Psychology.

….In one experiment, Ms. Schnall had 40 undergraduates unscramble some words. One group unscrambled words that suggested cleanliness (pure, immaculate, pristine), while the other group unscrambled neutral words. They were then presented with a number of moral dilemmas, like whether it’s cool to eat your dog after it gets run over by a car. Ms. Schnall wanted to discover whether prompting—or priming, in psych parlance—people with the concept of cleanliness would make them less judgmental…..These studies fit into a relatively new field known as embodied cognition, which examines how one’s environment and body affect one’s feelings and thoughts. …

For instance, political extremists might literally be less capable of discerning shades of grey than political moderates—or so Matt Motyl thought until his results disappeared. Now he works actively in the replication movement.[1]

500x307-embo-reports-vol-73-meeting-report-fig-1-abc

Aside: Nosek, Spies and Motyl wrote an interesting article: “Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability.” From a quick read: I agree with their goal of promoting “truth over publishability” and some of their strategies might well help, if followed. My main gripe is that they felt the need for footnote 1 to soften their notion of truth: “We endorse a perspectivist approach…––the idea that all claims may be true given the appropriate conditions…” Well if a statement S is not a self contradiction, then it has a model and so conditions under which it comes out true, but that’s not at all helpful in an article urging truth over publishability. The rest of note 1 gets squishier, their galoshes sinking further into murky swamplands. I could have helped if they’d asked! There’s no need to backtrack on “truth”, especially if it’s already in your title.
Links are here.
7/1: By the way, since Schnall’s research was testing “embodied cognition” why wouldn’t they have subjects involved in actual cleansing activities rather than have them unscramble words about cleanliness?
^^^^^^^^^^
Another irony enters: some of the people working on the replication project in social psych are the same people who hypothesize that a large part of the blame for lack of replication may be traced to the reward structure, to incentives to publish surprising and sexy studies, and to an overly flexible methodology opening the door to promiscuous QRPs (you know: Questionable Research Practices.) Call this the “rewards and flexibility” hypothesis. If the rewards/flex hypothesis is correct, as is quite plausible, then wouldn’t it follow that the same incentives are operative in the new psych replication movement? [2]

A skeptic of the movement in psychology could well ask how the replication can be judged sounder than the original studies? When RCTs fail to replicate observational studies, the presumption is that RCTs would have found the effect, were it genuine. That’s why it’s taken as an indictment of the observational study. But here, one could argue, it’s just another study, not obviously one that corrects the earlier. The question some have asked, “Who will replicate the replicators?” is not entirely without merit. Triangulation for purposes of correction, I say, is what’s really needed. [3]

Daniel Kahneman, who first called for the “daisy chain” (after the Stapel scandal), likely hadn’t anticipated the tsunami he was about to unleash.[4]

Daniel Kahneman, a Nobel Prize winner who has tried to serve as a sort of a peace broker, recently offered some rules of the road for replications, including keeping a record of the correspondence between the original researcher and the replicator, as was done in the Schnall case. Mr. Kahneman argues that such a procedure is important because there is “a lot of passion and a lot of ego in scientists’ lives, reputations matter, and feelings are easily bruised.”

That’s undoubtedly true, and taking glee in someone else’s apparent misstep is unseemly. Yet no amount of politeness is going to soften the revelation that a published, publicized finding is bogus. Feelings may very well get bruised, reputations tarnished, careers trashed. That’s a shame, but while being nice is important, so is being right.

Is the replication movement getting psych closer to “being right”? That is the question. What if inferences from priming studies and ”embodied cognition” really are questionable. What if the hypothesized effects are incapable of being turned into replicable science?

^^^^^^^^^

The sentiment voiced in the Guardian bristles at the thought; there is pushback even to Kahneman’s apparently civil “rules of the road”:

 For many psychologists, the reputational damage [from a failed replication]… is grave – so grave that they believe we should limit the freedom of researchers to pursue replications. In a recent open letter, Nobel laureate Daniel Kahneman called for a new rule in which replication attempts should be “prohibited” unless the researchers conducting the replication consult beforehand with the authors of the original work. Kahneman says, “Authors, whose work and reputation are at stake, should have the right to participate as advisers in the replication of their research.” Why? Because method sections published by psychology journals are generally too vague to provide a recipe that can be repeated by others. Kahneman argues that successfully reproducing original effects could depend on seemingly irrelevant factors – hidden secrets that only the original authors would know. “For example, experimental instructions are commonly paraphrased in the methods section, although their wording and even the font in which they are printed are known to be significant.”

“Hidden secrets”? This was a remark sure to enrage those who take psych measurements as (at least potentially) akin to measuring the Hubble constant:

If this doesn’t sound very scientific to you, you’re not alone. For many psychologists, Kahnemann’s cure is worse than the disease. Dr Andrew Wilson from Leeds Metropolitan University points out that if the problem with replication in psychology is vague method sections then the logical solution – not surprisingly – is to publish detailed method sections. In a lively response to Kahnemann, Wilson rejects the suggestion of new regulations: “If you can’t stand the replication heat, get out of the empirical kitchen because publishing your work means you think it’s ready for prime time, and if other people can’t make it work based on your published methods then that’s your problem and not theirs.”

Prime time for priming research in social psych?

Read the rest of the Guardian article. Second installment later on…maybe….

What do readers think?
^^^^^^^^^^^^^^

2nd Installment 7/1/14

Naturally the issues that interest me the most are statistical-methodological. Some of the methodology and meta-methodology of the replication effort is apparently being developed hand-in-hand with the effort itself—that makes it all the more interesting, while also potentially risky.

The replicationist’s question of methodology, as I understand it, is alleged to be what we might call “purely statistical”. It is not: would the initial positive results warrant the psychological hypothesis, were the statistics unproblematic? The presumption from the start was that the answer to this question is yes. In the case of the controversial Schnall study, the question wasn’t: can the hypotheses about cleanliness and morality be well-tested or well probed by finding statistical associations between unscrambling cleanliness words and “being less judgmental” about things like eating your dog if he’s runover? At least not directly. In other words, the statistical-substantive link was not at issue. The question is limited to: do we get the statistically significant effect in a replication of the initial study, presumably one with high power to detect the effects at issue. So, for the moment, I too will retain that as the sole issue around which the replication attempts revolve.

Checking statistical assumptions is, of course, a part of the pure statistics question, since the P-value and other measures depend on assumptions being met at least approximately.

The replication team assigned to Schnall (U of Cambridge) reported results apparently inconsistent with the positive ones she had obtained. Schnall shares her experiences in “Further Thoughts on Replications, Ceiling Effects and Bullying” and “The Replication Authors’ Rejoinder”:http://www.psychol.cam.ac.uk/cece/blog

The replication authors responded to my commentary in a rejoinder. It is entitled “Hunting for Artifacts: The Perils of Dismissing Inconsistent Replication Results.” In it, they accuse me of “criticizing after the results are known,” or CARKing, as Nosek and Lakens (2014) call it in their editorial. In the interest of “increasing the credibility of published results” interpretation of data evidently needs to be discouraged at all costs, which is why the special issue editors decided to omit any independent peer review of the results of all replication papers. (Schnall)

Perhaps her criticisms are off the mark, and in no way discount the failed replication (I haven’t read them), but CARKing? Data and model checking are intended to take place post-data. So the post-data aspect of a critique scarcely renders it illicit. The statistical fraud-busting of a Smeesters or a Jens Forster were all based on post-data criticisms. So it would be ironic if in the midst of defending efforts to promote scientific credentials they inadvertently labeled as questionable post-data criticisms. top

^^^^^^^^^^^^^^^^^^^^^^^^^^^

3rd installment 7/3/14

Uri Simonsohn [5] at “Data Colada” discusses, specifically, the objections raised by Simone Schnall (2nd installment), and the responses by the authors who failed to replicate her work: Brent DonnellanFelix Cheung and David Johnson.  

Simonsohn does not reject out of hand Schnall’s allegation that the lack of replication is explained away (e.g., by a “ceiling effect”). (In fact, he has elsewhere discussed a case that was rightfully absolved thereby [6].) Simonsohn provides statistical grounds for denying a ceiling effect is to be blamed in Schall’s case. However, he also agrees with Schnall’s discounting the replicators reaction to the charge of a ceiling effect by simply lopping off the most extreme results.

In their rejoinder (.pdf), the replicators counter by dropping all observations at the ceiling and showing the results are still not significant.

I don’t think that’s right either.Data Colada
Since the replicators here have the burden of proof of evidence, the statistical problems with their ad hoc retort to Schnall are grounds for concern, or should be.

http://datacolada.org/2014/06/04/23-ceiling-effects-and-replications/

What follows from this? What follows is that the analysis of the evidential import of failed replications in this field is an unsettled business. Despite the best of intentions of the new replicationists, there are grounds for questioning if the meta-methodology is ready for the heavy burden being placed on it.  I’m not saying that facets for the necessary methodology aren’t out there, but that the pieces haven’t been fully assembled ahead of time. Until they are,the basis for scrutinizing failed (and successful) replications will remain in flux. 
^^^^^^^^^^
Final irony. If the replication researchers claim they haven’t caught on to any of the problems or paradoxes I have intimated for their enterprise, let me end with one more. ..No, I’ve save it for installment 4. top

 

 

[0] Unsurprisingly, replicationistas in psych are finding well-known results from experimental psych to be replicable. Interestingly, similar results are found in experimental economics, dubbed “experimental exhibits”. Expereconomists recognize that rival interpretations of the exhibits are still open to debate.

[1] In Nuzzo’s article: “For a brief moment in 2010, Matt Motyl was on the brink of scientific glory: he had discovered that extremists quite literally see the world in black and white”.
(Glory, I tell you!)

[2] Some of the results are now published in Social Psychology. Perhaps it was not such an exaggeration to suggest, in an earlier post, that “non-significant results are the new significant results”.  At the time I didn’t know the details of the replication project; I was just reacting to graduate students presenting this as the basis for a philosophical position, when philosophers should have been performing a stringent methodological critique.

[3] By contrast, statistical fraudbusting and statistical forensics have some rigorous standards that are hard to evade, e.g., recently Jens Forster.

[4] In Kahneman’s initial call (Oct, 2012) “He suggested setting up a ‘daisy chain’ of replication, in which each lab would propose a priming study that another lab would attempt to replicate. Moreover, he wanted labs to select work they considered to be robust, and to have the lab that performed the original study help the replicating lab vet its procedure.”

[5] Simonsohn is always churning out the most intriguing and important statistical analyses in social psychology. The field needs more like him.

[6]For an excellent discussion of a case that is absolved from non-replication by appealing to the ceiling effect see http://datacolada.org/2014/06/27/24-p-curve-vs-excessive-significance-test/.


Filed under: junk science, science communication, Statistical fraudbusting, Statistics

Viewing all articles
Browse latest Browse all 34

Trending Articles