Fixing Academic Publishing: Jose Cortina on Research Integrity

The Great IO Get-Together (The GIG)

00:00 / 00:46:52

Richard and Tara conclude their research methods series with Dr. Jose Cortina, Professor of Management and Entrepreneurship at Virginia Commonwealth University. Cortina advocates for results-blind reviewing as a solution to publication bias, arguing that requiring statistically significant results creates perverse incentives for researchers. The discussion explores how reviewer behavior punishes candor, the misuse of meta-analysis mean effects while ignoring heterogeneity, problematic control variable practices, and the disconnect between methodological best practices and actual research conduct. Cortina shares insights from his extensive experience conducting methods reviews for the Journal of Management, revealing consistent gaps between what researchers do and what methods research shows they should do.

Key Takeaways:

Results-blind reviewing could eliminate pressure for statistically significant findings
Reviewers often punish researcher candor despite claiming to value it
Meta-analysis heterogeneity reveals more than mean effects but gets ignored
Control variables are frequently misused without proper justification
Ratio variables create methodological problems with no benefits
The sociology of science explains many poor research practices
Constructive replication studies are rare but essential for advancing knowledge
Business schools’ reward structures incentivize questionable research practices
Theory-only journals exist but no empirics-only journals despite data’s value
Methodological problems consistently slip past peer review

Website: https://thegig.online/ Follow us on LinkedIn: https://www.linkedin.com/company/great-io/ Join our Discord here: https://discord.gg/WTzmBqvpyt Join The GIG Email List: https://docs.google.com/forms/d/e/1FAIpQLSfVQ4hyF8MA4G9W-ERwVL8_e91a-MUMuhNvxhXmgkSFUDFatg/viewform?embedded=true%22

Transcript

[Richard Landers] (0:00 - 0:33)
Welcome to The Great IO Get Together, on tonight's show, quips and queries about the world of work as IO psychology comes alive. Now, please welcome our hosts, Richard and Tim. Welcome everyone to The Great IO Get Together, number 41.

My name is Richard. This is my co-host, Tara. Today, we are exploring chapter 15 of our textbook, Research Methods for IO Psychology, and this is the final chapter, wrapping up all the lessons of the book.

So, to help us wrap up, on the show today, we have Dr. Jose Cortina, Professor of Management and Entrepreneurship at Virginia Commonwealth University. Welcome to the show.

[Jose Cortina] (0:34 - 0:35)
Thank you. Good to be here.

[Richard Landers] (0:37 - 1:04)
We'll start big picture. This is the end of the book, so folks that have been following along, either this series or the other piece, they've had all the lessons that we can provide. We provide a lot of recommendations throughout all that, guidance on how to identify and basically do good, valid research, but reality is often a little different.

So, let's say that right now, you could snap your fingers and cure the field of one bad habit. What would you change?

[Jose Cortina] (1:04 - 4:03)
I think I would change the requirement that the data are flattering to the hypotheses. The solution there is remarkably simple. If you ask me why this change hasn't happened, I'm not going to be able to answer that, but results-blind reviewing is a fairly simple way to fix this.

Right now, no matter what anybody says, if your data aren't flattering to your hypotheses, then reviewers, when we put our reviewer hats on, something happens and we just have to see lots of, you know, p less than 0.05s. And if we don't, then there must be something wrong with the paper. It doesn't make a contribution and it gets rejected. So that incentivizes everybody to ensure that what they put in the results section is consistent with what they put in the introduction section.

I think that business schools are more to blame for this simply because they have more money and they can afford to reward people for top-tier journal hits with quite a lot of money. So, you know, a scholar sitting around and saying, well, a minority of my p-values are less than 0.05, but if I pretend that these two or three hypotheses never existed, then a majority of my p-values are less than 0.05 and I should be able to get past reviewers. This insistence on consistency between the results and introduction sections creates lots of problems and, as far as I can tell, has no upside.

So the simple solution is just reviewers evaluating the intro and methods and having all of the normal sorts of comments, but basically once you sign off on the methods, then the data are what they are and they say what they say and they might be completely contradictory with regard to the hypotheses, but if you as a reviewer think that this was a reasonable way to test those hypotheses, then the results of that test are what they are. If I had to do one thing, I think that's the thing that I would change and I've suggested this to a lot of editors and other leaders over the years and there's always some sort of unintended consequences.

Any change has potential unintended consequences, it's just that we know the consequences of the current system and they appear to be getting worse, so let's try this and if there are unintended consequences, let's find out what they are. Maybe there are fixes for them. In the meantime, we can remove this really insidious incentive that I think is the driver for most of the academic fraud in our field.

[Richard Landers] (4:05 - 4:53)
So you're characterizing it all as very purposeful. I've occasionally found myself slipping into a sort of post hoc reasoning when things don't go the way that I thought they would, where you start to sort of question your own decision making earlier, where it's like, well, maybe this didn't work out because we actually didn't operationalize that very well or maybe we should have made different decisions. I've really been curious about the sort of balance between how much is actively trying to beat the system like communication versus how much is just a very sort of human tendency to sort of like rationalize when we think we've made a mistake and are trying to fix it.

I don't know. Do you think editorial changes are the best way to address that whole shape of that problem then? Like this is a matter of journal policy.

[Jose Cortina] (4:54 - 6:50)
I think it is. When we put our reviewer hats on, we just we don't value candor. We think we do.

We know we should, but we don't. I mean, I don't know how many times I've had a reviewer say, I appreciate your candor. I appreciate the fact that you told me everything that you actually did.

But and then they go on to punish me for it. That seems to be this very human thing. And it's very consistent.

And if it's a very consistent human thing, then the only solution that I can see is to remove that element. And the only way to do that is to force reviewers to evaluate the paper based on the arguments and theory in the introduction section and the description of the study design. And if they're OK with that stuff after it's revised or whatever, then there might be some additional there, you know, there are often suggestions with regard to how the data were analyzed, but it's easy enough to analyze the data in a different way.

Fine. But the decision is in no way driven by the degree to which the results are consistent with the introduction section. That's the point that I think we need to get to.

And there are probably some challenges associated with results blind reviewing. OK, let's figure out what those are, because the way that we're doing things now, not only does it not work as far as I'm concerned, it can't work. If it could work, it would have started working by now.

And it hasn't. So it must be something in our brains. And we have to bypass that thing in our brains if the system is going to work.

[Tara Behrend] (6:51 - 6:53)
Is there anywhere you think is getting it right currently?

[Jose Cortina] (6:54 - 7:57)
Some journals have options for results blind reviewing. I think LQ does. But even even I don't think the option actually works because as an author, I don't really know how reviewers are going to see that they might very well infer correctly or incorrectly that I want results blind because my results didn't work.

And if the results didn't work, it must be something in the methodology. I know what it is. It's that that's now a deal breaker.

It would have been OK. That element of the design would have been fine if the results have been flattering to the hypotheses. I can't see the results.

I'm assuming they're unflattering to the hypotheses. That element of the design that wouldn't have bothered me now is a deal breaker. And I'm going to reject the paper, whether that's fair or not.

It's certainly the worry that people would have before they ask for results blind reviewing. So I just I feel like it has to be either a journal wide thing or it can't really serve its purpose.

[Tara Behrend] (7:58 - 8:16)
I always wondered whether imposing a two page limit on an introduction section would help or just, you know, tell us what you predicted. But please save us the 40 pages of made up theory. And then spend all that extra energy explaining what you did and why you did it and explain your methodological decisions.

What do you think about that?

[Jose Cortina] (8:17 - 10:15)
Well, I think you know me well enough to know that I'm bang alongside that idea. But I will also say that there is a reason that one of the journals on everybody's A-list publishes only papers that make no empirical contribution whatsoever. And there are no top journals that only make an empirical contribution.

There's a reason for that. I think this is, again, more of a business school thing than it is an psychology thing. But business school people and particularly elite business school people really love the idea that they can just sort of sit at their desk and dream up a wonderful story that explains some really interesting organizational phenomenon.

And I think they think of it the way that Einstein thought of his theories. You know, when he was told that somebody's found a way to test such and such a theory of yours, what do you think? And he said, well, they'll either support the theory or there's something wrong with their methodology.

I mean, he was so confident in the math that the testing of it was really incidental. What Einstein would have called theory and what we call theory are completely different things. But there are a lot of people, especially people in the elite places that set a great deal of store by what we call theory and are much less impressed by the empirical part.

So they would never go along with that. They're they're fine consigning a lot of the details of the methodology to an appendix. But the idea of anything other than a full blown, really interesting introduction section would not fly at all.

[Tara Behrend] (10:16 - 10:55)
I mean, it seems sort of strange to me to even talk about our field when it seems obvious that there are two fields. One is a science and one is a humanities. And they're both valid.

But to try to evaluate them both on the same metrics or to try to compare them to each other is just obviously not going to work because they have different goals. And if you look at papers published in Science and Nature, they do not have 40 pages of introduction. I mean, the whole thing is like five pages.

But then read a philosophy paper. The point of a philosophy paper is not to put out something replicable, is to put out an interesting idea, like any way of thinking about something. That's great.

I want both of those things in my life. I don't want them in the same journal.

[Jose Cortina] (10:56 - 13:59)
I think there's a middle ground. I would love it if we could just jettison the whole Davis. It has to be interesting.

It has to capture our imagination. And no, it doesn't. It has to be something that's relevant to organizations.

It has to involve some question that organizations actually have that hasn't been answered many different times. If those things aren't true, then forget it. But if they are true, then absolutely, let's look at what we actually know and try to extrapolate and reason and come up with hypotheses where they're appropriate.

Otherwise, we just have research questions. There's nothing shameful about that. Find a good way to answer those questions and see what the data suggests.

And it's just one study, of course. We shouldn't attach a great deal of importance to any one study. We do.

But what I just described is a middle ground, and it means that we don't have to have these ponderous dozens of page introductions. We have a I mean, I'd like to see more introduction sections like the Eden and Shawnee 1982 Pygmalion effect paper introduction, where it's just here's a problem. Previous research on Pygmalion involved usually contrived manipulations.

Pretend this is true or it involved school kids. It involved, you know, sort of low stakes settings. There were a variety of problems that usually weren't true experiments.

We're just going to fix all of those major flaws and, you know, do a test of Pygmalion that's relevant for the workplace. But that was it. I mean, the introduction section really had no theorizing at all.

They basically said the theorizing has been done. This just hasn't been tested properly. We found a way to test it properly.

Our field could use a lot of papers like that. I told Dov at some point years ago that although that's probably my single favorite empirical paper in the history of our field, there's a decent chance that it would be desk rejected today because it doesn't make an empirical contribution. I don't know that that's true, of course, but some people, I know this for sure, some people would look at that paper and say you aren't making a theoretical contribution.

So this doesn't belong in a top journal. I would love to see us get rid of that mindset. But there can be a middle ground.

You know, let's use our heads and previous research to come up with testable hypotheses, to justify those. And then let the chips fall where they may. That would work for me too.

But it's a pretty long way from where we are now. And that's especially true of the top business school journals, I think.

[Richard Landers] (14:00 - 14:31)
A lot of the things you're talking about are partially in the hands of reviewers. You know, it's sort of an interesting tug of war between reviewers and editors sometimes. But reviewers in particular is a topic you've had, I believe, great interest in.

When you were SIA president, you focused on that as a topic of importance for the whole field, maybe the most important topic of the field at the time. Can you maybe talk a little bit about what led you to that focus? How did it go?

What would you recommend we do from here?

[Jose Cortina] (14:31 - 19:15)
I mean, printing the reviewer quality was one element of it. It was really the broader issue was just improving the methods that we use to test our hypotheses. And I remember trying to decide between that issue and the migration of IOs from psychology to business schools.

I decided that although that one might be more important, there's probably not much I can do about that. So I let that one go and focused on improving the methods. I usually see eye to eye with people like Larry Williams and Bob Vandenberg and Jeff Edwards.

When we disagree, it's usually because I'm coming at it from the sociology of science perspective. And they're coming at it from the more what's technically correct perspective. I believed then and believe now that we will get what we incentivize.

And if what we incentivize is easy methods, or at least if what we allow is easier, quicker methods, then that's what people will do. If we incentivize methodological rigor, then that's what people will do. The concern that I was addressing as a president and have continued to work on to this day is not just training people about what the best methods are but trying to change the incentives in the system so that people feel like they have no choice but to use better methods.

So when I talk about this sort of topic, I talk about us when we wear our author hats and us when we wear our reviewer hats and try to remind people that when they have the reviewer hats on, they're doing something really, really important but they have to remember that they are in a small way providing certain incentives and disincentives. If one wants to see better methods, then as a reviewer, one has to insist on better methods and that means more work as a reviewer. One example is whenever I review a paper, I'm the Journal of Management methods editor for micro papers now.

So when a revision comes in and it's a micro paper, it comes to me and mostly I review the methods section. Sometimes I farm it up to somebody else but one thing that I always do in that role and in the other role that involves scales is I go check the original cited source for the scale because quite often the scale that the authors used is different from the one that they're saying that they used. Maybe they modified it, often shortened it but rather than say I used the 15 item temper 2000 abuses supervision scale, they say I used the five item temper 2000 abuses supervision scale.

There is a five item version out there, it's not by Tepper. And if you have reason to use that one, okay. I mean, I'd have some questions about that too but you at least have to tell me and Eric Hegistad had a paper in JOM in 2018 or 19 looking at scale modifications and what they found is basically that people modify scales all the time and they either don't say that they did or they say that they did but provide no evidence that the modified scale functions in the same way as the original.

You see this with translation of scales and all of this happens I think because reviewers incentivize the path of least resistance that we, when we have our author hats on are often looking for. If we want people to be more rigorous then when we put our reviewer hats on we have to insist on that rigor. If we do that as a field, methods will get more rigorous.

Right now I think there are a lot of people out there who mostly focus on the intro, make sure that there aren't any egregious problems with the method, make sure that enough P values have asterisks next to them and that's it and we're good.

[Richard Landers] (19:15 - 20:23)
On a very similar vein, people following what's incentivized and then following the path of least resistance and increasingly, I guess harder to resist path for both reviewers and authors has been AI solutions to their problems. As a journal editor, I've seen both submissions and reviews coming in that are clearly AI, actually had a submission that was co-authored by chat GPT. So, it's all right.

Oh yeah, so it's become a, it seems like an easy option for folks and even on the review side it's been, it's a little unexpected because AI reviews, I think they're fairly obvious but they're partly obvious because they're broader. They actually cover, instead of the sort of the cherry picking of like, here's the thing I care about when I'm doing my review, they're actually a little more comprehensive. The insights are less deep.

It's just like this shallow pass on stuff. I don't know if you've tried any of these AI kind of approaches or have seen it but what is your reaction to these kind of changes? Is that a good path of least resistance?

Are we incentivizing good behavior that way? And if not, what do we do about it?

[Jose Cortina] (20:24 - 23:18)
I've done that a few times just out of curiosity, mostly because I wanna make sure that the doctoral students, when I have them read a paper that we're gonna discuss or I have them do reviews of papers as well, I just wanna make sure that they know, that I know what Chet GPT says about this paper. So, starting there is one thing but finishing there is another. But I haven't done it enough to know, to really have a sense of what you just said, Richard, about, I think I understand what you mean about it's broader, it's less about pet peeves, it's less about these sort of reflexive things that we often say when we have our reviewer hats on.

But it does look fairly superficial to me. This issue actually came up at the ORM board meeting at Academy, in part because it's a violation of copyright. If you just attach a published paper or for that matter, a draft of a paper, then it goes into the training data for that generative AI.

And somebody somewhere, if it's published, is supposed to get paid for the use of that paper. So, unless you have your settings such that that generative AI is not including that as training data, then it's a copyright violation. That said, it might very well be that that's a timer.

People perceive that to be a time-saving device if they suppose that they can just submit whatever it is that GPT gives them. Like you just said, this is pretty transparent and it's unethical for a variety of different reasons. If someone wants to start there and then add in the things that they consider to be important, that's certainly better apart from the copyright issue.

I think it takes me less time to just do the review than to edit somebody else's review. Part of that is just because I have more pet peeves than most people and I know what they are and I know how to look for them quickly. So, I am actually more worried about AI as generating the papers in the first place than I am on AI's reviewers.

It's an issue both ways and it certainly ain't going anywhere. It's just gonna be another arms race. There's the AI solution and we figure out how to identify the AI solution.

So, the AI solution shifts and we figure out how to identify that. I don't think it ever ends. I don't have an answer.

Nobody does as far as I know.

[Richard Landers] (23:18 - 24:01)
So, I'd like to shift a little bit into talking about a particular paper of yours which has been massively cited. One, but two, also I think is probably required reading in every at least IO program and probably quite beyond that which is your 1993, What is Coefficient Alpha? I'm curious about your perspective on what made that paper one, resonate with so many people and two, why in your view is it important?

I assume you think it's important. So, I'd just love to hear in your own words what is the story of that paper? Where did it come from?

Why write such a thing? What do you want people to know today? Should the message change if you wrote it today?

[Jose Cortina] (24:01 - 29:40)
Those are great questions. I have thought about why that paper generated so many citations and I don't know. A big part of me is just thinking that it's like any sort of fashion trend.

It sort of ticks along and then it's a tipping point and then it takes off and who knows if a paper gets to that tipping point, I don't really know. It clearly accomplished something that people wanted to have accomplished. The story of the paper is I thought I understood what coefficient, I was a first year graduate student and I thought I understood what coefficient alpha told me.

I thought it was a measure of unidimensionality. And then in Neil Schmitt's psychometers class, Neil said it assumes unidimensionality. I was like, wait, assumes?

I thought that's what it told me in the first place. Now, if a scale is unidimensional then alpha tells you about basically the strength of that dimension. Okay, then I was curious about why I thought that that I should make that up.

So I started paying attention to what authors were saying in their articles about alpha. If they said anything, what they said suggested that, we said this was unidimensional, alpha is 0.8, so it is. Okay, so I was getting that from the field but then the question is what else do we not know about this more or less universally reported statistic?

And this is a long time ago, this is pre-internet. So that was every morning going to the library, going to the social sciences citations index, looking at the tens of thousands of citations to Cronbach's work in the 40s and early 50s, of those finding the ones that are actual methods papers and then just trying to do a sort of thematic analysis. I didn't know what a thematic analysis was at that time but it's what I was trying to do of what these methods people say is true of coefficient alpha and then just summarize it.

And I certainly learned a lot about what it was but it turned out that my misconceptions were pretty common. The thing about alpha telling you if your scale is unidimensional or not was I think a nearly universal misconception at the time. Anyone who has read that paper knows things that might not be obvious that it's highly sensitive to the number of items in the scale that it can actually be quite large in a scale with two clear dimensions.

If the items in each one load pretty strongly on their respective dimensions. And because we're in the habit of reporting it, I think a lot of people felt like they needed to know what it was. With regard to your second question, where have we gone since then?

When I got to BCU, I wanted to do a project with the students that were there. And I thought maybe a 25 year followup to that paper would make some sense. Let's see if we've learned anything.

People cite this paper. Does it appear that we have learned very much? And that paper got broader and broader through the review process until it became about psychrometric soundness anyway.

But what we found with regard to alpha in particular, is that number one, scales have gotten much shorter in sense that paper much shorter and coefficient alpha has gone up. You just take a scale and dump some of the items coefficient alpha is going to go down. So how can it be that as a field we're using much shorter scales with much larger alphas.

Once we realized that, then the answer was kind of obvious. People need shorter scales because we have more complex models, more things to measure. Respondents don't have more time.

I need a shorter scale, but I still need a big alpha or the reviewers will say that my scale is no good. The way that I achieved that is by not so much asking different questions, but asking the same question three or four times. Grammatical redundancy will get me a large coefficient alpha in a three or four item scale.

What people are compromising is domain coverage. I'm not actually getting at the construct of interest. But again, when people put the reviewer hats on they're mostly just looking for the items past the eyeball test and its coefficient alpha more than 0.7. And if the answer is true, we move on from there. So I think you're right, Richard, that that paper, that 93 paper gets assigned a lot to graduate students and that paper has certainly been cited a lot. I'm not sure that the lessons in there or in follow up work have shown up in the work that people actually do. But for me, this comes down to once again, what are we doing when we have our reviewer hats on?

If you look for grammatical redundancy, you will see it in just about every short scale. Not all of them, but just about everyone. If we as a field insist on domain coverage in scales, we'll get domain coverage.

But if we're good with ask basically the same question three or four times, that's what we'll get because that's easy.

[Tara Behrend] (29:41 - 30:08)
I mean, it seems like what you're identifying in a number of different ways is that people might not care about whether or what they're saying is true. They don't think their work is important and they don't think it's worth getting it right as long as they have a paper. So who cares if they're not measuring what they think they're measuring?

Who cares if they're not reporting things accurately because that's not the goal. The goal is not to learn something about the world or to contribute knowledge, it's just to have a paper that no one will read.

[Jose Cortina] (30:08 - 32:34)
There's way too much of that. And some people are kind of candid about it. They won't necessarily broadcast it, but I've heard lots of people say, you know, it's not like we're curing cancer.

And that's true, we aren't. I'm not really sure the cancer researchers are either, but we're not trying to. And they're not trying to figure out how to make organizations function better.

We are, and I think that's important. I think for most people, we're somewhere in between the two extremes. I wanna do the best possible work, but I also need publications, especially for my students.

So personally, I'm not gonna do anything that's outright fraudulent, but if there are things that I know reviewers will punish me for doing, then I'm gonna be reluctant to do those things. I will try to change the field. I will try to make people aware that punishing certain things makes for bad science.

But in the meantime, there is a pragmatic aspect to this. And I think that's where a lot of us are. There are a few people out there who really just wanna do the best possible stuff.

And there are, unfortunately, also people out there who really just look at it as a game, and they rationalize it as none of this really matters. So I will just go so far as making up data out of whole cloth. That's what a lot of the highest profile fraud cases have involved, because what's the difference between me making up data and me going and getting data?

I mean, either way, I'm going to call my theoretical model until I have a model that's consistent with whatever data I have. So I can imagine people over time just sort of chipping away at their integrity. I'm gonna let this little thing slide.

Next time, I'm gonna let this little thing slide, and pretty soon you've let everything slide. It comes back to what gets incentivized. If the process incentivizes a certain kind of work, that's what we'll get.

And if it incentivizes easy stuff, quote unquote, interesting stuff, counterintuitive stuff, then that's what we'll get, for good or ill, that's what we'll get.

[Tara Behrend] (32:34 - 33:04)
Yeah, I mean, I do think there's something we can do about, you know, we just all assume that we have the same idea about what the top journals are and they're the top because we sat there at the top, but we could say that other journals are at the top based on the fact that we trust what's published in them. And so it's, I think reviewers are one vector, but then deans are another vector. Also people who are established who can lead by example by just choosing not to behave that way.

But I do think it just takes all of those, all those various vectors working together.

[Jose Cortina] (33:04 - 35:49)
I agree. I have said publicly in other venues that I think Journal of Management is our top journal right now. And my reason is that it has a similar sort of process and it has a similar sort of acceptance rate as other top journals, but they also have this mechanism for catching methodological problems before these papers end up in print.

I think the system that Cindy Devers put in place is probably the right one. Methods reviews aren't done for all submissions or even all submissions that go out for review. It's just for revisions and it's just advisory to the AE.

So the AE can choose to emphasize certain things or not, but I've built a career on reviewing the methods that people actually use. And inevitably what I and others find is that there's a big difference between recommended practices and actual practices. I would like to have a field in which I can't publish those papers anymore because the distance between those two is so small.

And I think the distance between those two is smaller in Journal of Management simply because they put this into place. It's more of a hassle. I mean, it involves more people.

It's more of a hassle for authors because now there's this second step. We've all had papers where a new reviewer got introduced at some point later in the process and that can often be really frustrating because they've decided the things that the existing team didn't think were important are crucial. That's especially true when you're talking about sticking me onto a paper because there are things that I know to be important.

I know from the methods literature to be important, but not everybody does. So it got an R&R. Some of the things that I raise are things that the reviewers also raised, but sometimes they're not.

And sometimes they are what I know to be pretty big things. I would love to see all of the journals embrace that idea somehow. Right now, it's another one of those changes that people seem to be reluctant to embrace.

They're worried about, again, unintended consequences, author reactions, whatever. But it comes down to what kind of a field we want to be. If we want to be a science, we have to act like a science.

And that means insisting that every paper get scrutinized not just for how cool the introduction section is, but also whether the hypotheses really were scrutinized with the empirical part of the paper.

[Richard Landers] (35:49 - 36:36)
A lot of what we're talking about really stems from something that we spent a lot of time in the book talking about, which is the sort of values and almost how your self-concept as a researcher, your identity as a researcher plays out in terms of the kind of decision making that you make. Where even despite incentives, you may do something counter to them because of those personal values. So what I'd like to ask next about is how has your own kind of self-concept and values, how have they evolved since your grad student days?

Is the version of you, the kind of things you're telling us now, is that how you felt back when you started or has that changed over time? And what is your journey I suppose as a researcher look like?

[Jose Cortina] (36:36 - 42:10)
Well, the alpha paper that we were talking about was it was more just understanding this one statistic that everybody reports. My later methodological work was more along the lines of, here is a particular methodology I know to be, this is the correct way to do this thing. Or maybe there were a few different correct ways, but this is commonly used and it looks at first blush like there's a disconnect between the way it's supposed to be used and the way it is used.

Some of my early work was on moderators and meta-analysis. And at the time, what most people did was the old Hunter & Schmidt 75% test. You know, you've got a bunch of effect sizes and they're gonna vary due to chance and you can estimate how much they ought to vary just by chance.

And if the amount that they actually vary is pretty close to what we would expect by chance, then it's just chance. And we don't have to worry about moderators. If there's more than that, then we can start looking at moderators.

And what we were told at the time, what Jack Hunter told me in his class was usually the amount of variance that you see in effect sizes is what you would expect by chance. So that mean effect size is really all you need to know. As I reviewed meta-analysis, I found that actually that's not true at all.

It's very uncommon for observed variance to be anywhere near what's expected by sampling error. It's usually a lot more than that and it's usually not explained by moderators. Why would it be that this is what's common knowledge in the field when in fact we can look at the field and see that that's not accurate?

Well, I don't really know. It's a kind of convenient misunderstanding and it has persisted. We still ignore heterogeneity of effect sizes and largely when we're reporting the results of meta-analyses or when we're citing other meta-analyses, what we focus on is those mean effects.

And even if there's huge heterogeneity around those mean effects, we generally pretend it's not there because it's kind of inconvenient. Because I've, and you know, the various papers that I and others have done have shown that we seem to be thinking about meta-analyses in the wrong way we can tell by looking at what people say and do and comparing that to what we know to be best practices. Papers on meta-analysis and lots of other topics got me thinking about the sociology of our science.

Why is it that papers appear in the form that they appear in? When we've got our author hats on, we are just trying to get our work done and we're usually doing some satisficing fine. But then we put our reviewer hats on and it's then that we really need to be hardasses.

There are things that people attach importance to that aren't really that important and lots of examples of that. But there are so many other things that really are important that get overlooked. I see these in just, I've done about 75 of these JOM methods reviews now.

And I've seen at least one, but usually multiple serious problems with the methodology that were not raised by the reviewers. I don't know why that is. I'm glad it's being caught now.

But I suppose my perspective has changed over time because everywhere I've looked, and I've looked at just about any design and analysis approach that you can think of, everywhere I've looked, there's a big disconnect between what we do and what we're supposed to do. Not my idea of what we're supposed to do, but what methods research shows is the right way to go about things. Because I've never done a paper where that wasn't true, it gets me thinking about why it isn't true.

That was what drove my SIA presidential mission. That's what has driven most, it's what's driven almost all of my methods research over the last 25 years. And to the degree that I and others can call attention, the attention of us as reviewers to these disconnects, I would like to think, I know that some people, when they then put the reviewer hats on, have this in mind.

And we'll say something in the review process. Hopefully, you just have to hope that one of those pairs of eyes gets on every paper before it ends up published. But I know that's not true.

So all I and anyone else can do is raise as much awareness as possible about these disconnects, about the way that things are done and how it differs from the way that things ought to be done. In the hope that us as reviewers, more than us as authors, pay attention to that stuff and push it so that we as authors feel like we have no choice but to do things properly.

[Richard Landers] (42:11 - 42:38)
So as a last question, one of the things we like to ask our guests is if there's a specific paper or study that they've read in the last year that they really liked or found thought provoking. I'll add too that this is the very last of these we'll do. So no pressure.

Paper to end on. Do you have any recommendations for what you viewers should take a look at?

[Jose Cortina] (42:39 - 46:13)
I have a few. One I already mentioned, Neaton and Shawnee 1982. I mean, if you want a comprehensive constructive replication example, that's one of the very few in our field and it's just, I don't even understand how they were able to pull that study off, how they were able to get the Israeli military to sign off on that.

But it's wonderful that they did because we learned something really important. The other category that comes to mind is when I'm doing these methods reviews, there are certain things that come up over and over and over again. People as authors will often just stick in control variables and they won't even give a reason.

Or if they do give a reason, they'll say previous research has shown that this is related to the DB. Well, that's not actually a reason to control for anything. There's a paper by Kevin Carlson in ORM in 2012 or 2013 about control variable use.

And among other things in that paper, they explain, here are the reasons to control for stuff. If this is one of your reasons, then say that. And if one of these is not your reason, don't control for it.

There's no reason to control for it. The sociology of science angle is of course, all right, I measured 15 other things. I'm going to find the specific configuration of control variables that lets the coefficients of actual interest generate small enough p-values.

That's the bit that scares me. So if the justification for the controls is not sound, I just want them to get rid of it. Travis Surdo has an ORM paper on the use of ratio variables.

This is more common in strategy research than it is in micro research. And it's also not, he didn't really discover anything new, but he does a really good job of explaining why there's really no justification for using ratio variables, period. None.

Usually the denominator is some sort of scaling factor. Well, that's fine. You just include that as a control variable.

That's a good reason to control for something. If instead you use a ratio, you're just manufacturing problems where there's no upside. Any treatment of a ratio variable shows that it doesn't actually work and can't.

All it can do is screw things up. And that paper does a nice job of explaining why that is. In metanalysis, I mentioned the issue of heterogeneity.

And Sven Capish has a paper that looks at how we treat heterogeneity as meta-analysts and as users of meta-analysis. Oftentimes for me, most of the story is in the heterogeneity. And you wouldn't know that if you just looked at our field and how we do and treat meta-analysis.

You'd think it was all about those mean effects. And heterogeneity is almost, if not entirely irrelevant. That's ignoring most of what we're actually finding in a meta-analysis.

Mostly what we're finding is that the effects are all over the place. We got to figure out why. Is there some factor that is creating stronger effects in some situations or people than in other situations or people?

And instead we tend to just focus on that mean. No, I think those are the ones that jump out at me.

[Richard Landers] (46:14 - 46:29)
Well, we're at the end. Let me say thank you so much for joining us today. This has been a lot of fun.

I think it's also be a really great wrap up for our viewers and readers here. So yeah, just thank you so much for coming again.

[Jose Cortina] (46:30 - 46:36)
Thank both of you. This was a very interesting discussion. It's important stuff.

At least I think it's important.

[Tara Behrend] (46:37 - 46:38)
I think so too.

[Jose Cortina] (46:38 - 46:39)
That's it for another gig.

[Richard Landers] (46:40 - 46:52)
To stay in touch, subscribe on YouTube. Check out our website at thegig.online. Join our LinkedIn group. Sign up for our email notification list and join our Discord.

Thanks for joining us and see you next time for another great IO get together.