Data Cleaning Methods with Dr. Adam Meade – The Great IO Get-Together (GIG) Online

The Great IO Get-Together (The GIG)

Data Cleaning Methods with Dr. Adam Meade

00:00 / 00:37:22

Dr. Adam Meade, professor of industrial-organizational psychology at North Carolina State University, discusses data cleaning methods and research methodology. He shares his journey from engineering to IO psychology, driven by curiosity about how we know what we know. The conversation covers careless responding detection, the evolution of academic publishing, and challenges in contemporary research methods. Dr. Meade reflects on meta-analyses, cultural influences on research, and the growing complexity of academic papers.

Key Takeaways:
• Data cleaning requires systematic approaches beyond simple checklists
• Research methods knowledge often comes from review work and editorial experience
• Academic publishing has shifted toward longer, more complex papers
• Cultural context influences what research gets published and accepted
• Careless responding detection has both benefits and limitations when oversimplified
• Methods training requires patience and good mentorship
• External factors shape the foundations of our knowledge base
• Technical skills can be developed with time and practice

Website: https://thegig.online/
Follow us on LinkedIn: https://www.linkedin.com/company/great-io/
Join our Discord here: https://discord.gg/JcTcMu335K
Join The GIG Email List: https://docs.google.com/forms/d/e/1FAIpQLSfVQ4hyF8MA4G9W-ERwVL8_e91a-MUMuhNvxhXmgkSFUDFatg/viewform?embedded=true%22

Transcript

[Richard Landers] (0:00 - 0:31)
Welcome to The Great IO Get Together on tonight's show, quips and queries about the world of work as IO psychology comes alive, please Welcome our hosts, Richard and Tara. Welcome everybody to The Great IO Get-Together

Number 37. My name is Richard. This is my cohost Tara today.

We're exploring chapter 11 of our textbook research methods for IO psychology. And this chapter is all about data cleaning. So to help us understand the cutting edge of data cleaning on the show today, we have Dr. Adam Mead, professor of industrial organizational psychology at North Carolina state university.

[Adam Meade] (0:31 - 0:45)
Welcome to the show. Thank you for having me. I'm honored as a guest.

I'm a little disappointed. You guys were scraping the bottom of the barrel and Tara's a former mentor. I got to say, I'm a bit disappointed in that, but happy to be here as a guest though, so thank you for having me.

[Tara Behrend] (0:46 - 0:56)
Well, we're very happy to have you here to start off. We like to ask our guests to tell us a little about themselves. Maybe something people might not know from reading your work.

Like how did you first get into methods and stats?

[Adam Meade] (0:57 - 3:03)
Yeah, good question. So I'm originally from a really small town in Virginia in the mountains. So like 10 miles away from the nearest stoplight kind of place and way out there, not to give you the full history, but I started college as an engineering major, which actually is not, turns out not that uncommon amongst methods researchers than it's like before.

And I've mentioned that and I've had several other people say, Oh yeah, me too. So started as an engineering major, made it most of the year and then chemistry too, failed me out essentially. So I got into psychology and then sort of slowly gravitated back toward more of the methods stats side of things.

It's sort of a way to get into some of the things I liked about engineering or science and data analysis, those kinds of things, but still from a more of a framework of psychology. So I got into it a bit. I found that I was pretty good at it.

That's really kind of where the interest came from. So I did well as an undergrad. I wanted to go to grad school.

I didn't want to get a job, job market for psychology majors with an undergrad degree was not good. So I knew I wanted to go to grad school and one of the least inspirational stories you've ever heard. I kind of asked around and found us as like pre-Google, right?

So I asked around and found out what paid well in psychology. And I was told clinical and IO and I said, well, I know I don't want to do clinical. So let's try IO.

And I had never had a class in IO before I went to grad school and funny story. I got there and the first two weeks of grad school, one of the first readings was R.J. Harvey's 120 page handbook chapter on job analysis. And I thought, oh, my God, I made a terrible mistake.

I said that R.J. I like R.J. a lot. He's a good guy and all that kind of thing. But it was it was a heck of a start.

So from there, I went to grad school thinking I would do leadership and groups and teams and quickly found I didn't like that stuff at all. And what I did like was sort of the stats and methodology. And I'm sort of interested in the way that we know things, right?

That's ultimately my interest is how do we know what we know? And that gets you into kind of the realm of statistics, quantitative methods and then ultimately measurement. And that's kind of what most of my research has been focused on ever since.

[Tara Behrend] (3:03 - 3:22)
I guess we should be all happy that chemistry didn't work out very well for you. But, you know, it's funny you say that about job analysis. I think the unfortunate paradox of job analysis is that it's extremely fun to do and torture to read about.

And so all these people never try because it's just great.

[Adam Meade] (3:23 - 3:39)
Yeah. When I've done it, you walk around the factory floor and you look at how stuff is made. It's wonderful.

But yeah, talking about it is it's a little bit tedious, even by my standard. Right. I do some tedious stuff.

And even I'm like, oh, that's that's rough, guys. So, yeah.

[Tara Behrend] (3:39 - 3:55)
Yeah. Hopefully our job analysis friends are not listening. Well, on the topic of what you find interesting and thought provoking, is there a paper that you've read in the past few months that you thought was especially thought provoking that we should take a look at?

[Adam Meade] (3:55 - 6:48)
Short answer is no. But the reason is that I don't know about you, but I don't really read. You know, when I took this job, I thought, oh, an academic, I'm going to be reading all the latest journals that come out.

And I'm going to be having interesting conversations with colleagues about them and thinking deeply about stuff. And I found the job is just not conducive to that. Like you're pushing paper early in your career.

You get burnout later in your career. Eventually you get to where you never want to read anything. You don't have to.

At least that's me. So I don't really read a lot like in the field. I most almost all my reading is through review work.

So probably like you guys, I get asked to review a lot of stuff. So what I what I end up reviewing is it can be really interesting. My role is associate editor, journal, business and psych.

I read some really cool stuff, kind of all over the discipline, which I like. But if you had to push me on one, I would probably say the big meta-analysis redo from Paul Sackett, Philip Levens, those guys a few years ago. And for a few reasons, number one is that the results are just interesting, you know, kind of what do we know about all these predictors and what do we think we know and how do we do it?

But for me, the thought provoking part of that is sort of the context in which it came. And so full disclaimer, I really appreciate and respect all those guys. They do science at the highest level.

I don't think they have an agenda or did anything even remotely wrong or inappropriate. I trust all their findings completely. But it's interesting because it came about.

And the thing that jumped out of me when it came out is it came out and sort of a broader cultural context of sort of like an anti-testing vibe, right? All these schools are getting rid of the GRE. There's all this stuff in the popular press and media about testing and how it can be bad or disadvantageous for certain people.

And it kind of made me wonder, like, could this paper have been done 20 years ago? Would would reviewers and editors seen it the same way? Without that cultural kind of context in it.

That part, I find really interesting. And the other thing about it that interests me is it's not just saying sort of it looks like testing doesn't work as well as we thought it did or not. That it doesn't say something like testing doesn't work as well as it used to.

It's sort of a rewrite of the entire kind of history of our knowledge base. And so from that perspective, again, I have no reason to doubt anything in that paper. But it sort of almost looks like a way that external factors and cultural factors might have a real impact on the very foundations of what we think we know about IO psychology and predictors for selection employment testing.

So, yeah, from that perspective, I find it a really interesting kind of look at not so much the paper itself, but sort of how our process works. How does our information get into the system? What determines whether something is published or not?

What makes an impact and what doesn't and why? Those kind of things I find personally interesting.

[Tara Behrend] (6:49 - 7:18)
It's a really great point that we like to pretend we're objective, but everything we do is wrapped around the context of what's going on in the world. And I can't think of a better example than testing. And, you know, everybody learns something about testing, but it's worth asking the question of why do we think this is true?

And if we looked at it with new eyes, would we feel the same way about that evidence base? I think it's a really important reminder that everyone should be reflecting on why they think they know what they think they know.

[Adam Meade] (7:19 - 7:59)
Yeah, it's interesting. You know, some of my colleagues have taught classes where they start the class with sort of, and these are people doing mostly qualitative research, and they start the class with sort of a personal statement. You know, here's my values and beliefs and biases I might bring to bear.

And I can't decide if I love that or hate it. Part of me wants to think, no, this should be a science. We should do things in an objective way.

And that shouldn't be a factor at all. And it doesn't matter what your background is. But other part of me thinks, well, maybe it does actually matter quite a lot to the kind of questions we ask, how we evaluate them, whether we decide to file or something or not.

So, yeah, I'm really kind of struggling with that question in an ongoing basis.

[Tara Behrend] (7:59 - 8:22)
Yeah, me too. In fact, we include a story in the book and in the qualitative chapter about that, being asked to reflect on positionality and having sort of a negative reaction to that, because, of course, we're objective. And your examples are great examples of how that sort of human component shows up, even when the statistics are not affected by my opinion, but whether I choose to publish them or the questions that I ask certainly are.

[Adam Meade] (8:22 - 8:29)
Well, yeah. And of course, I pulled that right from your book because I read it, read it lightly, keep it by the bedside table. So it's fresh on my mind.

[Tara Behrend] (8:29 - 8:52)
Legally required for all podcast guests. So that's expected. On the theme of impact, I wanted to talk about your paper from 2012, because we include a story from you in the book about that paper, sort of where it came from, why you wanted to write it.

Now that it's 13 years old. So looking at the past decade, do you think the field has mostly adopted the advice that you put forward in that paper?

[Adam Meade] (8:53 - 11:14)
Wow, 13 years. I've been writing the coattails of that paper for a long time, Tara. You know, I think people have changed the way that they approach data.

So there seemed to be kind of a sweet spot in history where things moved online, but it wasn't commercialized. It wasn't on a huge scale. It was mostly people putting together their own HTML forms and researchers sending it out.

So there seemed to be a time, like in the early 2000s, I would say, where you could get data a lot easier than you could back in the paper and pencil survey days. I'm old enough to have done that in grad school, keying it in and doing the whole kind of data entry thing. There was a time where you could get seemingly pretty good data.

And then we kind of went all in on that and commercial tools came up. And then with the advent of these crowdsourced platforms, it became really kind of almost like a business model. Around the time I wrote that paper, it was a completely classic case of right place at the right time.

There were other people working in the same stuff and publishing stuff in the same time. Nathan Bolling and Patrick Curran and a bunch of other guys that were doing the same kind of work. And so I think it was something that I completely believe was not any inspiration for me, but just happened to be kind of that theory of...

It was a theory where science isn't really kind of done by people. It's more discovered. Information is discovered.

And so it was really kind of that kind of thing where there was such a clear need for something like that paper that we just happened to be able to get it out there and get it to a good journal and that kind of thing at the right time. So I do think people have changed the way they go about data analysis, much in the way that you look for outliers or influential cases and things like that. I think people now do pretty regularly screen for data quality, not because they read my paper and they're inspired by it, but because everybody's in the same boat.

We all have data. We all sort of suspect maybe there's some junk in the data and we don't want that junk in there. And so we try to clean it up.

And then once the paper gets cited a few times, it kind of snowballs. And if you're doing this, people, for whatever reason, chose to cite our paper a lot of the time. Yeah, I think people have changed, you know, not everywhere, but in a lot of disciplines, if you look at kind of who's citing that paper, it's from kind of all over, you know, anybody doing survey work across lots of different disciplines.

So I'm hopeful that it's made our data samples a little cleaner and results a little better and truer, but who knows?

[Tara Behrend] (11:14 - 11:25)
Well, relatedly, there was a paper that came out last year that you wrote with MK Ward that was sort of an update. So what would you say are some of the major advances to the field that happened during that period?

[Adam Meade] (11:25 - 13:33)
Yeah, that's a great question. I think it's mostly the way we kind of think about careless responding. And so talking a lot with the people that do this kind of work and thinking about it, you know, sort of deeply, I think there's been some consensus about what careless responding looks like and what the goal is of data cleaning.

So first off, you know, we typically flag people as careless or not careless, and then we throw them out of the sample or we keep them. But really, you can think about it on like a continuum. You know, there's some people that are really diligent.

There are some people that are a little bit careless. Some people are a lot of bit careless. Some people are careless to the point that they don't care whether you know that or not, and other people try to hide it.

So it's a little bit like a continuum. I think that thought has kind of come out as a bit of a consensus. The other thing I think that there's a little bit more of a consensus on is sort of the goal.

The goal isn't really to throw out every person that might be the least bit inattentive, to have the purest sample possible, because as you start chucking people out, you lose power and you may lose some representation. There's some people showing, you know, like, well, maybe personality correlates a bit with carelessness. So if you start throwing people out left and right, you're maybe throwing out some of the more disagreeable people or the less conscientious people.

Really what you're looking for are the most egregious people, the people that are really careless, you know, almost from the word go, that just do things that really junk up your data. The people whose attention drift a bit here and there, and maybe they do miss even like an instructor response question where you tell them to put strongly agree. We don't know, you know, that may be the only time in that survey they were kind of careless.

And so I think thinking a little more nuanced about that, there's been a lot of other developments, just like the kind of the nuts and bolts of how we do this, you know, we're going to look at individual variability is maybe a better indicator than something like straight lining or long stream. There's some other advancements as well, but I think those are sort of the big ones that we should be doing at least some of this. And we should be doing at least something to try to chuck out those people that are the most egregious, you know, careless people that we have in our data set.

[Tara Behrend] (13:33 - 14:40)
I really appreciate that perspective that you can think of every person as having some signal and some noise, and you don't need to throw people out that have any noise. On that topic of thinking about data quality as a continuum, there's a recent paper that came out that argued that throwing people out of your data set who fail your manipulation check is not necessarily the right move. And I obviously found this claim very shocking.

I couldn't imagine not doing this, but I thought the paper made a compelling argument. The argument is that you can think of the manipulation as having some effect on the manipulation check variable, which is perception variable, and that might not be 100% effect. It might be something that, again, has some signal and some noise in it.

You model it like you'd model anything else. And it's essentially an instrumental variables approach that you then use the manipulation check variable as your predictor in the model. And so it allows for the fact that the manipulation might have different kinds of effects or different levels of effect on people, and that it's still interesting and valuable.

You shouldn't throw them out. So again, I found it very shocking. I'm curious to know if you agree or would rebut that claim.

[Adam Meade] (14:40 - 15:35)
I haven't read the paper, full disclosure, so I don't know the nuances of the argument. My gut reaction is that it depends a little bit on what you're trying to do in a traditional lab environment where you're interested in whether the independent variable affects the dependent variable. If somebody fails the manipulation check, if it's set up in the usual way, it means that they didn't see or didn't notice the independent variable.

So it probably depends on what you're trying to know. If you're wondering if the independent variable really does affect the dependent variable in the way that you've implemented it, then I think throwing out people that missed the manipulation check probably makes some sense. But if you're curious more about the kind of the holistic thing, and maybe you want to learn something about why your manipulation didn't actually work, then I could see keeping it in.

But earlier I did mention that I don't really read anything. I don't have to. So this is evidence of that just to show you that I'm not making that up.

[Tara Behrend] (15:35 - 16:07)
You did, but I think it's an argument that comes from a sort of economics way of thinking about instrumental variables. So it's a different way of thinking, and it was tough for me, but I appreciated it. And I'm thinking of trying to incorporate that into the way that I think about experiments.

Well, let me shift gears for a minute. So when I was in graduate school, you were one of my mentors, and we've worked on several papers together since. In fact, are working on one right now.

So my question is, why did you think it was necessary to make your students memorize IRT equations?

[Adam Meade] (16:07 - 18:18)
So I don't remember doing this. So my hypothesis is that your memory was that things were so tedious and boring in that class that I must have required the memorization of formulas because there's no other way to explain the negative association you had with that course and of course my teaching. No, I don't know.

So my goal was never to make anybody memorize anything, right? The goal is to get you to understand it well enough that you can recite the formula because you know in a fundamental way how things are working. So yeah, I don't know.

I don't know that that happened. I'm going to take issue with the premise of the question, but even if it did happen and it may malhave because my memory is not the best, then it would be because I want you to understand it in a way. And I think what you're getting here, the broader point is sort of philosophy of teaching.

And this is something that's come up a few times with some classes in our department that are a little bit more oriented toward point and click stat understanding and their statistics classes, but they really lean heavily on knowing what to do in the class with respect to the software and knowing how to interpret the output. And I think that's a perfectly fine approach for a lot of people. If you're in a discipline and you're not a methodologist and you're just pushing theory and testing theory all day and that may be all you need to know.

But you know, our field I.O. is sort of more advanced than most of psychology in terms of methodology and understanding and things like that. And so for me, what I really hope students get is a deep level learning of what's happening in the software. You know, software changes.

I learned special equation modeling on WISRL because that was cutting edge at the day and that predated NPLUS and predated R, of course. And that stuff's all, you know, I haven't used that in 15 years, but I understood what it was doing well enough that I can use something else. And so my philosophy is, yeah, you're not going to use BILOG like we did back in the day, Tara.

But, you know, if you understood some of that well enough, you could go and do R and the MERT package and things like that today and do it and understand what it's doing and then be able to understand kind of what you're seeing in the output. And that's kind of more my philosophy of what I'd like people to know.

[Tara Behrend] (18:18 - 18:53)
It's a great point about brittle knowledge versus knowledge that can be flexible and adapt to new situations, right? Which has come up a lot lately in the context of AI and learning. And you're right.

If you don't understand why something works, then you can't adapt when it changes. Like if I had only learned BILOG, I wouldn't be able to translate that to R, et cetera. Just for the record, I did not imagine this.

Richard, I have proof, I have evidence that we have to memorize these equations for prelims. And I did. And also for the record, I actually love that class.

It was great, but I memorized a lot of equations.

[Adam Meade] (18:54 - 20:03)
Well, that reflects very poorly on you of having you love that class. No, thank you. I appreciate the kind words, but yeah, no, it's a great point.

I struggle with this now. You know, what is relevant for students to know today? Is it how to use AI?

I'm guilty myself. I was writing R code today and I was asking ChatGPT to help me modify it because it's easier than doing it myself and trial and error and all that kind of stuff. And it's pretty good at it, honestly, even now, you know, and in 10 years, the notion that it won't be way better than any student or even kind of expert at it, it's hard to know if that would still be true.

It's harder and harder to deal with this stuff. It's harder as an instructor to know how to assess student learning. You know, I've got 200 students, my undergrads.

I don't feel like I can give them a writing assignment because they can do that for them. I don't feel like I can give it unproctored testing because it's really good at that. So I don't really know what to do, really, I don't know what to do that's scalable for 200 people, you know, maybe an oral exam, but you can't do that with 200 people.

It's a tough question and I'm glad you guys are thinking about that. It's nice to know people that are better at this than I am are out there giving some thought to it.

[Tara Behrend] (20:03 - 20:16)
Well, I don't know about that, Sam, but I do think it's it's hard to predict the future. I don't think it's reasonable to say that it will definitely be better in 10 years. It could definitely be trash because we've been training it on garbage this whole time.

We really have no way of knowing that.

[Adam Meade] (20:17 - 20:31)
It could be, yeah. I hope somebody out there is thinking, like, maybe we should stop and give this some attention before we play all ahead with what it can do and give it more capabilities, like maybe let's address this thing first, you know?

[Tara Behrend] (20:31 - 20:49)
Speaking about AI, actually, what do you think about the practice that's becoming, I think, really interesting to a lot of people that involves using AI to simulate human participants, like synthetic data, to basically avoid the hassle of data cleaning entirely and having to deal with real people and all of their messiness? Pros and cons?

[Adam Meade] (20:49 - 24:08)
Yeah, that's a good question. So I don't know a ton about this. I did review a paper that was like an invited paper at Journal of Business and Psych on this, and they were using synthetic data in that context to not to replace real data in a way that was, you know, we're not going to collect data, but more like we've collected data, but it's got sensitive information in it.

We want to be able to put this out on the internet. It's like a free repo that people can download and analyze themselves. But what we're going to do is instead of putting the actual data out there, we're going to use synthetic data.

So I did review that paper and found it really interesting. One of the takeaways, at least the one I reviewed, is that it wasn't great. Like it sort of introduces its own error because we have error in our data from humans, of course, but it was sort of trying to mimic that a bit with its own kind of error.

And it didn't seem to be as accurate. Like if you're trying to, if you treat the human data as sort of the population, the true data and consider it to be, you know, what you want, it wasn't great at reproducing it in an easy way. And it had some hassles to the point that my takeaway was, if you can make human data anonymous and post that, you should do it because that's better.

And it's going to be more accurate. As far as kind of like the idea that you're going to not collect data at all, you're just going to simulate data based on what some AI agent thinks it would be. I don't think we're there yet with the accuracy.

I mean, we're not able to predict things well. So it depends. It's going to be, like it might know that the correlation between some predictor and job performance is 0.35, let's say. And it could create data that would do that, much like a Monte Carlo study would. But the nature of its errors and the nature of that data will look different than what most human data would look like. And I would also kind of start to worry about some drift.

Like that's sort of how, like, if an AI is trained on images to create new images, but those go out into the public data frame, and then it becomes used as a training image for future, you end up getting sort of, like, weirder and weirder stuff as you repeat that cycle. So I would certainly worry about that. Ultimately, you know, I think that what we study is human behavior, certainly in I.O. and most of psychology. So I don't think we should lose sight on that just because of ease of use. It's just like I kind of feel like in some ways we've sort of lost sight of going back to the Carol Shephardic stuff, collecting data, when now this crowdsourcing is so easy to do and so fast and so convenient. I worry that's going to replace, you know, like actual going into an organization and trying to collect data from actual people and those kind of things.

And I see it. I see students all the time do that and they want to get their PhD and get out of here and they don't want to spend six months. They want to spend, you know, three days collecting their data.

And I worry as we move more and more toward that, even that we're getting further away from people that sort of have skin in the game, that care about what the data is and care about the surveys that are not just doing it professionally for money and in some, you know, kind of remote corner of the world and pretending to be somebody they're not. All those kind of things, too. So and I'm guilty of it.

I'm as guilty as anybody of that. I do worry about the direction of that stuff. I guess my only hope is that I'll be retired by the time it gets to that point and leave it to younger people to worry about.

[Tara Behrend] (24:09 - 24:12)
Classic, classic. Pass that on in the next generation.

[Adam Meade] (24:12 - 24:14)
It's passing the buck. That's right.

[Tara Behrend] (24:14 - 25:16)
Well, there's so many great points raised in your comment. I wanted to follow up with a few of them sort of working backwards. The idea of not getting out into organizations.

I think there's a tension, right? Between because we we really value big sample sizes as a field. And so if you're going to ask a question and go into an organization, you are sacrificing quality for quantity, I think, in some cases, or you're trying to work with huge organizations, which are often harder to work with.

Like, it's pretty easy to convince like a local restaurant to cooperate. But now you only have 13 people that filled out your survey, right? So I think that tradeoff is we've done it to ourselves in a lot of ways by expecting bigger and bigger tables.

To the second point you made about synthetic data, I totally agree with you that the paradox there is that if we understand something well enough to simulate it that well, we don't really need to do that study, do we? Like, we already know that answer. And if we don't, then we don't have good enough information to simulate the data.

So like, what do we think is happening here?

[Adam Meade] (25:16 - 25:17)
Yeah, yeah.

[Tara Behrend] (25:18 - 25:21)
A question that I have, and maybe Richard disagrees, but I don't care.

[Adam Meade] (25:22 - 26:47)
Sorry to interrupt, but there's different standards in like computer science than there is in psychology. So for example, several years ago, I was working with a company that was interested in my, not trying to plug my own assessment, but I have a personality testing business. They were interested in it.

But what they were doing instead is they were using text to infer personality. And they were inferring personality based on text using AI and some other tools. And by the standards of sort of AI and computer science, they were adequate.

But if you look at it, it was only correlating something like 0.2, 0.3 with survey-based personality measures. And so from a psychological perspective on convergent validity, that's terrible, right? That's really bad.

But in the field in which they were sort of working, that was pretty good. And so, you know, they were, the company doing it was perfectly happy to sell that as a product, you know, give us your whatever and samples of somebody's writing will tell you their personality. So, you know, that's another thing to keep in mind too.

So the synthetic data might, it might be able to reproduce the degree of the association, you know, to somewhat, but it's going to be, yeah, composed of God knows what. And I think your point's a great one. If we know it well enough to simulate it with a high degree of accuracy, what are we doing?

Because, you know, we're not that good at predicting anything. So yeah, I don't really know what that study would be about.

[Tara Behrend] (26:48 - 26:53)
The other point you brought up about differential privacy, I think a really exciting application of this.

[Adam Meade] (26:53 - 28:02)
Well, if you think about where some of our original knowledge came from about what predictors are like in the organization, it came from meta-analysis of published studies. And we just don't do that, right? We don't publish, who's publishing a basic validation study?

Who's got a basic, you know, bio data and a correlation with performance and they, who's putting that in a journal? You know, people aren't doing it. The research that's getting done is mostly behind closed doors at Hogan and SHL and, you know, all these other consulting firms and you don't have access to it.

It's not clear to me, like, that we're still doing the fundamentals like that in a way that they did, you know, 50, 60 years ago, where, hey, we just learned that interviews predict this well in this context. And here's my six-page validation paper I'm going to publish and get out there in the public domain. So I'm hopeful that what replaces that is what we're talking about.

You know, I've got this data set. I collected it in an organization, but it's sensitive. They won't let me publish it, but I'm going to make synthetic data and put that out there that sort of reproduced.

That might be something actually where, no, it's not as good as the original data. But if we have lots of that, it might be a lot better than what we have now, which is relatively little access to basic validation work.

[Tara Behrend] (28:03 - 28:17)
That's a really good point. Yeah, I think as the academic side of our field has really evolved away from those basic core questions, we don't, we don't see that kind of work published anymore. I mean, when was the last time you saw a training study in a top journal?

[Adam Meade] (28:18 - 29:16)
Yeah, you can't get it published. You can't get access to it a lot of the times. Yeah, I mean, there's relatively little avenue to advance that kind of thing.

Also, you know, as journals move much more toward an emphasis on theoretical development and understanding, you know, that changed things a lot too. I mean, now what's the shortest paper you reviewed lately? I mean, maybe 30 pages, you know, maybe, if you're lucky.

I routinely see them 70 plus pages now. And it's got, you know, the kitchen sink, theory building and things like that. Tara, you and I just had a paper rejected with that exact criticism.

Oh, you didn't include the last, you know, you've only got 50 studies on this topic. You need to have at least 80, essentially, in your intro. Who wants to read that, man?

But apparently that's kind of what we're pushing as a field, you know, like these really detailed, tediously detailed setups for a basic hypothesis that's fairly common sense to begin with.

[Tara Behrend] (29:17 - 29:49)
Yeah, I think it's more than annoying. It's really alarming to me because what it means is a lot of people wasting their energy making up a fake theory instead of testing a theory. So, you know, there's all these studies of like what percent of AMR papers are ever tested.

And it's like 4% of these competitions are ever tested, right? So like we're building an incredibly shoddy, poorly studied field if we're just pushing on everybody to tell beautiful stories and not wanting to publish any sorts of tests of those theories. Like that's really alarming.

[Adam Meade] (29:49 - 31:06)
You know what I really love is I teach social psych and I love seeing an absolutely classic paper from like the 50s or even 60s, especially the 40s, actually. And you go and pull it and it's like six pages and not even like six modern journal pages. It's like, yeah, I mean, it's like three pages maybe.

And it's just no BS. And so here's what we did. They're usually doing some cool stuff because it's why it's a classic.

And here's our findings. And like, you make sense of it. You know, like here's what we did.

We did this cool thing. We put a bunch of kids in a room and we told them they couldn't play with toys of different types. And some we told they're, you know, if you play with these toys, you're screwed.

We're going to take our stuff and go home. And others, we didn't do that. And here's what we found.

And it's this really interesting insight into cognitive dissonance and severe threats. And it's, they don't need 40 pages to beat you over the head with why this is important. Like you can read it and understand that.

But, you know, talking to like, like Steven Rotenberg has talked about this at JVP quite a bit, that the studies that get cited most tend to be the longest papers. And in my head, I'm like, nobody wants to read your 70 page papers. But I think from the journalist perspective, those are the ones that tend to get cited a lot for whatever reason.

And so that seems to be where we've gone as a field. These really long bloated papers that are just kind of exhausting to read.

[Tara Behrend] (31:06 - 31:28)
In general, I think what's, it's hard to ask a good question, a question that is important. And so if you can't do that, then you can write 70 pages of nonsense instead to make it seem like your question is important. I think that's, that's like a big, a big piece of what's happening here is that it's, it's an attempt to conceal the fact that the question isn't that interesting in the first place.

[Adam Meade] (31:29 - 31:57)
Yeah. Or occasionally the cynical side thinks that the findings came out weird. And so they've got this long convoluted, we're going to piece together four different theories to help explain this, this hypothesis that turned out to work, you know, kind of thing.

So yeah, I don't know why it's happening exactly. But yeah, it's, it's unfortunate. And my, my dream is to see somebody stand up a journal that becomes reputable and where you can just write up, you know, a normal length 10 page paper and send it in and get it published.

[Tara Behrend] (31:57 - 32:13)
Well, this has all been really just such a great conversation. And I'm wondering if you have, just as a last question, any advice for students who might want to get into a focus of methods, but feel intimidated or like they don't know where to start. Any, any words of wisdom or encouragement?

[Adam Meade] (32:14 - 33:39)
Yeah, that it's hard for everybody. There are people in the world that can read a psychometric paper that's got 45 formula in it and make sense of it. I'm not that person.

I can work in this area without being able to do that. I'm the kind of person that, that spends sometimes like and Tara, when I taught class, you know, we'd have these technical IRT papers. It took me sometimes hours and hours and hours to parse through those.

And I will pull stuff up at Excel. I'll type in the formula in Excel so I can see it so I can understand it. And so just know if it's hard for you, that's normal.

That's absolutely to be expected unless you're, you know, was kind of off the chart on your ability to understand that kind of language. And I can see why it gets written up in a very technical way because it's, it's sort of, you know, an international language of math and things like that. But just know that it is, it's hard for a lot of people in this field.

Even people that seemingly it's, you read the paper and you think, oh, this is, you know, how do they ever do that? This must be easy for them. It's not, it can be really hard for people to do that.

The things I've published with formula in them, sometimes it took me a long time to come up with the formula. And then a reviewer is like, this formula is wrong and they were right about that, you know? So don't be discouraged.

Try to find a good mentor, somebody that can explain it to you in common language and help you understand what you're looking at. And don't be afraid to spend some time with it and try to replicate it on your own. And that's for me, kind of what works best.

[Tara Behrend] (33:40 - 33:42)
That's great advice. Richard, any last words from you?

[Richard Landers] (33:42 - 34:18)
We've rotated around data cleaning a few times. I actually have a data cleaning question. So this is actually in line with some of the idea of heuristics and people taking shortcuts, which is that anytime a paper gets popular and maybe I'll pick on your 2012 Careless Responding paper, people tend to miss the substantive message a little bit and instead focus instead on, here's the checklist of things you can do so that people won't question you.

So I don't know, is that a net good if people do it that way? Like, what are your feelings on how people have used your work in that way?

[Adam Meade] (34:18 - 36:23)
I mean, that's how you get into one of the urban legends books, right? So like Chuck Lance, Bob Vandenberg made some books on that where nobody ever said that 0.7 was the cutoff for a suitable reliability and things like that. Yeah, I think it's a mixed bag, honestly.

I think that the extent at which you're getting people to do the stuff and think about it at all is a bit of a win. So if people can go through and they can check these three things, they probably got cleaner data than they would if they didn't do those three things. But you're right, sometimes nuance gets missed.

And it's so tricky as an author and even as an editor and reviewer to try to be mindful of that. You write this really nuanced paper and then what people take away from it isn't really at all what you're saying some of the times. And God forbid, it gets picked up by the media.

The media are really guilty of just reporting something that is not at all what you said and sometimes the opposite of what you said. So I think it's a bit of a mixed bag, Richard. And I don't know how to get...

Somebody that's interested in just testing their theory and they are told by a reviewer they need to screen for careless responding and they Google and they probably don't even read the paper. They just get the AI summary of what you're supposed to do. It's probably better than nothing.

So I can't complain too much, I guess. But yeah, that's, I think, always going to be happening. There's some good examples.

I can't think of the name of the site, but apparently there's a paper in item response theory that they got miscited. It was like an old paper the late 60s, early 70s. And it got miscited.

And if you look at something like 80% of the citations of that paper miscited because somebody early on gave the wrong issue or page number and then other people never read the paper. They just copied the citation. And so it's like 80, 90% of the citations of this thing are kind of erroneous.

Or if you think about like the tubes and crystal, technical report for personality has been cited tons. Who's actually read that thing? Like, where do you even get it from?

[Tara Behrend] (36:24 - 36:51)
The point, we give AI a hard time for just like summarizing things at a very high level. And we've been doing that the hard way for a long time, right? In a lot of cases, like just sort of passing the same wisdom down without interrogating it.

This has been a great conversation. I really appreciate having you on the show to hear about your perspective on these issues and to apologize for making me memorize IRT equations, which definitely happened. I have very...

[Adam Meade] (36:52 - 37:08)
You wouldn't be here where you are today without memorizing those equations. I'm confident of that. Thanks for having me.

I really appreciate it. Again, I hate to see you guys scraping the bottom of the barrel in this way. And it's sad and tragic to see, but I do appreciate you having me on the channel.

And thanks for having me.

[Richard Landers] (37:08 - 37:22)
That's it for another gig. To stay in touch, subscribe on YouTube. Check out our website at thegig.online. Join our LinkedIn group. Sign up for our email notification list and join our Discord. Thanks for joining us and see you next time for another great IOH get-together.