Machine Learning for Human Problems: Computational Methods in Workplace Research

The Great IO Get-Together (The GIG)
The Great IO Get-Together (The GIG)
Machine Learning for Human Problems: Computational Methods in Workplace Research
Loading
/

Hosts Richard Landers and Tara Behrend welcome Dr. Ivan Hernandez and Dr. Louis Hickman, both from Virginia Tech, to discuss the intersection of qualitative methods and natural language processing in industrial-organizational psychology. Dr. Hernandez directs the Computational Outreach Lab, using NLP to promote social change and study work experiences of non-human animals. Dr. Hickman brings expertise in machine learning applications to selection, assessment, and professional development. Their conversation explores how NLP extends beyond simple word counts to capture nuance, the balance between psychology expertise and computational skills, and the importance of domain knowledge when applying complex models. They discuss collaboration with computer scientists, the role of community in staying current, and practical advice for researchers entering this evolving field.

Key Takeaways:

  • Natural language processing works with qualitative data and can capture nuance beyond simple word frequencies
  • NLP applications in IO psychology range from analyzing worker narratives to predicting job skills
  • Domain expertise is crucial—simpler models often outperform complex ones when researchers understand their data
  • Psychologists can effectively apply computational techniques developed by computer scientists without becoming specialists
  • Building community connections helps researchers stay current in rapidly evolving AI and NLP fields
  • LLMs should be evaluated using the same standards as traditional methods: reliability and validity
  • Strong foundational knowledge of IO psychology remains essential even when using advanced computational methods
  • Cross-disciplinary collaboration requires clear communication and realistic expectations about model complexity

Website: https://thegig.online/
Follow us on LinkedIn: https://www.linkedin.com/company/great-io/
Join our Discord here: https://discord.gg/WTzmBqvpyt
Join The GIG Email List: https://docs.google.com/forms/d/e/1FAIpQLSfVQ4hyF8MA4G9W-ERwVL8_e91a-MUMuhNvxhXmgkSFUDFatg/viewform?embedded=true%22

Transcription

[Richard Landers] (0:00 - 0:10)
Welcome to The Great IO Get Together, on tonight's show, quips and queries about the world of work as IO psychology comes alive. Now please welcome our hosts, Richard and Tim.

[Richard Landers] (0:11 - 0:34)
Welcome everyone to The Great IO Get Together number 31. My name is Richard. This is my co-host, Tara.

Today we are exploring chapter 5 of our textbook, Research Methods for IO Psychology, and this chapter is all about qualitative methods. So to help us understand the cutting edge in qualitative, on the show today we have Dr. Ivan Hernandez, Associate Professor of IO Psychology, and Dr. Louis Hickman, Assistant Professor of IO Psychology, both at Virginia Tech. Welcome to the show.

[Louis Hickman] (0:35 - 0:38)
Thanks for having us. Pleasure of the year. Thank you.

[Tara Behrend] (0:39 - 0:57)
Well, we are very excited to have the two of you on the show today. Both of you are rising star researchers doing really interesting work. I think the listeners would be very excited to hear about it.

So let's just start with a quick overview of who you are and what you do. And maybe Ivan can start with you.

[Ivan Hernandez] (0:57 - 2:04)
Thank you. So my name is Ivan Hernandez. I direct the computational outreach lab at Virginia Tech, and this lab is dedicated to using technology to promote social change.

So much of my research now is focused on promoting consideration towards non-human animals in work environments, and highlighting areas of inequity in those environments. I specialize in natural language processing, and I apply it to this domain in a variety of ways. One way that natural language processing, which I'll just abbreviate as NLP from now, one way that NLP can help is by compiling and analyzing narratives of those who work with animals, and can speak to their experiences.

This could be transcribing open-ended voice interviews to text, extracting domain-related responses, and applying topic modeling to those responses to examine common themes. And other ways that NLP can help understand the work experiences of non-human animals is by developing tools that can support research or measurement, such as identifying relevant I.O. journals, or predicting job-related skills based on behavioral task descriptions of a population that is largely non-verbal.

[Tara Behrend] (2:05 - 2:21)
I love that. Your work is definitely not just the same old I.O. topics. It kind of makes me think of Steven Robeberg's old work with animal shelters, and I'm sure that you're very closely connected to that body of work.

Okay, and Lewis, how about you?

[Louis Hickman] (2:22 - 3:31)
Well, my assistant professor here at Virginia Tech. I'm also just recently started as a visiting academic at Amazon. So now I guess what that means is I have to say things like my views and opinions don't represent Amazon's policy or positions when I talk.

My research focuses on applications of machine learning, natural language processing, artificial intelligence, mostly the problems of selection and assessment, but of course now branching out as we get to that area, being a bit more mature and saturated and trying to figure out how we can use this to help promote individual development, training and development, these topics that maybe can help individuals more than as well as help any organization.

My family were sick over the holiday break. I started growing up my facial hair with the hopes that I can shave it soon and then scare my seven-month-old baby where I come out and she doesn't recognize me, and we all get a good laugh and the video that we can show her when she's older from that. Really excited to chat with y'all.

I've done a couple trips as an academic internationally. Both times I got to hang out with either Richard or Tara. So it was fun to get to chat with y'all today.

[Tara Behrend] (3:31 - 3:52)
Yeah, the time-honored tradition of psychologists experimenting on their own children. I'm glad to see you alive and well. That's great.

Well, this is not one of the questions that we sent you, but I'm curious and I wanted to ask you before we get started, what do you think of the fact that we are talking about NLP in the same breath as qualitative methods? Do you find that scandalous?

[Louis Hickman] (3:53 - 4:02)
It's qualitative data, right? That NLP works with, I think kind of by definition. So it makes a lot of sense.

I don't know if Ivan has a different view.

[Ivan Hernandez] (4:02 - 4:52)
I love that. I truly love that. We'll probably explore it in the subsequent questions because they're very qualitative analysis oriented.

But NLP doesn't just have to be word frequencies or these kind of simplifications of language. I think a lot of times why I use NLP is for nuance. NLP extends often a lot more beyond, I think, the economical quantitative uses.

I mentioned topic analysis is there. That's something that you could apply Gaussian extra modeling to. But it's also like translational research, right?

It's a very qualitative domain. You don't necessarily get numbers out of that. You just want to go from one person's understanding in one context to another individual's understanding in a different context.

So I think it's cool that you're putting it in that sort of context because I think it's the strength of what NLP can often offer.

[Tara Behrend] (4:53 - 5:34)
I felt a little subversive, but I agree with you that it not only transforms the way that we do qualitative research, but itself just adds so many layers of richness and meaning and can help us understand things in different ways. So I'm glad to hear it. All right, so this field that you both work in is really fast-changing and growing very quickly.

I'm curious to know just what kinds of things you read. So if you could tell me, each of you, what's the most thought-provoking paper that you have read in the last year? And tell us a little bit.

It doesn't have to really specifically to your own research, but just something that really got you thinking in the last year. And maybe we'll start with you, Lewis.

[Louis Hickman] (5:34 - 7:15)
I guess I think a lot now about this Reth J. et al. paper that came out in PNAS last year.

It's titled, GPT is an effective tool for multilingual psychological text analysis. Basically showing that across several different variables, large language models outperform models that you train specifically for a task in natural language processing, whether that's sentiment analysis or some other classification or regression task, you apply LLMs to do the same thing. They give validity that's more or less the same.

Andrew Speer has a paper in organizational research method that shows the same thing, but in one very specific context. What that means now is we can potentially apply these methods to a much broader set of problems than we could before because we don't need nearly as much labeled training data in order to apply NLP to research. It reminds me too of Richard's paper that's in psych methods showing the sample size that we need to replace human content coders.

Well, now the sample size is zero. Now you can use LLMs to do that content coding. You don't have to train a model.

You don't have to know a lot of code, what they don't cover in that paper that I think is really important because you still need some evidence that it's accurate or valid. You can't just throw it at it and say, well, let's assume validity like we do for none of our other measures in our whole field. You still need to show some evidence of accuracy and or validity.

That way we have confidence that you're actually measuring what you claim you're measuring, but we can be pretty confident that most of the time it's going to work pretty well if you kind of play around with a prompt until it seems like it's working well.

[Ivan Hernandez] (7:16 - 11:19)
Funny that you picked that paper, Louis. We didn't coordinate anything on our responses. But the paper I chose was also a paper that was kind of on the paucity of information needed in order to derive some sort of semantic understanding.

It was a paper that came out in NeurIPS, which I don't know how much the audience knows, but NeurIPS is like the leading conference in computer science. It's usually where the greatest sort of innovations usually come out of, in terms of these large language models and neural networks. This paper was by Goldswatter and colleagues where they presented this theory of unsupervised translation.

Unsupervised translation is when you want to know what a population that speaks some language is saying, but you really only have that language itself to learn the meaning of. You don't really have a ton of parallel translations, if any. It proposed that the more complex a language is, the easier it is to translate.

That's a little counterintuitive, but it's really because language is made of units. You can think of them like words. And those units have to fit together.

We don't say things like, I swallowed the sun. That just isn't really something that works within our physical understanding of the environments that we perceive. Complexity in a language is the information richness of the units.

Imagine a puzzle where all the pieces look the same. It'd be really hard to put that puzzle together. But if the pieces were more clearly defined in their shape, they were more informationally rich, then it's a lot easier to put all the puzzle pieces together because you can tell where everything belongs once you know where other things belong.

And so what's interesting is that language is a very information-rich modality and things that are information-rich tend to follow a very specific distribution. It's called Zifian distribution. So every single language in English, I'm sorry, in the world, like English, like the Hansa of China, like Arabic, Spanish.

If you look at the frequency of the most common word, it occurs twice as often as the frequency of the next most common word and three times as common as the next most common word, four times as common as the next most common word, et cetera. Generally things are information-rich tend to follow that distribution. And researchers find that, for example, like dolphins, their whistles follow a Zifian distribution.

So they have probably a lot of information-richness in their communication. Why I think this paper is interesting is because it really is trying to establish what you need in order to be able to have unsupervised translation. And it kind of is like a power analysis for translation where it's showing that if you have greater complexity in the language, greater information-richness, you don't need as many samples and you don't need as much compute time.

And it's showing that, really, there seems to be enough richness with the animal language that we might be able to tractably create unsupervised translation models of what they're saying. Really, it just becomes in a matter of, like, can we gather a feasible enough amount of samples? And given the richness, it seems like we can gather a feasible amount of samples.

And do we have enough compute power? So I think that's a really fascinating prospect of the future of sort of understanding other cultures and cultures that are, like, organizationally affected. Because, for example, the Navy works with dolphins.

They've been working with them since, like, the 1960s with the naval and marine mammal program. They use them to detect where mines are. And our submarines are sonar effects, like whale and migrations, that whales have because whales use acoustics to be able to communicate with each other and locate each other.

So the sonar can interfere. Being able to translate to those kind of populations, I think, can be a way to, you know, coexist more peacefully and be able to perhaps warn them of impending dangers and understand to what extent our organizations are affecting their welfare.

[Tara Behrend] (11:20 - 12:33)
Well, I love that idea for so many reasons. One of them is that one of my favorite books of all time is The Hitchhiker Sketch. The Galaxy, in which, you know, understanding what the animals are saying is quite an important plot point.

I think I wonder what that book would look like if it were written today. But I appreciate the broader point that you're both making about the amount of data not being nearly as relevant as the quality of that data. And I think that's an important point to keep in mind just given the various conversations now about how the models have sucked up all the information on the Internet already.

And, you know, as they look for more data to turn on, they're sort of absorbing things that are just lower quality slop. What I hear you both saying is maybe that's not necessary, that maybe making sure that the data they do have is high quality is more important. So that's a really great point.

Let me shift gears a little bit. So, you know, I want you to reflect a little bit about the fact that a lot of the traditional qualitative methods that people have used for decades have been replaced with this sort of very data-driven, potentially a theoretical NLP approach. What do you think about this shift, both in terms of what we've gained and what we've potentially lost from this change?

[Ivan Hernandez] (12:34 - 16:33)
I think sometimes we think about the procedural gains that, like, modern NLP, especially the computationally-based methods, offer relative to traditional qualitative methods, things like human-based content analysis. A lot of times you read in papers that they say, you know, these methods, the reason I want to use them is because they're faster, they can be more applied easily at scale. But I always thought that that was like a double-edged sword.

Like, those aren't necessarily great properties if the accuracy or the validity of those methods wasn't so good. Because if that's true, then you're just applying something that's more erroneous more quickly or, like, more often. And so if you get garbage coming out of a model and you have it fast and at scale, then you really just end up with, like, a lot of garbage.

So that was, I think, where a lot of NLP methods maybe started out, where they were fast and they were quicker. Things, you know, I don't want to call any, like, methods out specifically, but ones that are very, like, rudimentary, ones that focus more on, like, kind of just, like, lack of context and there's sort of, like, nuance in terms of being able to have words have made different types of meanings. Those earlier methods, like, they were fast and they were able to be, you know, applied with the click of a button.

But I don't know if they necessarily provided more information about, like, a problem that we were studying. I think that sometimes the way you look at it, you might have been coming to a wrong conclusion simply because, like, some words were more frequent and we erroneously inferred that, like, oh, this word must correspond to this construct. Nowadays, though, I think, like, these models offer a sort of edge on these aspects of validity that the earlier approaches didn't have.

So things like being able to, for example, like, Louis talked about multilingual settings, right? Like, being able to do research where you can keep a person's language in the original context versus having to translate it to English but have it, like, analyzed in the same exact embedding space, I think it's really amazing for being able to, like, capture some of those things that we appreciate about different languages being able to do that, like, English might not be able to convey. In terms of separating experimenter expectations from, like, our analyses, I think it's something that was really challenging with, like, older, qualitative, human-based content approaches versus now, like, you know, I think, like, our methods are able to be a little bit more automated in a way where you can't push the result as easily in, like, the way that you might expect or even, like, to constantly push the results.

So I love gaining those types of aspects of research, just sort of believing things in their original form, preserving the nuance or, like, the original semantic meaning, and being able to, like, you know, remove, you know, I don't know if you want to call it bias, but some sort of, like, expectancy effect from the analyses. I think those are phenomenal. What you lose, though, I mean, like, if I were to say downsides, it's really easy for these things to be applied, like, now for purposes that may not agree with our values.

So, for example, like, online, if you wanted to astroturf as some sort of, like, nation that was opposed to sort of, like, some type of value that we were trying to espouse in the United States, like, maybe we espouse things like egalitarianism or pro-LGBTQ sort of views. As a nation, it's very easy to apply these kinds of models and get very nuanced human-like responses that are generated that seem to come from actual people. And that also complicates research, like, a lot of my earlier research was talking about how you can use Twitter or, like, social media to understand people, but now it's really hard to know if, like, the responses, the text online is actually of human origin.

So, it complicates that sort of nice sort of population that was convenient to look at before in terms of being able to derive, like, truths about humanity.

[Tara Behrend] (16:34 - 16:38)
Yeah, terrific. Louis, did you want to elaborate or call out anything that I've addressed?

[Louis Hickman] (16:38 - 20:05)
Sure, and I can echo, you know, probably the most common question that I get when I talk to, like, you know, I taught Karma course last week on predictive analytics and machine learning, right? And kind of the most common application of predictive analytics and machine learning in organizational research is with NLP. Almost always, you get questions about topic modeling, and most of the topic modeling research published is using the type of methods that Ivan was talking about, where they're really counting words and phrases in this decontextualized sense, and if anybody's ever run topic modeling themselves with those approaches, what they know is most of the topics you get out are garbage.

They don't make any sense. You can look at them and try to interpret them, but it's pretty tough most of the time in my experience, and then the topics from dataset to dataset are different, and so it makes it hard to create a cumulative science and develop things that are building on each other because you're getting different topics out with potentially different labels every time. I know Ivan's working on new methods that address that better with some of these modern NLP methods that better understand context, better understand meaning, don't just rely on counting words and phrases.

I think that's part of what's so exciting. In addition to what I was mentioning, there are efficiencies gained from NLP that are only desirable if we have high-quality methods. I think there's also, at times, a lot of reliability to be gained.

It's not true of LLMs, what I'm about to say, but if we're talking about the previous generation where we're training a supervised model, if I give it the same data tomorrow that I gave it today, it's going to give me the same output. If I give it that same data a year from now, it's going to give me the same output if I haven't changed the model. With humans, whether it's scoring an interview or doing content coding, we just can't guarantee that that's the case.

I go through pre-antibiotics for being sick versus post-antibiotics. I may be giving different ratings. I forget my coffee in the morning or I spill my coffee on my jacket.

Suddenly, I'm giving different rating than I might. In addition, you've got multiple human raters making these ratings who have inter-rater unreliability, in addition to the intra-rater unreliability that can happen. I think there is a lot of potential for really scaling up in a reliable way a lot of the work that we do.

I think that what... I guess I've seen some initial works that are trying to do this, but I don't think it's there yet. I think when it comes to grounded theory, we're still probably not quite there, except maybe in a very hybrid approach where you're using the AI as a kind of assistant and you're working together to figure out what the possible codes are that you're going to label and then maybe use the AI to help you categorize them into a smaller group of codes before then you and the AI potentially going and labeling all of those. But I think in terms of content coding, I think we should be using AI when we can, if we've got our set of codes decided and we've got something well-defined that we want to do, but that I'm less confident still at this stage in the inductive and abductive insights coming from AI that we can get when we're humans reviewing data and reviewing narratives and trying to uncover that new thing that hasn't been noticed yet.

[Tara Behrend] (20:06 - 20:30)
So most IO graduate programs do not teach NLP methods. My first question is, should they? Is that something they should be thinking about?

And what advice would you give to students who are in an IO graduate program who may be curious about these methods and want to learn more? You can tell us about how you yourself gain these skills and what you'd recommend to other people.

[Louis Hickman] (20:31 - 24:02)
I think we both have pretty idiosyncratic paths to this. I'd say first, I don't remember a ton of content on qualitative research in grad school, even short of NLP. I know there must have been some there, but then if you don't go apply it right away, tough to remember.

And it was just part of the broader research methods class which focuses so much on experimental design. And I guess I've run one or two experiments now, but still not the most common thing that I do. I got similar questions in this karma class I was teaching last week.

There are textbooks and things out there that now versus 2018, when I really started in earnest doing this work, there are just so many more resources available online to get started with this stuff if you're interested, including organizational research publications, whether it's coming from IO psychologists or management scholars. There's so much out there, so many good opportunities to get up to speed with things. We could make a huge list and share that with folks because there are so many of them out there.

What I find interesting is I do get reviewers telling me at times, what do you need to tell people how to do this stuff for? There's textbooks from 1999 to tell them how to do this or whatever. Then I go look at what they're mentioning in the textbook and I see that, oh, it disagrees with this paper that person just wrote last year and so on, because things have changed.

Those count-based methods from back then, the same recommendations aren't necessarily going to hold now that we're working with transformer-based models and LLMs. I do think that now there's much less education required in a lot of ways if people are going to rely on LLMs for the research that they're doing. You don't need much coding skill. You don't need to even know that much about NLP.

You need to have some data you can validate the outputs against, play around with a prompt, maybe look up prompting best practices, but prompts seem to be making less of an influence on model outputs as the models get more powerful. Finding a couple of these papers that have showed how you go about doing this and how you can use it in research, great places to start. Then if you have a need for some reason to go deeper, go deeper, but I think really try to let what's the research problem at hand, the research question, the data that you have.

Don't be picking up NLP for the purposes of picking up NLP. Pick up NLP because you've got some problem you want to solve with it. Otherwise, I know I found it very difficult to learn to code without having something that I was actively doing to help me learn it.

I dropped out of two computer science courses while I was in grad school because I was going to fail them. Now I teach predictive analytics and machine learning and Karma and do all this work with code because I've had so many things I've wanted to do with the code. I've been able to go out and find resources and examples to help me do it, but I think learning in that more decontextualized sense, at least for me, doesn't work and I don't see the point of it.

I guess I'm conflicted if they should teach NLP in programs. I think it's helpful for people to have someone there they can talk to and help. I don't think that every student should become an expert in NLP.

Just like every student doesn't become an expert in qualitative methods, every student doesn't become an expert in multi-level modeling.

[Tara Behrend] (24:02 - 24:20)
It sounds like what you're saying is that the domain expertise is what's really important. Understanding things like reliability and validity are important. Understanding the content area that you want to ask questions about, but that the coding can come if you have that expertise.

Is that a fair summary?

[Louis Hickman] (24:20 - 25:18)
I think that's very fair. It's key that this previous generation of models where we were building these supervised predictive models to go back and tell people on supervised-supervised reinforcement and break them down, I don't know. When we were doing that, the domain expertise is so important because you need to figure out what predictors are irrelevant if you're going to make a model that's reliable, valid, and unbiased.

Say we're going to put in a bunch of facial information in an interview evaluator. You're going to get differences as a function of glasses, potentially skin color, weight, attractiveness. All these things could suddenly be contaminating your model and biasing it in ways that you don't want.

So that domain expertise, I think, is really important for making sure that you're understanding how and why you're applying the model, what data that you're using, if it makes sense to use it. It's key all throughout.

[Tara Behrend] (25:18 - 25:21)
Great. Ivan, what do you think about that? Do you agree?

[Ivan Hernandez] (25:22 - 28:18)
I think for teaching, it might depend on what level of teaching you mean. Certainly, one option is teaching entire courses that are just NLP-based the way we might teach an entire course off of structural equation modeling or something like that. I think that could be challenging if the faculty's not familiar with the implementation of techniques.

It does take a little bit of time to learn. And so I think that it could be a good idea for a department, certainly, because NLP is this bridge that can work in many different topic areas. Yes, she does leadership research, so she can analyze, for example, construct contamination amongst leadership items.

Amal Sheely, she does diversity research, so she can understand what are nuanced ways to understand how people identify, what their identity is in a much more depth than maybe demographics might impart or demographic questionnaires would traditionally impart. Chris Wynn, he does personality research and understand traits and being able to expedite very burdensome coding systems like the CAVE system for understanding people's internal versus external attributions, that's something NLP can apply to. These are very distinct areas of IO and psychology that benefit all from NLP in some way.

So I think it's one of these methods that if a department would want to give students maybe a lot of ability to transition or to offer many different types of insights, yeah, it's a great approach. But if they are not really well versed in it, then I think they should consider perhaps hiring in those directions for individuals who have that sort of expertise. Just because it is so hard, I think, as a faculty to learn an entirely different kind of set of techniques, unless you take Louis Hickman's card in the course.

So I will say, though, at a lower level of teaching it, I think absolutely, even if a department doesn't offer meta-analysis as its own course, I think we'd all agree that grad students should be exposed to the idea of what a meta-analysis is, that even if an area didn't offer, let's say, structural equation modeling as a course, they should certainly know about it. Because these are not just tools, they're also perspectives, they're ways of thinking about problems, being able to think about the problem of, how do I reconcile disparate findings? A meta-analysis is this perspective of aggregation, right?

Being able to think about processes, like the structural equation modeling is perhaps a way to conceptualize the way that processes interrelate to each other. And with that sort of processing, I think it's also this perspective of, what is data? Data is not just like Excel tables, it's things left in their kind of natural, unstructured form as well.

So I would love for students to, I think, be able to have that perspective, so that when they see problems, they don't maybe see them as impossible or insurmountable, like, oh yeah, no, I think we can do this.

[Tara Behrend] (28:18 - 28:45)
I love that point, that we can think of these analytical approaches as ways of thinking about problems, and the more ways a student has to think about a problem, the less likely it is that they'll be shoving everything into the newest approach, just because it sounds cool, but they'll really be asking the right kinds of questions and then finding the approaches that support that. Richard, I also wanted to ask you, since I know you teach a lot of coding-heavy classes, what your perspective is on this.

[Richard Landers] (28:46 - 29:55)
Sure. Well, I actually had a follow-up question, because the thing that I worry the most about in this area is the sort of downsides to democratization of tools. You know, we arguably in psych have seen a lot of downsides to the use of access to very powerful analytic tools.

Going back as far as SPSS, probably ruining generations of researchers thinking they can click a button to figure out if something's true or not, right? And we've seen similar problems in SEM. We've seen them in that analysis.

As soon as you make these tools easily available, very suddenly you have people that don't really fully understand what they're doing using those and getting that work published, and it's appearing in the literature and affecting the way that future researchers think about problems. So large language models, in particular, seem like an area of significant risk because they're incredibly accessible. Very simple to open chat GBT and say, hey, analyze this text and give me a number.

So I'm curious what you both think the risks are here. How severe is that problem? Can we avoid it?

Is this inevitable that we're going to see incredible misuse of NLP over the next decade? Where are we headed?

[Louis Hickman] (29:56 - 32:34)
Just in research or in things like I would mention it, because I think there's plenty of misuse of LLM already going on, right? And people already complaining about Google search results or other search results. I guess I shouldn't name a specific company.

The results being dominated by AI-generated garbage, especially when it comes to image search, where the image search is being dominated by AI-generated images that are not reality. So there's a big risk to the entire internet for one. I think I don't know probably well enough about some of the previous issues to know where all they emerged, although I imagine they emerged everywhere.

But I do think that the current widespread proliferation of research is itself inherently a problem, that there are millions or whatever it is now of journals that are out there publishing research. And to the outside observer, they can't tell them apart. People in fields can tell them apart.

But then we also know that there are issues at the journals that are supposed to be particularly reputable that make it where folks maybe have to go publish things about gamification elsewhere or other topics where you have to go to different areas than journals core to R or somebody else's field to get knowledge that we think is valuable out there. So it's like any tool, it can be misused. I think if we can, well, I don't know if that's true.

The optimistic view, right, is if we can get information out there to say what needs to be done to trust work done with LLMs, then we can help minimize that misuse. But then it requires authors and or reviewers being aware of those best practice guidelines and information to prevent it from going out into the world. As here it is, we just ignored what people said was important.

We put something out there that wasn't validated and you can't really rely on it. I'll tell you what, I go back and look at a lot of this work that's been done with count-based NLP. And I go, oh, I don't know if I can trust a lot of these results, particularly when it's based on topic modeling, because I can't tell what they did a lot of the time because they don't really tell me and they don't give me their code.

And then I'm really wondering what on earth they even did. So it's just, I guess Pandora's box is open in the research world. The LLMs are just the latest thing to emerge out of it.

I don't worry about it more than I worry about every other method, I think.

[Tara Behrend] (32:35 - 32:41)
It gives these days, right? That seems to be the essence of that. I think every...

[Louis Hickman] (32:41 - 33:36)
It's not just kids, though. I think there are senior researchers and scholars interested in applying these tools or encouraging their students to apply these tools and supervising them and using them and not necessarily knowing the issue's limitations or what you should do to... I got a chance, because I was reviewing a qualitative paper recently to read about this perspective that qualitative research should be trustworthy as opposed to focusing on validity.

It's hard to make that case, maybe, for LLMs being trustworthy. But I think in qualitative research, you have to think about it the same way that you think about making human evaluation and judgment trustworthy. And then when it comes to the quantitative research that you're using LLMs for, we want reliability and validity.

I think if you can give us that, we're probably doing okay, probably doing better than nothing.

[Tara Behrend] (33:37 - 34:23)
I agree. I mean, I think my joke is because I think people have been worried about the downfall of deep thought for a thousand years. And we're still here for now, so we don't have to worry too much.

But let me ask you another question. So both of you are, I think, very obviously experts in this area and also you're psychologists and not computer scientists. And so how do you find that balance between being an expert yourself and staying on top of new developments versus collaborating with people in other fields and bringing the psychology to a collaboration where they're bringing the cutting edge computer science?

Like, do you have any examples you can talk about or just tell us about how you think about that balance?

[Louis Hickman] (34:24 - 36:50)
Well, either maybe I can share a funny example where the computer scientists failed their computer science task, and then I had to do the computer science thing. I'm actually a hybrid. My master's is in computer and information technology, specializing in NLP.

Cool thing about that is people think it's computer science. It wasn't. Instead, I had to drop out of the computer science classes I took in grad school.

But we had this funded project when I was in grad school as a PhD student in I.O. It's collaboration with some computer scientists, and they were supposed to be building all the predictive models for automatically scoring interviews in this collaboration. For some reason, tasked some terminal master's student with the task and came back with this super complex model that he was getting no validity out of. This was after he's been working on it for a year, and they're telling us, oh, there's no signal in your data.

You collected garbage, so you were putting garbage into the model. Now we're getting garbage out. I went and trained a ridge regression model, which was much simpler than what they had done and got validity coefficients around .5. And I go, no, you're wrong. I don't know what you're doing, but it's not working. And the domain expertise is relevant there, and a lot of our data is not so complex that it needs super powerful models. Language is special in that way in that it is really complex capturing semantics and the nuances of meaning.

But when it comes to tabular data, you don't need a super complex model. And so sometimes computer scientists think, oh, I've got my super complex model. That's my hammer, and everything's a nail, and I need always to have a super complex model, or we're not going to be able to make a computer science contribution.

But generally, that's just not the case for a lot of human data. It tends to need to be a little bit simpler. So the computer science collaborations can be good.

I've had some that we've gone and published, and I'm still applying for grants with computer scientists where they're going to do a lot of that technical work. But sometimes it's nice just to be able to run it on your own, not have to worry, and to know that you know your data well. You know kind of what's worked on similar data in the past, and you can build a model that's effective, even if not, you know, earth shattering.

So sometimes the computer scientists can be great. Sometimes you have to take over, because otherwise your project's not going to get done. But that's probably pretty idiosyncratic to my experience.

[Ivan Hernandez] (36:51 - 41:08)
Yeah, that's a good question, just because in some ways, you know, I do think that there are going to be individual differences in terms of researchers and, you know, how complex like what they need is and to what extent like that can be certified. Collaborative computer scientist. But, you know, Richard, I love the example you gave of like how we have this issue already with like traditional statistics, right?

Like not everyone runs multi-level models. And I think in a way that is like up to snuff with how statisticians might say we should run it, or even like how we interpret like very basic, frequentous kinds of things. I think, you know, we've seen a lot of like pushback and literature that like we actually are making very common mistakes in terms of just like the simplest of statistics.

But that said, I think that we are, as psychologists and like social science researchers, able to run these models independently. I think like correction is certainly helpful. But once those corrections are understood, then like, you know, it's okay to not always work with a statistician every time you want to run like a regression, or interpret a p-value, or things like that.

So basically what I'm trying to say is that I believe our goal of nanopsychology in this quality of domain is largely going to be to apply the techniques that computer scientists have developed towards problems that we understand very well. And that ability to apply, I don't think we have to specialize in developing methodology the way that computer scientists do. I think that we can do that.

Same with like researchers in biology. They draw from advances in chemistry or physics that they've methodologically developed, but they can use those to impact the world just on their own sort of application of them. Medical researchers apply the types of techniques developed by biologists to create these profound benefits for the world.

And I think in psychology, if we think about just like really understand techniques well enough to go to apply them, then we're fully capable of, I think, like creating very profound impacts or profound benefits in the workplace through proper application of them. So I think, you know, short answer, in the learning phase it'd be great to like run these ideas off of like computer scientists, you know, like just kind of make sure that you're using techniques appropriately. But once you have that, you know, I think like you're able to work more independently and also supervise other psychologists, sort of as reviewers for journal articles where like, you know, you have to review your peers and things like that.

But that's all from like a strong foundation that was created by learning from computer scientists. I'm really inspired by, Terry, you made a comment, like I think before this call where it was, you know, emphasizing the importance of community, having like sort of a club, right, where you can interface and interact with individuals who are, you know, working on a very similar kind of perspective of machine learning and NLP and AI, because there's so much out there that's like constantly being developed.

I think it is challenging to always stay on top of all of it through these kind of like collaborations, these discussions, being able to have like, you know, these conversations. I think it's a great way to sort of bridge that distance of, you know, understanding what can be relevant towards the problems that you're working on, and also like help others, like with the problems that they're working on, with the knowledge that you've gained. As you can see, Louis and I, like, you know, we both are in this space, but the kinds of papers that resonate us and we can, you know, we can, but also like have overlap that, you know, we can draw from that and use that to like have a more, I think, like nuanced understanding.

So I would tell like an early version of myself to really, I think, like embrace that community and cultivate it.

[Louis Hickman] (41:08 - 43:38)
Well, it would have been nice to have that back in 2018. I was muddling my way through and, you know, emailing the one or two researchers I knew they could give advice to ask for advice, but I guess I was doing that already by reaching out. but it'd be nice for that to be formalized.

I think my advice kind of skews in a, maybe the opposite direction of how I would think it would, because I think what I would have benefited from would be to focus even more on the substantive components of the research I'm doing, not the methods components actually, but focusing more on that foundational understanding of what's the research that's brought to bear on this? What does this look like in practice? And using that as a foundation to build from better, because I think I've been fortunate in that I've been able to proceed, I think, in a pretty phenomenological fashion with a lot of my research, but I think that I'd be even better positioned with an even stronger foundation and understanding the content and having that domain expertise that we talked about earlier, but then people who know me know I'm my own hardest critic. And so, you know, that's maybe pushing me in an area that I, you know, every day feel like there's just always more to learn about IO psychology and relevant topics that I haven't seen yet, whether it's recent papers, whether it's papers from the 90s, whether it's papers from the 60s, where I come across things that I'm like, wow, they spoke to this problem that we're working on today with such clarity, because there was maybe less publishing and less noise back then in terms of the research world that I think would just serve me and anybody interested in research to continue to dig in, make sure that you read, don't rely on the LLMs for reading, still go and gain that because it's necessary to build the research in a way that's received well by reviewers of grant applications and papers. I think that's so key for building yourself up for success rather than just trying to get papers out, which is often what we're focused on.

[Louis Hickman] (44:09 - 44:10)
See you there. Thanks again.

[Richard Landers] (44:10 - 44:23)
That's it for another gig. To stay in touch, subscribe on YouTube, check out our website at thegig.online, join our LinkedIn group, sign up for our email notification list and join our Discord. Thanks for joining us and see you next time for another great IO Get Together.