Building Better Tests: Psychometrics at Microsoft with Dr. Liberty Munson

The Great IO Get-Together (The GIG)
00:00 / 00:34:36
Dr. Liberty Munson, Director of Psychometrics at Microsoft, shares her journey from Boeing employee selection to overseeing millions of global certification exams. She explores the similarities and differences between research scale development and high-stakes certification testing, discusses innovative item types beyond multiple choice, and addresses the challenges of being a lone psychometrician in a large organization. The conversation covers AI’s impact on test security, the role of human expertise in psychometric work, and practical advice for IO psychology professionals navigating internal organizational roles. Dr. Munson emphasizes the importance of building credibility, knowing when to advocate for psychometric rigor, and maintaining work-life balance throughout a long career.
Key Takeaways:
High-stakes certification testing requires more rigorous development processes with hundreds of subject matter experts involved
Microsoft uses diverse item types including case studies, hot areas, active screens, and drag-and-drop questions beyond traditional multiple choice
AI tools can assist with item generation but require human oversight to maintain quality and prevent bias
Building credibility as an internal IO psychologist involves admitting knowledge gaps while demonstrating problem-solving skills
Knowing “which battles to fight” is crucial for lone psychometricians in organizational settings
Work-life balance and finding humor in challenges sustains long-term career satisfaction
Recent research shows concerning bias in perceptions of AI use, particularly affecting women and minorities
Agent-based AI approaches show promise for automated item generation by splitting tasks among specialized roles
Test development fundamentals remain consistent whether building research scales or certification exams
Website:
https://thegig.online/
Follow us on LinkedIn: https://www.linkedin.com/company/great-io/
Join our Discord here: https://discord.gg/JcTcMu335K
Join The GIG Email List: https://docs.google.com/forms/d/e/1FAIpQLSfVQ4hyF8MA4G9W-ERwVL8_e91a-MUMuhNvxhXmgkSFUDFatg/viewform?embedded=true%22
00:00 – Introduction and Welcome
00:31 – Dr. Munson’s Career Path and Origins
03:32 – Scale Development vs Certification
08:48 – Interactive Item Types at Microsoft
11:24 – Test Security and AI Challenges
16:40 – The Future of Psychometric Careers
19:08 – Being the Lone Psychometrician
22:48 – Fighting Battles and Building Trust
27:50 – Imposter Syndrome and Credibility
30:41 – Recent Research Recommendations
34:00 – Closing Thoughts
Transcript

[Richard Landers] (0:00 - 0:31)
Welcome to the great IO get together on tonight's show quips and queries about the world of work as IO psychology comes alive Please welcome our hosts Richard Welcome everyone to great IO get together number 36. My name is Richard. This is my cohost Tara today We are exploring chapter 10 of our textbook research methods for IO psychology, which is all about scale and test development So to help us all make better tests on the show today We have dr.

Liberty Munson director of psychometrics at Microsoft.

[Tara Behrend] (0:31 - 0:59)
Welcome to the show. Thank you for having me. I'm so excited about this We're really excited to have you So let's just start off by understanding a little bit more about the work that you do So your work involves overseeing the deployment of this massive global very high-stakes tests huge audiences So this is something that a lot of IO psychologists are probably not familiar with just operating at this scale Maybe you could start by telling us how you got into this field and and give us a high-level picture of your career so far

[Liberty Munson] (0:59 - 3:32)
So my origin story So my PhD industrial organizational psychology, of course from the University of Illinois at Urbana-Champaign When I got done with graduate school My first job was at Boeing in their employee selection and I got interested in testing because I ended up doing an internship At GTE before it became Verizon. So that kind of that gives you that ages me But I really fell in love with the concept of assessment and testing by doing that internship at GTE So when I got done and I started looking for a job, I wanted to do selection testing And so that's what I went to do at Boeing and I did that for a few years and then Boeing had a reorg Where they brought the employee survey in with the selection stuff and it was really cool because I got to work on both Of them then, of course, they did another reorg So if like if you haven't been in organizations reorgs reorgs happen all of the time And so when they did that they split out again the selection and the employee survey and they asked me which one do you want To work on and since I wanted to kind of grow my skills I thought I'll go work on the employee survey So I was the co-lead on the employee survey for Boeing for a few years and then I realized, you know I really like assessment better. I really like tests exams, whatever people want to call them And so how do I get back into testing, but I'm not going to leave the Pacific Northwest So my options were somewhat limited the IO community in Seattle is not huge or it wasn't at the time I started looking for different opportunities and being in the Pacific Northwest another great company to work for is Microsoft, right?

So at the time they were hiring a psychometrician and so wasn't selection testing But it got me back into doing something that I really love and so I came to Microsoft Almost 19 years ago to become their psychometrician on their technical It was a certification program at the time and now it's a it's expanded and it's more about credentialing I like to say I have a unique set of skills because I have stayed in this role Although of course my responsibilities have changed I've been promoted and things like that But my whole career at Microsoft has been the psychometrician and that's unusual at Microsoft Most people stay in a role of like three to five years and then they move to a different role and they get lots Of experience around the company, but I really love the work that I do Microsoft certification program really changes people's lives And I think that's the power of the Microsoft platform that we deliver millions of exams a year We're certifying millions of people a year and I'm the only psychometrician So that it's just like really cool to have that kind of influence over so many people who are getting certified in those

[Tara Behrend] (3:32 - 4:08)
Certifications can change people's lives and hugely impactful and just as a side note it must be so interesting to have a front seat to all these conversations about alternative credentials and Badges and certifications and just be thinking about it from from what you've seen in people and how these credentials can change people's careers Yeah, so walk us through what are the what are the basic sort of high-level steps major steps in this work? And how does making a test that goes out to millions of people look different than maybe making a scale that we would use In research paper, this is kind of an interesting question.

[Liberty Munson] (4:08 - 8:48)
And when you asked me to be a part of this I started thinking about so what are the big differences and what are the similarities when it comes to creating a certification exam versus a Research scale and so of course there's some obvious similarities No matter what you're building when you're building some sort of evaluation or assessment You have to start with a clear definition of what you're trying to measure So when you think about creating a scale you're probably going to start by doing some sort of literature review But when it comes to creating an exam you start by talking to subject matter experts But the the point is is that no matter how you get started the start is really understanding what you're trying to measure and why When you think about the items and the way you're going to develop the items There's some consistencies around how you think about item development, whether it's a scale or an exam You want the items to be clear you want them to be unbiased and at the end of the day They need to be aligned to whatever a construct you're trying to measure So you will pilot test those items to see how they perform statistically Make sure they're valid and reliable to ensure that the tool that you're measuring is measuring what it's supposed to be measuring In the case of exams, and I think some so this is where you start seeing some of those differences I'm looking at characteristics of the items such as p-value, which is the percent of people who answer it correctly Point by serial correlation. So how well is it differentiating between high and low performers?

And then Microsoft has always been a one factor one parameter. I RT model We like so I look at rice rush item difficulty But then of course there's some big differences when you look at the purpose and the stakes so certification is high stakes It's going to be used to make decisions about whether someone is qualified to perform a job or to have this Certification so the development process is much more rigorous and it requires much more input from subject matter experts So when I think about the Microsoft certification exam I'm having hundreds of people look at those when you consider the number of people I invite to participate in a beta test Hundreds of the subject matter experts are playing a role in helping us define the content domain Which usually starts with like a job or task analysis so we can figure out what we're going to measure We also use a wider variety of items So when you think about scales, you're usually talking about likert and things like that But when we're talking about a certification exam, we have the traditional multiple choice, of course But Microsoft also one of the very first things when I started working at Microsoft is I wanted to move us away from just Multiple choice questions to more interactive item types.

So we have things like case studies. We have hot area We have active screen. We have drag-and-drop.

We have build list and some of our exams. We even have labs The exams do need to be legally defensible fair and secure I did learn early on in my career at Microsoft to not use the word legally defensible so much because people in Organizations don't react as well as you might think they should to the words legally defensible They think I'm putting handcuffs on their ability to do what they need to do So I needed to kind of change so as you enter the workforce if you choose to go into organizations Just kind of keep that in mind There's not as much tolerance around words like legally defensible But you can talk about these things without using those words But it does have to be because oftentimes certifications do play a role in a decision on whether somebody is offered a job or hired into a job or not or promoted and Then there's the concept of a cut score So I have to make a decision about whether somebody's qualified or not and you don't see that kind of thing on Research scale, of course with the cut score What's interesting is one of the things that I learned is there's a lot of baggage when people could bring like they spend 12 to 18 Years taking tests right and usually those tests aren't great the ones you take in school And then you they take that baggage and they apply it to a certification exam And so I hear a lot from people where they're saying well my score I must be more qualified because my score is higher, but that's not the way it works with certifications, right? It's a it's a binary decision.

You're either qualified or not and whether your score is high or not That's not what this exam is optimized for it's optimized to make sure that I'm making the right decisions right around the cut score Another big difference is the security and maintenance So for certification exams, we have to make sure that they're secure and delivered in a way to ensure that people are Don't have unfair advantages to do better cheating is a huge huge thing that we have to deal with within the certification industry And when you think about my exams, I'm basing them on content in the cloud So I'm constantly having to update them and neither one of those kinds of ideas. I think apply to research scales So I like it comes down to they're very similar when you think about them at like the 30,000 foot level around how you develop them But because credentialing and certifications are high stakes there's and they're about decision making There's different process that has to go in place when I'm thinking about how we design and develop those Versus what you might do with the research scale, which is more about understanding and exploring human behavior or experience

[Tara Behrend] (8:48 - 9:38)
That's incredibly helpful. Yeah, and when you think about cheating It's obviously only an issue when people care about the outcome and and other researchers might be thinking about Insufficient effort or trying to identify people who don't care enough about taking the test So we're still we're still using our psychology skills. It sounds like to understand the experience of the test taker You know, we we tend to talk about in the field reliability validity and fairness as the gold standard of Test quality, right?

These are the standards that we hold ourselves to as a field Are there other pieces though? So you mentioned security you mentioned looking for cheating Are there other things that you have to take into account that that might not be on a researcher's mind? Like are you thinking about the cost of the exam for example or other things when you're evaluating whether this is a high quality exam?

[Liberty Munson] (9:38 - 18:25)
So for me the I don't know to what extent some of the ways I think about exams really other researchers Like if you're thinking about scales and stuff how much this would apply But one of the things I've been very focused on in the last few years is trying to make our exam experience more inclusive And accessible one of the challenges is because when you're doing a proctored exam There's a lot of friction in the system and that friction can make it and the experience it can be inaccessible and it can be exclusive, so how do we change it so more people can get certified so we've been doing things like I've Reimagined all of our policies around accommodations like what really does require documentation? Versus which is just a notification to the our exam delivery provider that that something is going to be different about the experience So we've really kind of doubled down on when is documentation really needed so medical documentation for many Disabilities is super expensive the fact that most Certification providers there's an age limit to how old the documentation can be if it's like it's got to be Microsoft's used to be it had to be less than two years old so that so if somebody wants to take a Microsoft certification exam, and they need some sort of accommodation And I'm saying that the documentation to demonstrate the need is got to be less than two years old A lot of people can't afford that some of the stuff especially for like ADHD and those kinds of diagnoses is a couple $12,000 so that makes it incredibly expensive for some people to be able to request the accommodation So we made some changes around being very specific when certain there's only two accommodations that require documentation one is extra time because that truly does provide an unfair advantage and One is having another person in the room because again having somebody in the room that we can't don't know who it is Can create an unfair advantage so we have removed a lot of the documentation requirements the rest of the accommodations are just a notification To the exam delivery provider so they can notify their proctors some cases it requires the exam to be configured differently And so we can just have the accommodation ready We are also allowing more behaviors during the exam than many other programs I did some research a couple years ago where we took and we looked at all the different types of behaviors that online Proctoring was saying you can't do like looking away from the monitor Fidgeting mumbling and things like that and we evaluated what the security risk was if we allowed people to just do those behaviors And so based on that research we kind of came up with the risk matrix And I made some decisions based on my risk tolerance for Microsoft certifications around what behaviors I would allow during an exam Because they're more likely to be driven by test anxiety than someone who's trying to cheat and then finally we also are allowing children In the room, which is actually opened up the Microsoft exams to more women Especially in third world countries where child care options are much more limited I'm just like I'm super proud about that kind of stuff that we've been doing but I don't know to what extent people think about The exam experience needing it should be more inclusive and accessible I do have the luxury that it's an IT certification and for the most part if somebody earns certification Then they're really not qualified think bad things aren't gonna happen so you know in medical disciplines They probably need do need to be more rigorous around some of these things that Microsoft's changing to make it more inclusive But I I'm just trying to balance that reality that we do want to have more people certified We want to give people more opportunities because good things happen to people who have certifications So but but at the end of the day, you know What my job really is is about finding the right balance between psychometric requirements for validity and reliability and the reality of the business context So you mentioned like budget I didn't never have the luxury of creating an exam the way I was taught to in graduate school I don't think I ever have done that actually so but I'm very good at figuring out how to Reimagine the exam development process in a way that maintains that psychometric integrity while still Meeting the reality of I have a limited budget and I have very aggressive timelines So over the time I've been at Microsoft I've actually reduced the cost of our exams by more than two-thirds and we used to take Nine months to get an exam and market now We're down to three and I still and I'm very convinced that we have a valid and reliable exam and market I've just been very good at figuring out how to Navigate the realities of working within some of the constraints that happen when you're in a business environment Another big difference here is that if it was taking me nine months to get an exam and market My guess is it's not going to be valid and reliable anyway, because I'm using stuff That's cloud and all of my certifications are cloud-based right where things are literally changing every week, maybe every day So how do I think about the development process so that we can make sure that it's valid and reliable Because of these constant changes in the technology that do have an impact on the items that I'm asking Even if it is a rule based exam and we're measuring skills, which skills are probably a little more durable But skills in the context of the technology itself is less durable So we update our certification exams quarterly and we're gonna start I'm good. I don't know if this is gonna work at all. But for our AI exams, we're gonna start trying to update them monthly I have no idea how that's gonna work.

But this is unique to IT certifications in many regards to Because almost all of them are based in cloud technologies and things are moving faster But in other spaces like nursing and medicine things are certainly moving faster with the advances in AI and how those are Affecting job roles, but probably not as fast as it's changing in my world The other things that I have to think about are security around the exam So we talked a little bit about that and we are leveraging like proctors and AI assistants to help us identify and prevent cheating But when you think about how AI is continuing to evolve It's not only introducing opportunities for helping us to better identify Cheaters and bad behaviors, but it's also introducing a lot of challenges So if you think about there's AI powered cheating Tools like generative AI can answer complex questions in real time If you have a multiple choice test and you're not putting any security around it at all I'm at the point right now to say what's the point? Because chat GPT lead there's been lots and lots of research that shows chat GPT can answer multiple choice questions Like with 99% of the time correctly so if you are trying to do something where you're trying to measure somebody's ability to do something and You don't have any security around it and you're not trying to measure their ability to use chat GPT I don't know that an unproctored and insecure Assessment anymore has any real value. It kind of feels like it's just a waste of money We're also seeing deep fake technology. This is scary What I've been seeing is this ability for deep fake technology to impersonate test takers So, of course, we can't really truly identity like identity verification systems are can't tell the difference So some of them can because deep fakes still have some obvious like if you know what you're looking for You can kind of tell but i'm very worried that the technology is going to get so good that we're not going to be able to Tell the difference so we kind of have to stay ahead of those kinds of things Even if you look at even proctoring systems AI can simulate human behaviors to bypass detection.

So One of the things I noticed early on with AI is that I can set up my computer and my camera on my computer So that it always looks like i'm looking at my computer like i'm looking at the camera when in fact i'm not And so that's just AI Correct, like making it look like i'm doing something And so if somebody has that kind of technology going on in the background while they're taking an exam The system can't tell that they're not actually looking at it We have to be super creative around how we're trying to identify when somebody's cheating Using AI the cheaters using AI or we as the test provider And uh the deliverer are trying to figure out how to use AI to catch it and then of course There's fake credentials. I think it's getting super easy for AI to generate something that looks real So it becomes difficult to distinguish what is fake from something that is not so There that one's actually a little bit easier to Manage because we have control of the transcript and today we never tell people to you know, share the certificate Share the downloadable file or whatever, right?

it's if you really want the truth you need to go to the transcript and make sure that It's an authenticated version of the transcript to really know if somebody has earned that credential So those are some of the things that go beyond the validity and reliability question that I have to think about as I do My job to ensure that at the end of the day we still have a valid and reliable assessment of somebody's skills

[Tara Behrend] (18:25 - 18:32)
Speaking of AI some people think that AI can just make tests now. So Do you think that and what are some things to consider?

[Liberty Munson] (18:33 - 23:57)
If we were to say allow AI to make an exam So the short answer is no, I don't think AI can create an exam at least not in a way That's going to meet the standards we expect for validity fairness security and certification So certainly AI can help generate the items. They can create good first drafts, but it What I have noticed in all of the AI items that have been generated is that it misses a lot Of the nuance and the context and maybe even the complexity of what a human can create I do think at some point we'll get there but I I think it's still early in the AI item generation Even though we've been talking about it for probably two years as a good solution for creating items I still think that there's a lot of room for improvement It will work pretty effectively for more basic knowledge-based questions So some of you think about kind of foundational knowledge and math and the stuff maybe in school I think it probably does work pretty well for that But when we're talking about with a Microsoft associate or expert exam This requires kind of deep thinking and deep understanding of the technology And understanding some of those nuances that you can only get if you've done If you if you've actually used the technology it's stuff that you're not going to find in learning content and things like that people I think forget that the AI is what that we're using for this is Generative which means there's got to be content from which it can generate the items So one of the challenges Microsoft has especially when we try to do this with new exams is the content may not exist So how do I generate items in my space? We are trying to do some of this experimentation with AI item generation and we're using it for an architect Level exam, which I have some concerns because I don't think that AI is smart enough to understand some of the nuances of what it means to be a good architect of some of our solutions, but we can't start creating items yet because the We have no content. It's a brand new role For us that's a challenge and it's something to think about as people think about using AI to generate items is remember.

It's generative It cannot create Items if there's nothing from which it can use as a basis to generate At least for the foreseeable future. You're not going to be in a situation where we can just automatically create items of quality Using AI and do it on the fly the human element is going to be critical So there's this concept of humans in the loop and human on the loop So in the loop means that you're the the humans are actively guiding the item development reviewing content ensuring alignment To whatever it is. We're trying to measure on the loop is more of an oversight So I think some combination of those right now It's got to be human in the loop when it comes to item generation I think maybe five seven years from now It might be more of an on the loop situation where you're just kind of checking for bias checking the quality Making judgment calls that simply AI simply can't make So AI is a powerful tool.

We've been very focused on how we use it for AI item generation But I think there's other parts of the exam development process where it could have some interesting implications So the job task analysis i've been doing a lot of experimentation with using that to help us create draft Jtas and taking those drafts and talking to SMEs What's interesting about that is that we haven't done it much with original development because of my whole content problem But we have been trying it for jta refreshes And the thing that I keep seeing over and over again Is that AI keeps telling us to add stuff back in that my SMEs have told me they don't think it's important and shouldn't be in It at some point.

I think we'll get there But I think it's funny right now that it's adding stuff back in that we have intentionally removed I think that there's some interesting possibilities around using AI Examinees originally when I started thinking about AI and psychometrics I was thinking there's probably some characteristics of items that would help us If we had a system like a smart thing like AI could look at all those items and figure knowing the psychometrics of existing items Figure out if there's certain characteristics that led to an item being too easy being too difficult Not differentiating and then using that to help predict what the psychometrics will be for new items I read an article About nine months ago about this concept of AI Examinees and I just love the idea now They didn't have much luck with it But I think this is the direction that we need to think about how AI can help us solve the Problem that we're going to get to a point where we can create a lot of items That we don't know the psychometric properties on so how do we solve that problem?

AI examinees is a very interesting idea to me. So Basically, you're having AI create a population of people who meet your audience profile And then they're essentially taking the exam you get psychometrics based on those AI examinees And then you can do your normal statistical analysis, which it's just like a really neat idea So I think AI could play a lot of roles, but can it develop a test? No It's a powerful tool, but it's not a replacement for human expertise.

It's The way microsoft so if you know microsoft we talk about AI in terms of copilot I mean that's truly what it is. It's a collaborator not a creator And so we don't want to lose a side of that because the minute we do then we're building assessments that look Valid but then are actually not well, I couldn't agree with you more

[Tara Behrend] (23:57 - 24:12)
I think a lot of students will find it heartening that there will still be work for human psychometric experts If any students listening to this episode are interested in this kind of work Do you have any recommendations for things they should read or things they should do or or how they should spend their time?

[Liberty Munson] (24:13 - 25:26)
handbook of test development by Haldania, that's my go-to resource. I still look at that today I don't know actually know when that was published but that was that's where I would start if you're Really want a deep dive. I like to call it the bible of test Because it's like a door stop must be like a thousand pages But it's so detailed and it's it's so comprehensive that if you're interested in psychometrics at all To really understand that I think that's a great place to start The institute for credentialing excellence has a handbook too.

That's probably a little more probably shorter I actually have a chapter in that one on technology and test development. And so that's another good one if there's interest in Learning more about the different phases. That one's going to include kind of certification and licensure as a Program so you're going to get some pieces of the business side of it You're going to get some of the test development side of it and then I the technology which is the Element that I brought into the chapter that I wrote If you compare the two the handbook of test development and the ICE handbook It's a broader Look at this field and the different things that go into thinking about having a certification program or a licensure program

[Richard Landers] (25:26 - 25:54)
One thing I was wondering is the role of what I think a couple of times you describe yourself as like a lone Psychometrician in certain contexts So in my own consulting experience when we deal with psychometrics I often feel like the psychometrics person is the one Coming from the back saying like hey guys, maybe let's slow down Let's do things a little better and it has to really like advocate in that role Curious how what your experience has been with that and how how your values end up getting translated into these very complex teams

[Liberty Munson] (25:54 - 27:50)
So having done this for so long I've learned some lessons So I kind of alluded to a little bit when I first joined microsoft I leaned into legal defensibility and I think that was a bit of a mistake because it did come across as being a hammer As a psychometrician, I have to say no a lot I learned how to say no without saying no what i've gotten really good at is saying, okay So I see what the problem is in my brain I'm like I see what the problem is and i'd like to say no you shouldn't do this because But how can I say it in a way that gets them to move in the direction?

They need to go and still to maintain the integrity of the process without saying the words No, and it is a bit of a balancing act But at the end of the day, I am kind of the cop Making sure that we stay on the road and we're headed in the right direction because if we did All the weird stuff that i've heard my lt say We would be we would not have a great program There are times too when I I try to treat my role as an advisor And so there are times where you have to disagree but commit because that's just the nature of the work When it turns out I was right.

I try really hard not to say I told you so How do we get how do we like come back from whatever it was that we did that got us here and and move forward? But it is I will say there are many psychometricians Who if you're really into the i'm going to say the academic side of thinking about test development where working at microsoft would be? Extremely stressful.

I have just been able to find the right balance to figure out how to make it work So I still find it fun I say if I can't laugh That's some of the stuff i've heard then I probably need to leave and do a new job But I still if I can still find humor and still move forward That's I think the key to being successful in an internal role when you're going to get challenged all the time About what you want to do and why you want to do it

[Richard Landers] (27:50 - 28:35)
Yeah, that's that's really great. You know, I I think a lot of our Students when they graduate end up becoming maybe low in psychometrician, but often at least a lone i-o or of a very small team And I hear a lot of stories about how challenging it could be to find your voice in that kind of environment Because there's so many people with so much greater expertise than you in related areas But you have your own sort of like core and it it plays out in some really It seems to play out in some really interesting ways in this kind of research methods and kind of dimension Where I I don't know. I I'm really interested in your characterization as being a cop So that that seems like such a difficult line to walk It so it can be i'm trying to think back when I first started at microsoft.

[Liberty Munson] (28:35 - 30:41)
I remember Oh, I had a serious case of imposter syndrome because I was you know I did testing at bowing for my first three years there and then I was doing survey for the next three years and then I'm coming into a role That's not even selection testing it's certification testing and I had something I hadn't even done in a couple years So I was like they're gonna totally figure this out then I don't belong here But what I realized is a test is a test is a test And so I had the right foundation and I got really good at saying, you know What I don't know the answer but I know how to find the answer i'm going to go get the answer I'm going to come back to you And that's how I built up my credibility was recognizing when I didn't know knowing that I would get it and then following up like I can't tell you how often when people It's it seems like such a little thing but respond to people if you tell people you're going to do something do it And then you just you just show that you're reliable and credible And over the course of time as I started demonstrating more of my expertise in some of these conversations people would differ to me I believe I am held in a pretty high regard here at microsoft But there are still sometimes when I feel like people really don't want to hear what I have to say But I have to say it anyway Because that's my role is to make sure that we're designing something that's valid and reliable within the constraints that i'm given If they say that they're gonna they're gonna do something different knowing when to fight back and how much to fight back is really it's an it's very much an art and it's something that You're just going to have to learn through experience and it's because it's not something that's going to be the same from one organization to the other I do think i'm able to fight back more now because i'm 18 plus years into my role here at microsoft But at the beginning of my career I think you have to find the right balance and I think you just have to be good at reading the room And knowing when you're pushing too far and recognizing that you know what this is not The time this is not the horse I want to die on Real heck of a better phrase and then figure out what they choose to go down that road what you can do to protect The integrity of what you're doing as much as possible within whatever decision was made

[Tara Behrend] (30:41 - 30:55)
Lastly, we like to ask all of our guests about a paper that they've read in the last year that they really liked or found Thought-provoking so you mentioned one about ai test takers Is there anything else that you've read recently that you wanted to give a shout out to?

[Liberty Munson] (30:55 - 34:00)
So this morning I just read one in the harvard business review. It's called the hidden penalty of using ai at work And I think what was really interesting about this article is that the research that they did is they gave the the participants examples of work that was done and they told them who did it whether it was a Man or a woman and they might have had some ethnicity and things like that But and then they said it was written by ai and it shouldn't surprise anybody at all but when it was a woman who was using ai or a minority that was using ai that the Perception of the quality of the work was much less than if it was a man doing it, right? And so as a result what you're seeing is fewer people The use of ai I think is impacted by that because there's a perception that People who are using it are going to be perceived like they're less competent.

It's not just a training problem So there's been a lot of talk about people not using ai because they don't have the skills They don't have the training. I think it's bigger than that, which is why I liked that article In that it was kind of highlighting some of that bias in the use of ai Being at microsoft. I wonder like to what extent that's true for us when we're being told to use ai every day And i'm strongly suspect that microsoft is actually confirming i'm using ai every day But like if you have that bias and if i'm using ai Is that somehow going to hurt my the perception of my performance because I happen to be a woman?

Versus somebody on my team who's using it who happens to be a man. It's just an interesting question And i'm just super curious If that is something that would happen at microsoft because nothing tells me that it wouldn't Because that just seems like it's human bias And then the other one I read interestingly enough it terrorized that the journal of business and psychology It was the ai powered automatic item generation for psychological tests So it's based on the use of agents and we were just at microsoft starting to think about agents a little bit differently And what I loved about this article was that What microsoft was doing around ai item generation at the time is we were trying to build in all the rules that I have around What makes a good microsoft item into the prompt? Our prompt was really long and it was super complicated and if there was even a hint of a rule that seemed to Contradict another rule it would ignore both of them And so it it was getting really hard to get the prompt to work and create good items But what they did here is they thought about the item writing process in terms of the SME roles And they split it up and they had agents doing each of the roles individually, which is that is just a fascinating idea and when So for me being at microsoft agents are now a thing that I use every day and it's like not so new to me anymore But I have to imagine that anybody who's not in it who's not dealing with ai All the time would find this just an interesting take on how to think about using ai differently to make yourself more efficient Don't try to put everything into one Solution think about the jobs to be done and create an agent for each one of those jobs and make them work together And you're going to get a better outcome.

[Tara Behrend] (34:00 - 34:22)
Yeah, I really like that paper, too Liberty, this has been a wonderful conversation It's so generous of you to share your expertise and your hard-earned wisdom with us And i'm very inspired personally about your outlook for the future I'm glad to hear that that there's still a need for human psychometricians. I totally agree with you and thank you so much for coming You're welcome. Thanks so much for having me.

[Richard Landers] (34:22 - 34:36)
I had fun That's it for another gig to stay in touch subscribe on youtube Check out our website at the gig dot online join our linkedin group sign up for our email notification list and join our discord Thanks for joining us and see you next time for another great iO get together