#629: Why generative AI transparency is so important with Matt Van Itallie, Sema Software

Do you know who—or what—created the foundation of your software? And can you trust it?

Today, we’re joined by Matt Van Itallie, CEO of Sema Software, a leading code and software analytics platform that has analyzed over $1 trillion worth of software. A lifelong tech expert, Matt has been instrumental in shaping the AI industry with innovations like the GenAI Code Monitor and Generative AI Bill of Materials.

He’s here to discuss transparency in Generative AI for software development and how AI is transforming coding practices while introducing new challenges in accountability, ownership, and quality assurance.

About Matt Van Itallie

As the CEO of Sema, Matt Van Itallie is passionate about problems that are hard and problems that matter. His career has always focused on a desire to serve the common good. At Sema, he leads a cross-functional team of technologists to build solutions that empower engineers to work with GenAI code. He co-founded Sema Technologies Inc., which he currently leads as CEO, to help organizations drive better outcomes through their software.

Sema’s flagship solution, Comprehensive Codebase Scans, has earned a reputation as the leading codebase scanning tool for the technical due diligence process. Having evaluated more than $1T in value for leading private equity firms and enterprise M&A teams, Sema has built one of the most comprehensive datasets to provide actionable suggestions to engineering organizations. With this foundation, Matt is leading the development of the AI Code Monitor, which helps engineering organizations increase developer productivity and address the regulatory risks of generative AI code.

When he’s not building the AI Code Monitor or thinking about code as data and CTO dashboards, Matt enjoys hiking with his wife, cooking with his kids, and throwing dinner parties to bring diverse minds together.

Resources

Sema Software: https://www.semasoftware.com/

Don’t miss Medallia Experience 2025, March 24-26 in Las Vegas: Registration is now available: https://cvent.me/AmO1k0 Use code MEDEXP25 for $200 off registration

Register now for HumanX 2025. This AI-focused event which brings some of the most forward-thinking minds in technology together. Register now with the code “HX25p_tab” for $250 off the regular price.

Connect with Greg on LinkedIn: https://www.linkedin.com/in/gregkihlstrom

Don’t miss a thing: get the latest episodes, sign up for our newsletter and more: https://www.theagilebrand.show

Check out The Agile Brand Guide website with articles, insights, and Martechipedia, the wiki for marketing technology: https://www.agilebrandguide.com

The Agile Brand podcast is brought to you by TEKsystems. Learn more here: https://www.teksystems.com/versionnextnow

The Agile Brand is produced by Missing Link—a Latina-owned strategy-driven, creatively fueled production co-op. From ideation to creation, they craft human connections through intelligent, engaging and informative content. https://www.missinglink.company

Transcript

Greg Kihlstrom:
Do you know who or what created the foundation of your software? And can you trust it? Today, we’re joined by Matt Van Italy, CEO of SEMA Software, a leading code and software analytics platform that’s analyzed over $1 trillion worth of software. A lifelong tech expert, Matt has been instrumental in shaping the AI industry with innovations like the Gen AI code monitor, and Generative AI Bill of Materials. He’s here to discuss transparency and generative AI for software development and how AI is transforming coding practices while introducing new challenges in accountability, ownership, and quality assurance. Welcome to the show, Matt.

Matt Van Italile: Thanks so much for having me, Greg. I am so excited.

Greg Kihlstrom: Yeah, looking forward to talking about this with you. You know, we’ve talked about As you can imagine, Gen AI and AI a lot, but this is a topic that I think needs some more discussion. But before we dive into all that, why don’t we start with you giving a little more background on yourself and what inspired you to found SEMA Software?

Matt Van Italile: Yeah, for sure. So I’m the son of a math teacher and a computer programmer. So if you had told my earlier self, well, you’re going to end up running a company that treats code as data, I’d say, well, it probably makes sense. But I had a very circuitous path to get here. I worked in government reform. I worked in enterprise software companies. And I founded CIMA just based on a a logical problem that I, an itch that I needed to scratch, which was, you know, sitting in software company executive team meetings, the chief revenue officer, and while we’re at it, the chief marketing officer would talk about the state of the sales funnel using Salesforce or, you know, the CRM of their choice. And of course, a CRM like that is an executive dashboard explaining the state of sales and of funnel marketing under certain circumstances and the state of the sales team. thanks to code. And then we’d get to the CTO, amazing CTOs I worked with, and they would explain the state of code and coders by hand. Qualitative perspective. Well, I kind of think it’s this, it’s kind of that. And my logical mind just exploded. Why can’t we use code to create an executive dashboard about code, just like sales and marketing and all these other teams have? And so I love solving logical problems that try to have some meaningful impact. And this one is just so fun because it’s a really hard problem trying to make code understandable given all the different languages and all the different components of what does it make to mean to have good code.

Greg Kihlstrom: Yeah, yeah, definitely. Yeah. So let’s, let’s dive in here then. And we’re going to talk about a few things, but overall, you know, transparency in Gen AI for software development, as I mentioned, you know, I’ve talked a lot about Gen AI for Martek and customer data and other things like that on the show, but less about using it in software development, and yet it’s being used plenty, you know, tools like GitHub’s Copilot and others are transforming that process. For those less familiar, again, we have a lot of marketers listening to the show, maybe a little less familiar on the engineering side, but for those less familiar, can you explain a little bit, you know, what do these tools do? And, you know, what do you see as some of the most significant benefits that they offer to those software engineering teams?

Matt Van Italile: Absolutely. So let’s just describe it generally first. There are LLMs that help give advice and draft code, just like I hope every single one of your listeners is now using one or more LLMs to draft and write human language. This is about computer language instead. And just like, my goodness, certainly from my own experience, how much Gen AI tools can help with human writing, It is incredibly helpful to coders, whether it’s check my work or give me a hundred different proposals and give me the best one, return the best one or auto-complete this or suggest it. It is great for developers, not all situations under all circumstances, but many of them. And as a result, it helps organizations be that much more productive. Again, if you can extrapolate listeners from how much more productive you are with human language genai tools the same is true and maybe even more so in some circumstances from coders using it to code and so it’s the introduction of genai is absolutely one of the biggest productivity boosts of the last 25 years for coders.

Greg Kihlstrom: Yeah, yeah. What are some of the, you know, those benefits aside, you know, what are some of the key challenges that organizations face when integrating some of these AI, you know, Gen AI tools, particularly around accountability and security?

Matt Van Italile: Yeah. So we do say that the biggest risk is not using it enough because it’s so valuable and has such impact. But if we put aside that and think about the prevention of negative risks, there are four in particular. Security risk being introduced. Any code can come with security risk, regardless of whether it’s an LLM or a human writing it. So you’ve got to make sure that it’s sufficiently safe. Intellectual property risk. Under certain circumstances, you don’t own the output of a Gen AI tool, and that includes for code. Third is maintainability and understandability for the future. Imagine if a junior member of the marketing team just produced something based on prompts and just turned it in and didn’t read it. You’d wonder if it was correct. You’d wonder if the person knew what it was about. It would be unmaintainable. That document would be less maintainable. Very much true for code. And then finally, it’s something sort of out there, but it’s already here, is something we call exit risk, which is, you know, SEMA got started in technical due diligence, helping companies evaluate the health and the overall validity, let’s say, of software organizations they’re purchasing. It’s already happening that in the diligence process, folks are looking and saying, well, this code was 90% written by Gen AI. Maybe we don’t need the company anymore. Maybe we can just build it ourselves. So it’s really changing M&A. Now, I don’t want you to be scared. The flip side is you can build M&A or investment-ready products faster because you can go faster. But for all four of these risks, security, intellectual property risk, maintainability, and exit, The solution, or should I jump ahead to a question, is making sure that a human stays in the loop. You’ve got to make sure code coming out of LLMs is read by humans, is reviewed by humans, code reviews, all of the code quality tools matter as much and arguably more, including the human judgment, to make sure that the right code is being introduced.

Greg Kihlstrom: Yeah, I mean, so you you mentioned a good example, as you know, if you have a junior engineer, very early on writing code, you’re gonna want to review it as well. So you know, much like that, what you just what you just were talking about, you definitely want a human to review it. Do you think there’s danger in Like, are people not thinking of that that review process as much because they kind of take for granted? You know, again, I know in the content creation world, it’s like people kind of take for granted that chat GPT will generate good content. But when you review it, it’s there’s weird writing styles or whatever. Again, I know that world more than more than the coding world. Do people have less likelihood from your perspective to feel like they need to review that stuff?

Matt Van Italile: Yeah, it’s a great question. I’m worried about it a little, in part because it comes out looking so polished and so right that sometimes people let their guard down. But to me, that really is overwhelmed by just the craftsmanship nature of coding. Folks are not coding for, I mean, some people are, but most people are not coding for a paycheck. They’re coding because they love coding. And so they want the opportunity to dig in and to make things better rather than figuring out how to most efficiently get stuff done. And that’s, you know, so we are evaluating. We built software to look at how the ingredients list, if you will, of a code base, how much of it is Gen AI and of that, how much was human modified. And the good news is Almost every codebase we’ve seen is well within the appropriate human level. We say blended. Gen-AI code needs to be blended. Almost all codebases are meeting that blended standard, which they may be using Gen-AI a little or a lot, but definitely when they do use it, they’re putting humans against it to make sure it’s appropriate, which is a great sign.

Greg Kihlstrom: And to your point, it’s not that AI generated is bad. It’s just it’s the understanding and transparent documentation of it is important. Can you talk a little bit more? I know you touched on this, but talk a little bit more about why that transparency is so important.

Matt Van Italile: Yeah, you know, there is no, back to my data days, if you don’t have metrics, if you don’t have data, you’re not having the best form of the conversation that you could have. You know, if you, you can guess about whether or not the code is maintainable or understandable or is correct, but if you haven’t measured it, you don’t really know. You don’t really know if the code’s getting reviewed, which is super important. Literally, by definition, if you haven’t reviewed the code and modified it, you won’t get some kinds of intellectual property protection. Now, that’s not true for all companies. Not everyone needs to worry about that yet, but copyright protection, your teams on this call are very familiar, you can’t get copyright protection for stuff that comes out of an LLM, because machines don’t get copyright protection. And so, if you’re an organization who’s seeking copyright protection for your work, Whether it’s art or words or code, you literally have to modify it to make it copyright protectable. Which, by the way, is going to be huge. The U.S. Copyright Office is coming out with new regs on this. So if you’re an organization large enough to be worried about copyright, make sure you and your legal teams and your external counselor paying attention to the copyright, it should be late January, newest Copyright Office statements on not infringing copyright, there’s been a lot of talk about that, but actually receiving copyright when you’re using AI for whatever it is you’re doing, coding or otherwise.

Greg Kihlstrom: So you created something called the Generative AI Bill of Materials. Can you walk us through how it works and why it’s a game changer for software development?

Matt Van Italile: Yeah, sure. So I love using analogies. You think about the ingredients list on packaged goods, packaged foods, excuse me. calories, sugar, how much of it is vitamin D, etc. The Generative AI Bill of Materials is an ingredients list and that breaks your code into three parts. How much was completely generated by Gen-AI and not modified, we call that Gen-AI pure. How much was generated by Gen-AI and then modified, which is much safer, we call that blended Gen-AI code. And then all other code, non-gen AI, originated. The reason we call it the Generative AI Bill of Materials, or GBOM, this has a pretty nice nickname, but the real reason is there already is something called a Software Bill of Materials, which is about how much open source code is in a code base. So already, last decade plus, decades, organizations already know that in high-stakes situations, in diligences, insurance, procurement, coding teams have to show the provenance of their code, how much did their team write versus how much came from the open source community. And all we did, I mean, incredibly hard engineering work, I’m so proud of our team and our research and engineering and product, but what we did is extended that idea. Now you need an ingredients list that’s not just open source versus your team’s creation, but of the human code, how much of it came from, of the code, how much was Gen AI or not, so that The Gbomb, if you will, is really just an ingredients list to help understand, are you in that too little zone and you should use it more, that too much zone and you should manage it better, or blend it at least, or in the just right Goldilocks zone in the middle.

Greg Kihlstrom: Yeah. Yeah. And so using that analogy to the open source software world, probably a lot, a lot of people familiar with that as well. How, I guess, how much overlap is there in, in the way that organizations should think? I mean, you mentioned, you know, from the copyright standpoint and the diligence standpoint, like, does that always hold through? Are there, are there some differences or things like that?

Matt Van Italile: Yeah. I would love everyone to listen to this a year from now, January 2026, and you can see how accurate these predictions are. Right now, we are in the early adoption phases of Gen AI for coding, even though it’s been around for several years. It’s really just starting to take off, and organizations are most interested in understanding usage to increase adoption. you know it on the open source world it would be crazy to say you know what we need to do we need to figure out to explain to developers why they should use open source everyone knows right uh knows that they have to do that it did not have to it’s why would you reinvent the wheel we can go to the open source community and use something and so today the challenge is increasing adoption We predict that exactly identical, nearly identical, let’s call it risk profile will be developed for Gen-AI code. So in open source, you have to worry about security risk through CVEs. That is coming for Gen-AI code. You have to worry about intellectual property risk. That’s gpl licenses for open source here it’s a copyright ability and so forth you definitely have to worry about maintain ability and keeping it up to date true for both. And then exit risk you absolutely this is a little bit arcane but if you are selling a software company to a medium or large size investor acquire you have to show your open source provenance. You have to. The SBOM is a huge part of technical due diligence. Actually, that part’s already come true. So SEMA’s first product does technical due diligence, and we built SBOM. We did SBOM for years, and then some of our major clients asked us to build a GBOM as well. So if you’re getting bought by one of our clients, you will get a GBOM produced for you, and you’ll have to talk about it. But we think all the ways that this is the prediction, that procurement offices, insurers, Investors all the folks who care about open source. Providence the open source ingredients list will increasingly care about jenny i providence jenny i transparency as well. Yeah that was a little bit weasley cuz i didn’t put numbers on it i bet by january fourteen twenty twenty six. At least 10 major Fortune 500 procurement offices will care and insurance companies who at least some insurance companies who ask for an S-bomb will be asking for G-bomb. So there, hold me to it. I’ll put a stake in the ground.

Greg Kihlstrom: We’ll have you back next year. Nice. So, you know, one of the other topics around AI that’s been coming up more often lately is agentic AI. And, you know, I’ve had a few folks on talking about that topic already, but looking at it from this perspective, you know, with the emergence of agentic AI, pursuing kind of, I don’t know if it’s open-ended objectives, but it’s at least multi-step and, you know, by that token, you know, maybe a little harder to trace in some aspects. What are some of the new and, you know, ethical and operational challenges that you foresee with the dawn of agentic AI?

Matt Van Italile: Yeah, I think it just increases the need for transparency. The more that you are putting your organization’s health in the hands of something else, the more you’re going to need to understand what’s going on. You know, SEMA has spent a lot of time thinking about metrics about coders, and bottom line, They largely are used for evil, not for good, because code really is a craft, not a competition. It doesn’t lend itself to metrics like salespeople and my sales hat. How many bookings I have, how many calls I make is really indicative of my performance. In coding, if someone adds more code, Yeah, actually, it’s not certain that that’s a good thing, right? Some of the best coding there is, is removing, right? So metrics about people, we’re really deeply skeptical of. There’s some limited uses where it makes sense. But metrics about individual agents and exactly what they’re doing and exactly what they’re changing, one, I think it just comes with less objections, but it’s also just that much more important. You need to know what all these things are doing on your behalf and how they’re interacting with each other I am still passionate about the need for humans in the loop. I don’t care how good the tools are. If the stakes matter for what the code’s going to do on the other end, you really need something looking at it. I’m sure this is a bastardization, but my personal AI usage journey, I started by asking an LLM, tell me the answer to this question, right? I started with that. Now I’m at, if I’m solving, if I’m working on a new product brief, I have a task for that product brief with 10 different personas who interact with each other and then give me back a different answer. Under both circumstances, the second one is a lot better than the first one, but my goodness, do I need to know and carefully read and review what that final answer is? I just don’t see that not changing. I see that changing. I just think you’re going to have to, the more important the stakes, the more carefully you’re going to have to look at the output and under certain circumstances, looking under the hood about how it was made to make sure it was done in a safe and appropriate way.

Greg Kihlstrom: Yeah. Yeah. I mean, again, though, I wonder if the analogy going back to open source, you know, it’s like the, the big open source projects that are, you know, well documented and well used and, and, you know, at least reasonably secure and all of that. Those don’t, they have processes similar. It’s the human analog to those. Right. So it’s like, again, I think there’s, there’s an analog here for all of us in in this of like, okay, we invented this, and maybe it looks differently. I mean, you know, to go out of the software world, you know, Wikipedia has a process for reviewing, you know, for reviewing content and things like that. So like, there’s analogs here to be able to use it just, you know, to, I think, to what you’re saying this, moves quickly. And it’s if not, if not, but by design, it’s not always as transparent. But again, I think the the reassuring thing is that if there are tools that help us to, you know, to have some of that transparency there, you know, that it seems like there’s, there will be some good precedent, right? So And I guess, you know, in along those lines, you know, talking about accountability and talking about quality control here. So I know you’ve touched on this a bit, but you know, how, how do you look at this, you know, as organizations establishing that accountability for code created by AI?

Matt Van Italile: Well, I do have to share a brief story. Before I did coding work and before I was in enterprise software, I was in school district reform, deeply passionate about trying to make systems work and at the time, systems now, coding systems at the time, helping make school districts, trying to help increase teaching and learning on behalf of low-income kids. I don’t think there’s a harder problem on earth or a more important one. But I came to an organization where my title was Chief Achievement and Accountability Officer And I said, well, that’s a mouthful. I’m going to shorten it. And I shortened it to chief accountability officer. And that was among the worst decisions of my entire life. If you ask me, I personally love accountability. I have to do lists. I have coach. I’m a competitor, back to the craft versus competition. Code is not like that. By the way, teachers are not like that. I am less about accountability for coders using Gen AI tools or otherwise, and more about My goodness, smart, passionate professionals who get to create for a living. How amazing is that? Let’s meet them where they are and explain as professionals, this really matters. This really matters. And here’s why it matters. And here’s why it matters. So we talk in the coding world about functional requirements and non-functional requirements and sort of overexplaining. Functional is what does the product do and non-functional is security, quality, maintainability, et cetera. Most coders love worrying about the non-functional requirements and they just wish they had more time to do it. So if you can explain, you know, it’s the craft. They’re not going to Ikea to buy a mug. They want to build the mug themselves and they want it built the right way. That process, that craftsmanship really matters. And so I always like to start with explaining to coders why this should matter and why they should think about it and then giving them data. There was some magic way to give coders data about their work without having big brother looking over their shoulder. I might be talking, but folks don’t really believe that, that if there’s going to be developer level metrics, there’s going to be top-down metrics spying. And so I would say to organizations, find a way for developers to know to have data without spying on them is really the fine point. And I’d start with getting to full understanding before worrying about accountability.

Greg Kihlstrom: Yeah. Yeah. Well, and so for those that sit outside of the, the engineering and technology parts of the business there, you know, rather than micromanaging and, and mistrust and, you know, all the, the negative stuff, what should they be doing in relation to that? You know, they’re using their own gen AI tools for their own stuff, but what should they be doing in relation to, you know, this part of the business?

Matt Van Italile: Yeah, it depends on where you’re sitting. I’d say CEOs, probably the most important things that CEOs can do is role model the use of Gen AI tools, whatever those things are. I use it on the product side and the communication side and the fun side. It’s amazing for writing limericks, as everybody knows. It is explaining the why behind it, and I think one of the things that is leading to Gen-AI adoption going slower than I think would be better for developers and for the organizations is this identity conversation. It’s a sense of, am I really a coder if I’m so heavily relying on this? And what I say to that is, I do understand, but are you a real coder if you use open source? Of course I am. Why would I reinvent the wheel if something already exists? Well, same thing. Are you a real coder if you use GitHub as a version control system or if you use an IDE? It is a very powerful tool. I don’t want to put that aside, but we’ve developed so many tools that have advanced the work. And this one just feels so different that really supporting colleagues in understanding that to be a professional is to use the best available tools and it’s not a It’s not a dig, it’s actually a plus to go off and see how you can use these tools. I really think that’s the right way to think about it.

Greg Kihlstrom: Yeah, I mean, it seems like in a few years, it’s going to be like using like autocorrect when you type in Microsoft Word, right?

Matt Van Italile: Why would you possibly turn that off? Exactly right.

Greg Kihlstrom: Right. I mean, that’s technically AI, right? You know, grammarly, you know, what are all those tools? It’s like, yeah, why? Why would you turn that off? Yeah, so definitely. What do you I mean, you’re you think about this stuff a lot. What are you excited about on the horizon? You know, what’s what should we be looking out for here?

Matt Van Italile: I am really excited about how the power of AI can be used to understand code, not just to be used to help coders carry out their vision. There’s a lot of drudgery. There’s so much about coding that is great and engineering management that can be really fun, but man, there is a lot of drudgery. I had a dear friend we’re talking about. 25 engineering teams that he was trying to keep track of what they were up to. And by hand, even with a traditional metrics dashboard, it’s either nearly impossible or extremely unpleasant because you’d have to look through 25 things and compare and contrast.

Greg Kihlstrom: Probably both, right? Yeah, probably both, right?

Matt Van Italile: And so instead, using the power of AI to augment humans and understanding what is going on in engineering teams. I am thrilled by that. You’ll be shocked to hear we’re working on it. So if any of your listeners want a taste, feel free to reach out to us. We’ll be happy to show you what we’re working on. But using AI to let professionals do what they do best and take out some of the drudgery work, and in particular, in understanding the engineering and the delivery roadmap, I am over the moon excited about that.

Greg Kihlstrom: Yeah, love it. Well, thanks so much for all your insights here. One last question before we wrap up. I like to ask everybody, what do you do to stay agile in your role and how do you find a way to do it consistently?

Matt Van Italile: I mean, I spend a lot of time with Claude these days. I have Claude prompts that help me figure out the next prompt for the next problem. I have self-improvement prompts. We have a Slack channel to share ideas. I cannot stress, personally, just how transformational this technology has been. Certainly internet-level disruption, in a good way, of my workflows. I’m only scratching the surface of it. I do continue to make time for my family, but on the margin, my goodness, there’s a lot to do with LLMs to learn and try to become a better learner and a better professional.

Image