#845: Bright Data Chief Product Officer Ariel Shulman on why access to real-time web data is critical in the age of autonomous AI


The Agile Brand with Greg Kihlström® | Listen on: Apple | Spotify | YouTube 

Help others find the show by leaving us a review


We talk endlessly about the power of AI models, but what happens when the public web data they rely on is incomplete, outdated, or just plain wrong?

Agility requires not just reacting to market changes, but anticipating them by having a clear, real-time view of the world. This means having the infrastructure to see the digital landscape as it truly is, not just how it’s presented in curated reports.

Today, we’re going to talk about the foundational element that will determine the winners and losers in the age of AI: access to high-quality, real-time web data. As AI agents become more autonomous and LLMs more integrated into our workflows, the quality of the public web data they consume is no longer an academic concern—it’s a critical business imperative that impacts everything from competitive intelligence to dynamic pricing and customer experience.

To help me discuss this topic, I’d like to welcome Ariel Shulman, Chief Product Officer at Bright Data.

About Ariel Shulman

Ariel Shulman is Chief Product Officer at Bright Data, the world’s #1 web data infrastructure company for AI & BI. With more than 20,000 global customers, including 14 of the top 20 LLM labs, Bright Data powers 100 million daily AI agent interactions. Ariel is an accomplished executive with extensive experience in technology management, business development, marketing, and strategy. Since joining Bright Data in 2021, Ariel has leveraged his networking, security, and Internet expertise to drive innovation to access high-quality public web data solutions at scale. He now serves as CPO, responsible for Bright Data’s AI-integrated product suite that leads innovation.
Fluent in English, French, Spanish, and Hebrew, with professional experience across multiple countries, Ariel plays a pivotal role in shaping Bright Data’s global positioning.

Ariel Shulman on LinkedIn: https://www.linkedin.com/in/arielshu/

Resources

Bright Data: https://www.brightdata.com

The Agile Brand podcast is brought to you by TEKsystems. Learn more here: https://aglbrnd.co/r/2868abd8085a9703

Drive your customers to new horizons at the premier retail event of the year for Retail and Brand marketers. Learn more at CRMC 2026, June 1-3. https://aglbrnd.co/r/d15ec37a537c0d74
We’re proud to be a media partner for #MAICON26 – Oct. 13-15! Learn how AI can power your marketing and business and help you grow smarter. Use code AGILE150 to save! https://aglbrnd.co/r/7fe458ced0f04658

Enjoyed the show? Tell us more at and give us a rating so others can find the show at: https://aglbrnd.co/r/faaed112fc9887f3

Connect with Greg on LinkedIn: https://www.linkedin.com/in/gregkihlstrom

Don’t miss a thing: get the latest episodes, sign up for our newsletter and more: https://aglbrnd.co/r/35ded3ccfb6716ba

Check out The Agile Brand Guide website with articles, insights, and

Martechipedia, the wiki for marketing technology: https://www.agilebrandguide.com

The Agile Brand is produced by Missing Link—a Latina-owned strategy-driven, creatively fueled production co-op. From ideation to creation, they craft human connections through intelligent, engaging and informative content. https://www.missinglink.company

Transcript

[Greg Kihlstrom] We talk a lot about the power of AI models, but what happens when the public web data they rely on is incomplete, outdated, or just plain wrong? Agility requires not just reacting to market changes, but anticipating them by having a clear, real-time view of the world. This means having the infrastructure to see the digital landscape as it truly is, not just how it’s presented in curated reports. Today, we’re going to talk about the foundational element that will determine the winners and losers in the age of AI. Access to high quality, real-time web data. As AI agents become more autonomous and LLMs more integrated into our workflows, the quality of the public web data they consume is no longer an academic concern. It’s a critical business imperative that impacts everything from competitive intelligence to dynamic pricing and customer experience.

To help me discuss this topic, I’d like to welcome Ariel Shulman, Chief Product Officer at Bright Data. Ariel, welcome to the show.

[Ariel Shulman] Thank you, Greg. Thanks for having me. 

[Greg Kihlstrom] Yeah, looking forward to talking. Definitely, definitely a timely topic here to to be diving into. Before we do, though, why don’t you give a little background on yourself and your role at Bright Data?

[Ariel Shulman] Sure. So, as you said, I’m Bright Data’s Chief Product Officer. I’ve been with the company for about 11 years across several different roles. my core, I’m an engineer. I studied industrial engineering, followed by an MBA. So I’m, well, you know, I like, multi-disciplinary issues, talking to developers, talking to designers, talking to customers, and building things at scale. I’m married. I have two grown kids who are now also attending engineering school, continuing the tradition. Nice. and, in Bright Data, I’ve worked on me really many many different parts of our infrastructure and our product, back from when we actually invented the concept of proxies and residential proxies. I was the, the guy who built our residential proxy network, which gives us a really strong, you know, position to access web data from. And today I lead a team of about 10 engineers. We move really fast with quick iterations and tight feedback loops, and, you know, things are moving very quickly, especially we’re in the web, we’re in the web space where nothing, everything is online and can be changed instantly. This is not hardware business. So we need to be on our toes and there we are. 

[Greg Kihlstrom] . And and and maybe just a little more detail on, you know, can you explain what what does Bright Data do and what’s the core problem you solve? Who’s your, who are your customers and and everything like that. 

[Ariel Shulman] Yeah. So, Bright Data is really the biggest company doing what we call public web data infrastructure. And what this means is that we help companies access and use publicly available web data at scale. Publicly available web data mean things like means things like, prices, reviews, and things like that. And this is information that’s available on the web for humans, right? And what we do is we actually make this accessible to machines. And, this is hard because the web keeps changing, websites keep changing, and it’s very hard to access at scale programmatically. and we do that, and we’ve been doing that for the last 10 years. Today, we’re something like 500 people. We have an annual annual revenue rate of over $300 million. We have a huge proxy network and unblocking infrastructure. we process more than 50 billion pages per day, which is, by the way, way more than what Google processes. 

[Greg Kihlstrom] Wow, wow. Wow. So, yeah, let’s, let’s, let’s dive in here, and I want to start at the strategic level here. Many marketers are used to working with first-party data or syndicated marketing, or market research. How has the role and strategic importance of public web data evolved for brands, especially, you know, in the last 12 to 18 months with the explosion of generative AI? 

[Ariel Shulman] Yeah, I think that, in the last 12 months, especially, once we saw the adoption of AI chatbots really take off, the public web data moved from something that was kind of nice to have or complementary to being strategic. And the reason is that your customers are actually looking at your brand and at your products, not only through your channels, for example, through your website, but also through other things. So review sites, social platforms, forums, and obviously generative AI platforms. And if you don’t monitor that closely, you you risk making decisions based just on information you have, ignoring, you know, a wider base, and that can you leave you with blind spots. and generative AI has really accelerated how this is has how this is happening because people still search, but they ask kind of open questions, like, what is the best so and so? Yeah. And, you know, how does this brand compare to this brand? and then it is the AI agent that actually makes the decisions, or at least the recommendations on behalf of the customer. And it feeds on public web data. So how you rank in Google, for example, is important, but that’s not the only source for these chatbots. So it is important to have, you know, a wide kind of web presence, and that’s why web data is important, and it’s important to know, you know, how you stack up against competition, and how to appear, and how to be smart about it. 

[Greg Kihlstrom] . And I and I think this is getting a lot more notice and and attention, surely, but what what do you think are still the biggest blind spots or maybe misconceptions that marketing leaders, execs still have when they think about using public web data? You know, is it primarily technical? Is it strategic, ethical? All of the above? 

[Ariel Shulman] It’s kind of all of the above, but I think one of the biggest misconceptions is that it’s easy, right? Because you can go on ChatGPT and let’s say you work for some e-commerce brand, and you say, you know, write me a scraper to that will collect the prices and reviews for my competitor. And that looks okay, but it doesn’t really hold at scale, because things change constantly, and obviously, most customers, this is not their kind of core competency to write scrapers or to do things, at scale. in fact, one of the analogies I like to make about web web data collection is that it’s kind of like quantum mechanics but in reverse, because if you think about quantum mechanics, when you go really small, the laws of physics change, right? Things go crazy. And with web scraping or web data collection, that’s exactly the case, but it’s in the opposite direction. So, you can everything works okay until you go above a certain threshold, and then things become crazy. You get blocked, you get captchas, you get missing information. It’s very hard to pull off at scale. And for enterprise customers, the scale is what’s is what’s interesting. So they tend to underestimate the the complexity, and they also tend to underestimate the value of what’s of what’s in there. Okay? Because if you look at things like reviews that can help you with sentiment analysis, you just launched a new product, you know, where you want to know how it’s doing. you have some sort of a media crisis and you want to see how how the Internet is reacting. 

All of those things are on the web and are not, available, in, you know, in what you have now. So it can get very complicated, when analyzing it. Lastly, there’s the ethical considerations. So, web data collection is perfectly acceptable, and we can talk about maybe, maybe that about that in a soon, because the you have, you know, that we have been sued by some big players, and that’s a really interesting story, but it doesn’t mean that you can collect web data as, you know, as you see fit. There are ways of doing this. There are some things you need to respect. You never need to log in. If you log in, it ceases being private a public web data. You need to do this while respecting, private information. You need to do this while safeguarding the website’s performance. So many customers, especially the, the large ones, are concerned about these things, and those are some of the things that inside the Bright Data platform are very robust and addressed. 

[Greg Kihlstrom] . Well, and and so, building on the certainly generative AI has, accelerated the the the some of the use cases as well as the usage. Agentic AI, that probably takes that yet yet another step further and and, you know, your your company recently launched a suite of products aimed at powering AI agents. So, what are some of the the most compelling use cases that you’ve seen for AI agents to reliably access some of this live web data? 

[Ariel Shulman] Yeah. So, one of the major use cases that we’re seeing right now is what we call GO, so that’s SEO but for generative models, or LLM visibility. That is understanding how AI systems perceive your brand. In fact, I actually listened to your podcast, and I saw that you had Brandlight a couple of weeks ago. Yeah. And that’s that’s one of the things that they do, right? They help brands do that. So we actually provide the infrastructure to do this at scale. Okay? We we’re not, strong into analyzing what the answers are like, but we provide uninterrupted access to these to these, to this kind of information sources. So it is important when you’re looking at these answers, when people are looking at your brand, how does it, how does it appear? Where when do computer competitors show up? What is your kind of share of answer and things like that. So that’s one major use case, which is the LLM visibility, or how do you appear inside chatbots as opposed to, let’s say, your standard Google SEO. a second one is competitive pricing, and, everything that’s e-commerce related. 

So if if you’ve bought a product or booked a flight or did anything along those lines, most likely your traffic or some of the some of the related traffic went through Bright Data because e-commerce relies on this kind of information. And AI agents, if you ask an open question, such as, you know, what is the best, you know, the best the best guitar for beginners or something. It will go out and read forums and do all sorts of things, but it will also try to find you some relative relative information regarding prices and things like that. If you are a retailer, it’s a very important for you to know this information. and thirdly, also it’s related this is related to e-commerce and to retailers is review and sentiment operations. So you want to know what how people are talking about your product. and AI agents will actually give that a relatively high score because it’s content that’s human-driven, it’s it’s usually in the forms of of questions and answers and things that AI agents kind of like. So, it’s important for you to understand what’s going on, because AI agents will then use that in compiling their response. 

[Greg Kihlstrom] Yeah. So, for someone, you know, marketing leader out there that’s listening to this and and maybe has a a kernel of an idea of, okay, there’s there’s some potential here. What’s a, you know, what what’s a first step to start, you know, realizing the potential here or, you know, even integrating this kind of intelligence into their existing MarTech stack and and things without causing a lot of disruption? 

[Ariel Shulman]  I think there was as you know, it’s it’s not surprising that you need to focus on something. So probably the important thing is to choose one important high-value decision that you want to, to make, based on this kind of web data. for example, what should pricing be for this new product that I’m about to launch, or what do people think about my recently released product, or anything along those lines. Then, you need to look at the scope. So which domains or what are the sources that you will be looking at? Okay? How fresh do you want the data to be? It can be real-time, or it can be like a month old, it depends. Yeah. And the volumes and the geographies also, because this can be very, especially for large multi, you know, multinationals, it can be very different in different countries based on languages and cultural preferences and things like that. So, the the data that you collect needs to match kind of the try to provide the answer to the question that you’re trying to to solve. You should always start light with kind of a pilot. It’s very tempting to to download terabytes of data, and it’s possible, but it can actually be counterproductive because you get you can kind of get overwhelmed. Start with a few gigabytes or megabytes. Put put them into maybe even something like Google Sheets. See what the data feels like. And once you have a feel for it, then you can go at scale, and then then the information becomes very, very statistically robust. So the idea is to go fast, to get some kind of proof of value, and then once you, once you’re ready to go, go full blast. 

[Greg Kihlstrom]  So, I want to get back to something you you briefly touched on earlier, and and that’s the the governance, the the ethics and and and and and those things. So, you know, accessing web data at scale certainly does bring up questions of ethics. You you addressed that briefly, and as well as legal challenges. And, you know, as you mentioned, Bright Data’s famously faced and and won legal battles with platforms like Meta and X. What’s the key principle that marketers need to understand about the right to access public data, and, you know, how should they build a governance framework to do this responsibly? 

[Ariel Shulman]  That was that was an interesting time, absolutely. So, you know, technology in general moves very fast and definitely much faster than the law. Yeah. and this is this is no exception. So you’re right, we were indeed sued by Meta and by X, and we won in both, you know, both cases in federal court court in California. These are actually very important precedents, okay, that have shown the world that public web data is something that is that that is free to collect under certain conditions, which I’ll explain. And it’s interesting to see that this is actually are our rulings are actually used as precedents in a you know, big, trials that are taking place right now with some other big companies. by the way, just as a interesting tidbit of information, Meta, while suing us, was actually a customer of Bright Data. Oh, wow. So they were, they were scraping e-commerce sites, you know, for for for data to by trying to compare for the marketplace Facebook marketplace. So, the key principle is that if the information is available publicly, it can be collected lawfully. And, you know, there’s a famous quote from the, the, the trial that we had against X, okay, Twitter at the time. where the judge basically told the lawyer, and I quote, you do not own the internet. Okay? He said that. You don’t own the internet. and, what else? You you are trying to appropriate part of something that isn’t public and open to everyone and say it’s only open to you and your customers. Information can be accessed by everyone. 

So, if you do not log in, you have not accepted the site’s terms and conditions, you have not entered into an agreement with them, it’s okay. We also you also need to do that in a way that will not have any kind of negative implications on the site’s performance, because when you scrape or when you collect data, if you do that irresponsibly, you can actually damage the site or their response time. So one of the things that we do in Bright Data is that we actually measure the response time of the website as we collect the data, and if we see that for some reason, it starts to slow down, for any reason whatsoever, we slow down, or we maybe we even stop, because we never want to be something that’s going to be in the way. Yeah. So going back to your question, customers, this is a very deep tech. Okay? I’m I’m a product guy, so I love this. It’s a very very deep tech. 

Companies are very unlikely to be able to build this kind of thing. As part of the platform, we offer all of these things, so make sure that we never log in, make sure that website performance is protected, that we don’t know no personal information is collected. All of those things are part of the system. So, I think governance is super important. For us, it’s important. We have a special page on our website, which is called the Trust Center, that explains exactly how all of these things are done. and it’s it’s kind of part of the package. And this is typically important for large customers. Large Fortune 500 customers, sometimes sometimes even SEC regulated companies. They’re very concerned about these things, and we’re very pleased with the outcome of these, of these trials, because it actually proved what our CEO always says that, you know, public data should remain public. 

[Greg Kihlstrom] , love it. So, let’s talk a little bit about measuring success then. So, you know, moving past the we’ve implemented something, you know, how should how should customers measure ROI of investing in web data infrastructure like this? And, you know, are we talking about I guess, you know, what what are some what are some ways that that ROI is measured here? 

[Ariel Shulman] Okay, well, this depends really on how you use the data. You know, we have customers who are actually using Bright Data as an integral part of their technology stack. And if we are we go down, which fortunately it doesn’t happen, they go down as well. So it’s not in not even a matter of ROI, it’s a matter of like life or death for the business. If you run, say, a price comparison site, and you want to have real-time information regarding prices on different sites. A query from a customer will will result in a query through Bright Data, which will go to websites and send the information back. So, ROI can be as extreme as, you know, kind of life or death, or it can be I would say softer. So stuff that that has to do with, for example, the quality of the data, or the stain the staleness or the freshness, you would say, of the data, or how how, you know, a kind of structured and predictable it is. 

For many customers, this is very important, because it has direct implication, for example, on their pricing models. people take web signals from all over to decide on pricing and promotion based on availability of a product in competitors sites, or competitors even local brick and mortar stores. This is a very sophisticated system. So in some cases, you know, we have clearly demonstrated that the that the web data that we provided has in some cases doubled revenue for certain products during certain time windows, because some discounts were removed as a result of the retailer understanding that they are the ones with the stock, they can actually take they’re in a good position. They don’t have to undercut competition. Yeah. yeah, so that’s that’s about it. If you’re looking at marketing, then we’re trying to provide information regarding the campaign efficiency and conversions and, and things like that. that’s more on the marketing side. 

[Greg Kihlstrom] . And so, you know, looking ahead a bit for those that, maybe they don’t have that idea yet or or they’re they’re contemplating some things. You know, what what’s an important action that a marketing leader should take to, you know, make sure their brand is prepared for this this future you’ve described where, you know, autonomous AI agents are a key part of the digital landscape. 

[Ariel Shulman]  Well, I think it’s important to start experimenting. the cost of entry, and I’m just I’m not not even talking about the like like the dollar cost. The effort cost, even the mental cost of trying these things is not very high. at least at Bright Data, we have what we call a PLG LED, so product-led kind of system. You can sign up, you can try things for yourself, and you want to see even on a small scale the value of, of public web data. It’s fascinating for brands to just send a couple of queries anonymously to chatbots in different countries, in different languages, for example, and see how they are perceived. very eye-opening experience. And I think that this is becoming more and more important because these, you know, these things are actually impacting consumer decisions, right? You’re asking an open question and you’re getting a recommendation. Whereas before, you would get in the traditional SEO world, you would get a bunch of links, and as a human, you would be tasked with going to those links and making some decisions. Right now, you have an agent that processes web information and gives you an answer. So you want to know what the web information that AI agents are looking at looks like, and how you can potentially manipulate that web information so that you would be better off when that AI agent makes its recommendation. 

[Greg Kihlstrom] , makes sense. Well, Ariel, thanks so much for joining today. I got a couple of questions for you as we wrap up here. first one, if we were having this interview one year from today, what is one thing that we would definitely be talking about? 

[Ariel Shulman] Okay. So, I can’t give you the name of a customer, but I can tell you a few weeks ago, I went to visit a customer, a robotics company, and I saw with my own eyes one of those home robots. Okay? And I think that’s really the next thing. That’s kind of, we’re going to see AI moving from the virtual world into the physical world. We’re going to see robots starting to appear, impacting the economy, and impacting people’s lives. And today, we serve something like 14 or 15 of the top LLMs in the world, and a lot of the information that they are asking of us is related to the physical world. That would be video or images or spoken language, for understanding people, and maybe even responding back. So, I think we’re going to see robots, because we know that this information is being used to train robots, what could now appear science fiction, next year, might, you know, seem pretty normal. 

[Greg Kihlstrom] , wow. And, last question for you here, what do you do to stay agile in your role and how do you find a way to do it consistently? 

[Ariel Shulman] So, I try to, as we say here in Bright Data, stay in the trenches. So, I look at even though I’m a Chief Product Officer and this is a big company, I look at support tickets. I look at what breaks in production. I talk to customers. web changes in literally real-time. Okay? Things can change immediately. People websites change overnight, things, you know, blocking mechanisms change overnight, all sorts of things change. So, it’s very important to stay agile, in understanding what’s going on, in shipping. So, when we ship products here, we ship in really small measurable steps. We do in Bright Data 60, 60 product releases per day. so we iterate extremely, extremely fast. And personally, one of the things I like is to try new tools constantly. So, whenever I read that about some new startup, some Y Combinator or something like that, I will go and I will I will pay for the first month, and I’m going to try it out. The cost is meaningless. The cost of missing out is much higher. Okay? So I will try this, and even, you know, as a as a product fanatic, it’s always interesting to see new products and new onboarding onboarding mechanisms. 

So, just try new things, constantly. Thanks, Greg. Thank you for the all of these, thoughtful questions. You know, public web data in general is kind of a misunderstood space. A lot of people have questions about it, is it legal, is it okay? And I think that we have demonstrated that it is. the mission that we’ve had in Bright Data has always been the same, to allow kind of uninterrupted public access to this public web data to everyone. It used to be purely kind of old-fashioned, so to speak, web scraping. Now, it’s more AI-related, but at the core, it’s it’s the same thing. So, we take this human information, make it available to machines at scale, and, these are going to be a few interesting years in the AI and the robots. 


The Agile Brand Guide®
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.