This article was based on the interview with Bright Data Chief Product Officer Ariel Shulman on why access to real-time web data is critical in the age of autonomous AI by Greg Kihlström, AI and MarTech keynote speaker for The Agile Brand with Greg Kihlström podcast. Listen to the original episode here:
As marketing leaders, we are squarely in the midst of an AI-fueled transformation. The conversation is dominated by the power of large language models, the promise of autonomous agents, and the potential to create hyper-personalized customer experiences at an unprecedented scale. We’re rightfully excited about the applications, the interfaces, and the magic of it all. Yet, in our rush to harness these new capabilities, we often overlook the very foundation upon which they are built: the data. And I’m not talking about the clean, structured, first-party data sitting comfortably in our CDPs. I’m referring to the messy, chaotic, and utterly vital torrent of public web data—the real-time pulse of the global marketplace.
This brings us to the critical question that will separate the winners from the losers in this new era. What happens when the AI models shaping your brand’s perception and driving customer decisions are fed incomplete, outdated, or simply incorrect public data? The consequences are not academic. They directly impact competitive intelligence, dynamic pricing, brand reputation, and ultimately, revenue. The new strategic imperative for marketing leaders is no longer just about *using* AI, but about ensuring that the AI we use has a clear, accurate, and real-time view of the world. It’s about building an infrastructure that can see the digital landscape as it truly is, not just as it’s presented in last quarter’s market research report.
The Shift from Complementary to Critical
For years, public web data was often seen as a secondary or complementary source of insight, useful for occasional competitive analysis or sentiment tracking. The primary focus remained on owned channels and first-party data, where we could control the narrative. The rapid adoption of generative AI has fundamentally and permanently altered this dynamic. Your brand story is no longer solely yours to tell. It is being continuously assembled and retold by AI agents that synthesize information from a vast ecosystem of third-party sources.
Ariel Shulman, Chief Product Officer at Bright Data, explains that this shift has elevated public web data from a tactical tool to a strategic necessity.
“The public web data moved from something that was kind of nice to have or complementary to being strategic… your customers are actually looking at your brand and at your products, not only through your channels… but also through other things. So review sites, social platforms, forums, and obviously generative AI platforms.”
This is a profound change for marketers. The customer journey now includes a powerful, and often opaque, intermediary: the AI agent. A potential customer asking, “What is the best camera for travel bloggers under $2,000?” is no longer just served a list of links to review. They are given a synthesized, authoritative-sounding answer. That answer is constructed from product reviews on e-commerce sites, discussions in Reddit forums, and articles from tech publications. If your brand’s information is absent, misrepresented, or overshadowed by a competitor in those public sources, you effectively lose the sale before you even knew it was happening. Monitoring and understanding this external perception is no longer optional; it’s as critical as monitoring your own website’s uptime.
The Misconception of “Easy”
Given the importance of this data, the immediate temptation for many technically inclined teams is to try and gather it themselves. After all, how hard can it be to write a script to pull down some prices or reviews? This line of thinking, however, gravely underestimates the complexity of the task at an enterprise scale. What works for a one-off query of a few hundred data points completely falls apart when you need reliable, real-time data from millions of pages across different geographies and platforms.
Shulman offers a rather fitting, if slightly nerdy, analogy to describe the challenge.
“One of the analogies I like to make about web data collection is that it’s kind of like quantum mechanics but in reverse… everything works okay until you go above a certain threshold, and then things become crazy. You get blocked, you get captchas, you get missing information. It’s very hard to pull off at scale.”
This is the reality that many internal teams discover too late. Enterprise marketing demands consistency. Whether you’re powering a dynamic pricing engine, monitoring for a brand crisis, or tracking competitive promotions, the data feed must be robust. As soon as you scale up your requests, websites deploy sophisticated blocking mechanisms. You’re suddenly battling a constantly evolving landscape of IP blocks, user-agent fingerprinting, and CAPTCHAs that are designed specifically to stop the kind of automated access you need. This isn’t a marketing challenge; it’s a deep infrastructure and engineering problem that requires a specialized focus. The effort to build and maintain an in-house solution invariably diverts precious resources from core marketing activities and rarely achieves the reliability of a dedicated infrastructure provider.
The New SEO: Mastering Generative Optimization (GO)
For the better part of two decades, marketers have been engaged in the art and science of Search Engine Optimization (SEO). We’ve learned how to structure our content, build backlinks, and optimize keywords to earn a coveted spot on the first page of search results. Now, a new competitive arena is emerging, and it requires a different playbook. Shulman calls this “GO,” or Generative Optimization.
“One of the major use cases that we’re seeing right now is what we call GO, so that’s SEO but for generative models, or LLM visibility. That is understanding how AI systems perceive your brand… it is important when you’re looking at these answers, when people are looking at your brand, how does it, how does it appear? Where do competitors show up? What is your kind of share of answer and things like that.”
Unlike traditional SEO, where success is measured by a list of ten blue links, GO is about influencing a single, synthesized answer. It’s about winning the “share of answer.” This requires a sophisticated understanding of which data sources an LLM prioritizes for a given query and ensuring your brand is not only present but positively and accurately represented in those sources. If a model consistently finds that your competitor’s product is mentioned more frequently in positive contexts within trusted forums or review sites, that competitor will win the AI’s recommendation.
Mastering GO means treating the entire public web as your potential brand canvas. It involves systematically monitoring how your brand and products are perceived across countless domains, identifying gaps in information, and developing content and engagement strategies to improve your visibility within the data sets that AI agents are most likely to consume. This is the new frontier of brand management, and it is entirely dependent on having access to comprehensive, real-time public web data.
The Right to See: Navigating the Legal and Ethical Landscape
For any enterprise leader, the prospect of collecting web data at scale immediately raises important questions about legality and ethics. The space can feel like a gray area, and the fear of legal repercussions can lead to inaction. However, recent landmark legal battles have brought significant clarity to the issue, establishing a firm precedent for the lawful collection of publicly available information.
Shulman references a particularly telling moment from one of their legal victories, which underscores the core principle at stake.
“The key principle is that if the information is available publicly, it can be collected lawfully… there’s a famous quote from the… trial that we had against X, Twitter at the time. Uh, where the judge basically told the lawyer, and I quote, ‘You do not own the internet.’”
This is the crux of the matter. Information that is publicly accessible to any human with a web browser, without requiring a login or acceptance of specific terms of service, is generally considered fair game for collection. For marketing leaders, this provides a clear legal foundation. The responsibility, then, is to ensure that this collection is done ethically and responsibly. This means never attempting to access private or personally identifiable information, respecting robots.txt files, and ensuring that collection activities do not degrade the performance of the target website. Partnering with a platform that has these ethical guardrails built into its core infrastructure is not just a best practice; it’s a requirement for any enterprise operating in good faith.
The conversation around web data has matured. It is no longer a question of if it is permissible, but how to do it responsibly and strategically. The legal frameworks exist, and they affirm the principle that public data should remain public, accessible not just to a few dominant platforms, but to any organization seeking to understand the world and compete within it.
The ground has shifted beneath our feet. The reliance of AI on the public web has transformed data from a supplemental resource into the foundational infrastructure for modern marketing. We’ve seen that accessing this data is a complex engineering challenge, not a simple marketing task. It is the raw material for the new competitive discipline of Generative Optimization, and there is a clear legal and ethical framework for doing it responsibly. The theoretical discussions are over; the time for implementation is now.
The winners in the age of AI won’t be the brands with the most creative prompts or the flashiest chatbot interfaces. They will be the organizations that possess the most comprehensive, accurate, and real-time understanding of their digital ecosystem. Investing in a robust public web data infrastructure is not merely a MarTech expenditure; it’s a core business capability that grants the agility to anticipate market shifts, not just react to them. For every leader, the question is no longer whether AI can help your brand, but rather, can your AI see? Is it operating with a clear view of the world as it is, or is it flying blind?





