TechTok #13. Does AI use your data for training?

AI today has seemingly found its way into every single aspect of life, its applications ranging from obvious areas like coding or image processing to less apparent ones like disease diagnostics and legal work. AI is absolutely everywhere. And even if you know very little about how it works, you have probably at least heard that every AI requires troves of data to learn from before it can be put to use.

This data has to come from somewhere, and this gets us to the first question of today’s TechTok:

Are apps and websites using my data to train AI without me knowing?

There is no short definitive answer to this question. The best we can come up with is: “Yes, they do, but not necessarily in the way you might think.” We are aware that you probably didn’t come here to get a broad answer like that. But before we dive any deeper, let’s get one thing clear: “training AI” and “collecting data” are not synonyms, although they are related. To put it simply, to train AI you need data, so finding ways to obtain that data is one of the biggest challenges when you’re building an AI system. However, there are countless other reasons why someone might want to get their hands on your information.

The thing is, the concept of online data collection has existed for decades, long before AI even appeared on the digital horizon, and the main driving force behind gathering user data for many years has been advertising. Insanely complex systems have been built to create user profiles and to track users across various apps and websites, all with the goal of knowing exactly which ad to show to which person at what time and to increase the probability that that person clicks the banner. The digital ad market is estimated at about USD 600 billion to USD 700 billion per year, and at the foundation of this market is user data — this should give you an idea about why data is so often called the new oil.

Of course, there were other reasons why companies would seek digital data: personalization, recommendations, fraud detection, billing, retention, product analytics — often important in sectors like finance, retail, telecom, and marketplaces. The exact reasons are beyond the point. What we want to highlight here is that global and rampant data collection was not spawned by the emergence and subsequent spread of AI. In fact, in many cases, the collection methods used today to gather data for AI training are the same that have been used for years for other purposes, so AI companies didn’t have to reinvent the wheel, or at least they had a very solid foundation to stand on.

The types of data required for ad tracking and AI training overlap heavily too — which might come as a surprise for some. In the minds of many people, the terms ‘AI’ and ‘LLM’ (large language model) are synonyms. Indeed, chatbots (which are basically user-facing shells with an LLM underneath) are perhaps the most commonly interacted with type of AI for an average user. Common sense dictates that training a generative AI used in a chatbot would require datasets that include tons of user-generated text — such as posts and comments on online platforms like Reddit or X, chat inputs, reviews, etc. This is correct, as these LLMs need to learn how people actually talk, how to answer questions, how real-life conversations flow; things like humor, slang, tone. But what many people do not realize is how many different types of AI other than generative there are, built for so many different purposes — recommendation systems, search ranking, ad targeting, just to name a few. For these AI systems, behavioral data is king, while content itself matters much less. And many modern platforms combine both approaches: they need raw content, but also they want to know what you click and when.

So, circling back to the initial question: yes, some AI companies take advantage of your data to train their systems, but they largely do that the same way they (and other companies) have been collecting your data before AI for other purposes. And here comes the tricky part — technically, most companies do not collect data behind your back, to train AI or otherwise — doing so is illegal in many jurisdictions. Some go as far as making a public announcement about their incentive to use your data for AI training, although some sugarcoat it more than others. At the same time, it’s a fairly common practice to hide the ongoing data collection behind lengthy privacy policies, tedious terms of service, and other long and boring legal documents. Those with a darker sense of humor might even find it funny that privacy policies that cover data collection for AI training often use the same vague language and broad wording as you would find in similar documents about gathering information for ad tracking purposes.

But even if you do your due diligence and power through all the legalese to confirm that the app you want to install doesn’t use your data to feed the proverbial machine, the sad reality is, you are still not in the clear. Sometimes the developers ‘forget’ to mention it, as was in the very recent case where OkCupid, a popular dating app, shared 3 million user photos with an AI company to train on — all without telling its users about it. This is nothing new; the same shady practices have existed forever even before AI. Unfortunately, where there is profit to be gained, there always will be those willing to turn a blind eye to the law to their advantage.

How does your data end up training AI?

Let’s now take a step back. We’ve touched a little on the topic of which data is being used to train AI and mentioned that anything goes: both raw content, like texts and photos, and behavioral data, like clicks and other interactions. But many readers would probably like us to be more specific and wonder: “What exactly of my data could end up being used for AI, and how?” Well, not all data is used in the same ways. Some data may be more sensitive, and data from different sources may feed AI differently. If your goal is to train AI, there are countless potential sources to get the training data from. For the purposes of this article we will identify four categories, depending on the way the data is collected:

  • Social media (publicly available data)
  • Chatbot conversations (direct input)
  • Platform interactions (behavioral data)
  • Third-party apps and websites

First off, if you post or comment something publicly — on Reddit, YouTube, X, Facebook, etc. — that does not automatically mean anyone can use it for AI training, but you also usually lack any real means to ban the platform from training AI on your content or sharing your data with third parties. Of course, everything varies greatly from platform to platform, but the rule of thumb remains: if it’s public, you probably don’t control it. The platforms that don’t make use of users’ data themselves often sell or share it to others, in some form or fashion. Users from the EU are generally protected better than others, thanks to the EU’s advanced privacy legislation. Regulations like GDPR and the EU AI act give EU citizens the rights to be informed, object to certain processing, request access or deletion of their data in some cases, and challenge or restrict the use of their personal data for AI training.

But what if you talk to a chatbot directly, what are the chances that your input will be used for AI training? Depends on the service, of course, but more often than not with consumer-facing AI tools anything that you type in or upload may be used for improving that service. Even if you are on a paid plan, unless it’s a corporate/enterprise (not an individual) plan, your data is still mostly treated as fair game. It needs to be mentioned that many AI chatbots at least offer an opt-out feature for users, even if in many cases they are buried somewhere deep in the settings. We imagine that for many readers of this article this is one of the key questions: “How do I opt out of data collection when talking to my chatbot?" It seems important to provide some practical advice here rather than settle for some general words. There are hundreds and even thousands of chatbots, so let’s focus on some of the most common ones (we assume personal use everywhere, and not enterprise or analogs):

  1. ChatGPT. Open ChatGPT, go to your profile, then Settings → Data Controls, and turn off “Improve the model for everyone.” OpenAI says this stops your chats from being used to train ChatGPT going forward, though some retention may still apply. OpenAI also used to grant opt-out status upon receiving a message to support. If you did that at some point in the past, OpenAI claims to still honor that request, but this path is no more available to newer users.

  2. Perplexity. Open Account settings → Preferences and switch off “AI data retention.” Note that this opt-out will only affect future data, anything collected before the opt-out date may be used by Perplexity for AI training and cannot be deleted or removed.

  3. Gemini. In your Google account, go to Data & privacy and find “Gemini Apps Activity,” then select “Turn off” or “Turn off and delete activity.” This will only prevent future sampling and will not affect any past interactions. Mind that with multiple Google products that use Gemini, the exact training/privacy behavior will depend on the product.

  4. Claude. Claude doesn’t train its models on your conversations by default, only giving you an option to opt in manually if you’d like to. If you delete a conversation, Anthropic removes it from their systems within approximately 30 days.

As for behavioral data collection, a simple (but mostly accurate) way of thinking about it is: the larger the platform, the more they rely on your behavioral data; smaller narrowly functional apps and services rarely engage in tracking your behavior. Big content platforms like YouTube, TikTok, or Netflix, search engines, e-commerce platforms like Amazon or eBay — these are the ones that you can be sure about. They will collect as much data about your activities as they can to hone their recommendation and ranking algorithms. It doesn’t mean that smaller apps don’t do that at all, but for them this kind of tracking is much less relevant.

But what about the ‘regular,’ smaller apps and websites that we use every day? Not everything is a chatbot or a huge platform, what if you just install a random app or a game, or visit a smaller website? Again, it is impossible to give a single answer for all of them, as there are literally millions. But, in general, such smaller apps and websites are not interested in your data to train any AIs of their own, and also they rarely directly sell users’ data to someone else who might be. However, it is beyond common for the developers of such apps and websites to include analytics SDKs, ad networks, and other tracking tools for monetization purposes. These tools can, and very much do, collect stuff like behavioral data, device info, usage patterns, and so on. And when this data gets to ad networks, data brokers, and analytics firms, it gets aggregated and can easily be used for modelling, sold, or can otherwise indirectly contribute to AI training (among many other things, of course).

When you look at all these ways in which your data can end up in some AI’s training dataset, you might think: “That’s a lot to worry about!” That is true, somewhat, but also keep in mind that not every single bit of information that you provide gets used, and not all companies behave the same way. And, last but not least, there are ways to minimize the amount of data collected about you. Which brings us to the second question of today’s TechTok:

Can using an ad blocker and/or a VPN stop AI tracking and data collection?

As you just saw, AI tracking takes so many different forms that it is impossible to give a “yes or no” answer to this question. Both an ad blocker and a VPN can help, each in its own way, but not against everything.

First of all, neither of them will help if you actively provide data: talk to a chatbot, post on social media, leave comments. Ad blockers and VPNs can’t magically prohibit the platform from using something you have already given them, directly or indirectly. Against that type of data collection, your best bet is privacy settings, opt-out toggles, and laws aimed at protecting privacy. Check out privacy policies and available privacy settings of the platforms and apps you engage with, and if you don’t like what you see, consider picking a different option.

What ad blockers can help with is third-party trackers that collect data about you for future use and, to some extent, behavioral tracking. Stopping third-party analytics is, without question, the strongest suit of ad blockers when it comes to preventing your data from leaking. Ad blockers like AdGuard can deal with most, if not nearly all, third-party trackers on websites. Inside apps, things might get trickier, but this is true in general — Android and iOS have rather strict limitations when it comes to interfering with the traffic of other apps.

Ad blockers can also help stop the collection of behavioral data, but not entirely. Unfortunately, most major platforms rely heavily on first-party tracking and don’t need third parties to build recommendations, train models, and analyze behavior. Often, blocking first-party tracking, especially on large platforms, interferes with the useful functionality — imagine that you block first-party tracking on YouTube and the videos suddenly stop loading. And yet again, these problems are more pronounced in mobile apps than on websites.

Still, an ad blocker is one of the best resources available to you if your goal is to starve the AI training algorithms. But what about VPNs?

VPNs are great — some may even say essential — for privacy protection. But when it comes specifically to stopping your data from being used for AI training, their use is limited. Still, they can be helpful, but not in a direct way. VPNs hide your IP and mask your location, making it harder for websites and third-party trackers to link your activity across different sites or build a profile based on your network identity. However, a VPN does not stop the platforms you use from seeing what you do on them. If you are logged into an account, or even just interacting with a website or app, your clicks, searches, and inputs are still recorded directly by that service. A VPN also will not stop third-party trackers from gathering information about you — leave that job to ad blockers (although a VPN may make tracking less precise).

Let’s recap: ad blockers and VPNs are great tools in your privacy protection arsenal, and they certainly will not hurt if you seek to protect your data from becoming AI training fodder — especially ad blockers. But in the end, your data’s safety is first and foremost dependent on your own attentiveness and diligence. If you study the privacy policies before using apps and services, if you are mindful about what you post online and what information you share with a chatbot — the chances of your personal details becoming a part of some future AI’s learning dataset can go down significantly. It’s good to have strong tools on your side, but nothing beats good old caution.

Liked this post?
AdGuard VPN AdGuard DNS AdGuard Mail AdGuard Wallet
AdGuard VPN AdGuard DNS AdGuard Mail AdGuard Wallet
20,138 20138 user reviews
Excellent!

Ad Blocker for Windows

AdGuard for Windows is more than an ad blocker. It is a multipurpose tool that blocks ads, controls access to dangerous sites, speeds up page loading, and protects children from inappropriate content.
By downloading the program you accept the terms of the License agreement
Microsoft Store
Ad Blocker for Windows v7.22, 14-day trial period
20,138 20138 user reviews
Excellent!

Ad Blocker for Mac

AdGuard for Mac is a unique ad blocker designed with macOS in mind. In addition to protecting you from annoying ads in browsers and apps, it shields you from tracking, phishing, and fraud.
By downloading the program you accept the terms of the License agreement
Read more
Ad Blocker for Mac v2.18, 14-day trial period
20,138 20138 user reviews
Excellent!

Ad Blocker for Android

AdGuard for Android is a perfect solution for Android devices. Unlike most other ad blockers, AdGuard doesn't require root access and provides a wide range of app management options.
By downloading the program you accept the terms of the License agreement
Read more
Scan to download
Use any QR-code reader available on your device
Ad Blocker for Android v4.12, 14-day trial period
20,138 20138 user reviews
Excellent!

Ad Blocker for iOS

The best iOS ad blocker for iPhone and iPad. AdGuard eliminates all kinds of ads in Safari, protects your privacy, and speeds up page loading. AdGuard for iOS ad-blocking technology ensures the highest quality filtering and allows you to use multiple filters at the same time
By downloading the program you accept the terms of the License agreement
Read more
Scan to download
Use any QR-code reader available on your device
Ad Blocker for iOS v4.5
20,138 20138 user reviews
Excellent!

AdGuard Content Blocker

AdGuard Content Blocker eliminates all kinds of ads in mobile browsers that support content-blocking technology — namely, Samsung Internet and Yandex Browser. Its features are limited compared to AdGuard for Android, but it is free, easy to install, and efficient
By downloading the program you accept the terms of the License agreement
Read more
AdGuard Content Blocker v2.8
20,138 20138 user reviews
Excellent!

AdGuard Browser Extension

AdGuard is the fastest and most lightweight ad blocking extension that effectively blocks all types of ads on all web pages! Choose AdGuard for the browser you use and get ad-free, fast and safe browsing.
Install
By downloading the program you accept the terms of the License agreement
Install
By downloading the program you accept the terms of the License agreement
Install
By downloading the program you accept the terms of the License agreement
Install
By downloading the program you accept the terms of the License agreement
Install
By downloading the program you accept the terms of the License agreement
Read more
AdGuard Browser Extension v5.3
20,138 20138 user reviews
Excellent!

AdGuard Assistant

A companion browser extension for AdGuard desktop apps. It allows you to block custom items on websites, add websites to allowlist, and send reports directly from your browser
AdGuard Assistant v1.4
20,138 20138 user reviews
Excellent!

AdGuard Home

AdGuard Home is a network-based solution for blocking ads and trackers. Install it once on your router to cover all devices on your home network — no additional client software required. This is especially important for various IoT devices that often pose a threat to your privacy
AdGuard Home v0.107
20,138 20138 user reviews
Excellent!

AdGuard Pro for iOS

AdGuard Pro for iOS comes with all the advanced ad-blocking protection features enabled. It offers the same tools as the paid version of AdGuard for iOS. It excels at blocking ads in Safari and lets you customize DNS settings to tailor your protection. It blocks ads in browsers and apps, protects your kids from inappropriate content, and keeps your personal data safe
By downloading the program you accept the terms of the License agreement
Read more
AdGuard Pro for iOS v4.5
20,138 20138 user reviews
Excellent!

AdGuard Mini for Mac — Safari ad blocker

AdGuard Mini for Mac is a powerful Safari ad blocker. This lightweight app removes ads, blocks trackers, and speeds up page loading. It helps you browse the Web in Safari without distractions and keep your data private
Install
By downloading the program you accept the terms of the License agreement
Read more
AdGuard Mini for Mac v2.1
20,138 20138 user reviews
Excellent!

AdGuard for Android TV

AdGuard for Android TV is the only app that blocks ads, guards your privacy, and acts as a firewall for your Smart TV. Get warnings about web threats, use secure DNS, and benefit from encrypted traffic. Relax and dive into your favorite shows with top-notch security and zero ads!
AdGuard for Android TV v4.12, 14-day trial period
20,138 20138 user reviews
Excellent!

AdGuard for Linux

AdGuard for Linux is the world’s first system-wide Linux ad blocker. Block ads and trackers at the device level, select from pre-installed filters, or add your own — all through the command-line interface
AdGuard for Linux v1.3
20,138 20138 user reviews
Excellent!

AdGuard Temp Mail

A free temporary email address generator that keeps you anonymous and protects your privacy. No spam in your main inbox!
20,138 20138 user reviews
Excellent!

AdGuard VPN

76 locations worldwide

Access to any content

Strong encryption

No-logging policy

Fastest connection

24/7 support

Try for free
By downloading the program you accept the terms of the License agreement
Read more
20,138 20138 user reviews
Excellent!

AdGuard DNS

AdGuard DNS is a foolproof way to block Internet ads that does not require installing any applications. It is easy to use, absolutely free, easily set up on any device, and provides you with minimal necessary functions to block ads, counters, malicious websites, and adult content.
20,138 20138 user reviews
Excellent!

AdGuard Mail

Protect your identity, avoid spam, and keep your inbox secure with our aliases and temporary email addresses. Enjoy our free email forwarding service and apps for all operating systems
20,138 20138 user reviews
Excellent!

AdGuard Wallet

A secure and private crypto wallet that gives you full control over your assets. Manage multiple wallets and discover thousands of cryptocurrencies to store, send, and swap
Downloading AdGuard To install AdGuard, click the file indicated by the arrow Select "Open" and click "OK", then wait for the file to be downloaded. In the opened window, drag the AdGuard icon to the "Applications" folder. Thank you for choosing AdGuard! Select "Open" and click "OK", then wait for the file to be downloaded. In the opened window, click "Install". Thank you for choosing AdGuard!
Install AdGuard on your mobile device