top of page

The Life, Death, and Life of Cookies, Part II: Data and Targeting

Welcome back to our series on cookies, how they work, what they're used for, and how they're changing. We have a lot of ground to cover - last time around we talked about what cookies are, with an eye toward the basics. Today we're going to talk about how they're used for data trafficking in digital advertising, and the differences between good data and bad data.


What is Data?

Well that's an incredibly loaded question. Broadly speaking data is just information but when we talk about digital advertising the word "Data" tends to refer specifically to information about a person which can be used for targeting decisions, such as their location, age, gender, household income, interests, the sites they've visited, the device they're on, or the actions they've taken online. Data covers a broad range of potential categories, but to have value it has to be usable for making targeting decisions. Until recently, almost all of this data was stored in cookies, placed there either by the site someone was visiting or a third-party company interested in collecting and selling that data.


How is Data Collected?

You can collect data on someone a number of different ways but the easiest way is to place a piece of code called a "pixel" on a web page which, when loaded, will create a cookie on the user's computer containing information, usually about the site. This makes "interest" data the easiest information to acquire on the internet - someone visits a page with NFL scores, your code fires and puts "interested in football" in their cookie file. While you can always see what site someone is currently on before you serve them an ad, knowing the other sites they've been on in the past is much more difficult without cookies. Likewise, you can embed this kind of tracking code in a button (such as a "share this" button) or in an ad itself, using it to track if someone has seen the ad or clicked on it.


Some data may be collected when you log in or provide it to a site or service - sites which collect your age and gender can collect that data and then add it to your cookies every time you log in to the site. This is the primary way that much of the available demographic data is collected - for years dating sites like OKCupid were responsible for a ton of the internet's age and gender data, selling that data to third-party data companies.


Other data may not require pixels or cookies - your current geographic location can be determined by your IP address, and the device you're on, Operating System and browser you're using, and the size of your screen are all key pieces of data necessary to deliver ads correctly and fed to sites using a line of code called a userAgent string.


Of course, not all data comes from online sources like websites - there's an entire section of the industry devoted to bringing "offline" data collected from other sources to the online ecosystem - Liveramp is the most notable player in this space. These companies map offline identities to email addresses, and use that as a means of tagging someone when they come online, making it possible to target them based on collected offline factors if you can connect those factors to their email address. And within this space, you have companies which just specialize in certain types of offline data - companies like IQVIA specialize in targeting healthcare providers while companies like L2 offer voter file data.


Deterministic vs. Probabilistic Data


It's worth talking about the two categories of data at play here. So far we've been talking about deterministic data, that is, data which is explicitly provided by or about a user. This data is generally always correct - if you're on a page looking at baseball scores, you're interested in baseball, at least marginally. On the other hand, there's a ton of data which is probabilistic, or predicted based on other factors. You could, for example, use the fact that around 61% of MLB fans are male to predict that the person reading the scores is male, and assign them that data point. What you lose in accuracy, you gain in reach - there's a ton more interest data out there than gender data, so if you're really focused on reaching men, that's sometimes worth the risk.


Data Clashes


Data is cumulative; it can be removed from cookies, but that seldom happens and you can't edit someone else's cookie file - only the ones you've placed. So over time computers and devices (and the cookies contained on them) tend to accumulate more and more cookies and data. This gets compounded with probabilistic data and situations where multiple people use the same machine. It's very likely that, over any given time, a given user will be tagged as both male and female, and in multiple age groups.


Contrary to what you might think, this doesn't actually create an issue for targeting; if you're targeting cookies with the female signifier or an age in the 25-54 range, the fact that they also have male and 55+ doesn't stop you - you're just looking for those first two numbers. This can make good targeting and use of data tricky, but the details behind that are a subject for a future article.


Targeting with Data


So how is all this data used? When it comes to advertising the primary method is in making targeting decisions. Specifically, making decisions about who to show ads for a given campaign and, in programmatic media, how much to bid on those ad opportunities. A campaign might be run targeting women ages 25-44 with interests in healthy diet and exercise. These are readily available data points which can be collected or purchased from data providers (we'll cover that process in another future article), and then you can either set up a buying strategy or program a bidder to show ads to devices with cookies matching your criteria, i.e. they've got those data points sitting in the relevant cookie files you can read. This cookie check can be done every time you want to serve an ad, ensuring that you'll only serve ads to people in your target audience.


The Value of Cookies and Data


At its core, this is a strategy built around ad efficiency - you have a limited budget to spend, and you don't want to waste it showing ads to the wrong people. Data will have the most value when your budget is limited and your audience is niche - in those instances where people outside your target just aren't going to buy the product. On that note, the value of data can vary dramatically. Deterministic data is always better than probabilistic, and the smaller and more valuable an audience is, the more valuable (and expensive) data around them will be - LinkedIn charges a large premium for being able to effectively target C-Suite executives on their network.


All of this creates an interesting tug-of-war between reach and efficiency when it comes to using data to advertise: You want your targeting to be as accurate as possible, but at the same time you want your reach within your target to be as large as possible. And no matter the audience you're trying to reach, no online data source will effectively give you 100% reach - there are always going to be people who are "dark" with regard to certain datapoints, i.e. a car buyer who hasn't visited a dealer site. This makes audience building a mixture of art and science - building an accurate audience with good reach using existing data segments can mean getting creative with your choices and doing some homework. For example, you can identify boat owners by buying data segments around people visiting dealership websites, but you can also identify them by finding people buying trailers and looking for ways to wire them up.


Next Time: Data and Privacy


We've been talking mostly about data broadly up to this point - it's worth noting that there are ways to target without cookies, but cookies generally represent the easiest, most reliable way to gather, store, and access data about someone from their device. In a future article we'll talk about what removing cookies from the equation actually entails, and what the alternatives are - because even in a world without cookies, there's still plenty of data available and even in a world where they stick around, there are environments where you can't use them.


There was a lot to cover here and if it seems like there's too much data out there well, you're not the first person to think so. Next time we'll talk about the privacy concerns this created, how that led to new legislation around data collection and processing, and Google's decision to deprecate cookies.

Commentaires


bottom of page