Online Advertising 101

I don’t like ads. I think web tracking is creepy although I realize that nothing on the Internet is private. Ads slow down web browsing and steal screen real estate. Over the years I’ve used a number of tactics to reduce the number of ads I see. I thought I’d take a few minutes and try to organize them into something useful for others. I apologize in advance for the “It’s one big conspiracy plot” tone.

How ads work

Ads in newspapers are pretty straightforward. Unless they aren’t. Have you ever come across a full page ad that’s printed in a format (font, columns, etc) that mirrors the newspaper content? That’s a sneaky way to get people to read their ad because people don’t think they’re reading an ad.

Ads on web pages are even more sophisticated. Most follow a few basic principles except where they don’t – for example, logged-in sites and advertisers that don’t play by the rules are stickier. I’ll try to break it down by layer.

1) Web pages, with few exceptions, are simply html files. Internet browsing in its basic form is simply a string of file transfers from the web server to the client computer. The client sends a request to an address, and the web server responds by sending an html file back. That’s it.

For example, typing “yahoo.com” into your address bar sends a request to an address (yahoo.com) for content (everything after the “.com” which is nothing in this example, so this tells the server to send the default, or home, page). A request to “yahoo.com/mail” will send the “mail” html page. The html file is discarded by the client computer when requesting a different page or closing the browser. It’s pretty simple.

2) Ads appear on web pages when they are embedded in the html code on the web page. However, since most ads are constantly changing, the ad is not included in its entirety in the html code. Rather the ad is “served” from a hyperlink in the original web page that is “called” every time the web page is loaded. For example, Google’s ad server is “doubleclick.com” – so if Google is selling ads on, say, nytimes.com, when a person goes to (requests) the nytimes.com web page, an ad will be piggybacked originating from doubleclick.com. From a network perspective the client computer has requested two files: one from nytimes.com (news) and one from doubleclick.com (ad content). Sly.

3) Ads served from independent sources present a number of opportunities for the ad shyster. Blacklisting is a good place to start. List the common ad domains (doubleclick.net, etc) in the hosts file of the client computer which “kills” all requests to these domains. More details to follow. However, this isn’t quite as easy since ad domain addresses are frequently changing to get around this limitation.

4) Another way to disable ads is to block the request to fetch the ad from executing in the client browser. Most ad requests are written in javascript, so disabling javascript will kill 95% of ads. However, websites need javascript to function so you’ll lose functionality at the same time.

5) Ad blocker extensions are sometimes malware in disguise – so choose carefully. Always be very careful when installing any browser add-in or plug-in. It’s a minefield out there. I’m here to help.

Tracking

Cross-site tracking is where things get really creepy. Say you’re looking at Eureka vacuum on Amazon, then jump over to reading a blog or a new site, and Gotcha! – there’s an ad for the exact same Eureka vacuum that you were looking at on Amazon. How did they know? Cookies.

Cookies are tiny text files – here are the contents of one of my cookies from our tracking friends at webtrends.com:

ACOOKIE

C8ctADY3LjIzNC4yMDMuMTM3LTE3MzI2MzkzMjguMzAyODk0ODIAAOOOKKKBBBBB6gEAAOqVWFHqlVhRAQAAABLLAADqlVhR6pVYUAAAQA-

m.webtrends.com/

2379230982

2765834594

83772094

309849019

30286683

*

The numbers look like gibberish but this is actually a series of identifiers. Information about my web browsing trends is encoded here, such as date, number of visits, and site history. My name will generally not be included, although it’s a small feat for today’s servers to reach a conclusion on a name given the banks of data that is aggregated around my web browsing history. Other cookies are much longer than this with much more information. Fortunately marketing companies aren’t interested in knowing names (let’s hope not, anyway) but all the data is there when they decide otherwise. As I visit sites, the text in a cookie will be appended with updated history and updated on Big Data servers wherever they might be. Elsewhere on the web, other sites can peer into my cookie folder and search for specific cookies, which tells them where I’ve been, who I am, and what I look at on the Internet. And finally, this is a generalization of cookie operations – different sites have different levels of tracking and tenacity regarding how aggressive they peruse my cookie jar. Cookies have many legitimate functional uses, which is why we can’t outlaw them or block them altogether.

Facebook uses a basic form of ad customization. Users “Like” things by clicking on “Like” buttons scattered around the Internet. Each of my “Likes” goes into a database attached to my name. No surprise here, nor should it be a surprise that Facebook is using this catalog of my tastes to feed me customized advertisements. I’m a boring person on Facebook because I don’t like anything but Facebook doesn’t stop there. It peruses my photo locations and photo captions as evidenced by the string of ads below. My profile picture is a kayak, I have a lot of Grand Canyon photos, I have an iPhone 5 (the app) and I ride bike, and I post about coffee probably more than any other food (not sure why Gravedigger is in there though). Boring of them to tell this back to me in so many ads. A little creepy, but it gets weirder.

clip_image002
Facebook creep-factor

Those little social media buttons on almost every page are the big guns of online advertising. These combined with multi-tab browsing and persistent cookies gives advertisers world domination power. Here’s where the plot turns dark.

image
Widget bar

One item at a time: Multi-tab browsing lets users stay logged into Facebook or Google while they’re off browsing other sites. Furthermore, there are two types of cookies to be aware of: session cookies die on browser-close, and persistent cookies live forever or until they meet the ad shyster. Persistent cookies are supposed to get your explicit permission before being set; i.e. you will check the box that says “Keep me logged on” which sets a persistent cookie. Not everyone follows the rules of the Internet, unfortunately, and less scrupulous sites will sometimes set persistent cookies for kicks and giggles. The benefit for advertisers is they can aggregate much more data across multiple browsing sessions, and they also can track across sites.

Here’s how they track across sites. I’ll use Facebook as an example. Say you’re logged into Facebook on Tab 1 or you’ve logged in and closed the tab without closing the browser, so you have a persistent cookie from Facebook sitting in your cookie jar. You visit lifehacker.com for some tips on dealing with life, and there’s a Facebook “Like” button on the page. Even if you’ve told your browser to ignore third-party cookies, it won’t because you’ve got a Facebook persistent cookie in your cookie jar which makes Facebook a first-party site. So it pings the address in the cookie and sends a string of text up to Facebook telling them the date, time, computer stats, article ID, etc, etc. It does this even if you ignore the “Like” button because you’ve already got that Facebook cookie containing a unique identifier. Every site you visit which has that “Like” button will add a line to your account on Facebook, creating a play by play breadcrumb trail of your path through the internet. Could we say Ad infinitum?

If you could empty that cookie jar before every page load it would still ping Facebook with that data, but the unique identifier on each page load would be different because Facebook would now be a third-party cookie, which resets the session on every page load (with third-party cookies disabled), making tracking that path from site to site slightly harder. However there is little that mere humans can do to stop the powerful algorithms of Big Data.
 

Stop!

There are a many different ways to implement ad blocking and weaken tracking. Here are a few options, with progressively more effective, albeit more involving methods:

1) Clear cookies on every browser close. Look through your browser settings, usually in Privacy or History, and set the browser to delete all cookies on every browser close.

2) Disable third-party cookies in your browser. Look for this setting in the Privacy area, under Custom settings for cookies. Accept only first-party cookies and keep only until you close your browser.

3) Use Private Browsing Mode (go “Incognito” in Chrome). This won’t remember any history and will block all third-party cookies. The downside is that you’ll have to log in everywhere every time because this mode really scrubs the cookie jar clean.

4) Install an ad blocker browser extension. Be careful here because some ad blockers are a front for malware that is much worse than any ad could be. A tried and proven blocker is AdBlock for Chrome. This will block about 90% of all ads and works very well.

5) Blacklist ad sites in your Hosts file. The Hosts file is located in C:\Windows\system32\drivers\etc. Copy it to the Desktop, open it from the Desktop using Notepad, add the blacklist sites from my list to the bottom of it, save and close, then move back into the original location, overwriting the previous version. I’ve attached a Host file example here, which includes my list of blacklisted sites.

6) Install a javascript blocker. Most ads provide content by executing javascript code. Get it here. This is a whitelisting tool, so you’ll have to unblock all of your good sites as you visit them like banking, email, etc in order for them to function. ScriptSafe saves site information so you’ll only have to unblock a site once, and sites will sync across computers in the Chrome browser. This speeds up browsing considerably because you’re only allowing the necessary content – all other will be blocked. Note: This is not for non-techies because it will break your internet. This also blocks all widgets, gadgets, “Like” buttons, etc until unblocked which really speeds up the internet since your page only has to load half the content. I use this on my primary browser and love it. You’ve been warned, though. A javascript blocker will break your internet.

image
Widget bar with ScriptSafe

There you have it. Make this Easter the day you go ad-free!