by Andrew Oram
American Reporter Correspondent
January 8, 2010
THE COOKIE THAT ATE THE COFFEE SHOP
CAMBRIDGE, Mass. -- Editor's Note: After the Introduction, this is the second of seven parts of an exclusive, 9,000-word series on Identity & The Internet by American Reporter Webmaster Andy Oram.
What men daily do, not knowing what they do!
The previous section of this article explored the various identifies that track you in real life. Now we can look at the traits that constitute your identity online. A little case study may show how fluid these are.
One day I drove from the Boston area a hundred miles west and logged into the wireless network provided by an Amherst coffee shop in Western Massachusetts. I visited the Yahoo! home page and noticed that I was being served news headlines from my hometown two hours away. This was a bit disconcerting because I had a Yahoo! account but I wasn't logged into it. Clearly, Yahoo! still knew quite a bit about me, thanks to a cookie it had placed on my browser from previous visits.
A cookie, in generic computer jargon, is a small piece of data that a program leaves on a system as a marker. The cookie has a special meaning that only the program understands, and can be retrieved later by the program to recall what was done earlier on the system. Browsers allow Websites to leave cookies, and preserve security by serving each cookie only to the Web site that left it (we'll see in a later section how this limitation can be subverted by data gatherers).
As an experiment, I removed the Yahoo! cookie (it's easy to do if you hunt around in your browser's Options or Preferences menu) and revisited the Yahoo! home page. This time, news headlines for Western Massachusetts were displayed. Yahoo! had no idea who I was, but knew I was logging in from an Internet service provider (ISP) in or near Amherst.
What Yahoo! had on me was a minimal Internet identity: an IP address provided by the Internet Protocol. These addresses, which usually appear in human-readable form as as a series four separate three, two or one-digit numbers divided by periods, like 18.104.22.168, bear no intrinsic geographic association. But they are handed out in a hierarchical (top-down) fashion, which allows a pretty good match-up with location. At the top of the address allocation system stand five registries that cover areas the size of continents. These give out huge blocks of addresses to smaller regions, which further subdivide the blocks of addresses and give them out on a smaller and smaller scale, until local organizations get ranges of addresses for their own use.
Yahoo! simply had to look up the ISP associated with my particular IP address to determine I was in Western Massachusetts. But the technology is a bit more complicated than that. I was actually associated with three IP addresses - a complexity that shows how the fuzziness of identity on the Internet extends even to the lowest technological levels.
First, when I logged in to the coffee shop's wireless hub, or "hot spot," it gave me a randomly chosen IP address that was meaningful only on its own local network. In other words, this IP address could be used only by the local hub and anyone logged into it.
The hub used an aged but still vigorous technology known as Network Address Translation to send data from my system out to its ISP. As my traffic emanated from the coffee shop, it bore a new address associated with the coffee shop's wireless hot spot, not with me personally. All the people in the coffee shop can share a single address, because the hub associates other unique identifiers - port numbers - with our different streams of traffic.
But the ISP treats the coffee shop hot spot as the coffee shop treats me. The coffee shop's own address is itself a temporary address that is meaningful to the local network run by the ISP. A second translation occurs to give my traffic an identity associated with the ISP. This third address, finally, is meaningful on a world scale. It is the only one of the three addresses seen by Yahoo!.
However, an investigator with a subpoena could ask an ISP for the identity of any of its customers, submitting the global IP address and port numbers along with the date and time of access. The coffee shop didn't require any personal information before logging me in to its hot spot and therefore could not fulfill an investigator's request (unless, for instance, the coffee shop's surveillance cameras showed me opening my laptop to log on on that date at that time), but a person doing illegal file transfers or other socially disapproved activity from a home or office would be known to the hub system and could therefore by identified - so long as logfiles with this information had not been deleted from the hub.
The combination of IP address, port numbers, and date and time allows the Recording Industry Association of America to catch people who offer or download copyrighted music without authorization. And this technological mechanism underlies the European Union requirement that ISPs keep the information they log about customer use, as mentioned in the Part 1 of this article.
If I want to hide this minimal Internet identity - the IP address - I have to use another Internet account as a proxy. In the case of my visit to Western Massachusetts, I was protected by logging in anonymously to a coffee shop, but in some countries I'd be required to use a credit card to gain access, and therefore to bind all my Web surfing to a strong real-world identity. Many European countries require this form of identification, outlawing open wireless networks.
To generalize from my Amherst experiment, the information we provide as we use the Internet is very limited, and can be limited even further through simple measures such as removing cookies (a topic covered further in a later section of this article). But what the Internet still allows can be used in a supple manner to respond instantly with ads and other material - such as the nearest coffee shop or geographically relevant weather reports - that are hopefully of greater value than the corresponding material in print publications we peruse.
This section has explored the use of IP addresses metaphorically, as well as illustratively, to show how our Internet identity is context-sensitive and can change utterly from one setting to another. Usually, we provide more of a handle to the people we communicate with over email, instant messaging, forums, and so forth. Here too we have multiple identities and spend hours collecting each other's handles.
Email, the oldest form of personal online communication, ironically has one of the better hacks for combining identities. You email accounts can be set up to forward mail, so that mail to the address you kept from your alma mater goes automatically to your work address.
In contrast, you can't use your AIM instant message account to contact someone on MSN, so you need a separate account on each IM service and no one will know they all represent you unless you tell them. Twitter is experimenting with ways to assure users that accounts with well-known names are truly associated with the people for whom they're named.
If IM services all agreed to use XMPP (or some other protocol) you will be able to reduce all your IM accounts to one. And if every social network supported OpenSocial, you could do a lot of networking while maintaining an account on just one service.
A widely adopted protocol called OpenID allows one identity to support another: if you have an account on Yahoo! or Blogger you can use it to back up your assertion of identity on another site that accepts their OpenID tokens. OpenID and related technologies such as Information Card don't validate your existence or authenticate the personal traits you have outside the Internet, but allow the identity you've built up on one site to be transferable to others.
The next section shows how our the minimal elements of online identity have been expanded by advertisers and other companies, who combine the various retrievable glimpses of our identity. Following that, we'll see how we ourselves manipulate our identities and forge new ones.
Next: Tracking You Through The Wild Internet