Good (and bad news): the general consensus in the web developer community is that any and every website should be HTTPS by default. Why? HTTP by itself isn't encrypted, leaving it open to eavesdropping, message tampering, and man-in-the-middle attacks. HTTPS, if you use it consistently, prevents these issues.
So how can that possibly be bad news? HTTPS is confusing one of the core metadata tools of the Internet: HTTP Referrers. HTTP Referrers disappear when going from HTTPS to HTTP, but, more worryingly, sensitive HTTPS Referrers still get carried when going from HTTPS to HTTPS. Most secure applications aren't aware of where their HTTP Referrers do or don't go. Don't worry though: there's hope. Or at least meta hope.
Imagine you're on Reddit and click on a link to homakov.blogspot.ru. The server at homakov.blogspot.ru knows that you came from reddit.com. It even knows what specific webpage you came from -- in this case, a Reddit search page on /r/netsec searching for homakov.
So what does this look like? When your browser wants a page, it sends a GET request with a number of HTTP headers. One of these headers is the HTTP Referrer and informs server B that you came from a link at server A.
GET http://homakov.blogspot.ru/2013/05/csrf-tool.html HTTP/1.1 Referer: http://www.reddit.com/r/netsec/search?q=homakov Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.63 Safari/537.31 Connection: keep-alive Cookie: session="x1yc3OWM2ZTdhNNzk1YWY5NDk0MTczNTEKc=="; csrftoken="6QjAl18WY3NyZgpNsHpEKotZNfEtzSLocHRm"; Host: homakov.blogspot.ru Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.8 Accept-Encoding: gzip,deflate,sdch
The second line,
Referer: http://www.reddit.com/r/netsec/search?q=homakov, is the HTTP Referrer.
It's these HTTP Referrer fields that are tracked by analytics tools such as Google Analytics and ChartBeat.
HTTPS turns off HTTP Referrers to HTTP websites. Why? According to RFC 2616 (HTTP 1.1), this is due to the possibility of sensitive information being encoded in a referring URL:
Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol.
This seems quite reasonable at first glance. Unfortunately, this hard set rule doesn't transition well into a future where almost every website uses HTTPS.
Why? The HTTPS Referrer, which may contain sensitive data, will be sent from any HTTPS website to any other HTTPS website by default. It's only when a connection goes from HTTPS to HTTP that the referrer is dropped.
This leaves two problematic situations:
The first situation means we lose any understanding of where traffic is coming from, the second situation leads potentially to security vulnerabilities or information leaks. Essentially, if a HTTP website links to another HTTP website, the author of the secure page is lending extra trust just as it's HTTPS. In most cases, this is not what was intended.
So what are the possible situations with our HTTP Referrers? If we could somehow tell the HTTP Referrer to act in a particular way, what different behaviours would you like? What do we want to be possible? We'll label these different situations with names.
These cases are covered under a new HTML5 called the meta referrer.
Now a simple tag can be used, such as
<meta name="referrer" content="always">, to specify the exact behaviour of the HTTP Referrer regardless of whether we're using HTTP or HTTPS.
With that said, not all web browsers support this new fangled HTML5 meta referrer. If a web browser doesn't support this flag, it'll just fall back to the standard for HTTP or HTTPS. Which browsers support these?
|Web Browser||Meta Referrer Support|
|Firefox||No (In Progress)|
Well, okay, that's fine, the world improves. Let's say in a year they'll get it all sorted. What popular websites are using meta referrer?
|No: HTTP and no meta referrer|
|Hacker News||No: HTTPS with no meta referrer|
So, here's my little request.
If you run a website over HTTPS, add in an appropriate meta referrer. If it's a secure application, nix the referrers by setting it to _Never_. If the Internet would benefit from knowing you sent them traffic, allow those referrers for everyone.
Considering Google Analytics is used for 57.23% of the top million websites on the Internet (572,300 in case your math failed you), they're actually in an excellent position to track and understand the flow of the Internet even without these referrers in place. Assuming that a user is only browsing amongst the top million websites, not only do they know how a user interacts on a website with Google Analytics but they're likely able to track when a user goes from site A to site B as site B likely has Google Analytics. Even if it didn't, they know the user didn't go to one of the other Google Analytics backed websites, so the user either ended up closing the tab or going to one of the links on the page that isn't backed by Google Analytics. This is all valuable information they could use as it's a form of limited PageRank -- in PageRank you assume the user is a random link clicking bot, whilst in this situation you've removed a great deal of the ambiguity.
Combine this with Google's stranglehold on how people reach web pages (Google Search) and their
The correct spelling is referrer (two arrs) but you'll see it everywhere as referer (minimal arrs). Why? A small accident of history. Every reference to referrer is spelt referer in RFC 1945 (HTTP 1.0). By the time it was picked up, it was a little too late.
Interested in saying hi? ^_^