Over the past few years, Google have moved closer and closer towards implementing SSL on all their products. In January 2010, Gmail became HTTPS by default. In October 2011, Google Search became HTTPS by default. In September 2013, Google AdSense added support for serving through HTTPS.
In December 2013, it was discovered that the NSA uses Google cookies to pinpoint targets for hacking. This is only made possible due to Google Analytics not using HTTPS. More worryingly, the heavy usage of Google Analytics means that most of your web browsing habits can be easily intercepted.
Google, make early 2014 when Google Analytics becomes HTTPS by default.
With a flick of a switch, Google can add security against unwanted surveillance for billions of users and webpages across the Internet.
Before I begin, let me be clear about a few things. This is not a statement against Google, Google Analytics, referrer tracking, or analytics in general. Additionally, while I use results from work I did while a student at Harvard University, these opinions are my own.
Half a year ago, I wrote an article about HTTP referrers. As a minor tangent to the article, I posed the question "how much of the traffic around the Internet is Google in a place to observe?", primarily as Google Analytics (GA) is one of the biggest users of HTTP referrers. Recently I had the opportunity to investigate this question further by research conducted for a final project at Harvard University.
Just how pervasive is Google Analytics? 65.7% of the top 10k sites, 64.2% of the top 100k, and 50.8% of the top million use Google Analytics. In practical terms, this means that you're either on a domain that is using Google Analytics or your next click will likely land you on one that does.
|N domains||Source||Percentage using Google Analytics|
|Random 48.5 million||C. Hornbaker, S. Merity (2013)||26.96%|
While the percentage of websites using Google Analytics has been calculated for a specific set of domains, this is not a true measure of how widely the impact is felt. Some domains have far more webpages than others, and certain domains are also more heavily trafficked. Most importantly, if you land on a web page with Google Analytics, a multitude of details are recorded, including the HTTP referrer, or web page that lead you to the current website. Thus, if you happen to land on a web page using Google Analytics on every second link click, Google has enough information to reconstruct your entire path.
This line of discussion motivates a simple question.
While only 29.96% of the 48.5 million domains surveyed used Google Analytics, 39.69% of the 535 million pages across those domains had Google Analytics, and of the 42 billion links on those pages, 48.96% of them end in a page using Google Analytics, thus reporting a user's browsing history for both that page and the page before it.
Our work was an approximation as it only used domains instead of the pages themselves, simplifying the computational requirements substantially. The full details are on the project website. The important take-away is that thanks to referrers, Google Analytics gets information from pages that don't have Google Analytics on them, and the "worst case" is that the entire browsing history of a user could be reconstructed, even if not all the pages use Google Analytics.
Imagine if the NSA wanted to know the browsing history for the majority of users on the Internet. To perform this, the NSA would need to perform a man-in-the-middle attack for each of the connections it's interested in. This either means observing all connections as they leave a user or observing all connections as they reach a server of interest. This doesn't scale well considering the number of users and servers on the Internet.
With Google Analytics, the NSA can sidestep this, as it only has to eavesdrop on all connections to and from the Google Analytics servers. Not that that would ever happen.
We are already aware that the NSA uses Google cookies to pinpoint targets for hacking. Even though Google Analytics doesn't have full coverage of the Internet, they have substantial coverage when you add in referrers. Given the exact same information they've already collected, they can reconstruct partial browsing histories for millions of users.
Hence, the NSA has an economical way to tap a large portion of web traffic, especially considering they already have the required data on hand.
While this might be exciting for the NSA, it could be even more exciting for other smaller spy agencies. The tracking information sent to the Google Analytics servers are routed to the nearest server for latency reasons. This means that if I were in control of a small country and was interested in tracking my own populace, I would only need to eavesdrop all connections to the nearest Google Analytics server. This is quite a simple task, especially if you have influence over the local infrastructure providers.
An important note is that, if the website uses HTTPS, Google Analytics will use HTTPS for communicating the tracking information too. This prevents trivial eavesdropping as the tracking information is no longer sent in the clear.
For any website that uses HTTP, eavesdropping is entirely trivial however, as the tracking information is sent in cleartext.
Google have previously reacted to potential invasions of their own and their customer's privacy. The most recent example was when they encrypted inter-datacenter communications when it was discovered that the NSA had tapped their private fiber networks.
In this situation, they could again act to protect their customers and lessen the impact that Google Analytics has on an end-user's privacy.
By default, if you use Google Analytics on a website run over HTTP, there is nothing you can do to make it resilient to eavesdropping. As previously stated, if your website is HTTPS, then the information is sent encrypted only to the Google servers.
You might ask, why not just load the HTTPS version of the Google Analytics script? Good question, but Google Analytics decides how to reply by looking at whether the current page is HTTP or HTTPS. This means that, even if you load the Google Analytics script over HTTPS instead of HTTP, the reply to the Google servers will be sent in the clear if the page you load is HTTP.
Therefore, if your website is served over HTTP, there is nothing you can do to prevent eavesdropping when using Google Analytics.
Google could counter this potential privacy invasion for those exposed to Google Analytics on HTTP websites by forcing the reply to be sent over HTTPS. There are two options for this: opt in or opt out.
An opt in is the least obstructive method and would enable concerned website owners to tell Google Analytics to send tracking information back over HTTPS even if the website is served over HTTP. This would likely have a negligible impact on helping to secure the billions of websites and end users across the Internet, however, as the majority of website owners would be unlikely to adopt this simply due to laziness or lack of understanding.
Opt out, or HTTPS by default, is the simplest and most effective way to prevent potential privacy invasions. Google are well aware of how to provide SSL/TLS at scale however and have been working on the topic for some time. The largest potential impact might be worse battery life for clients on mobile devices. In most situations, the overhead for the end user would be negligible.
Google, protect your users, enable HTTPS on Google Analytics by default.
I'm Stephen Merity, better known in most places as Smerity.
BIT (University Medal + First Class Honours)
MS in CSE (starting August 2013)
Interested in saying hi?