What Type of Information Is Leaked by Your Browser?

by Karthikeyan KC — updated on January 30, 2018 · in Computing

Photo of the 'tracking protection' text from Firefox's private browsing window.

In this data-driven era of technology, internet users often need a reminder to be safe on the internet. In this Geekswipe edition, we explore the type of personal data that can be revealed by the modern web browsers with just a single visit to a website and find out how individual anonymized data pieces together, to form a unique fingerprint that can be tracked.

Some general data

When you type in a web address in the address bar and hit enter, a TCP connection is initiated. Right after a successful TCP handshake, your browser sends an HTTP request to the server on the other end of the connection. The HTTP header contains a few fields that help the server to respond the right content for you. For example, if you make a request to https://geekswipe.net/ from the address bar, the request header sent by your browser would look something like this.

Host: geekswipe.net
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-GB,en;q=0.5

The server responds to your browser’s request with a response payload containing the HTML content you requested for. But now, with a simple HTTP request to a server, the information about your web browser and the operating system is revealed. But this is by design to identify the software stack for tailoring contents and for other analytical purposes.^[1] They are still benign and harmless without any association to your browsing history over time.

Location information

Your approximate location can be revealed with just your IP address, thanks to the geolocation services. Your IP address is the first thing that a server receives even before the HTTP headers. Geolocation is designed to help the server to deliver a content localized to your area. And by area, in most cases, it is an approximate position trimmed to the city level. However, approximate positioning can be achieved in the case of GPS enabled devices, or by sharing the Wi-Fi information.

But how do they know your physical location with just the IP address? In most cases, the IP address is dynamically assigned to you by your ISP. ISPs usually buy these addresses in bulk and register them with the IANA (Internet Assigned Numbers Authority), a department of ICANN (Internet Corporation for Assigned Names and Numbers) that manages the IP allocations. As this information is part of the public record of ICANN or any of the local registries, it’s the ISPs city or address that will be known and not that of the users’ in particular. The geolocation service companies aggregate this information and offer their service to the companies that wish to use geolocation to deliver regional content. Though it sounds benign, it contributes a significant level of uniqueness when combined with the dataset that you are about to explore.

Information that can be accessed from the DOM

Time to go a little deeper with JavaScript! So once all the request payload is rendered, what you will have is a window object, with a potential to query specific information about your computer. The following are some of the information that is easy to obtain with just a few lines of JavaScript.

Screen/Viewport resolution
Time zone
Installed languages
Processor information (Virtual cores)
Operating System version and architecture
Browser version
Referrer info
Installed extensions
Cookie and notification preferences
List of fonts
Ad blocker presence

Though these data revealing components and methods are designed with a good intent, in reality, they are abused for monetizing opportunities, especially by the advertisement companies. They aggregate all these small pieces of information and build up to a big unique set of identifiable and trackable ‘fingerprint’ of your browsing environment. This is otherwise known as stateless fingerprinting, where no storage of identifiable information (like cookies) are needed to track a user.

Apart from these data, other HTML components like the canvas element can also reveal unique hashes and hardware data with the help of APIs of HTML5 and WebGL. Even if the above-seen headers can be spoofed using an extension, the way your computer renders image and texts is unique and hard to spoof, which also leads to a different level of tracking that we will explore later in this article.

More personal data mining

Now let’s explore something called cross-origin resource sharing. At Geekswipe we use the fonts Ubuntu and Montserrat for aesthetic purpose. The fonts are a shared resource here. We could very well serve the fonts from our own server, but as Google offers a faster option (and chances are the fonts are cached in your browsers already), Geekswipe makes a request to Google’s servers once the page is loaded. In other words, Geekswipe shares this particular resource from Google for a better performance. This is known as cross-origin resource sharing done with a good intent.

But the idea of cross-origin request also allows a site to make a request to any arbitrary resource to any websites from their site during a user’s session, especially with a different intent, like to know if the user visiting their site is also logged into any of their competitors’ site or any sites of interest. For instance, if you are visiting example.com from your browser, and if example.com wants to know if you are logged in to example.net, they can simply make a cross-origin resource request from your session at example.com to an image at example.net that can only be accessed if the user is logged in at example.net. If the user is logged in, example.net would respond with an expected response. If not, the response would contain an error code. By using this simple technique, a server can conditionally identify if the user is logged into a certain social network or a website of interest.

Cross-browser tracking

Most of the users don’t just stick to one browser. And as far as web tracking goes, they evolved a bit and they don’t stick with one browser either. As mentioned above, there are ample methods and exploits that can reveal and fingerprint a user from inside a browser. The cross-browser tracking method just makes use of the hardware information to follow you across browsers.

The canvas element is one such component that enables cross-browser fingerprinting. The way the browser renders a 2D or 3D image depends heavily upon the underlying GPU and renderer installed. So by using it to render a lossless PNG image on two different browsers on the same machine, the WebGL render attributes can be independently correlated from the browser context, as the GPU and CPU data remains the same. This also extends to audio processing, keystroke dynamics (typing cadence), mouse movement analysis, extensive 3D model analysis^[2], and even the battery level metadata.

Protection methods

When you are on the internet, it is safe to assume that you are always targeted for any private information. To protect your private data, the components discussed above can be controlled and the data it sends can be spoofed by means of extensions and modifications to the browser. For example, the popular extension, NoScript, blocks scripts altogether. And modern browsers like Firefox offers in-built tracking protection that averts any tracking cookies or cross-site exploits.

While most of the exploits and other vulnerabilities are hardened, the threat of fingerprinting and tracking still exists due to the other weak components like the canvas element and hardware mappings. Canvas element can be disabled too, but then it would start breaking the functionality of a site. This is the reason why the Tor browser blocks the canvas element by default and prompts users when canvas needs to be loaded. On the hardware mapping front, it is rather proved as a helpful metric in authenticating a user, despite being a tracking vector.

As of now, the recommended way to protect your information is to use extensions like Privacy Badger, uBlock Origin, and NoScript that block scripts and tracking cookies. Using private tabs are a good way to defend against supercookies. You can also disallow third-party cookies, as it will stop most of the cookie tracking. Or if you want to take it up a notch for a hardened privacy right from the IP level, you can use Tor browser, which is a modified version Firefox with better privacy features.

Firefox Lightbeam is a good tool to visualize which third-party sites are tracking you over a period of time. Here is a tool from the Electronic Frontier Foundation to check what sort of information is leaked by your browser to these third parties (which is mostly Google and AddThis). It is also worth noting that the use of all such protective layers can itself contribute to a unique fingerprinting pattern.

Footnotes

RFC 7231 – Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. (2018). ietf.org. Retrieved 30 January 2018, from https://tools.ietf.org/html/rfc7231#section-5.5.3
(Cross-)Browser Fingerprinting via OS and Hardware Level Features. (2018). Retrieved from http://yinzhicao.org/TrackingFree/crossbrowsertracking_NDSS17.pdf