What Type of Information Is Leaked by Your Browser?
In this data-driven era of technology, internet users often need a reminder to be safe on the internet. In this Geekswipe edition, we explore the type of personal data that can be revealed by the modern web browsers with just a single visit to a website and find out how individual anonymized data pieces together, to form a unique fingerprint that can be tracked.
Some general data
When you type in a web address in the address bar and hit enter, a TCP connection is initiated. Right after a successful TCP handshake, your browser sends an HTTP request to the server on the other end of the connection. The HTTP header contains a few fields that help the server to respond the right content for you. For example, if you make a request to
https://geekswipe.net/ from the address bar, the request header sent by your browser would look something like this.
Host: geekswipe.net User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Encoding: gzip, deflate, br Accept-Language: en-GB,en;q=0.5
The server responds to your browser’s request with a response payload containing the HTML content you requested for. But now, with a simple HTTP request to a server, the information about your web browser and the operating system is revealed. But this is by design to identify the software stack for tailoring contents and for other analytical purposes. They are still benign and harmless without any association to your browsing history over time.
Your approximate location can be revealed with just your IP address, thanks to the geolocation services. Your IP address is the first thing that a server receives even before the HTTP headers. Geolocation is designed to help the server to deliver a content localized to your area. And by area, in most cases, it is an approximate position trimmed to the city level. However, approximate positioning can be achieved in the case of GPS enabled devices, or by sharing the Wi-Fi information.
But how do they know your physical location with just the IP address? In most cases, the IP address is dynamically assigned to you by your ISP. ISPs usually buy these addresses in bulk and register them with the IANA (Internet Assigned Numbers Authority), a department of ICANN (Internet Corporation for Assigned Names and Numbers) that manages the IP allocations. As this information is part of the public record of ICANN or any of the local registries, it’s the ISPs city or address that will be known and not that of the users’ in particular. The geolocation service companies aggregate this information and offer their service to the companies that wish to use geolocation to deliver regional content. Though it sounds benign, it contributes a significant level of uniqueness when combined with the dataset that you are about to explore.
Information that can be accessed from the DOM
- Screen/Viewport resolution
- Time zone
- Installed languages
- Processor information (Virtual cores)
- Operating System version and architecture
- Browser version
- Referrer info
- Installed extensions
- Cookie and notification preferences
- List of fonts
- Ad blocker presence
Though these data revealing components and methods are designed with a good intent, in reality, they are abused for monetizing opportunities, especially by the advertisement companies. They aggregate all these small pieces of information and build up to a big unique set of identifiable and trackable ‘fingerprint’ of your browsing environment. This is otherwise known as stateless fingerprinting, where no storage of identifiable information (like cookies) are needed to track a user.
Apart from these data, other HTML components like the canvas element can also reveal unique hashes and hardware data with the help of APIs of HTML5 and WebGL. Even if the above-seen headers can be spoofed using an extension, the way your computer renders image and texts is unique and hard to spoof, which also leads to a different level of tracking that we will explore later in this article.
More personal data mining
Now let’s explore something called cross-origin resource sharing. At Geekswipe we use the fonts Ubuntu and Montserrat for aesthetic purpose. The fonts are a shared resource here. We could very well serve the fonts from our own server, but as Google offers a faster option (and chances are the fonts are cached in your browsers already), Geekswipe makes a request to Google’s servers once the page is loaded. In other words, Geekswipe shares this particular resource from Google for a better performance. This is known as cross-origin resource sharing done with a good intent.
But the idea of cross-origin request also allows a site to make a request to any arbitrary resource to any websites from their site during a user’s session, especially with a different intent, like to know if the user visiting their site is also logged into any of their competitors’ site or any sites of interest. For instance, if you are visiting
example.com from your browser, and if
example.com wants to know if you are logged in to
example.net, they can simply make a cross-origin resource request from your session at
example.com to an image at
example.net that can only be accessed if the user is logged in at
example.net. If the user is logged in,
example.net would respond with an expected response. If not, the response would contain an error code. By using this simple technique, a server can conditionally identify if the user is logged into a certain social network or a website of interest.
Most of the users don’t just stick to one browser. And as far as web tracking goes, they evolved a bit and they don’t stick with one browser either. As mentioned above, there are ample methods and exploits that can reveal and fingerprint a user from inside a browser. The cross-browser tracking method just makes use of the hardware information to follow you across browsers.
The canvas element is one such component that enables cross-browser fingerprinting. The way the browser renders a 2D or 3D image depends heavily upon the underlying GPU and renderer installed. So by using it to render a lossless PNG image on two different browsers on the same machine, the WebGL render attributes can be independently correlated from the browser context, as the GPU and CPU data remains the same. This also extends to audio processing, keystroke dynamics (typing cadence), mouse movement analysis, extensive 3D model analysis, and even the battery level metadata.
When you are on the internet, it is safe to assume that you are always targeted for any private information. To protect your private data, the components discussed above can be controlled and the data it sends can be spoofed by means of extensions and modifications to the browser. For example, the popular extension, NoScript, blocks scripts altogether. And modern browsers like Firefox offers in-built tracking protection that averts any tracking cookies or cross-site exploits.
While most of the exploits and other vulnerabilities are hardened, the threat of fingerprinting and tracking still exists due to the other weak components like the canvas element and hardware mappings. Canvas element can be disabled too, but then it would start breaking the functionality of a site. This is the reason why the Tor browser blocks the canvas element by default and prompts users when canvas needs to be loaded. On the hardware mapping front, it is rather proved as a helpful metric in authenticating a user, despite being a tracking vector.
As of now, the recommended way to protect your information is to use extensions like Privacy Badger, uBlock Origin, and NoScript that block scripts and tracking cookies. Using private tabs are a good way to defend against supercookies. You can also disallow third-party cookies, as it will stop most of the cookie tracking. Or if you want to take it up a notch for a hardened privacy right from the IP level, you can use Tor browser, which is a modified version Firefox with better privacy features.
Firefox Lightbeam is a good tool to visualize which third-party sites are tracking you over a period of time. Here is a tool from the Electronic Frontier Foundation to check what sort of information is leaked by your browser to these third parties (which is mostly Google and AddThis). It is also worth noting that the use of all such protective layers can itself contribute to a unique fingerprinting pattern.
- RFC 7231 – Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. (2018). ietf.org. Retrieved 30 January 2018, from https://tools.ietf.org/html/rfc7231#section-5.5.3
- (Cross-)Browser Fingerprinting via OS and Hardware Level Features. (2018). Retrieved from http://yinzhicao.org/TrackingFree/crossbrowsertracking_NDSS17.pdf