Browser Fingerprint

Updated by guanguan

User Agent

What is a User Agent?

A user agent is any software that interacts with web servers on behalf of Internet users. They can also be seen as a bridge between you and the Internet.

Any software that sends web requests to web servers is a user agent whether its works independently of human interaction as it is in the case of automation tools or bots or in the case of web browsers and other software that accept direct commands from humans.

Take, for instance, if you want to send access content online, you will have to make use of a web browser that serves as the user agent that deals with retrieving, rendering, and making it possible for you to interact with the content.

In a network protocol, the client is seen as the user agent, which is used in communication with a client-server network system. It might interest you to know that your email reader is a mail user agent.

User agents do not stop there – your gaming console can be a user agent, so is your smart TV and other Internet-enabled devices. In the Hypertext Transfer Protocol (HTTP), clients (user agents) are identified using the user-agent header.

Uses of User Agents

You might be wondering why would client software identify itself and what do web servers need that information for? It turns out that user agents have two major uses. These include content negotiation and access granting and blocks.

Content Negotiation

There are many variants of a web page served to devices based on their capabilities. Take, for instance, the structure of the Google search engine result page varies depending on the browser or platform you are using to access it. By looking at the user agent string, Google is able to serve you the best version for your browser and device.

There are many other sites on the Internet that make use of user-agent for providing a better user experience. Without a user-agent, at best, you are served the generic version of a page, which may or may not render well on your browser. Bot developers use this to walk around avoiding JavaScript-rich site by using mobile browser agents that will get web servers to return a non-JavaScript heavy version of a page.

Access Negotiation and Blocks

Perhaps, the most popular use of the user-agent string is to know whether a particular client software has the access right to access certain content or not. Web servers use the user-agent string in an HTTP request header to exclude crawlers, scrapers, and other bots from accessing their platform.

Many of the popular websites on the Internet frown at bot traffic and, as such, will deny access to user-agents other than that of popular browsers. While they do this internally, they can provide web crawlers signals via the robots.txt file – and expect you to follow the directives in there. Generally, web servers only want to allow access to traffic originating from a user and tend to block traffic from automated sources, except there’s a benefit for them.

Cookie, a type of “small text file”, refers to data (usually encrypted) that some websites store on the user’s local terminal (Client Side) in order to identify the user.

Because the HTTP protocol is stateless, that is, the server does not know what the user did last time, which severely hinders the implementation of interactive web applications. In a typical online shopping scenario, a user browses several pages and buys a box of biscuits and two bottles of drinks. At the final checkout, due to the stateless nature of HTTP, the server does not know what the user bought without additional means, so cookies are one of the “extra means” used to bypass the statelessness of HTTP. The server can set or read the information contained in Cookies to maintain the state of the user in the conversation with the server.

In the shopping scenario just now, when the user purchases the first item, the server sends a cookie to the user while sending the web page to the user, recording the information of that item. When the user visits another page, the browser will send the cookie to the server, so the server knows what he bought before. The user continues to purchase drinks, and the server adds new product information to the original cookie. At checkout, the server just reads the cookie sent.

Another typical application of cookies is when logging in to a website, the website often asks the user to enter a user name and password, and the user can check “Automatic login next time”. If checked, the next time the user visits the same website, the user will find that he has logged in without entering the user name and password. This is precisely because the server sent a cookie containing login credentials (an encrypted form of user name and password) to the user’s hard disk during the previous login. When logging in for the second time, if the cookie has not expired, the browser will send the cookie and the server verifies the credentials, so the user can log in without entering the user name and password.

Language

Language is one of the basic fingerprints of the browser. The basic fingerprint of the browser is a characteristic identifier that any browser has,including screen resolution, hardware type, operating system, user agent, system font, language, browser plug-in, Browser extensions, browser settings, time zone differences and many other information, these fingerprint information is similar to human height, age, etc., there is a high probability of conflict, can only be used as auxiliary identification.

Resolution

Screen resolution refers to the clarity of text and images displayed on the screen. The larger the monitor, the higher the resolution usually supported. Whether the screen resolution can be increased depends on the size and capabilities of the monitor and the type of video card used. Technically speaking, “resolution” is the number of pixels per unit area, not the total number of pixels.

Timezone

A time zone is a designated area on a global scale, and a unified standard time is observed for legal, commercial and social purposes. Time zones often follow the borders of countries and their subregions, rather than strictly follow the longitude, because it is convenient for close commercial or other communication areas to maintain the same time.

The total time difference between most time zones on land and Coordinated Universal Time (UTC) is a whole hour (UTC-11:00 (unoccupied UTC-12:00) to UTC + 14:00), but some of them are offset Shift 30 or 45 minutes (for example, Newfoundland Standard Time is UTC-03:30, Nepal Standard Time is UTC+05:45, India Standard Time is UTC+05:30, and Myanmar Standard Time is UTC+06:30).

MoreLogin displays Greenwich Mean Time when you open the browsers.

Audio

The Audio API provided by HTML5 for JavaScript programming enables developers to directly manipulate the original audio stream data in the code, generate, process, and recreate it arbitrarily, such as improving the tone, changing the pitch, and audio segmentation. It can even be called the web version of Adobe Audition.

The principle of AudioContext fingerprint is roughly as follows:

Method 1: Generate an audio information stream (triangular wave), perform FFT transformation on it, and calculate the SHA value as a fingerprint.

Method 2: Generate audio information stream (sine wave), perform dynamic compression processing, and calculate MD5 value.

In both methods, the audio is cleared before the audio is output to the audio device, and the user is fingerprinted without even noticing it.

Basic principles of AudioContext fingerprinting:

The subtle differences in the hardware or software of the host or browser cause differences in the processing of audio signals. The same type of browser on the same browser produces the same audio output, and the audio output produced by different machines or different browsers will be different.

It can be seen from the above that AudioContext and Canvas fingerprints are very similar in principle. They both use differences in hardware or software. The former generates audio, the latter generates pictures, and then calculates different hash values ​​as identification.

LocalStorage

As one of the APIs of HTML5 local storage web storage feature, localStorage is mainly used to store bai data in the client, and the client generally refers to the computer of the Shanghai website design user. On mobile devices, since most browsers support web storage features, web browsers on smartphones such as android and ios can use this feature normally.

The data saved by localStorage is generally permanently saved, which means that as long as the information is saved by localstorage, the data will always be stored in the user’s client. Even if the user closes the current web browser and restarts it, the data still exists. The life cycle of the data will only end when the user or the program clearly makes the deletion.

In terms of security, localstorage is secure within the domain, that is, localstorage is domain-based. Any page in the domain can access localstorage data. But there is a problem, that is, the data between the browsers of each browser manufacturer is independent. In other words, if you use localstorage to store a set of data in firefox, it cannot be read under the chrome browser. Similarly, because localstorage data is stored in the user’s device, the data saved by the same application on different devices is different.

Geo

The Geolocation API is a part of the W3C HTML5 standard which provides a simple high-level JavaScript API that allows websites to request physical location, thereby potentially compromising the user’s privacy. This Geolocation API test is intended to ensure that no location information is accessed through this API without your explicit permission.

Source:https://browserleaks.com/geo

Fonts

Font fingerprinting – is what fonts you have, and how they are drawn. Based on measuring dimensions of the filled with text HTML elements, it is possible to build an identifier that can be used to track the same browser over time.

Font metric-based fingerprinting are tightly crossed with the canvas fingerprinting. It is probably a weaker fingerprinting technique since canvas gets not only bounding boxes but also pixel data. On the other hand, font fingerprinting is much more difficult to defend.

Text rendering is a subtle and complex part of a web browser. Even in the Latin alphabet, layout is more than simply stacking boxes together: considerations such as ligatures, kerning, and combining characters come into play. Some other writing systems are even more complex, causing browsers to rely on OS-provided libraries for text layout. These libraries, including Pango on GNU/Linux, Graphics Device Interface (GDI) or DirectWrite on Windows, and Core Text on Mac OS X, are independent code bases and do not behave identically. Browsers additionally impose their own customizations atop the base text rendering.

Source:https://browserleaks.com/fonts

Canvas

Canvas is part of HTML5 and allows scripting languages to dynamically render bit images. You can use JavaScript programming language to draw on this element. Common applications include drawing graphics and text, image processing, games, and animation.

Source:https://browserleaks.com/canvas

WebGL

WebGL is a JavaScript API used to render interactive 2D and 3D graphics in any compatible web browser without using plug-ins. WebGL is fully integrated into all webpage standards of the browser, and GPU acceleration of image processing and effects can be used as part of the webpage Canvas. WebGL elements can be added to other HTML elements and mixed with other parts of the web page or web page background. The WebGL program consists of a handle written in JavaScript and a shader code written in OpenGL Shading Language (GLSL), which is similar to C or C++ and runs on the graphics processing unit (GPU) of the computer.

Source:https://browserleaks.com/webgl

WebRTC

WebRTC is a browser plugin that is typically utilized by web applications that require a fast direct connection. Since WebRTC establishes a connection through a UDP protocol, it is not routed through proxy servers that you may use in a browser. Websites may exploit this fact to reveal your real public and local IP addresses even if you are using a proxy. The same plugin can also be used to reveal your local IP addresses or track your media devices.

What WebRTC plugin leaks

  1. Public IP address(es)
  2. Local IP address(es)
  3. Media device numbers and hashes 

Source:https://browserleaks.com/webrtc

Media devices

WebRTC is a browser plugin that facilitates audio and video communication inside a web page by allowing a direct peer-to-peer connection to happen, excluding the need to install additional plugins or other native applications. In order for the plugin to work, WebRTC gives access to your media devices, such as your microphones, cameras, and headphones. Websites can exploit this mechanism in two possible ways:

  1. Device enumeration
  2. Media devices ID tracking

Do Not Track

Do Not Track (DNT) is an HTTP header field (Header). When the user enables this function, the browser will add a header field to the http data transmission: dnt: 1. This field indicates to the website server that the user does not want to be tracked. In this way, websites that comply with the rules will not track users’ personal information for more accurate advertising.

SSL

Transport Layer Security (TLS) and its predecessor Secure Sockets Layer (English: Secure Sockets Layer, abbreviation: SSL) is a security protocol that aims to provide security and data integrity protection for Internet communications. When Netscape launched the first version of the web browser-Netscape Navigator in 1994, it introduced the HTTPS protocol with SSL encryption, which is the origin of SSL. IETF standardized SSL, and published the TLS 1.0 standard document (RFC 2246) in 1999. Subsequently, TLS 1.1 (RFC 4346, 2006), TLS 1.2 (RFC 5246, 2008) and TLS 1.3 (RFC 8446, 2018) were announced. This protocol is widely used in applications such as browsers, e-mail, instant messaging, VoIP, and Internet fax. Many websites, such as Google, Facebook, Wikipedia, etc. also use this protocol to create secure connections and send information. It has become the industry standard for confidential communication on the Internet.

SSL includes a Record Layer and a transport layer. The record layer protocol determines the encapsulation format of the transport layer data. The transport layer security protocol uses X.509 authentication, and then uses asymmetric encryption algorithm to authenticate the communication party, and then exchanges the symmetric key as the session key. This session key is used to encrypt the data exchanged by the two parties to ensure the confidentiality and reliability of the communication between the two applications, so that the communication between the client and the server application will not be eavesdropped on by an attacker.

Need more help? Contact us:


How did we do?