A Uniform Resource Locator (URL) is a set address for locating a unique resource on the internet, such as a file or an app. It is recognizable for users as the string of text which is shown in the browser address bar of every web page, or which links a user to another internet location. A URL is a type of Uniform Resource Identifier (URI).
It all began with the search for a method that would allow the sharing of information on computers. Three events highlight the roadmap towards the introduction of URLs:
- In 1969, the Advanced Research Projects Agency Network (ARPANET) sent the first node-to-node transmission from UCLA to the Stanford Research Institute, using the ARPANET 1822 Protocol. Consisting of the Physical, Data, and Network layers, a Transport layer was introduced in 1970 with the introduction of the Network Control Protocol (NCP).
- In 1983, ARPANET switched from NCP to the Transmission Control Protocol/Internet Protocol (TCP/IP).
- In 1989, Tim Berners-Lee created the World Wide Web (WWW), introducing the markup language, resource identifiers, and protocols for retrieving resources.
When Berners-Lee introduced the WWW to the world, data transmissions were stabilizing. By the early 1990s, the internet had advanced enough to serve up files, mail, and data, using protocols such as Gopher, FTP, and Telnet.
The time was ripe, therefore, in 1992, for Berners-Lee to define the URL as a tool to link to the location of any internet resource a user may require. Although his focus on URLs was for the web and its HTTP protocol, his vision incorporated most internet protocols.
Components of a URL
The four main components of URLs are the protocol, domain, path, and query.
Let us have a closer look at the different URL components, using the following example:
The protocol or scheme of a URL indicates the method that will be used for transmitting or exchanging data. The most familiar scheme is the Hypertext Transfer Protocol (HTTP) or Hypertext Transfer Protocol Secure (HTTPS) for the transmission of HTML files. FTP (for files) and Mailto (for mails) are examples of other types of schemes.
In the example URL above, https:// is the URL's secure protocol.
The domain or hostname of a URL is a user-friendly expression of the Internet Protocol (IP) address of a website. It points to the location of the website's host server.
In the example above, the domain is www.example.com.
The path that follows the domain name inside a URL points to a specific file or other resource location. It can also include a query string.
In our example URL, /category-A/subcategory-A1/model-123.html shows the path of the URL, which in this example, ends in a product page.
The query string, also known as a fragment identifier, is frequently used for internal searches and is commonly preceded by a question mark (?).
This URL is the result of a user entering the search term “Model 123” on the subcategory A1 page. The landing page in this example is either the product page of model 123 or a list of search results that contain the term “Model 123”.
Characters allowed in URLs
The characters that are allowed in URLs are specified by the World Wide Web Consortium. The specifications include lists of unreserved and reserved characters.
- Unreserved characters: These are not reserved and can be used freely in URIs and URLS. They include all upper case and lower-case letters, all decimal numbers, hyphens and underscores, tildes, and periods.
- Reserved characters: These serve special purposes within a URI/URL. They include / ; : ? @ & , + $ =.
Reserved characters are used for delimiting or other special purposes inside a URL and cannot be used in any other way within a URL unless they are URL-encoded. This means that in order to display a "?" as a question mark or a "+" as a plus sign inside a URL string, these characters will require encoding.
Examples of the usage of reserved characters inside a URL
In our example URL:
- Forward slash: "/" is used to delineate parts of a URL, for example, by separating the file path from the domain.
- Question mark: "?" is used at the start of a query.
- Equal sign: "=" is used between a parameter name and the value given for that name.
Differences between URL and domain
Although a domain can be a URL, as in the case of a homepage, most URLs contain much more information than just the domain name. URLs contain resource addresses, file paths, and queries.
The domain name is a user-friendly method of listing an IP address, making it easy to remember.
Components of a domain
A domain is divided into three different hierarchical levels, which start on the right side of the domain component of a URL.
- Top-level domain: domain name extension, such as .com, .net, .biz, .de.
- Mid-level domain: the component with the most flexibility. It can be an organization’s name or a phrase. It is also known as the second-level domain.
- Prefix: the WWW prefix of the mid-level domain. It is not required, but it initially distinguished web URLs from other URL schemes.
Absolute vs relative URLs
If a website contains a number of pages branching off from its home or category pages, links to other web pages can use relative URLs if the resources are on the same server as the referencing page. Relative URLs are less cumbersome than absolute URLs, but the webmaster must be very careful to avoid broken links if resources are moved frequently. It may be better to err on the side of caution with the consistent use of absolute URLs.
On the web page https://www.example.com/category-A/, a link points to a product in a subcategory:
This relative URL will send the user to the web page describing the product “Model 123” if all the product pages in subcategory-A1 are on the same server as the Category A page.