Introduction
Parsing URLs is a common task to perform during web development, and also one that seems to be simple but can get complex. It's not often that you come across a module that makes parsing so easy that you don't have to think a lot yourself.
Despite being relatively young (published in June 2021), with almost 3 million weekly downloads, parse-url is one of the top modules for parsing URLs in JavaScript.
In this article, we'll be taking a look at all of its possibilities and how it makes our lives easier.
Parts of an URL
A URL has many parts, so breaking it up in bits and pieces without pure string parsing is a powerful tool to have. Every URL has the same main parts, with other parts being optional depending on the query or action.
The constituent elements of an URL are:
- Scheme - used to identify the protocol being used to fetch the resource on the Internet
- Some of the more popular protocols are: HTTP, HTTPS, IP, ICMP, TCP, UDP, etc.
- Host - the name of the host that has the resource we're getting (
www.somehost.com
) - Path - the path to the resource located on the host (
www.somehost.com/path/to/index.html
) - Query string - string containing key-value pairs (
www.somehost.com/index?key=value&key2=value2
)
These are the main chunks of the URL, but we'll see that we can retrieve even more with parse-url, in a very readable and again, parsable format.
Installing and Setting up the parse-url Module
We start by creating a folder for our mini project called parse_url
. In the folder, we can install the module using npm
:
$ npm i parse-url
To use the module in our code (in the index.js file), we must require
it:
const parseUrl = require('parse-url');
That's it, we're good to go! Let's see what this module offers.
Parsing the URL
To start, let's take a simple URL: https://www.stackabuse.com
. The constructor for parseUrl
takes in two parameters, string_url
and normalize
, with normalize
being optional.
By default, it is set to false
and it's assumed that the URLs being supplied are already normalized. When true
, it transforms a non-normalized URL to a normalized one. For example:
someRandomUrl.com:80 --> http://someRandomUrl.com
This is called URL normalization. The parse-url
module bases its normalization on the normalize-url
module and the normalize-url
module works exactly as shown above.
Let's parse a URL:
const url = 'https://www.stackabuse.com/';
const parsedUrl = parseUrl(url);
console.log(parsedUrl)
The output of the code will be in JSON format, which consists of the elements of that URL:
{
protocols: [ 'https' ],
protocol: 'https',
port: null,
resource: 'www.stackabuse.com',
user: '',
pathname: '',
hash: '',
search: '',
href: 'https://www.stackabuse.com',
query: [Object: null prototype] {}
}
As you can see there is a lot of things that were extracted, although, some are empty since the URL we've provided is pretty bare. Let's take a look at the elements in this JSON:
protocols
- list of protocols used in the URL (can be more the one)protocol
- first ofprotocols
port
- a port (if supplied)resource
- the hostuser
- user at the host's server (user@host)pathname
- path to resourcehash
- if supplied, info after the#
(hash) - usually anchors on a web pagesearch
- a query stringhref
- the full URL
An interesting example is found using GitHub links, which were one of the reasons this module was created in the first place. GitHub links can get pretty complex and convoluted compared to other URLs you see on a daily basis, and can include multiple protocols and users:
const url = 'git+ssh://[email protected]/path/to/resource.git';
const parsedUrl = parseUrl(url);
console.log(parsedUrl)
This results in:
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
{
protocols: [ 'git', 'ssh' ],
protocol: 'git',
port: null,
resource: 'somehost.com',
user: 'git',
pathname: '/path/to/resource.git',
hash: '',
search: '',
href: 'git+ssh://[email protected]/path/to/resource.git',
query: [Object: null prototype] {}
}
The list protocol here has changed, since there are multiple protocols in use. However, the first one is referred to when printing out the URL info. We can also see pathname
is now filled with the path to the resource.
One of the selling points of parse-url
is the fact that it works so well with Git URLs.
Let's really up the URL and include a hash and a couple of key-value queries:
const url = 'git+ssh://[email protected]:30/path/to/resource.git?key1=value1&key2=value2#anchor';
const parsedUrl = parseUrl(url);
console.log(parsedUrl)
This example differs from the previous one just a little bit, just enough to fill out the empty values in the previous example. The output will be:
{
protocols: [ 'git', 'ssh' ],
protocol: 'git',
port: 30,
resource: 'somehost.com',
user: 'git',
pathname: '/path/to/resource.git',
hash: 'anchor',
search: 'key1=value1&key2=value2',
href: 'git+ssh://[email protected]:30/path/to/resource.git?key1=value1&key2=value2#anchor',
query: [Object: null prototype] { key1: 'value1', key2: 'value2' }
}
The port, hash and query are present now - and we've even got the keys and values for the query! Having the parsed data be structured in a human-readable format, that's also universally accepted and easily parsable is a really helping hand when parsing URLs.
Though, this is only the pretty-printed output of the returned object. What allows us to really work with these parsed elements is the fact that they're all fields of the returned object, which we can easily access:
console.log("The protocols used in the URL are " + parsedUrl.protocols);
console.log("The port used in the URL is " + parsedUrl.port);
console.log("The resource in the URL is " + parsedUrl.resource);
console.log("The user in the URL is " + parsedUrl.user);
console.log("The pathname in the URL is " + parsedUrl.pathname);
console.log("The hash in the URL is " + parsedUrl.hash);
console.log("The search part in the URL is " + parsedUrl.search);
console.log("Full URL is " + parsedUrl.href);
Running this code results in:
The protocols used in the URL are git,ssh
The port used in the URL is 30
The resource in the URL is somehost.com
The user in the URL is git
The pathname in the URL is /path/to/resource.git
The hash in the URL is anchor
The search part in the URL is key1=value1&key2=value2
Full URL is git+ssh://[email protected]:30/path/to/resource.git?key1=value1&key2=value2#anchor
Finally, let's see the results of URL normalization. If we pass an unnormalized URL, such as stackabuse.com:3000/path/to/index.html#anchor
, as a URL string:
const url = 'stackabuse.com:3000/path/to/index.html#anchor';
const parsedUrl = parseUrl(url, true);
console.log(parsedUrl);
This results in:
{
protocols: [ 'http' ],
protocol: 'http',
port: 3000,
resource: 'stackabuse.com',
user: '',
pathname: '/path/to/index.html',
hash: 'anchor',
search: '',
href: 'http://stackabuse.com:3000/path/to/index.html#anchor',
query: [Object: null prototype] {}
}
We can see that the parser automatically assigned http
as the protocol and filled out the href
property correctly. The missing parts are not filled in, since they weren't supplied to begin with.
If we were to disable the normalization feature, while providing a non-normalized URL, the results would be off:
{
protocols: [],
protocol: 'file',
port: null,
resource: '',
user: '',
pathname: 'stackabuse.com:3000/path/to/index.html',
hash: 'anchor',
search: '',
href: 'stackabuse.com:3000/path/to/index.html#anchor',
query: [Object: null prototype] {}
}
Note: If you set normalize
to true
and supply an already-normalized URL, nothing really happens, and it's parsed correctly. Given this - you'll typically want to set the parameter to true.
Since parsedUrl
is an object, its properties can be changed. We can simply access any property and change it:
console.log(parsedUrl.port) // 3000
parsedUrl.port = 4000
console.log(parsedUrl.port) // 4000
However, this is not the desired behavior and shouldn't be done, since this module is used solely to parse the URLs. The only time you should alter the parsedUrl
object like this is when you are confident in the value of some property, otherwise, you might be shooting yourself in the leg.
Conclusion
We've seen parse-url
lets us pretty easily parse URLs without any additional processing, and makes the process of parsing URLs extremely simple and readable.
It splits everything up as desired and creates an parsedUrl
object that can be accessed just like any other object, as well as changed. The module is as simple as they come, with a neat output and syntax and as straightforward as possible, resulting in quick and precise results.