HomeArticles

Easily Parse URLs in JavaScript with parse-url

Introduction

Parsing URLs is a common task to perform during web development, and also one that seems to be simple but can get complex. It's not often that you come across a module that makes parsing so easy that you don't have to think a lot yourself.

Despite being relatively young (published in June 2021), with almost 3 million weekly downloads, parse-url is one of the top modules for parsing URLs in JavaScript.

In this article, we'll be taking a look at all of its possibilities and how it makes our lives easier.

Parts of an URL

A URL has many parts, so breaking it up in bits and pieces without pure string parsing is a powerful tool to have. Every URL has the same main parts, with other parts being optional depending on the query or action.

The constituent elements of an URL are:

Scheme - used to identify the protocol being used to fetch the resource on the Internet
- Some of the more popular protocols are: HTTP, HTTPS, IP, ICMP, TCP, UDP, etc.
Host - the name of the host that has the resource we're getting (www.somehost.com)
Path - the path to the resource located on the host (www.somehost.com/path/to/index.html)
Query string - string containing key-value pairs (www.somehost.com/index?key=value&key2=value2)

These are the main chunks of the URL, but we'll see that we can retrieve even more with parse-url, in a very readable and again, parsable format.

Installing and Setting up the parse-url Module

We start by creating a folder for our mini project called parse_url. In the folder, we can install the module using npm:

$ npm i parse-url

To use the module in our code (in the index.js file), we must require it:

const parseUrl = require('parse-url');

That's it, we're good to go! Let's see what this module offers.

Parsing the URL

To start, let's take a simple URL: https://www.stackabuse.com. The constructor for parseUrl takes in two parameters, string_url and normalize, with normalize being optional.

By default, it is set to false and it's assumed that the URLs being supplied are already normalized. When true, it transforms a non-normalized URL to a normalized one. For example:

someRandomUrl.com:80 --> http://someRandomUrl.com

This is called URL normalization. The parse-url module bases its normalization on the normalize-url module and the normalize-url module works exactly as shown above.

Let's parse a URL:

const url = 'https://www.stackabuse.com/';
const parsedUrl = parseUrl(url);

console.log(parsedUrl)

The output of the code will be in JSON format, which consists of the elements of that URL:

{
  protocols: [ 'https' ],
  protocol: 'https',
  port: null,
  resource: 'www.stackabuse.com',
  user: '',
  pathname: '',
  hash: '',
  search: '',
  href: 'https://www.stackabuse.com',
  query: [Object: null prototype] {}
}

As you can see there is a lot of things that were extracted, although, some are empty since the URL we've provided is pretty bare. Let's take a look at the elements in this JSON:

protocols - list of protocols used in the URL (can be more the one)
protocol - first of protocols
port - a port (if supplied)
resource - the host
user - user at the host's server (user@host)
pathname - path to resource
hash - if supplied, info after the # (hash) - usually anchors on a web page
search - a query string
href - the full URL

An interesting example is found using GitHub links, which were one of the reasons this module was created in the first place. GitHub links can get pretty complex and convoluted compared to other URLs you see on a daily basis, and can include multiple protocols and users:

const url = 'git+ssh://[email protected]/path/to/resource.git';
const parsedUrl = parseUrl(url);

console.log(parsedUrl)

This results in:

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

{
  protocols: [ 'git', 'ssh' ],
  protocol: 'git',
  port: null,
  resource: 'somehost.com',
  user: 'git',
  pathname: '/path/to/resource.git',
  hash: '',
  search: '',
  href: 'git+ssh://[email protected]/path/to/resource.git',
  query: [Object: null prototype] {}
}

The list protocol here has changed, since there are multiple protocols in use. However, the first one is referred to when printing out the URL info. We can also see pathname is now filled with the path to the resource.

One of the selling points of parse-url is the fact that it works so well with Git URLs.

Let's really up the URL and include a hash and a couple of key-value queries:

const url = 'git+ssh://[email protected]:30/path/to/resource.git?key1=value1&key2=value2#anchor';
const parsedUrl = parseUrl(url);

console.log(parsedUrl)

This example differs from the previous one just a little bit, just enough to fill out the empty values in the previous example. The output will be:

{
  protocols: [ 'git', 'ssh' ],
  protocol: 'git',
  port: 30,
  resource: 'somehost.com',
  user: 'git',
  pathname: '/path/to/resource.git',
  hash: 'anchor',
  search: 'key1=value1&key2=value2',
  href: 'git+ssh://[email protected]:30/path/to/resource.git?key1=value1&key2=value2#anchor',
  query: [Object: null prototype] { key1: 'value1', key2: 'value2' }
}

The port, hash and query are present now - and we've even got the keys and values for the query! Having the parsed data be structured in a human-readable format, that's also universally accepted and easily parsable is a really helping hand when parsing URLs.

Though, this is only the pretty-printed output of the returned object. What allows us to really work with these parsed elements is the fact that they're all fields of the returned object, which we can easily access:

console.log("The protocols used in the URL are " + parsedUrl.protocols);
console.log("The port used in the URL is " + parsedUrl.port);
console.log("The resource in the URL is " + parsedUrl.resource);
console.log("The user in the URL is " + parsedUrl.user);
console.log("The pathname in the URL is " + parsedUrl.pathname);
console.log("The hash in the URL is " + parsedUrl.hash);
console.log("The search part in the URL is " + parsedUrl.search);
console.log("Full URL is " + parsedUrl.href);

Running this code results in:

The protocols used in the URL are git,ssh
The port used in the URL is 30
The resource in the URL is somehost.com
The user in the URL is git
The pathname in the URL is /path/to/resource.git
The hash in the URL is anchor
The search part in the URL is key1=value1&key2=value2
Full URL is git+ssh://[email protected]:30/path/to/resource.git?key1=value1&key2=value2#anchor

Finally, let's see the results of URL normalization. If we pass an unnormalized URL, such as stackabuse.com:3000/path/to/index.html#anchor, as a URL string:

const url = 'stackabuse.com:3000/path/to/index.html#anchor';
const parsedUrl = parseUrl(url, true);
console.log(parsedUrl);

This results in:

{
  protocols: [ 'http' ],
  protocol: 'http',
  port: 3000,
  resource: 'stackabuse.com',
  user: '',
  pathname: '/path/to/index.html',
  hash: 'anchor',
  search: '',
  href: 'http://stackabuse.com:3000/path/to/index.html#anchor',
  query: [Object: null prototype] {}
}

We can see that the parser automatically assigned http as the protocol and filled out the href property correctly. The missing parts are not filled in, since they weren't supplied to begin with.

If we were to disable the normalization feature, while providing a non-normalized URL, the results would be off:

{
  protocols: [],
  protocol: 'file',
  port: null,
  resource: '',
  user: '',
  pathname: 'stackabuse.com:3000/path/to/index.html',
  hash: 'anchor',
  search: '',
  href: 'stackabuse.com:3000/path/to/index.html#anchor',
  query: [Object: null prototype] {}
}

Note: If you set normalize to true and supply an already-normalized URL, nothing really happens, and it's parsed correctly. Given this - you'll typically want to set the parameter to true.

Since parsedUrl is an object, its properties can be changed. We can simply access any property and change it:

console.log(parsedUrl.port) // 3000
parsedUrl.port = 4000
console.log(parsedUrl.port) // 4000

However, this is not the desired behavior and shouldn't be done, since this module is used solely to parse the URLs. The only time you should alter the parsedUrl object like this is when you are confident in the value of some property, otherwise, you might be shooting yourself in the leg.

Conclusion

We've seen parse-url lets us pretty easily parse URLs without any additional processing, and makes the process of parsing URLs extremely simple and readable.

It splits everything up as desired and creates an parsedUrl object that can be accessed just like any other object, as well as changed. The module is as simple as they come, with a neat output and syntax and as straightforward as possible, resulting in quick and precise results.

# javascript # node # npm

Last Updated: October 21st, 2021

Was this article helpful?