Web Browser Automation with Selenium and Java

Introduction

Several tools can drive the web browser the way a real user would do like navigating to different pages, interacting with the elements of the page and capturing some data. This process is called Web Browser Automation. What you can do with web browser automation is totally on your imaginations and needs.

Some of the common use cases of web browser automation could be:

  • Automating the manual tests on a web application
  • Automating the repetitive tasks like scrapping information from websites
  • Filling up the HTML forms, doing some administrative jobs, etc

In this tutorial we'll explore one of the most popular web browser automation tools - Selenium. We'll learn about its features, the API and how we can use it with Java to automate any website.

What is Selenium?

Selenium is a collection of tools that includes Selenium IDE, Selenium RC, and Selenium WebDriver.

Selenium IDE is purely a record playback tool that comes along as a Firefox plugin and Chrome extension. Selenium RC was the legacy tool that is now depreciated. Selenium WebDriver is the latest and widely used tool.

Note: The terms Selenium, Selenium WebDriver, or simply WebDriver, are used interchangeably to refer Selenium WebDriver.

It is important to note here that Selenium is built to interact with web components only. So if you encounter any desktop-based components like a Windows dialog, Selenium on its own cannot interact with them. There are other types of tools like AutoIt or Automa that can be integrated with Selenium for these purposes.

Why use Selenium?

Selenium is one of the most popular browser automation tools. It is not dependent on a particular programming language and supports Java, Python, C#, Ruby, PHP, Perl, etc. You can also write your implementation for the language if it isn't already supported.

In this tutorial, we'll learn how to use the Java bindings of Selenium WebDriver. We'll also explore the WebDriver API.

Selenium's success can also be attributed to the fact that the WebDriver specifications have become the W3C recommendation for browsers.

Prerequisites:

WebDriver provides binding for all popular languages as described in the previous section. Since we are using the Java environment we need to download and include Java bindings in the build path. Also, nearly every popular browser provides a driver that can be used with Selenium to drive that browser.

In this tutorial, we'll drive Google Chrome.

WebDriver

Before moving forward it is useful to understand a few concepts that spread confusion among beginners. WebDriver is not a class, it is an interface.

All browser-dependent drivers like ChromeDriver, FirefoxDriver, InternetExplorerDriver are Java classes that implement the WebDriver interface. This information is important because if you want to run your program against a different browser you do not need to change a bunch of your code for it to work, you just need to swap out the WebDriver for whichever browser you want.

First, let us specify the path to the browser driver. Next, we'll instantiate the "right driver" for that browser, ChromeDriver in our case:

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver");
WebDriver driver = new ChromeDriver();

As we can see the driver holds a reference to the ChromeDriver and therefore can be used to drive the browser. When the above statement executes you should see a new browser window open in your system. But the browser has not yet opened any website. We need to instruct the browser to do so.

Note: To use a different WebDriver you need to specify the driver path in the file system and then instantiate it. For example, if you want to use IE then here is what you'd need to do:

System.setProperty("webdriver.ie.driver", "path/to/IEDriver");
WebDriver driver = new InternetExplorerDriver();

From here onwards the code will exactly be the same for all browsers. To keep our learnings focused, we'll automate stackabuse.com.

As mentioned above, we first need to navigate to our target website. To do this, we simply send a GET request to the URL of the website:

driver.get("http://stackabuse.com");

WebElement

The first step in web browser automation is to locate the elements on the web page that we want to interact with, like a button, input, dropdown list, etc.

The Selenium representation of such HTML elements is the WebElement. Like WebDriver the WebElement is also a Java interface. Once we get hold of a WebElement we can perform any operation on them that an end user can do, like clicking, typing, selecting, etc.

It is obvious that attempting to perform invalid operations, like trying to enter text in a button element, will result in an exception.

We can use the HTML attributes of an element like id, class, and name to locate an element. If there are no such attributes present we can use some advanced locating techniques like CSS Selectors and XPath.

To check the HTML attributes of any element we can open the website in our Chrome browser (other browsers also support this), right click over the element you want to select, and click Inspect Element. This should open the Developer Tools and display the HTML attributes of that element:

Inspecting an element

As we can see, the element has an <input> tag and multiple attributes like id, class, etc.

WebDriver supports 8 different locators to locate elements:

  • id
  • className
  • name
  • tagName
  • linkText
  • partialLinkText
  • cssSelector*
  • xpath

Let us explore all of them one by one by automating the different elements in our target website.

Locating Elements via id

If we inspect the newsletter input box of our target website we can find it has an id attribute:

<input type="email" id="email" value name="email" class="required email input-lg" placeholder="Enter your email...">

We can locate this element by using the id locator:

WebElement newsletterEmail = driver.findElement(By.id("email"));

Locating Elements via className

If we inspect the same input box we can see that it also has a class attribute.

We can locate this element by using the className locator:

WebElement newsletterEmail = driver.findElement(By.className("required email input-lg"));

Note: The locator name is className, not class. But the HTML attribute is class.

Locating Elements via name

For this example, let's imagine a drop-down list, where a user should select their age range. The drop-down list has a name attribute, which we can search for:

<select name="age">
    <option value="Yet to born">Not Born</option>
    <option value="Under 20">Under 20</option>
    <option value="20 to 29">Under 30</option>
    <option value="30 to 39">Under 40</option>
    <option value="40 to 50">Under 50</option>
    <option value="Over 50">Above 60</option>
    <option value="Ghost">Not Defined</option>
</select>

We can locate this element by using the name locator:

WebElement age = driver.findElement(By.name("age"));

Locating Elements via xpath

Sometimes though, these approaches are obsolete, as there are multiple elements with the same attribute:

<p>
    <input name="gender" type="Radio" value="Female">Female<br>
    <input name="gender" type="Radio" value="Male">Male<br>
    <input name="gender" type="Radio" value="donotknow">Still Exploring
</p>

In this example we can see that all three input elements have the same name arttribute, "gener", but not all of them have the same value. Sometimes, the basic attributes like id, class, or name are not unique, in which case we need a way to define exactly which element we'd like to fetch.

In these cases, we can use XPath locators. XPaths are very powerful locators and they are a complete topic on their own. The following example can give you an idea of how to construct an XPath for the above HTML snippets:

WebElement gender = driver.findElement(By.xpath("//input[@value='Female']"));

Locating Elements via cssSelector

Again, let's imagine a list of checkboxes where the user selects their preferred programming language:

<p>
    <input name="language_java" type="Checkbox" value="java">Java<br>
    <input name="language_python" type="Checkbox" value="python">Python<br>
    <input name="language_c#" type="Checkbox" value="c#">C#<br>
    <input name="language_c" type="Checkbox" value="c">C<br>
    <input name="language_vbs" type="Checkbox" value="vbscript">Vbscript
</p>

Technically, for this HTML snippet, we can easily use the name locator as they have distinct values. However, in this example, we'll use cssSelectors to locate this element, which is used extensively in the front-end with libraries like jQuery.

The following example can give you an idea how to construct CSS selectors for the previous HTML snippet:

WebElement languageC = driver.findElement(By.cssSelector("input[value=c]"));
WebElement languageJava = driver.findElement(By.cssSelector("input[value=java]"));

Evidently, it's very similar to the XPath approach.

Locating Elements via linkText

If the element is a link i.e. has an <a> tag, we can locate it by using its text. For example, the link "Stack Abuse":

Inspecting an element

<a href="/">Stack Abuse</a>

We can locate the link using its text:

WebElement homepageLink = driver.findElement(By.linkText("Stack Abuse"));

Locating Elements via partialLinkText

Say, we have a link with the text - "random-text-xyz-i-wont-change-random-digit-123". As previously shown, we can locate this element by using linkText locator.

However, the WebDriver API has provided another method partialLinkText. Sometimes a portion of the link text could be dynamic that gets changed every time you reload the page - for instance, "Order #XYZ123".

In these cases we can the partialLinkText locator:

WebElement iWontChangeLink = driver.findElement(By.partialLinkText("i-wont-change"));

The code above will successfully select our link "random-text-xyz-i-wont-change-random-digit-123" since our selector contains a substring of the link.

Locating Elements via tagName

We can also locate an element by using its tag name e.g. <a>, <div>, <input>, <select>, etc. You should use this locator with caution. As there may be multiple elements with the same tag name and the command always returns the first matching element in the page:

WebElement tagNameElem = driver.findElement(By.tagName("select"));

This way of finding an element is usually more useful when you're calling the findElement method on another element and not the entire HTML document. This narrows down your search and allows you to find elements using simple locators.

Interacting with Elements

So far we have located the HTML elements on the page and we're able to get the corresponding WebElement. However, we have not yet interacted with those elements like an end user would do - clicking, typing, selecting, etc. We'll explore some of these simple actions in the next few sections.

Clicking Elements

We perform click operation by using the click() method. We can use this on any WebElement if it's clickable. If not, it'll throw an exception.

In this case, let's click the homepageLink:

homepageLink.click();

Since this actually peforms the click on the page, your web browser will then follow the link that was programmatically clicked.

Inputting Text

Let's enter some text into the newsletterEmail input box:

newsletterEmail.sendkeys("[email protected]");

Selecting Radio Buttons

Since radio buttons are simply clicked, we use the click() method to select one:

gender.click();

Selecting Checkboxes

The same goes for selecting checkboxes, though in this case, we can select multiple checkboxes. If we select another radio button, the previous one will be deselected:

languageC.click();
languageJava.click();

Selecting Items from a Dropdown

To select an item from the dropdown list we would need to do two things:

First, we need to instantiate Select and pass it the element from the page:

Select select = new Select(age);

It is important to note here that Select is a Java class that implements the ISelect interface.

Next, we can select an item by using its:

Displayed Text:

select.selectByVisibleText("Under 30");

Value (the value atribute):

select.selectByValue("20 to 30");

Index (starts with 0):

select.selectByIndex(2);

If the application supports multi-select we can call one or more of these methods multiple times to select different items.

To check if the application allows multiple selections we can run:

select.isMultiple();

There are lots of other useful operations that we can perform on the dropdown list:

  • Getting the list of options:
java.util.List<WebElement> options = select.getOptions();
  • Getting the list of selected options:
java.util.List<WebElement> options = select.getAllSelectedOptions();
  • Getting the first selected option
java.util.List<WebElement> options = select.getFirstSelectedOption();
  • Deselect all options
select.deselectAll();
  • Deselect by displayed text:
select.deselectByVisibleText("Under 30");
  • Deselect by value:
select.deselectByValue("20 to 30");
  • Deselect by index:
select.deselectByIndex(2);

Note: We can also combine the two steps of finding the element and interacting with them into a single statement via chaining. For instance, we can find and click on the Submit button like this:

driver.findElement(By.id("submit_htmlform")).click();

We can also do this with Select:

Select select = new Select(driver.findElement(By.name("age")));

Getting Attribute Values

To get the value of a particular attribute in an element:

driver.findElement(By.id("some-id")).getAttribute("class")

Setting Attribute Values

We can also set the value of a particular attribute in an element. It could be useful where we want to enable or disable any element:

driver.findElement(By.id("some-id")).setAttribute("class", "enabled")

Interacting with the Mouse and Keyboard

The WebDriver API has provided the Actions class to interact with the mouse and the keyboard.

First, we need to instantiate Actions and pass it the WebDriver instance:

Actions builder = new Actions(driver);

Moving the Mouse

Sometimes we may need to hover over a menu item that makes the submenu item appear:

WebElement elem = driver.findElement(By.id("some-id"));
builder.moveToElement(elem).build().perform();

Drag and Drop

Dragging an element over another element:

WebElement sourceElement = driver.findElement(By.id("some-id"));
WebElement targetElement = driver.findElement(By.id("some-other-id"));
builder.dragAndDrop(sourceElement, targetElement).build().perform();

Dragging an element by some pixels (e.g. 200 px horizontal and 0px vertical):

WebElement elem = driver.findElement(By.id("some-id"));
builder.dragAndDropBy(elem, 200, 0).build().perform();

Pressing Keys

Hold a particular key while typing some text like the Shift key:

WebElement elem = driver.findElement(By.id("some-id"));
builder.keyDown(Keys.SHIFT)
    .sendKeys(elem,"some value")
    .keyUp(Keys.SHIFT)
    .build()
    .perform();

Perform operations like Ctrl+a, Ctrl+c, Ctrl+v, and TAB:

// Select all and copy
builder.sendKeys(Keys.chord(Keys.CONTROL,"a"),Keys.chord(Keys.CONTROL,"c")).build().perform();

// Press the tab to focus on the next field
builder.sendKeys(Keys.TAB).build().perform();

// Paste in the next field
builder.sendKeys(Keys.chord(Keys.CONTROL,"v")).build().perform();

Interacting with the Browser

Getting the Page Source

Most likely, you'll use this for web scraping needs:

driver.getPageSource();

Getting the Page Title

driver.getPageTitle();

Maximizing the Browser

driver.manage().window().maximize();

Quitting the Driver

It is important to quit the driver at the end of the program:

driver.quit();

Note: WebDriver API also provides a close() method and sometimes this confuses the beginners. The close() method just closes the browser and can be reopened anytime. It doesn't destroy the WebDriver object. The quit() method is more appropriate when you no longer need the browser.

Taking Screenshots

First, we need to cast WebDriver to TakesScreenshot type which is an interface. Next, we can call getScreenshotAs() and pass OutputType.FILE.

Finally, we can copy the file into the local file system with the appropriate extensions like *.jpg, *.png, etc.

File fileScreenshot=((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);

// Copy screenshot in local file system with *.png extension
FileUtils.copyFile(fileScreenshot, new File("path/MyScreenshot.png"));

Executing JavaScript

We can inject or execute any valid piece of JavaScript through Selenium WebDriver as well. This is very useful as it allows you to do many things that aren't built directly in to Selenium.

First, we need to cast WebDriver to the type JavaScriptExecutor:

JavaScriptExecutor js = (JavaScriptExecutor)driver;

There could be several use-cases to do with the JavaScriptExecutor:

  • Performing operations the natural way to do so if the WebDriver API failed - like a click() or sendKeys().
js.executeScript("driver.getElementById('some-id').click();");

We can also first find the element by using WebDriver locators and pass that element to the executeScript() as the second argument. It is the more natural way to use JavaScriptExecutor:

// First find the element by using any locator
WebElement element = driver.findElement(By.id("some-id"));

// Pass the element to js.executeScript() as the 2nd argument
js.executeScript("arguments[0].click();", element);

To set the value of an input field:

String value = "some value";
WebElement element = driver.findElement(By.id("some-id"));
js.executeScript("arguments[0].value=arguments[1];", element, value);
  • Scrolling the page to the borttom:
js.executeScript("window.scrollTo(0, document.body.scrollHeight);");
  • Scrolling the element to bring it to the viewport:
WebElement element = driver.findElement(By.id("some-id"));

// If the element is at the bottom pass true, otherwise false 
js.executeScript("arguments[0].scrollIntoView(true);", element);
  • Altering the page (adding or removing some attributes of an element):
WebElement element = driver.findElement(By.id("some-id"));
js.executeScript("arguments[0].setAttribute('myattr','myvalue')", element);

Accessing Cookies

Since many websites use cookies to store user state or other data, it may be useful for you to programmatically access it using Selenium. Some common cookie operations are outlined below.

Get all cookies:

driver.manage().getCookies();

Get a specific cookie:

driver.manage().getCookieNamed(targetCookie);

Add a cookie:

driver.manage().addCookie(mySavedCookie);

Delete a cookie:

driver.manage().deleteCookie(targetCookie);

Conclusion

We have covered all the major features of the Selenium WebDriver that we may need to use while automating a web browser. Selenium WebDriver has a very extensive API and covering everything is beyond the scope of this tutorial.

You may have noticed that Selenium WebDriver has lots of useful methods to simulate nearly all user interactions. Having said that, modern web applications are really smart. If they want to restrict their automated usage there are various ways to do so, like using captcha. Unfortunately, Selenium cannot bypass captcha. Please use this tool while keeping the Terms of Use of the target website in mind.

Author image
Ireland Twitter Website
A Software Engineer who is passionate about writing programming articles. Founder of CosmoCode.io - Free Coding tutorials