Several tools can drive the web browser the way a real user would do like navigating to different pages, interacting with the elements of the page and capturing some data. This process is called Web Browser Automation. What you can do with web browser automation is totally on your imaginations and needs.
Some of the common use cases of web browser automation could be:
- Automating the manual tests on a web application
- Automating the repetitive tasks like scrapping information from websites
- Filling up the HTML forms, doing some administrative jobs, etc
In this tutorial we'll explore one of the most popular web browser automation tools - Selenium. We'll learn about its features, the API and how we can use it with Java to automate any website.
What is Selenium?
Selenium is a collection of tools that includes Selenium IDE, Selenium RC, and Selenium WebDriver.
Selenium IDE is purely a record playback tool that comes along as a Firefox plugin and Chrome extension. Selenium RC was the legacy tool that is now depreciated. Selenium WebDriver is the latest and widely used tool.
Note: The terms Selenium, Selenium WebDriver, or simply WebDriver, are used interchangeably to refer Selenium WebDriver.
It is important to note here that Selenium is built to interact with web components only. So if you encounter any desktop-based components like a Windows dialog, Selenium on its own cannot interact with them. There are other types of tools like AutoIt or Automa that can be integrated with Selenium for these purposes.
Why use Selenium?
Selenium is one of the most popular browser automation tools. It is not dependent on a particular programming language and supports Java, Python, C#, Ruby, PHP, Perl, etc. You can also write your implementation for the language if it isn't already supported.
In this tutorial, we'll learn how to use the Java bindings of Selenium WebDriver. We'll also explore the WebDriver API.
Selenium's success can also be attributed to the fact that the WebDriver specifications have become the W3C recommendation for browsers.
WebDriver provides binding for all popular languages as described in the previous section. Since we are using the Java environment we need to download and include Java bindings in the build path. Also, nearly every popular browser provides a driver that can be used with Selenium to drive that browser.
In this tutorial, we'll drive Google Chrome.
Before moving forward it is useful to understand a few concepts that spread confusion among beginners.
WebDriver is not a class, it is an interface.
All browser-dependent drivers like
InternetExplorerDriver are Java classes that implement the
WebDriver interface. This information is important because if you want to run your program against a different browser you do not need to change a bunch of your code for it to work, you just need to swap out the
WebDriver for whichever browser you want.
First, let us specify the path to the browser driver. Next, we'll instantiate the "right driver" for that browser,
ChromeDriver in our case:
System.setProperty("webdriver.chrome.driver", "path/to/chromedriver"); WebDriver driver = new ChromeDriver();
As we can see the
driver holds a reference to the
ChromeDriver and therefore can be used to drive the browser. When the above statement executes you should see a new browser window open in your system. But the browser has not yet opened any website. We need to instruct the browser to do so.
Note: To use a different
WebDriver you need to specify the driver path in the file system and then instantiate it. For example, if you want to use IE then here is what you'd need to do:
System.setProperty("webdriver.ie.driver", "path/to/IEDriver"); WebDriver driver = new InternetExplorerDriver();
From here onwards the code will exactly be the same for all browsers. To keep our learnings focused, we'll automate stackabuse.com.
Navigating to the Website
As mentioned above, we first need to navigate to our target website. To do this, we simply send a GET request to the URL of the website:
The first step in web browser automation is to locate the elements on the web page that we want to interact with, like a button, input, dropdown list, etc.
The Selenium representation of such HTML elements is the
WebElement is also a Java interface. Once we get hold of a
WebElement we can perform any operation on them that an end user can do, like clicking, typing, selecting, etc.
It is obvious that attempting to perform invalid operations, like trying to enter text in a button element, will result in an exception.
We can use the HTML attributes of an element like
name to locate an element. If there are no such attributes present we can use some advanced locating techniques like CSS Selectors and XPath.
To check the HTML attributes of any element we can open the website in our Chrome browser (other browsers also support this), right click over the element you want to select, and click Inspect Element. This should open the Developer Tools and display the HTML attributes of that element:
As we can see, the element has an
<input> tag and multiple attributes like
WebDriver supports 8 different locators to locate elements:
Let us explore all of them one by one by automating the different elements in our target website.
Locating Elements via id
If we inspect the newsletter input box of our target website we can find it has an
<input type="email" id="email" value name="email" class="required email input-lg" placeholder="Enter your email...">
We can locate this element by using the
WebElement newsletterEmail = driver.findElement(By.id("email"));
Locating Elements via className
If we inspect the same input box we can see that it also has a
We can locate this element by using the
WebElement newsletterEmail = driver.findElement(By.className("required email input-lg"));
Note: The locator name is
class. But the HTML attribute is
Locating Elements via name
For this example, let's imagine a drop-down list, where a user should select their age range. The drop-down list has a
name attribute, which we can search for:
<select name="age"> <option value="Yet to born">Not Born</option> <option value="Under 20">Under 20</option> <option value="20 to 29">Under 30</option> <option value="30 to 39">Under 40</option> <option value="40 to 50">Under 50</option> <option value="Over 50">Above 60</option> <option value="Ghost">Not Defined</option> </select>
We can locate this element by using the
WebElement age = driver.findElement(By.name("age"));
Locating Elements via xpath
Sometimes though, these approaches are obsolete, as there are multiple elements with the same attribute:
<p> <input name="gender" type="Radio" value="Female">Female<br> <input name="gender" type="Radio" value="Male">Male<br> <input name="gender" type="Radio" value="donotknow">Still Exploring </p>
In this example we can see that all three
input elements have the same
name arttribute, "gener", but not all of them have the same value. Sometimes, the basic attributes like
name are not unique, in which case we need a way to define exactly which element we'd like to fetch.
In these cases, we can use XPath locators. XPaths are very powerful locators and they are a complete topic on their own. The following example can give you an idea of how to construct an XPath for the above HTML snippets:
WebElement gender = driver.findElement(By.xpath("//input[@value='Female']"));
Locating Elements via cssSelector
Again, let's imagine a list of checkboxes where the user selects their preferred programming language:
<p> <input name="language_java" type="Checkbox" value="java">Java<br> <input name="language_python" type="Checkbox" value="python">Python<br> <input name="language_c#" type="Checkbox" value="c#">C#<br> <input name="language_c" type="Checkbox" value="c">C<br> <input name="language_vbs" type="Checkbox" value="vbscript">Vbscript </p>
Technically, for this HTML snippet, we can easily use the
name locator as they have distinct values. However, in this example, we'll use
cssSelectors to locate this element, which is used extensively in the front-end with libraries like jQuery.
The following example can give you an idea how to construct CSS selectors for the previous HTML snippet:
WebElement languageC = driver.findElement(By.cssSelector("input[value=c]")); WebElement languageJava = driver.findElement(By.cssSelector("input[value=java]"));
Evidently, it's very similar to the XPath approach.
Locating Elements via linkText
If the element is a link i.e. has an
<a> tag, we can locate it by using its text. For example, the link "Stack Abuse":
<a href="/">Stack Abuse</a>
We can locate the link using its text:
WebElement homepageLink = driver.findElement(By.linkText("Stack Abuse"));
Locating Elements via partialLinkText
Say, we have a link with the text - "random-text-xyz-i-wont-change-random-digit-123". As previously shown, we can locate this element by using
However, the WebDriver API has provided another method
partialLinkText. Sometimes a portion of the link text could be dynamic that gets changed every time you reload the page - for instance, "Order #XYZ123".
In these cases we can the
WebElement iWontChangeLink = driver.findElement(By.partialLinkText("i-wont-change"));
The code above will successfully select our link "random-text-xyz-i-wont-change-random-digit-123" since our selector contains a substring of the link.
Locating Elements via tagName
We can also locate an element by using its tag name e.g.
<select>, etc. You should use this locator with caution. As there may be multiple elements with the same tag name and the command always returns the first matching element in the page:
WebElement tagNameElem = driver.findElement(By.tagName("select"));
This way of finding an element is usually more useful when you're calling the
findElement method on another element and not the entire HTML document. This narrows down your search and allows you to find elements using simple locators.
Interacting with Elements
So far we have located the HTML elements on the page and we're able to get the corresponding
WebElement. However, we have not yet interacted with those elements like an end user would do - clicking, typing, selecting, etc. We'll explore some of these simple actions in the next few sections.
We perform click operation by using the
click() method. We can use this on any
WebElement if it's clickable. If not, it'll throw an exception.
In this case, let's click the
Since this actually peforms the click on the page, your web browser will then follow the link that was programmatically clicked.
Let's enter some text into the
newsletterEmail input box:
Selecting Radio Buttons
Since radio buttons are simply clicked, we use the
click() method to select one:
The same goes for selecting checkboxes, though in this case, we can select multiple checkboxes. If we select another radio button, the previous one will be deselected:
Selecting Items from a Dropdown
To select an item from the dropdown list we would need to do two things:
Free eBook: Git Essentials
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
First, we need to instantiate
Select and pass it the element from the page:
Select select = new Select(age);
It is important to note here that
Select is a Java class that implements the
Next, we can select an item by using its:
select.selectByValue("20 to 30");
Index (starts with 0):
If the application supports multi-select we can call one or more of these methods multiple times to select different items.
To check if the application allows multiple selections we can run:
There are lots of other useful operations that we can perform on the dropdown list:
- Getting the list of options:
java.util.List<WebElement> options = select.getOptions();
- Getting the list of selected options:
java.util.List<WebElement> options = select.getAllSelectedOptions();
- Getting the first selected option
java.util.List<WebElement> options = select.getFirstSelectedOption();
- Deselect all options
- Deselect by displayed text:
- Deselect by value:
select.deselectByValue("20 to 30");
- Deselect by index:
Note: We can also combine the two steps of finding the element and interacting with them into a single statement via chaining. For instance, we can find and click on the Submit button like this:
We can also do this with
Select select = new Select(driver.findElement(By.name("age")));
Getting Attribute Values
To get the value of a particular attribute in an element:
Setting Attribute Values
We can also set the value of a particular attribute in an element. It could be useful where we want to enable or disable any element:
Interacting with the Mouse and Keyboard
The WebDriver API has provided the
Actions class to interact with the mouse and the keyboard.
First, we need to instantiate
Actions and pass it the
Actions builder = new Actions(driver);
Moving the Mouse
Sometimes we may need to hover over a menu item that makes the submenu item appear:
WebElement elem = driver.findElement(By.id("some-id")); builder.moveToElement(elem).build().perform();
Drag and Drop
Dragging an element over another element:
WebElement sourceElement = driver.findElement(By.id("some-id")); WebElement targetElement = driver.findElement(By.id("some-other-id")); builder.dragAndDrop(sourceElement, targetElement).build().perform();
Dragging an element by some pixels (e.g. 200 px horizontal and 0px vertical):
WebElement elem = driver.findElement(By.id("some-id")); builder.dragAndDropBy(elem, 200, 0).build().perform();
Hold a particular key while typing some text like the
WebElement elem = driver.findElement(By.id("some-id")); builder.keyDown(Keys.SHIFT) .sendKeys(elem,"some value") .keyUp(Keys.SHIFT) .build() .perform();
Perform operations like
// Select all and copy builder.sendKeys(Keys.chord(Keys.CONTROL,"a"),Keys.chord(Keys.CONTROL,"c")).build().perform(); // Press the tab to focus on the next field builder.sendKeys(Keys.TAB).build().perform(); // Paste in the next field builder.sendKeys(Keys.chord(Keys.CONTROL,"v")).build().perform();
Interacting with the Browser
Getting the Page Source
Most likely, you'll use this for web scraping needs:
Getting the Page Title
Maximizing the Browser
Quitting the Driver
It is important to quit the driver at the end of the program:
Note: WebDriver API also provides a
close() method and sometimes this confuses the beginners. The
close() method just closes the browser and can be reopened anytime. It doesn't destroy the
WebDriver object. The
quit() method is more appropriate when you no longer need the browser.
First, we need to cast
TakesScreenshot type which is an interface. Next, we can call
getScreenshotAs() and pass
Finally, we can copy the file into the local file system with the appropriate extensions like *.jpg, *.png, etc.
File fileScreenshot=((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE); // Copy screenshot in local file system with *.png extension FileUtils.copyFile(fileScreenshot, new File("path/MyScreenshot.png"));
First, we need to cast
WebDriver to the type
There could be several use-cases to do with the
- Performing operations the natural way to do so if the WebDriver API failed - like a
We can also first find the element by using WebDriver locators and pass that element to the
executeScript() as the second argument. It is the more natural way to use
// First find the element by using any locator WebElement element = driver.findElement(By.id("some-id")); // Pass the element to js.executeScript() as the 2nd argument js.executeScript("arguments.click();", element);
To set the value of an input field:
String value = "some value"; WebElement element = driver.findElement(By.id("some-id")); js.executeScript("arguments.value=arguments;", element, value);
- Scrolling the page to the borttom:
- Scrolling the element to bring it to the viewport:
WebElement element = driver.findElement(By.id("some-id")); // If the element is at the bottom pass true, otherwise false js.executeScript("arguments.scrollIntoView(true);", element);
- Altering the page (adding or removing some attributes of an element):
WebElement element = driver.findElement(By.id("some-id")); js.executeScript("arguments.setAttribute('myattr','myvalue')", element);
Get all cookies:
Get a specific cookie:
Add a cookie:
Delete a cookie:
We have covered all the major features of the Selenium WebDriver that we may need to use while automating a web browser. Selenium WebDriver has a very extensive API and covering everything is beyond the scope of this tutorial.