Several tools can drive the web browser the way a real user would do like navigating to different pages, interacting with the elements of the page and capturing some data. This process is called Web Browser Automation. What you can do with web browser automation is totally on your imaginations and needs.
Some of the common use cases of web browser automation could be:
- Automating the manual tests on a web application
- Automating the repetitive tasks like scrapping information from websites
- Filling up the HTML forms, doing some administrative jobs, etc
In this tutorial we'll explore one of the most popular web browser automation tools - Selenium. We'll learn about its features, the API and how we can use Selenium with Java to automate any website.
What is Selenium?
Selenium is a collection of tools that includes Selenium IDE, Selenium RC, and Selenium WebDriver.
Selenium IDE is purely a record playback tool that comes along as a Firefox plugin and Chrome extension. Selenium RC was the legacy tool that is now depreciated. Selenium WebDriver is the latest and widely used tool.
Note: The terms Selenium, Selenium WebDriver, or simply WebDriver, are used interchangeably to refer Selenium WebDriver.
It is important to note here that Selenium is built to interact with web components only. So if you encounter any desktop-based components like a Windows dialog, Selenium on its own cannot interact with them. There are other types of tools like AutoIt or Automa that can be integrated with Selenium for these purposes.
Why use Selenium?
Selenium is one of the most popular browser automation tools. It is not dependent on a particular programming language and you can use it with Java, Python, C#, Ruby, PHP, Perl, etc. You can also write your implementation for the language if it isn't already supported.
In this tutorial, we'll learn how to use the Java bindings of Selenium WebDriver. We'll also explore the WebDriver API.
Selenium's success can also be attributed to the fact that the WebDriver specifications have become the W3C recommendation for browsers.
Prerequisites:
- Java environment and your favourite Java IDE
- Selenium-java client
- Google Chrome Driver
WebDriver provides binding for all popular languages as described in the previous section. Since we are using the Java environment we need to download and include Java bindings in the build path. Also, nearly every popular browser provides a driver that can be used with Selenium to drive that browser.
In this tutorial, we'll drive Google Chrome.
WebDriver
Before moving forward, it is useful to understand a few concepts that spread confusion among beginners. WebDriver
is not a class, it is an interface.
All browser-dependent drivers like ChromeDriver
, FirefoxDriver
, InternetExplorerDriver
are Java classes that implement the WebDriver
interface. This information is important because if you want to run your program against a different browser you do not need to change a bunch of your code for it to work, you just need to swap out the WebDriver
for whichever browser you want.
First, let us specify the path to the browser driver. Next, we'll instantiate the "right driver" for that browser, ChromeDriver
in our case:
System.setProperty("webdriver.chrome.driver", "path/to/chromedriver");
WebDriver driver = new ChromeDriver();
As we can see the driver
holds a reference to the ChromeDriver
and therefore can be used to drive the browser. When the above statement executes you should see a new browser window open in your system. But the browser has not yet opened any website. We need to instruct the browser to do so.
Note: To use a different WebDriver
you need to specify the driver path in the file system and then instantiate it. For example, if you want to use IE then here is what you'd need to do:
System.setProperty("webdriver.ie.driver", "path/to/IEDriver");
WebDriver driver = new InternetExplorerDriver();
From here onwards the code will exactly be the same for all browsers. To keep our learnings focused, we'll automate stackabuse.com.
Navigating to the Website
As mentioned above, we first need to navigate to our target website. To do this, we simply send a GET request to the URL of the website:
driver.get("http://stackabuse.com");
WebElement
The first step in web browser automation is to locate the elements on the web page that we want to interact with, like a button, input, dropdown list, etc.
The Selenium representation of such HTML elements is the WebElement
. Like WebDriver
the WebElement
is also a Java interface. Once we get hold of a WebElement
we can perform any operation on them that an end user can do, like clicking, typing, selecting, etc.
It is obvious that attempting to perform invalid operations, like trying to enter text in a button element, will result in an exception.
We can use the HTML attributes of an element like id
, class
, and name
to locate an element. If there are no such attributes present we can use some advanced locating techniques like CSS Selectors and XPath.
To check the HTML attributes of any element we can open the website in our Chrome browser (other browsers also support this), right click over the element you want to select, and click Inspect Element. This should open the Developer Tools and display the HTML attributes of that element:
As we can see, the element has an <input>
tag and multiple attributes like id
, class
, etc.
WebDriver
supports 8 different locators to locate elements:
id
className
name
tagName
linkText
partialLinkText
cssSelector*
xpath
Let us explore all of them one by one by automating the different elements in our target website.
Locating Elements via id
If we inspect the newsletter input box of our target website we can find it has an id
attribute:
<input type="email" id="email" value name="email" class="required email input-lg" placeholder="Enter your email...">
We can locate this element by using the id
locator:
WebElement newsletterEmail = driver.findElement(By.id("email"));
Locating Elements via className
If we inspect the same input box we can see that it also has a class
attribute.
We can locate this element by using the className
locator:
WebElement newsletterEmail = driver.findElement(By.className("required email input-lg"));
Note: The locator name is className
, not class
. But the HTML attribute is class
.
Locating Elements via name
For this example, let's imagine a drop-down list, where a user should select their age range. The drop-down list has a name
attribute, which we can search for:
<select name="age">
<option value="Yet to born">Not Born</option>
<option value="Under 20">Under 20</option>
<option value="20 to 29">Under 30</option>
<option value="30 to 39">Under 40</option>
<option value="40 to 50">Under 50</option>
<option value="Over 50">Above 60</option>
<option value="Ghost">Not Defined</option>
</select>
We can locate this element by using the name
locator:
WebElement age = driver.findElement(By.name("age"));
Locating Elements via xpath
Sometimes though, these approaches are obsolete, as there are multiple elements with the same attribute:
<p>
<input name="gender" type="Radio" value="Female">Female<br>
<input name="gender" type="Radio" value="Male">Male<br>
<input name="gender" type="Radio" value="donotknow">Still Exploring
</p>
In this example we can see that all three input
elements have the same name
arttribute, "gener", but not all of them have the same value. Sometimes, the basic attributes like id
, class
, or name
are not unique, in which case we need a way to define exactly which element we'd like to fetch.
In these cases, we can use XPath locators. XPaths are very powerful locators and they are a complete topic on their own. The following example can give you an idea of how to construct an XPath for the above HTML snippets:
WebElement gender = driver.findElement(By.xpath("//input[@value='Female']"));
Locating Elements via cssSelector
Again, let's imagine a list of checkboxes where the user selects their preferred programming language:
<p>
<input name="language_java" type="Checkbox" value="java">Java<br>
<input name="language_python" type="Checkbox" value="python">Python<br>
<input name="language_c#" type="Checkbox" value="c#">C#<br>
<input name="language_c" type="Checkbox" value="c">C<br>
<input name="language_vbs" type="Checkbox" value="vbscript">Vbscript
</p>
Technically, for this HTML snippet, we can easily use the name
locator as they have distinct values. However, in this example, we'll use cssSelectors
to locate this element, which is used extensively in the front-end with libraries like jQuery.
The following example can give you an idea how to construct CSS selectors for the previous HTML snippet:
WebElement languageC = driver.findElement(By.cssSelector("input[value=c]"));
WebElement languageJava = driver.findElement(By.cssSelector("input[value=java]"));
Evidently, it's very similar to the XPath approach.
Locating Elements via linkText
If the element is a link i.e. has an <a>
tag, we can locate it by using its text. For example, the link "Stack Abuse":
<a href="/">Stack Abuse</a>
We can locate the link using its text:
WebElement homepageLink = driver.findElement(By.linkText("Stack Abuse"));
Locating Elements via partialLinkText
Say, we have a link with the text - "random-text-xyz-i-wont-change-random-digit-123". As previously shown, we can locate this element by using linkText
locator.
However, the WebDriver API has provided another method partialLinkText
. Sometimes a portion of the link text could be dynamic that gets changed every time you reload the page - for instance, "Order #XYZ123".
In these cases we can the partialLinkText
locator:
WebElement iWontChangeLink = driver.findElement(By.partialLinkText("i-wont-change"));
The code above will successfully select our link "random-text-xyz-i-wont-change-random-digit-123" since our selector contains a substring of the link.
Locating Elements via tagName
We can also locate an element by using its tag name e.g. <a>
, <div>
, <input>
, <select>
, etc. You should use this locator with caution. As there may be multiple elements with the same tag name and the command always returns the first matching element in the page:
WebElement tagNameElem = driver.findElement(By.tagName("select"));
This way of finding an element is usually more useful when you're calling the findElement
method on another element and not the entire HTML document. This narrows down your search and allows you to find elements using simple locators. This narrows down your search and allows you to find elements using simple locators.
Interacting with Elements
So far we have located the HTML elements on the page and we're able to get the corresponding WebElement
. However, we have not yet interacted with those elements like an end user would do - clicking, typing, selecting, etc. We'll explore some of these simple actions in the next few sections.
Clicking Elements
We perform click operation by using the click()
method. We can use this on any WebElement
if it's clickable. If not, it'll throw an exception.
In this case, let's click the homepageLink
:
homepageLink.click();
Since this actually peforms the click on the page, your web browser will then follow the link that was programmatically clicked.
Inputting Text
Let's enter some text into the newsletterEmail
input box:
newsletterEmail.sendkeys("[email protected]");
Selecting Radio Buttons
Since radio buttons are simply clicked, we use the click()
method to select one:
gender.click();
Selecting Checkboxes
The same goes for selecting checkboxes, though in this case, we can select multiple checkboxes. If we select another radio button, the previous one will be deselected:
languageC.click();
languageJava.click();
Selecting Items from a Dropdown
To select an item from the dropdown list we would need to do two things:
First, we need to instantiate Select
and pass it the element from the page:
Select select = new Select(age);
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
It is important to note here that Select
is a Java class that implements the ISelect
interface.
Next, we can select an item by using its:
Displayed Text:
select.selectByVisibleText("Under 30");
Value (the value
atribute):
select.selectByValue("20 to 30");
Index (starts with 0):
select.selectByIndex(2);
If the application supports multi-select we can call one or more of these methods multiple times to select different items.
To check if the application allows multiple selections we can run:
select.isMultiple();
There are lots of other useful operations that we can perform on the dropdown list:
- Getting the list of options:
java.util.List<WebElement> options = select.getOptions();
- Getting the list of selected options:
java.util.List<WebElement> options = select.getAllSelectedOptions();
- Getting the first selected option
java.util.List<WebElement> options = select.getFirstSelectedOption();
- Deselect all options
select.deselectAll();
- Deselect by displayed text:
select.deselectByVisibleText("Under 30");
- Deselect by value:
select.deselectByValue("20 to 30");
- Deselect by index:
select.deselectByIndex(2);
Note: We can also combine the two steps of finding the element and interacting with them into a single statement via chaining. For instance, we can find and click on the Submit button like this:
driver.findElement(By.id("submit_htmlform")).click();
We can also do this with Select
:
Select select = new Select(driver.findElement(By.name("age")));
Getting Attribute Values
To get the value of a particular attribute in an element:
driver.findElement(By.id("some-id")).getAttribute("class")
Setting Attribute Values
We can also set the value of a particular attribute in an element. It could be useful where we want to enable or disable any element:
driver.findElement(By.id("some-id")).setAttribute("class", "enabled")
Interacting with the Mouse and Keyboard
The WebDriver API has provided the Actions
class to interact with the mouse and the keyboard.
First, we need to instantiate Actions
and pass it the WebDriver
instance:
Actions builder = new Actions(driver);
Moving the Mouse
Sometimes we may need to hover over a menu item that makes the submenu item appear:
WebElement elem = driver.findElement(By.id("some-id"));
builder.moveToElement(elem).build().perform();
Drag and Drop
Dragging an element over another element:
WebElement sourceElement = driver.findElement(By.id("some-id"));
WebElement targetElement = driver.findElement(By.id("some-other-id"));
builder.dragAndDrop(sourceElement, targetElement).build().perform();
Dragging an element by some pixels (e.g. 200 px horizontal and 0px vertical):
WebElement elem = driver.findElement(By.id("some-id"));
builder.dragAndDropBy(elem, 200, 0).build().perform();
Pressing Keys
Hold a particular key while typing some text like the Shift
key:
WebElement elem = driver.findElement(By.id("some-id"));
builder.keyDown(Keys.SHIFT)
.sendKeys(elem,"some value")
.keyUp(Keys.SHIFT)
.build()
.perform();
Perform operations like Ctrl+a
, Ctrl+c
, Ctrl+v
, and TAB
:
// Select all and copy
builder.sendKeys(Keys.chord(Keys.CONTROL,"a"),Keys.chord(Keys.CONTROL,"c")).build().perform();
// Press the tab to focus on the next field
builder.sendKeys(Keys.TAB).build().perform();
// Paste in the next field
builder.sendKeys(Keys.chord(Keys.CONTROL,"v")).build().perform();
Interacting with the Browser
Getting the Page Source
Most likely, you'll use this for web scraping needs:
driver.getPageSource();
Getting the Page Title
driver.getPageTitle();
Maximizing the Browser
driver.manage().window().maximize();
Quitting the Driver
It is important to quit the driver at the end of the program:
driver.quit();
Note: WebDriver API also provides a close()
method and sometimes this confuses the beginners. The close()
method just closes the browser and can be reopened anytime. It doesn't destroy the WebDriver
object. The quit()
method is more appropriate when you no longer need the browser.
Taking Screenshots
First, we need to cast WebDriver
to TakesScreenshot
type which is an interface. Next, we can call getScreenshotAs()
and pass OutputType.FILE
.
Finally, we can copy the file into the local file system with the appropriate extensions like *.jpg, *.png, etc.
File fileScreenshot=((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
// Copy screenshot in local file system with *.png extension
FileUtils.copyFile(fileScreenshot, new File("path/MyScreenshot.png"));
Executing JavaScript
We can inject or execute any valid piece of JavaScript through Selenium WebDriver as well. This is very useful as it allows you to do many things that aren't built directly in to Selenium.
First, we need to cast WebDriver
to the type JavaScriptExecutor
:
JavaScriptExecutor js = (JavaScriptExecutor)driver;
There could be several use-cases to do with the JavaScriptExecutor
:
- Performing operations the natural way to do so if the WebDriver API failed - like a
click()
orsendKeys()
.
js.executeScript("driver.getElementById('some-id').click();");
We can also first find the element by using WebDriver locators and pass that element to the executeScript()
as the second argument. It is the more natural way to use JavaScriptExecutor
:
// First find the element by using any locator
WebElement element = driver.findElement(By.id("some-id"));
// Pass the element to js.executeScript() as the 2nd argument
js.executeScript("arguments[0].click();", element);
To set the value of an input field:
String value = "some value";
WebElement element = driver.findElement(By.id("some-id"));
js.executeScript("arguments[0].value=arguments[1];", element, value);
- Scrolling the page to the borttom:
js.executeScript("window.scrollTo(0, document.body.scrollHeight);");
- Scrolling the element to bring it to the viewport:
WebElement element = driver.findElement(By.id("some-id"));
// If the element is at the bottom pass true, otherwise false
js.executeScript("arguments[0].scrollIntoView(true);", element);
- Altering the page (adding or removing some attributes of an element):
WebElement element = driver.findElement(By.id("some-id"));
js.executeScript("arguments[0].setAttribute('myattr','myvalue')", element);
Accessing Cookies
Since many websites use cookies to store user state or other data, it may be useful for you to programmatically access it using Selenium. Some common cookie operations are outlined below.
Get all cookies:
driver.manage().getCookies();
Get a specific cookie:
driver.manage().getCookieNamed(targetCookie);
Add a cookie:
driver.manage().addCookie(mySavedCookie);
Delete a cookie:
driver.manage().deleteCookie(targetCookie);
Conclusion
We have covered all the major features of the Selenium WebDriver that we may need to use while automating a web browser. Selenium WebDriver has a very extensive API and covering everything is beyond the scope of this tutorial.
You may have noticed that Selenium WebDriver has lots of useful methods to simulate nearly all user interactions. Having said that, modern web applications are really smart. If they want to restrict their automated usage there are various ways to do so, like using captcha. Unfortunately, Selenium cannot bypass captcha. Please use this tool while keeping the Terms of Use of the target website in mind.