Getting Started with Selenium and Python

Introduction

Web Browser Automation is gaining popularity, and many frameworks/tools have arose to offer automation services to developers.

Web Browser Automation is often used for testing purposes in development and production environments, though it's also often used for web scraping data from public sources, analysis, and data processing.

Really, what you do with automation is up to you, though, just make sure that what you're doing is legal, as "bots" created with automation tools can often infringe laws or a site's terms of service.

Selenium is one of the widely used tools used for Web Browser Automation, and offers a lot of functionality and power over a browser.

It supports many languages such as C#, Java, Perl, PHP, and Ruby, though for the sake of this tutorial, we'll be using it with Python on Windows.

What is Selenium?

Selenium is a great tool that allows developers to simulate end-users with only a few lines of code. Using the tools it offers, it's very easy to use web pages and simulate a human, though it's hard to really replicate human behavior.

To combat "bots", which are meant to replicate humans, many sophisticated systems are used to recognize human-like behavior, which is border-line impossible to replicate using programming tools.

If you're building an application with Selenium, make sure that you adhere to all laws associated with Web Browser Automation, or simply use it for testing purposes in your own production environment.

Some of the most popular tasks accomplished with Selenium include, but are not limited to:

  • Clicking buttons
  • Inputting text
  • Extracting text
  • Accessing Cookies
  • Pressing keys

Prerequisites

Before we get started, we'll need to do a few things to get set up:

  • Install Google Chrome on your computer - we will be simulating a user on Google Chrome, although you could simulate other browsers, but for this article I'll be using Google Chrome.
  • Get chromedriver.exe, because in order to actually simulate the browser you'll need a path to this executable on your computer.
  • Install the selenium package using pip install selenium on the command line.

The Basics

Alright, now we're all set to start working with Selenium. The first thing you'll need to do is start the browser:

from selenium import webdriver

EXE_PATH = r'path\to\chromedriver.exe'
driver = webdriver.Chrome(executable_path=EXE_PATH)
driver.get('https://google.com')

Running this will open Google Chrome and navigate it to https://google.com.

Here, it's important to note that the connection to the web page is made with the get(URL) function of the driver object.

As you might have noticed, the driver is the Selenium WebDriver object, you use it to access the browser programmatically, for example:

print(driver.page_source)

The code above prints the source HTML code of the entire page. If you need to collect data, this is very useful.

Locating Elements

Usually, you don't need the contents of an entire page, but rather specific elements.

In order to do so, you'll first need to detect your target on the page, and for that you can use the Inspect Element tool in Google Chrome.

That means that if you need the contents of a certain element in the page, to get the tags ID you can do the following (in a regular session of Google Chome):

  • Right click on the element
  • Choose "Inspect"
  • And in the new window, you can take a look at the HTML of the element and the ID will be after id=.

Upon getting the elements we need, we can perform different kinds of operations on them.

Getting Elements by ID

If you have an exact ID of the element you're looking for, it's easy to retrieve it:

element = driver.find_element_by_id('element_id')

Getting Elements by Name

Similar to the previous approach:

element = driver.find_element_by_name('element_name')

Getting Elements by Class

And again, similar to the previous approach:

element = driver.find_element_by_class_name('element_class_name')

Getting Elements by HTML Tag

In some cases, you might want to get all elements by a certain tag:

links = driver.find_elements_by_tag_name('a')

In this case, links is populated with all a tags, which now contains each link in the page. This can be useful for web-crawling purposes.

Getting Elements by XPath

Not all elements have an ID, or maybe you don't want to access every a HTML tag. There are other ways to retrieve a very specific element, like XPath, which is another way to retrieve elements. With XPath, you can find elements more easily and efficiently:

tag_list = driver.find_elements_by_xpath("//tag[@attr='val']")

tag_list now contains each tag that has and attribute attr set to val:

<tag attr='val'>Foo</tag>

You can now iterate tag_list, and interact with each Selenium WebElement in it.

You can read more about the XPath system in Selenium here.

Selenium WebElement

A Selenium WebElement practically represents an HTML element. You can perform operations on these elements similar to how you'd do it as an end-user.

These operations include:

  • Accessing simple properties of the element, like the text inside (element.text)
  • Accessing parent elements, which are also of type WebElement (element.parent)
  • Accessing specific attributes, like the href of an a tag (element.get_attribute('href'))
  • Searching within it (the same way you'd search in driver)
  • Clicking it (element.click())
  • Inputting text if possible (element.send_keys(‘Input Text'))

Selenium WebDriver

WebDriver is similar to WebElement, however, the main difference is their scope. The latter's scope is the element itself, whereas the former's scope is the whole page.

You can do plenty of things with a Selenium WebDriver object as well, practically anything you could do as a human with a normal browser.

Some other very useful things are:

  • Executing JavaScript: driver.execute_script("script")
  • Saving a screenshot: driver.save_screenshot('image.png')
  • Initiate in "headless mode", where the browser saves time by not rendering the page:
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = True
driver = webdriver.Chrome(executable_path=EXE_PATH, chrome_options=options)
driver.set_window_size(1440, 900)

Note that the window size is set to (1440, 900), that is to prevent all sorts of bugs regarding some elements not loading properly because of the headless mode.

You could change the resolution to any other reasonably large resolution, but you have to make sure the resolution is changed from the defaults when going in headless mode.

Accessing Cookies

You might find yourself in need to add or remove browser cookies:

ck = {'some_attr': 'foo', 'some_other_attr': 'bar'}
driver.add_cookie(ck)

This adds a cookie to the browser, which can be helpful if you need to add authentication or preference cookies, for example. It's important to make sure that the cookie is in dict format.

It's also very easy to retrieve the cookies from the browser:

cookies = driver.get_cookies()
for ck in cookies:
    print(ck)

The code above prints each cookie in the browser.

Altering the HTML

Sometimes you might find yourself in need of changing a cerain element's property.

As mentioned before, you can use a Selenium WebDriver to execute JavaScript, and changing properties of elements just so happnes to be very easy to do with JavaScript:

driver.execute_script("arguments[0].setAttribute('attr','value')", element)

Here element is the element to alter, attr is the attribute to change and value is the new value.

Sometimes you might need to download a file from a website:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_experimental_option("prefs", {
    "download.default_directory": r"path\to\directory",
    "download.prompt_for_download": False,
    "download.directory_upgrade": True,
    "safebrowsing.enabled": True
})

driver = webdriver.Chrome(executable_path=EXE_PATH, chrome_options=options)

You can specify the path of the save location by defining download.default_directory, such as path\to\directory.

Pressing Keys

import time
from selenium.webdriver import ActionChains
from selenium.webdriver.common.keys import Keys

action = ActionChains(driver)
for _ in range(3):
    action.send_keys(Keys.ARROW_DOWN)
    time.sleep(.1)
action.perform()

This code presses the down arrow (Keys.ARROW_DOWN) 3 times. After each action.send_keys(Keys.ARROW_DOWN) the program waits a little bit. This is recommended to make sure all the keys register.

If we simply fired off several commands, they might get lost in the process and won't actually register.

Keys contains all of the keys on the keyboard, meaning that you can also use this method to tab (Keys.TAB) between elements on the page making it easier to interact with it (Keys.RETURN and Keys.SPACE are very important as well).

Clicking Buttons

Note that you can use key presses to navigate between elements in a page, for example you can use Tabs and Spaces to fill in checkboxes, and use the arrow keys to navigate between dropdown menu items.

Of course, a more natural way to select checkboxes and dropdown items would be to simply retrieve the element using the driver and click it:

checkbox = driver.find_element_by_id('checkbox')
checkbox.click()

Inputting Forms

You can also simulate key presses within elements themselves:

element.send_keys(Keys.CONTROL, 'a')
element.send_keys(value)

This way, the keys register inside the element, so that if you would like to fill in a textarea, you could do it like so.

By the way, this code uses a keyboard shortcut (CTRL + A) to select all text inside the element. The next line replaces the selected text with the value entered.

To register keyboard shortcuts, pass all of the desired keys in the parameters to send_keys.

Scrolling

Sometimes parts of the page load only after you scroll down (like an Instagram feed or any other infinite scrolling page). This can easily be done via executing a simple JavaScript script:

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

The code above uses a JavaScript command to scroll to the bottom of the page, now you can use driver.page_source again and get the new content.

Conclusion

Selenium is one of the widely used tools used for Web Browser Automation, and offers a lot of functionality and power over a human-controlled browser.

It's mainly used for production or integration environment testing/automatization, though it can also be used as a web scraper for research purposes, etc. Be sure to adhere to all laws when you practice web scraping of public content in order to not infringe on any laws.

Author image
About Ely Shaffir