Getting Started with Python PyAutoGUI

Introduction

In this tutorial, we're going to learn how to use pyautogui library in Python 3. The PyAutoGUI library provides cross-platform support for managing mouse and keyboard operations through code to enable automation of tasks. The pyautogui library is also available for Python 2; however, we will be using Python 3 throughout the course of this tutorial.

A tool like this has many applications, a few of which include taking screen-shots, automating GUI testing (like Selenium), automating tasks that can only be done with a GUI, etc.

Before you go ahead with this tutorial, please note that there are a few prerequisites. You should have a basic understanding of Python's syntax, and/or have done at least beginner level programming in some other language. Other than that, the tutorial is quite simple and easy to follow for beginners.

Installation

The installation process for PyAutoGUI is fairly simple for all Operating Systems. However, there are a few dependencies for Mac and Linux that need to be installed before the PyAutoGUI library can be installed and used in programs.

Windows

For Windows, PyAutoGUI has no dependencies. Simply run the following command in your command prompt and the installation will be done.

$ pip install PyAutoGUI

Mac

For Mac, pyobjc-core and pyobjc modules are needed to be installed in sequence first. Below are the commands that you need to run in sequence in your terminal for successful installation:

$ pip3 install pyobjc-core
$ pip3 install pyobjc
$ pip3 install pyautogui

Linux

For Linux, the only dependency is python3-xlib (for Python 3). To install that, followed by pyautogui, run the two commands mentioned below in your terminal:

$ pip3 install python3-xlib
$ pip3 install pyautogui

Basic Code Examples

In this section, we are going to cover some of the most commonly used functions from the PyAutoGUI library.

Generic Functions

The position() Function

Before we can use PyAutoGUI functions, we need to import it into our program:

import pyautogui as pag

This position() function tells us the current position of the mouse on our screen:

pag.position()

Output:

Point (x = 643, y = 329)

The onScreen() Function

The onScreen() function tells us whether the point with coordinates x and y exists on the screen:

print(pag.onScreen(500, 600))
print(pag.onScreen(0, 10000))

Output:

True
False

Here we can see that the first point exists on the screen, but the second point falls beyond the screen's dimensions.

The size() Function

The size() function finds the height and width (resolution) of a screen.

pag.size()

Output:

Size (width = 1440, height = 900)

Your output may be different and will depend on your screen's size.

Common Mouse Operations

In this section, we are going to cover PyAutoGUI functions for mouse manipulation, which includes both moving the position of the cursor as well as clicking buttons automatically through code.

The moveTo() Function

The syntax of the moveTo() function is as follows:

pag.moveTo(x_coordinate, y_coordinate)

The value of x_coordinate increases from left to right on the screen, and the value of y_coordinate increases from top to bottom. The value of both x_coordinate and y_coordinate at the top left corner of the screen is 0.

Look at the following script:

pag.moveTo(0, 0)
pag.PAUSE = 2
pag.moveTo(100, 500) #
pag.PAUSE = 2
pag.moveTo(500, 500)

In the code above, the main focus is the moveTo() function that moves the mouse cursor on the screen based on the coordinates we provide as parameters. The first parameter is the x-coordinate and the second parameter is the y-coordinate. It is important to note that these coordinates represent the absolute position of the cursor.

One more thing that has been introduced in the code above is the PAUSE property; it basically pauses the execution of the script for the given amount of time. The PAUSE property has been added in the above code so that you can see the function execution; otherwise, the functions would execute in a split second and you won't be able to actually see the cursor moving from one location to the other on the screen.

Another workaround for this would be to indicate the time for each moveTo() operation as the third parameter in the function, e.g. moveTo(x, y, time_in_seconds).

Executing the above script may result in the following error:

Note: Possible Error

Traceback (most recent call last):
  File "a.py", line 5, in <module>
    pag.moveTo (100, 500)
  File "/anaconda3/lib/python3.6/site-packages/pyautogui/__init__.py", line 811, in moveTo
    _failSafeCheck()
  File "/anaconda3/lib/python3.6/site-packages/pyautogui/__init__.py", line 1241, in _failSafeCheck
    raise FailSafeException ('PyAutoGUI fail-safe triggered from mouse moving to a corner of the screen. To disable this fail-safe, set pyautogui.FAILSAFE to False. DISABLING FAIL-SAFE IS NOT RECOMMENDED.')
pyautogui.FailSafeException: PyAutoGUI fail-safe triggered from mouse moving to a corner of the screen. To disable this fail-safe, set pyautogui.FAILSAFE to False. DISABLING FAIL-SAFE IS NOT RECOMMENDED.

If the execution of the moveTo() function generates an error similar to the one shown above, it means that your computer's fail-safe is enabled. To disable the fail-safe, add the following line at the start of your code:

pag.FAILSAFE = False

This feature is enabled by default so that you can easily stop execution of your pyautogui program by manually moving the mouse to the upper left corner of the screen. Once the mouse is in this location, pyautogui will throw an exception and exit.

The moveRel() Function

The coordinates of the moveTo() function are absolute. However, if you want to move the mouse position relative to the current mouse position, you can use the moveRel() function.

What this means is that the reference point for this function, when moving the cursor, would not be the top left point on the screen (0, 0), but the current position of the mouse cursor. So, if your mouse cursor is currently at point (100, 100) on the screen and you call the moveRel() function with the parameters (100, 100, 2) the new position of your move cursor would be (200, 200).

You can use the moveRel() function as shown below:

pag.moveRel(100, 100, 2)

The above script will move the cursor 100 points to the right and 100 points down in 2 seconds, with respect to the current cursor position.

The click() Function

The click() function is used to imitate mouse click operations. The syntax for the click() function is as follows:

pag.click(x, y, clicks, interval, button)

The parameters are explained as follows:

  • x: the x-coordinate of the point to reach
  • y: the y-coordinate of the point to reach
  • clicks: the number of clicks that you would like to do when the cursor gets to that point on screen
  • interval: the amount of time in seconds between each mouse click i.e. if you are doing multiple mouse clicks
  • button: specify which button on the mouse you would like to press when the cursor gets to that point on screen. The possible values are right, left, and middle.

Here is an example:

pag.click(100, 100, 5, 2, 'right')
Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

You can also execute specific click functions as follows:

pag.rightClick(x, y)
pag.doubleClick(x, y)
pag.tripleClick(x, y)
pag.middleClick(x, y)

Here the x and y represent the x and y coordinates, just like in the previous functions.

You can also have more fine-grained control over mouse clicks by specifying when to press the mouse down, and when to release it up. This is done using the mouseDown and mouseUp functions, respectively.

Here is a short example:

pag.mouseDown(x=x, y=y, button='left')
pag.mouseUp(x=x, y=y, button='left')

The above code is equivalent to just doing a pag.click(x, y) call.

The scroll() Function

The last mouse function we are going to cover is scroll. As expected, it has two options: scroll up and scroll down. The syntax for the scroll() function is as follows:

pag.scroll(amount_to_scroll, x=x_movement, y=y_movement)

To scroll up, specify a positive value for amount_to_scroll parameter, and to scroll down, specify a negative value. Here is an example:

pag.scroll(100, 120, 120)

Alright, this was it for the mouse functions. By now, you should be able to control your mouse's buttons as well as movements through code. Let's now move to keyboard functions. There are plenty, but we will cover only those that are most frequently used.

Common Keyboard Operations

Before we move to the functions, it is important that we know which keys can be pressed through code in pyautogui, as well as their exact naming convention. To do so, run the following script:

print(pag.KEYBOARD_KEYS)

Output:

['\t', '\n', '\r', ' ', '!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '@', '[', '\\', ']', '^', '_', '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '{', '|', '}', '~', 'accept', 'add', 'alt', 'altleft', 'altright', 'apps', 'backspace', 'browserback', 'browserfavorites', 'browserforward', 'browserhome', 'browserrefresh', 'browsersearch', 'browserstop', 'capslock', 'clear', 'convert', 'ctrl', 'ctrlleft', 'ctrlright', 'decimal', 'del', 'delete', 'divide', 'down', 'end', 'enter', 'esc', 'escape', 'execute', 'f1', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18', 'f19', 'f2', 'f20', 'f21', 'f22', 'f23', 'f24', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'final', 'fn', 'hanguel', 'hangul', 'hanja', 'help', 'home', 'insert', 'junja', 'kana', 'kanji', 'launchapp1', 'launchapp2', 'launchmail', 'launchmediaselect', 'left', 'modechange', 'multiply', 'nexttrack', 'nonconvert', 'num0', 'num1', 'num2', 'num3', 'num4', 'num5', 'num6', 'num7', 'num8', 'num9', 'numlock', 'pagedown', 'pageup', 'pause', 'pgdn', 'pgup', 'playpause', 'prevtrack', 'print', 'printscreen', 'prntscrn', 'prtsc', 'prtscr', 'return', 'right', 'scrolllock', 'select', 'separator', 'shift', 'shiftleft', 'shiftright', 'sleep', 'space', 'stop', 'subtract', 'tab', 'up', 'volumedown', 'volumemute', 'volumeup', 'win', 'winleft', 'winright', 'yen', 'command', 'option', 'optionleft', 'optionright']

The typewrite() Function

The typewrite() function is used to type something in a text field. Syntax for the function is as follows:

pag.typewrite(text, interval)

Here text is what you wish to type in the field and interval is time in seconds between each keystroke. Here is an example:

pag.typewrite('Junaid Khalid', 1)

Executing the script above will enter the text Junaid Khalid in the field that is currently selected with a pause of 1 second between each key press.

Another way this function can be used is by passing in a list of keys that you'd like to press in a sequence. To do that through code, see the example below:

pag.typewrite(['j', 'u', 'n', 'a', 'i', 'd', 'e', 'backspace', 'enter'])

In the above example, the text junaide would be entered, followed by the removal of the trailing e. The input in the text field will be submitted by pressing the Enter key.

The hotkey() Function

If you haven't noticed this so far, the keys we've shown above have no mention for combined operations like Control + C for the copy command. In case you're thinking you could do that by passing the list ['ctrl', 'c'] to the typewrite() function, you are wrong. The typewrite() function would press both those buttons in a sequence, not simultaneously. And as you probably already know, to execute the copy command, you need to press the C key while holding the ctrl key.

To press two or more keys simultaneously, you can use the hotkey() function, as shown here:

pag.hotkey('shift', 'enter')
pag.hotkey('ctrl', '2' ) # For the @ symbol
pag.hotkey('ctrl', 'c')  # For the copy command

The screenshot() Function

If you would like to take a screen-shot of the screen at any instance, the screenshot() function is the one you are looking for. Let's see how we can implement that using PyAutoGUI:

scree_shot = pag.screenshot() # to store a PIL object containing the image in a variable

This will store a PIL object containing the image in a variable.

If, however, you want to store the screen-shot directly to your computer, you can call the screenshot function like this instead:

pag.screenshot('ss.png')

This will save the screen-shot in a file, with the filename given, on your computer.

The confirm(), alert(), and prompt() Functions

The last set of functions that we are going to cover in this tutorial are the message box functions. Here is a list of the message box functions available in PyAutoGUI:

  1. Confirmation Box: Displays information and gives you two options i.e. OK and Cancel
  2. Alert Box: Displays some information and to acknowledge that you have read it. It displays a single button i.e. OK
  3. Prompt Box: Requests some information from the user, and upon entering, the user has to click the OK button

Now that we have seen the types, let's see how we can display these buttons on the screen in the same sequence as above:

pag.confirm("Are you ready?")
pag.alert("The program has crashed!")
pag.prompt("Please enter your name: ")

In the output, you will see the following sequence of message boxes.

Confirm:

Alert:

Prompt:

Conclusion

In this tutorial, we learned how to use the PyAutoGUI automation library in Python. We started off by talking about prerequisites for this tutorial, its installation process for different operating systems, followed by learning about some of its general functions. After that we studied the functions specific to mouse movements, mouse control, and keyboard control.

After following this tutorial, you should be able to use PyAutoGUI to automate GUI operations for repetitive tasks in your own application.

Last Updated: August 29th, 2023
Was this article helpful?

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms