Introduction to Python Data Types
In this article, we'll be diving into the Basic Data Types in Python. These form some of the fundamental ways you can represent data.
One way to categorize these basic data types is in one of four groups:
- Numeric:
int
,float
and the less frequently encounteredcomplex
- Sequence:
str
(string),list
andtuple
- Boolean: (
True
orFalse
) - Dictionary:
dict
(dictionary) data type, consisting of(key, value)
pairs
It's important to point out that Python usually doesn't require you to specify what data type you are using and will assign a data type to your variable based on what it thinks you meant.
An equally important thing to point out is that Python is a "loosely/weakly typed" programming language, meaning that a variable can change its type over the course of the program's execution, which isn't the case with "strongly typed" programming languages (such as Java or C++).
So something that was an int
can end up being a str
easily, if you assign it a string value.
In our examples we will occasionally use a function called type(variable)
which returns, well, the type of the variable we passed to it.
We will also be using the Python shell so we don't have cumbersome code to print everything we want to show.
Numeric Data Types
These data types are fairly straight-forward and represent numeric values. These can be decimal values, floating point values or even complex numbers.
Integer Data Type - int
The int
data type deals with integers values. This means values like 0, 1, -2 and -15, and not numbers like 0.5, 1.01, -10.8, etc.
If you give Python the following code, it will conclude that a
is an integer and will assign the int
data type to it:
>>> x = 5
>>> type(x)
<class 'int'>
We could have been more specific and said something along these lines, to make sure Python understood our 5
as an integer, though, it'll automatically do this exact same thing under the hood:
>>> x = int(5)
>>> type(x)
<class 'int'>
It's worth noting that Python treats any sequence of numbers (without a prefix) as a decimal number. This sequence, in fact, isn't constrained.
That is to say, unlike in some other languages like Java, the value of the int
doesn't have a maximum value - it's unbounded.
The sys.maxsize
may sound counterintuitive then, since it implies that that's the maximum value of an integer, though, it isn't.
>>> x = sys.maxsize
>>> x
2147483647
This appears to be a 32-bit signed binary integer value, though, let's see what happens if we assign a higher number to x
:
>>> x = sys.maxsize
>>> x+1
2147483648
In fact, we can even go as far as:
>>> y = sys.maxsize + sys.maxsize
>>> y
4294967294
The only real limit to how big an integer can be is the memory of the machine you're running Python on.
Prefixing Integers
What happens when you'd like to pack a numeric value in a different form? You can prefix a sequence of numbers and tell Python to treat them in a different system.
More specifically, the prefixes:
0b
or0B
- Will turn your integer into Binary0o
or0O
- Will turn your integer into Octal0x
or0X
- Will turn your integer into Hexadecimal
So, let's try these out:
# Decimal value of 5
>>> x = 5
>>> x
5
# Binary value of 1
>>> x = 0b001
>>> x
1
# Octal value of 5
>>> x = 0o5
>>> x
5
# Hexadecimal value of 10
>>> x = 0x10
>>> x
16
Floating Point Data Type - float
The float
data type can represent floating point numbers, up to 15 decimal places. This means that it can cover numbers such as 0.3, -2.8, 5.542315467, etc. but also integers.
Numbers that have more than 15 numbers after the dot will be truncated. For example, Python has no difficulty correctly understanding the following as a float
:
>>> y = 2.3
>>> type(y)
<class 'float'>
>>> y = 5/4
>>> type(y)
<class 'float'>
However, as previously mentioned, if we only say 5
Python will consider it an int
data type. If, for some reason, we wanted a float
variable that has the value 5
, we'd need to explicitly let Python know:
>>> y = 5.0
>>> type(y)
<class 'float'>
>>> y = float(5)
>>> type(y)
<class 'float'>
This data type can be used to represent some special "numbers" like the NaN
("Not a Number"), +/-infinity, and exponents:
>>> y = float('-infinity')
>>> y
-inf
>>> y = float(5e-3)
>>> y
0.005
>>> y = float('nan')
>>> y
nan
One interesting side-note here is how NaN
behaves. Namely, running y == float('nan')
would return False
, even though y
is, well, not a number.
In fact, it's behavior can be seen as strange by comparing the values and references:
>>> x = float('nan')
>>> x == float('nan')
False
>>> x == x
False
>>> x is x
True
This of course happens because NaN
was meant to behave this way, but it's still interesting.
If you're unfamiliar with the difference between the ==
and is
operators, check out our guide to Object Comparison in Python - ==
vs is
!
Complex Numbers - complex
The last numeric type we need to cover is the complex
type. It's a rarely used data type, and its job is to represent imaginary numbers in a complex pair.
The character j
is used to express the imaginary part of the number, unlike the i
more commonly used in math.
This is because Python follows the electrical engineering practice, rather than the mathematical practice of naming the imaginary part of a complex number.
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
Let's see how we can declare complex numbers in Python:
# Assigning a value of 0r and 1j to `com`
>>> com = 1j
# Printing `com`s value
>>> com
1j
# Multiplying complex numbers
>>> com*com
(-1+0j)
# Assigning a value to a new `com` number
>>> com2 = 3 + 1j
# Adding two complex numbers
>>> com+com2
(3+2j)
# Assigning a new value to `com`
>>> com = complex(1 + 2j)
# Printing `com`
>>> com
(1+2j)
Sequence Data Types
Sequence Data Types are used to represent collections of some sort. These collections of elements can consist of elements of the same type, or of completely different types.
str
Strings are sequences of characters, represented by either single or double quotes. This includes empty strings (without any characters between the quotes).
In a similar fashion to integers, strings don't really have a hard length limit set in place. You can make a string as long as your computer's memory allows you to, technically.
Strings are very common as they're the most basic way to represent a sequence of characters - or words:
>>> my_string = 'some sequence of characters'
>>> my_string
'some sequence of characters'
>>> type(my_string)
<class 'str'>
They can also contain special values, some of which are \n
if we want the string, when printed, to have a new line, or if we want to use special characters like \
, '
or "
we need to add a backslash before them, e.g. \
.
Adding a backslash before them is calling escaping the special characters, as we don't want their special meaning to be taken into consideration - we want their literal values to be used.
>>> my_string = "adding a new line \n and some double quotes \" to the string"
>>> print(my_string)
adding a new line
and some double quotes " to the string
Another way of not worrying about adding a backslash before every '
or "
is to use '''
(triple quotes) instead, and Python will add the backslash wherever necessary for us:
>>> my_string = '''No need to worry about any ' or " we might have'''
>>> my_string
'No need to worry about any \' or " we might have'
We can demonstrate the "weak typed" nature of Python while converting a float
to a str
:
# Assigning float value to `z`
>>> z = 5.2
# Checking for the type of `z`
>>> type(z)
<class 'float'>
# Printing the value of `z`
>>> z
5.2
# Changing `z` into a string
>>> z = str(z)
# Checking the type of `z`
>>> type(z)
<class 'str'>
# Printing the value of `z`
>>> z
'5.2'
We can see that z
changed its type without much issue.
list
Unlike strings, lists can contain ordered sequences of any data type, even multiple different data types within the same list.
They are created by providing the elements of the list between []
, e.g. [element1, element2]
or by simply writing []
and adding the elements later.
There are in-built methods for reversing, sorting, clearing, extending a list, as well as appending (inserting at the end), inserting or removing elements in specific positions, etc, amongst other methods.
Elements can be accessed by their index in the list, with the index starting at 0
.
In order to see the first element of the list (if it isn't empty) for a list named some_list
, you can use some_list[0]
and the same applies to all other elements of the list.
These elements can also be changed at an index i
by writing some_list[i] = new_value
.
Let's make a list and perform some operations on it:
# Making an empty list
>>> some_list = []
# Printing the value of the list
>>> some_list
[]
# Checking the type of the list
>>> type(some_list)
<class 'list'>
# Appending an integer to the list
>>> some_list.append(5)
# Printing the value of the list
>>> some_list
[5]
# Inserting an element at the `0`th index
>>> some_list.insert(0, 'some string')
# Printing the value of the list
>>> some_list
['some string', 5]
# Printing the value of the element on the `1`st index
>>> some_list[1]
5
# Appending another element ot the list
>>> some_list.append(123)
# Printing the value of the list
>>> some_list
['some string', 5, 123]
# Assigning the second element, an already existing value, to a new value
>>> some_list[2] = 'a'
# Printing the value of the list
>>> some_list
['some string', 5, 'a']
However, if you'd try sorting a list with mismatching types:
>>> some_list.sort()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'int' and 'str'
Since an int
cannot be compared to a str
with the <
operator - an error is thrown. Though, if we had:
>>> some_list = [1, 6, 4, 2, 8, 7]
>>> some_list
[1, 6, 4, 2, 8, 7]
>>> some_list.sort()
>>> some_list
[1, 2, 4, 6, 7, 8]
We could've sorted it.
tuple
The tuple
data type is very similar to lists, the only difference being that it's immutable and that it's created using ()
instead of []
. This means that once you create a tuple
, you can't change the values it contains.
They are in most cases slightly faster than lists and are used to protect data from being changed:
# Creating a tuple
>>> some_tuple = ("some string", 5, True, float('nan'))
# Printing the value of a tuple
>>> some_tuple
('some string', 5, True, nan)
# Accessing an element of a tuple
>>> some_tuple[0]
'some string'
# Accessing elements from given index to the end of the tuple
>>> some_tuple[1:]
(5, True, nan)
# Accessing elements from given index to another given index
>>> some_tuple[1:3]
(5, True)
# Trying to assign a new value to the element at the `0`th index
>>> some_tuple[0] = 'some other string' # Causes an error
Boolean Type - bool
The bool
data type is used to represent boolean values - True
or False
. The data type can't contain any other value.
However, Python will again without much issue convert most things to bool
. Namely, if you happen to say bool(5)
, Python will consider that True
, while bool(0)
will be considered False
.
Basically, 0
is false and 1
is true. Anything beyond 1
is treated as True
as well. A similar thing goes on for strings where if you assign an empty string, it's treated as False
.
This booleanification (also called truthiness in Python) is done implicitly in any context where Python expects a bool
value. For example, saying if(5)
has the same effect as if(bool(5))
, i.e. if(True)
.
Let's see how we can declare and use booleans:
# Declaring a boolean
>>> some_bool = True
# Printing a boolean's value
>>> some_bool
True
# Checking a boolean's type
>>> type(some_bool)
<class 'bool'>
# Assigning an empty string to a boolean
>>> some_bool = bool('')
# Checking the boolean's value
>>> some_bool
False
Note that True
and False
are keywords, and that you can't say true
or false
:
>>> some_bool = false
# Throws an error
Dictionary Type - dict
Unlike the Sequence group of data types, dict
s (dictionaries) are unordered collections. Specifically, unordered collections of (key, value)
pairs. What this means that, unlike with lists for example, values are associated with keys and not with integer indexes.
A dictionary has the following structure:
{
key1 : value1,
key2 : value2,
....
keyN : valueN
}
It's important to note that keys have to be unique, while values don't. When you'd like to look up a value - you pass its key in and retrieve the pair.
Dictionaries can be created by either adding (key, value)
pairs between {}
(remember, []
is for lists and ()
is for tuples), or simply writing an empty {}
and adding the pairs later.
Keys and values can be of varying data types:
# Checking the value of a dict
>>> type({})
<class 'dict'>
# Assigning keys and values to a dict
>>> some_dict = { 5 : 'five', 4 : 'four'}
# Printing the value of a dict
>>> some_dict
{5: 'five', 4: 'four'}
# Accessing a dict element via its key
>>> some_dict[5]
'five'
# Assigning a new value to a key
>>> some_dict[6] = 'six'
# Printing the value of the dict
>>> some_dict
{5: 'five', 4: 'four', 6: 'six'}
# Removing the (key, value) pair that has the key 5 (this also returns the value)
>>> some_dict.pop(5)
'five'
# Trying to access an element with the key 5
>>> some_dict[5] # Raises an error since the key 5 no longer exists
Conclusion
Python was written in such a way to make code as easy to write as possible, without making too code too ambiguous.
However, its easy-to-write, weakly typed nature can lead to confusion when someone else looks at your code or when you are revisiting it a while after writing it. It's good practice to write what exact type something is supposed to be wherever there's a chance of ambiguity and avoid re-using variable names with different types.