Data can take many shapes and forms - and it's oftentimes represented as strings.
Be it from a CSV file or input text, we split strings oftentimes to obtain lists of features or elements.
In this guide, we'll take a look at how to split a string into a list in Python, with the
split()
method.
Split String into List in Python
The split()
method of the string class is fairly straightforward. It splits the string, given a delimiter, and returns a list consisting of the elements split out from the string.
By default, the delimiter is set to a whitespace - so if you omit the delimiter argument, your string will be split on each whitespace.
Let's take a look at the behavior of the split()
method:
string = "Age,University,Name,Grades"
lst = string.split(',')
print(lst)
print('Element types:', type(lst[0]))
print('Length:', len(lst))
Our string had elements delimited with a comma, as in a CSV (comma-separated values) file, so we've set the delimiter appropriately.
This results in a list of elements of type str
, no matter what other type they can represent:
['Age', 'University', 'Name', 'Grades']
Element types: <class 'str'>
Length: 4
Split String into List, Trim Whitespaces and Change Capitalization
Not all input strings are clean - so you won't always have a perfectly formatted string to split. Sometimes, strings may contain whitespaces that shouldn't be in the "final product" or have a mismatch of capitalized and non-capitalized letters.
Thankfully, it's pretty easy to process this list and each element in it, after you've split it:
# Contains whitespaces after commas, which will stay after splitting
string = "age, uNiVeRsItY, naMe, gRaDeS"
lst = string.split(',')
print(lst)
This results in:
['age', ' uNiVeRsItY', ' naMe', ' gRaDeS']
No good! Each element starts with a whitespace and the elements aren't properly capitalized at all. Applying a function to each element of a list can easily be done through a simple for
loop so we'll want to apply a strip()
/trim()
(to get rid of the whitespaces) and a capitalization function.
Since we're not only looking to capitalize the first letter but also keep the rest lowercase (to enforce conformity), let's define a helper function for that:
def capitalize_word(string):
return string[:1].capitalize() + string[1:].lower()
The method takes a string, slices it on its first letter and capitalizes it. The rest of the string is converted to lowercase and the two changed strings are then concatenated.
We can now use this method in a loop as well:
string = "age, uNiVeRsItY, naMe, gRaDeS"
lst = string.split(',')
lst = [s.strip() for s in lst]
lst = [capitalize_word(s) for s in lst]
print(lst)
print('Element types:', type(lst[0]))
print('Length:', len(lst))
This results in a clean:
['Age', 'University', 'Name', 'Grades']
Element types: <class 'str'>
Length: 4
Split String into List and Convert to Integer
What happens if you're working with a string-represented list of integers? After splitting, you won't be able to perform integer operations on these, since they're ostensibly strings.
Thankfully, we can use the same for
loop as before to convert the elements into integers:
string = "1,2,3,4"
lst = string.split(',')
lst = [int(s) for s in lst]
print(lst)
print('Element types:', type(lst[0]))
print('Length:', len(lst))
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
Which now results in:
[1, 2, 3, 4]
Element types: <class 'int'>
Length: 4
Split String into List with Limiter
Besides the delimiter, the split()
method accepts a limiter - the number of times a split should occur.
It's an integer and is defined after the delimiter:
string = "Age, University, Name, Grades"
lst = string.split(',', 2)
print(lst)
Here, two splits occur, on the first and second comma, and no splits happen after that:
['Age', ' University', ' Name, Grades']
Conclusion
In this short guide, you've learned how to split a string into a list in Python.
You've also learned how to trim the whitespaces and fix capitalization as a simple processing step alongside splitting a string into a list.