Remove Quotes From All Rows in DataFrame Column
When string-based columns have quotes - we'll oftentimes want to get rid of them, in large part because 'string
is technically a different string to string
, which more often than not isn't a distinction we want to make.
Whether you'll be performing NLP and tokenizing words (in which case, you'll have different tokens for the same words because they're "glued" to a quote) or any other form of manipulation - removing the quotes will be of importance.
There are quite a few ways to remove quotes in a Pandas DataFrame
.
DataFrame.applymap (lambda...)
To remove all quotes from all rows and columns of an entire DataFrame
, you can use applymap()
with a lambda
function:
# applymap() works on entire DataFrame
df = df.applymap(lambda x: x.replace('"', ''))
Note: This will apply the lambda function on every row of every column, and will result in an error if not all columns are of str
type.
To remove all quotes from all rows in a single column, just apply the function to a single column:
# apply() works on column
df['ColumnName'] = df['ColumnName'].apply(lambda x: x.replace('"', ''))
These two approaches are generic and can apply any lambda function, besides one that leverages replace()
.
Series' str Method - str.replace()
Each Series
offers the str
function, which lets you use other functions such as replace()
to manipulate strings within rows of a single column:
df['ColumnName'] = df['ColumnName'].str.replace(r'"', '')
str.replace() with RegEx
To use Regular Expressions with the replace()
method, you pass in regex=True
:
df['ColumnName'].replace(regex=True,inplace=False,to_replace=r'"',value=r'')
You might also like...
Entrepreneur, Software and Machine Learning Engineer, with a deep fascination towards the application of Computation and Deep Learning in Life Sciences (Bioinformatics, Drug Discovery, Genomics), Neuroscience (Computational Neuroscience), robotics and BCIs.
Great passion for accessible education and promotion of reason, science, humanism, and progress.