In this article, we will learn various ways to handle missing data.
We will learn
- Check if there are NA values using isna()
- Drop the NA values using dropna()
- Fill and replace the NA values using fillna() and replace()
#Import necessary libraries
import pandas as pd
import numpy as np
Lets create a Data Frame
df = pd.DataFrame({"Name" : ["Rita","Meera","Leena",np.nan,"Sheena", None ,"Diya"],
"Age" : [23,np.nan,45,np.nan,56,77,34]})
print(df)
isna() - to check if the NA values are present
Return - the boolean same size object indicating True if the value of NA or None
df.isna()
notna() - Opposite of isna() function
Return : Boolean size object indicating the values are not NA
df.notna()
Output:
dropna() - Removes the missing values
Return : DataFrame with the NA elements dropped from it
axis Indicates rows or columns; default is 0
0 represents rows
1 represent columns
1 represent columns
how - Determine the rows or columns to be removed if we have at least one NA or all NA
any - drop row or column if any NA values are present
all - drop row or column if all NA values are present
Below will drop the rows if all NA values are present
df.dropna(axis=0, how ="all" )
Output:
Below will drop the rows if all NA values are present
df.dropna(axis=0, how ="any" )
Output:
to drop all the columns with NA values
df.dropna(axis='columns')
does not gives any columns as all columns contains at least one NA value
fillna() - Fill the NA values with specified method
Return - filled DataFrame
To fill all the NA values with "4"
df.fillna(4)
To propogate non-null values forward or backward use method parameter
df.fillna(method="ffill")
Output:
To replace all the Nan elements in column Name with "Radha" and Age with 99
values = {"Name" : "Radha", "Age" : 99}
df.fillna(value=values)
Output:
To replace only the first NA elements use limit parameter
df.fillna(value=values, limit=1)
No comments:
Post a Comment