Tuesday, 24 September 2019

Python : Missing data Handling


In this article, we will learn various ways to handle missing data.

We will learn
  • Check if there are NA values using isna()
  • Drop the NA values using dropna()
  • Fill and replace the NA values using fillna() and replace()


#Import necessary libraries
import pandas as pd
import numpy as np


Lets create a Data Frame

df = pd.DataFrame({"Name" : ["Rita","Meera","Leena",np.nan,"Sheena", None ,"Diya"],
                   "Age"  : [23,np.nan,45,np.nan,56,77,34]})
print(df)

Output:




















isna() - to check if the NA values are present
Return -  the boolean same size object indicating True if the value of NA or None 

df.isna()

Output:





















notna() - Opposite of isna() function
Return :  Boolean size object indicating the values are not NA

df.notna()

Output:





















dropna() - Removes the missing values
Return : DataFrame with the NA elements dropped from it


axis   Indicates rows or columns; default is 0
       0  represents rows
       1 represent columns
how - Determine the rows or columns to be removed if we have at least one NA or all NA
           any - drop row or column if any NA values are present
           all - drop row or column if all NA values are present

Below will drop the rows if all NA values are present 

df.dropna(axis=0, how ="all" )


Output:























Below will drop the rows if all NA values are present

df.dropna(axis=0, how ="any" )

Output:





















to drop all the columns with NA values

df.dropna(axis='columns')
does not gives any columns as all columns contains at least one NA value

Output:
























fillna() - Fill the NA values with specified method
Return -  filled DataFrame


To fill all the NA values with "4"

df.fillna(4)

Output:





















To propogate non-null values forward or backward use method parameter

df.fillna(method="ffill")

Output:






















To replace all the Nan elements in column Name with "Radha" and Age with 99

values = {"Name" : "Radha", "Age" : 99}
df.fillna(value=values)

Output:





















To replace only the first NA elements use limit parameter

df.fillna(value=values, limit=1)

Output




I hope you like the article.

Thanks
Neeru

Thursday, 19 September 2019

Python - String's Case Handling



In this article, we will learn the various ways to handle the case of the alphabets

Series.str can be used to access various functions for String


#Import necessary libraries
import pandas as pd
import numpy as np

#Create a Series 
s_case = pd.Series(['apple','Pomogranate',np.nan,'strawberry','the charlie and the chocolate factory'])
print(s_case)

Output:

#Capitalize the first character of the string
print("Use Captalize")
print(s_case.str.capitalize())

Output:












#Converts the character in lowercase
print("Use lowercase")
print(s_case.str.lower())

Output:











#Converts the charcter in uppercase
print("Use uppercase")
print(s_case.str.upper())

Output:











#Swaps the case from lower to upper and vice versa
print("Use swapcase")
print(s_case.str.swapcase())

Output:











#Converts the string into title case
print("Use title")
s_case.str.title()


Output:












#Check if the string is lower case
print("Use islower()")
print(s_case.str.islower())

Output:











Like wise there are many other functions like isupper(), isnumeric(), isalpha()