Thursday, 31 October 2019

Convert Date Of Birth to Age in Dataframe



To convert the Date of Birth to Age, we need to check the type of the column as below

train.info()



Method 1:


Here we can see that DOB column is of an object type

Step 1: We need to convert the DOB column as datetime type using below code

train['DOB']=pd.to_datetime(train['DOB'], format='%d-%b-%y')
train.info()

We can see that now the DOB column has been converted to datetime type






Step 2: To convert the DOB column to Age using below

from datetime import datetime

today=date.today()
now = pd.to_datetime('now')
train['Age']= (now.year - train['DOB'].dt.year) - ((now.month - train['DOB'].dt.month) < 0)
train.head()


We get a new column "Age" in the dataset


Method 2:

train['Age'] = train['DOB'].apply(lambda x: 119 - int(x[-2:]))
train['Age'].head()

Thank You!!


Wednesday, 30 October 2019

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 2: invalid start byte

While reading the excel file you receive the below error

train = pd.read_csv("Train.csv")
train.head()


"UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 2: invalid start byte"


Include the encoding parameter in read_csv function as below

train = pd.read_csv("Train.csv",encoding='latin-1')


train.head()

now you will be able to read the file




Tuesday, 24 September 2019

Python : Missing data Handling


In this article, we will learn various ways to handle missing data.

We will learn
  • Check if there are NA values using isna()
  • Drop the NA values using dropna()
  • Fill and replace the NA values using fillna() and replace()


#Import necessary libraries
import pandas as pd
import numpy as np


Lets create a Data Frame

df = pd.DataFrame({"Name" : ["Rita","Meera","Leena",np.nan,"Sheena", None ,"Diya"],
                   "Age"  : [23,np.nan,45,np.nan,56,77,34]})
print(df)

Output:




















isna() - to check if the NA values are present
Return -  the boolean same size object indicating True if the value of NA or None 

df.isna()

Output:





















notna() - Opposite of isna() function
Return :  Boolean size object indicating the values are not NA

df.notna()

Output:





















dropna() - Removes the missing values
Return : DataFrame with the NA elements dropped from it


axis   Indicates rows or columns; default is 0
       0  represents rows
       1 represent columns
how - Determine the rows or columns to be removed if we have at least one NA or all NA
           any - drop row or column if any NA values are present
           all - drop row or column if all NA values are present

Below will drop the rows if all NA values are present 

df.dropna(axis=0, how ="all" )


Output:























Below will drop the rows if all NA values are present

df.dropna(axis=0, how ="any" )

Output:





















to drop all the columns with NA values

df.dropna(axis='columns')
does not gives any columns as all columns contains at least one NA value

Output:
























fillna() - Fill the NA values with specified method
Return -  filled DataFrame


To fill all the NA values with "4"

df.fillna(4)

Output:





















To propogate non-null values forward or backward use method parameter

df.fillna(method="ffill")

Output:






















To replace all the Nan elements in column Name with "Radha" and Age with 99

values = {"Name" : "Radha", "Age" : 99}
df.fillna(value=values)

Output:





















To replace only the first NA elements use limit parameter

df.fillna(value=values, limit=1)

Output




I hope you like the article.

Thanks
Neeru

Thursday, 19 September 2019

Python - String's Case Handling



In this article, we will learn the various ways to handle the case of the alphabets

Series.str can be used to access various functions for String


#Import necessary libraries
import pandas as pd
import numpy as np

#Create a Series 
s_case = pd.Series(['apple','Pomogranate',np.nan,'strawberry','the charlie and the chocolate factory'])
print(s_case)

Output:

#Capitalize the first character of the string
print("Use Captalize")
print(s_case.str.capitalize())

Output:












#Converts the character in lowercase
print("Use lowercase")
print(s_case.str.lower())

Output:











#Converts the charcter in uppercase
print("Use uppercase")
print(s_case.str.upper())

Output:











#Swaps the case from lower to upper and vice versa
print("Use swapcase")
print(s_case.str.swapcase())

Output:











#Converts the string into title case
print("Use title")
s_case.str.title()


Output:












#Check if the string is lower case
print("Use islower()")
print(s_case.str.islower())

Output:











Like wise there are many other functions like isupper(), isnumeric(), isalpha()

Thursday, 1 August 2019

Python - String Pattern Matching


In this article we will try to match the pattern in the given string.
We will use Pandas Series.

Series.str can be used to access various functions for String

#Import all necessary libraries
import pandas as pd
import numpy as np
import re

#Create a Series
s_pat = pd.Series(['Parrot','pigeon','Eagle','sparrow',np.nan])
print(s_pat)


Output:













startswith() : Checks is the start of the string matches the pattern
Return :  A Series of Boolean values

#Match the pattern that starts with
print("Use startswith")
print(s_pat.str.startswith("P"))

Output:












If you want to display NaN to be false use, na=False

print(s_pat.str.startswith("P",na=False))

Output:












Similarly,
endswith() : Checks is the end of the string matches the pattern
Return :  A Series of Boolean values

print(s_pat.str.endswith("on"))


Output:













contains():  Check if the pattern or regular expression is contained in the string 
Return: A boolean series based on the pattern or regular expression is contained within the string of the Series

#Check if the pattern or Regular expression is contained in the string of Series
print("Use contains")
print(s_pat.str.contains("ro"))


Output:












To make it case insensitive use the parameter case=False

print(s_pat.str.contains("p",case=False))

Output:




findall(): Finds all the occurrence of the pattern or regular expression in the given series
Return : List of strings

#Find all the pattern in the string
print("Use findall")
print(s_pat.str.findall("Parrot"))












When the pattern matches more than one times in the string, then list of multiple string is returned.


print(s_pat.str.findall("r"))


Output:












To ignore-case the case add parameter flags. we need to import re,  ir regular expression

print(s_pat.str.findall("PARROT",flags=re.IGNORECASE))

Output: