Work with Python

Thursday, 31 October 2019

Convert Date Of Birth to Age in Dataframe

To convert the Date of Birth to Age, we need to check the type of the column as below

train.info()

Method 1:

Here we can see that DOB column is of an object type

Step 1: We need to convert the DOB column as datetime type using below code

train['DOB']=pd.to_datetime(train['DOB'], format='%d-%b-%y')
train.info()

We can see that now the DOB column has been converted to datetime type

Step 2: To convert the DOB column to Age using below

from datetime import datetime

today=date.today()
now = pd.to_datetime('now')
train['Age']= (now.year - train['DOB'].dt.year) - ((now.month - train['DOB'].dt.month) < 0)
train.head()

We get a new column "Age" in the dataset

Method 2:

train['Age'] = train['DOB'].apply(lambda x: 119 - int(x[-2:]))
train['Age'].head()

Thank You!!

Wednesday, 30 October 2019

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 2: invalid start byte

While reading the excel file you receive the below error

train = pd.read_csv("Train.csv")
train.head()

"UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 2: invalid start byte"

Include the encoding parameter in read_csv function as below

train = pd.read_csv("Train.csv",encoding='latin-1')

train.head()

now you will be able to read the file

Tuesday, 24 September 2019

Python : Missing data Handling

In this article, we will learn various ways to handle missing data.

We will learn

Check if there are NA values using isna()
Drop the NA values using dropna()
Fill and replace the NA values using fillna() and replace()

#Import necessary libraries

import pandas as pd

import numpy as np

Lets create a Data Frame

df = pd.DataFrame({"Name" : ["Rita","Meera","Leena",np.nan,"Sheena", None ,"Diya"],

"Age" : [23,np.nan,45,np.nan,56,77,34]})

print(df)

Output:

isna() - to check if the NA values are present

Return - the boolean same size object indicating True if the value of NA or None

df.isna()

Output:

notna() - Opposite of isna() function

Return : Boolean size object indicating the values are not NA

df.notna()

Output:

dropna() - Removes the missing values

Return : DataFrame with the NA elements dropped from it

axis Indicates rows or columns; default is 0

0 represents rows
1 represent columns

how - Determine the rows or columns to be removed if we have at least one NA or all NA

any - drop row or column if any NA values are present

all - drop row or column if all NA values are present

Below will drop the rows if all NA values are present

df.dropna(axis=0, how ="all" )

Output:

Below will drop the rows if all NA values are present

df.dropna(axis=0, how ="any" )

Output:

to drop all the columns with NA values

df.dropna(axis='columns')

does not gives any columns as all columns contains at least one NA value

Output:

fillna() - Fill the NA values with specified method

Return - filled DataFrame

To fill all the NA values with "4"

df.fillna(4)

Output:

To propogate non-null values forward or backward use method parameter

df.fillna(method="ffill")

Output:

To replace all the Nan elements in column Name with "Radha" and Age with 99

values = {"Name" : "Radha", "Age" : 99}

df.fillna(value=values)

Output:

To replace only the first NA elements use limit parameter

df.fillna(value=values, limit=1)

Output

I hope you like the article.

Thanks
Neeru

Thursday, 19 September 2019

Python - String's Case Handling

In this article, we will learn the various ways to handle the case of the alphabets

Series.str can be used to access various functions for String

#Import necessary libraries
import pandas as pd
import numpy as np

#Create a Series
s_case = pd.Series(['apple','Pomogranate',np.nan,'strawberry','the charlie and the chocolate factory'])
print(s_case)

Output:

#Capitalize the first character of the string
print("Use Captalize")
print(s_case.str.capitalize())

Output:

#Converts the character in lowercase
print("Use lowercase")
print(s_case.str.lower())

Output:

#Converts the charcter in uppercase
print("Use uppercase")
print(s_case.str.upper())

Output:

#Swaps the case from lower to upper and vice versa
print("Use swapcase")
print(s_case.str.swapcase())

Output:

#Converts the string into title case
print("Use title")
s_case.str.title()

Output:

#Check if the string is lower case

print("Use islower()")

print(s_case.str.islower())

Output:

Like wise there are many other functions like isupper(), isnumeric(), isalpha()

Thursday, 1 August 2019

Python - String Pattern Matching

In this article we will try to match the pattern in the given string.
We will use Pandas Series.

Series.str can be used to access various functions for String

#Import all necessary libraries

import pandas as pd

import numpy as np

import re

#Create a Series

s_pat = pd.Series(['Parrot','pigeon','Eagle','sparrow',np.nan])

print(s_pat)

Output:

startswith() : Checks is the start of the string matches the pattern

Return : A Series of Boolean values

#Match the pattern that starts with

print("Use startswith")

print(s_pat.str.startswith("P"))

Output:

If you want to display NaN to be false use, na=False

print(s_pat.str.startswith("P",na=False))

Output:

Similarly,

endswith() : Checks is the end of the string matches the pattern

Return : A Series of Boolean values

print(s_pat.str.endswith("on"))

Output:

contains(): Check if the pattern or regular expression is contained in the string

Return: A boolean series based on the pattern or regular expression is contained within the string of the Series

#Check if the pattern or Regular expression is contained in the string of Series

print("Use contains")

print(s_pat.str.contains("ro"))

Output:

To make it case insensitive use the parameter case=False

print(s_pat.str.contains("p",case=False))

Output:

findall(): Finds all the occurrence of the pattern or regular expression in the given series

Return : List of strings

#Find all the pattern in the string

print("Use findall")

print(s_pat.str.findall("Parrot"))

When the pattern matches more than one times in the string, then list of multiple string is returned.

print(s_pat.str.findall("r"))

Output:

To ignore-case the case add parameter flags. we need to import re, ir regular expression

print(s_pat.str.findall("PARROT",flags=re.IGNORECASE))

Output:

Thursday, 31 October 2019

Convert Date Of Birth to Age in Dataframe

Wednesday, 30 October 2019

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 2: invalid start byte

Include the encoding parameter in read_csv function as belowtrain = pd.read_csv("Train.csv",encoding='latin-1')

train.head()now you will be able to read the file

Tuesday, 24 September 2019

Python : Missing data Handling

Thursday, 19 September 2019

Python - String's Case Handling

Thursday, 1 August 2019

Python - String Pattern Matching

Include the encoding parameter in read_csv function as below

train = pd.read_csv("Train.csv",encoding='latin-1')

train.head()

now you will be able to read the file