AIM:

EXNO-3-DS

AIM:

To read the given data and perform Feature Encoding and Transformation process and save the data to a file.

ALGORITHM:

STEP 1:Read the given Data.

STEP 2:Clean the Data Set using Data Cleaning Process.

STEP 3:Apply Feature Encoding for the feature in the data set.

STEP 4:Apply Feature Transformation for the feature in the data set.

STEP 5:Save the data to the file.

FEATURE ENCODING:

Ordinal Encoding An ordinal encoding involves mapping each unique label to an integer value. This type of encoding is really only appropriate if there is a known relationship between the categories. This relationship does exist for some of the variables in our dataset, and ideally, this should be harnessed when preparing the data.
Label Encoding Label encoding is a simple and straight forward approach. This converts each value in a categorical column into a numerical value. Each value in a categorical column is called Label.
Binary Encoding Binary encoding converts a category into binary digits. Each binary digit creates one feature column. If there are n unique categories, then binary encoding results in the only log(base 2)ⁿ features.
One Hot Encoding We use this categorical data encoding technique when the features are nominal(do not have any order). In one hot encoding, for each level of a categorical feature, we create a new variable. Each category is mapped with a binary variable containing either 0 or 1. Here, 0 represents the absence, and 1 represents the presence of that category.

Methods Used for Data Transformation:

1. FUNCTION TRANSFORMATION

• Log Transformation • Reciprocal Transformation • Square Root Transformation • Square Transformation

2. POWER TRANSFORMATION

• Boxcox method • Yeojohnson method

CODING AND OUTPUT:

import pandas as pd df=pd.read_csv("/content/Encoding Data.csv") df

from sklearn.preprocessing import LabelEncoder,OrdinalEncoder pm=['Hot','Warm','Cold'] e1=OrdinalEncoder(categories=[pm]) e1.fit_transform(df[["ord_2"]])

df['bo2']=e1.fit_transform(df[["ord_2"]]) df

df['bo2']=e1.fit_transform(df[["ord_2"]]) df

le=LabelEncoder() dfc=df.copy() dfc['ord_2']=le.fit_transform(dfc['ord_2']) dfc

from sklearn.preprocessing import OneHotEncoder ohe=OneHotEncoder(sparse=False) df2=df.copy() enc=pd.DataFrame(ohe.fit_transform(df2[['nom_0']])) df2=pd.concat([df2,enc],axis=1) df2

pd.get_dummies(df2,columns=["nom_0"])

pip install --upgrade category_encoders

from category_encoders import BinaryEncoder df=pd.read_csv("/content/data.csv") be=BinaryEncoder() nd=be.fit_transform(df['Ord_2']) fb=pd.concat([df,nd],axis=1) dfb=df.copy() dfb

from category_encoders import TargetEncoder te=TargetEncoder() cc=df.copy() new=te.fit_transform(X=cc["City"],y=cc["Target"]) cc=pd.concat([cc,new],axis=1) cc

import pandas as pd from scipy import stats import numpy as np df=pd.read_csv("/content/Data_to_Transform.csv") df

df.skew()

np.log(df["Highly Positive Skew"])

np.reciprocal(df["Moderate Positive Skew"])

np.sqrt(df["Highly Positive Skew"])

np.square(df["Highly Positive Skew"])

df["Highly Positive Skew_boxcox"],parameters=stats.boxcox(df["Highly Positive Skew"])

df["Moderate Negative Skew_yeojohnson"],parameters=stats.yeojohnson(df["Moderate Negative Skew"]) df.skew()

df["Highly Negative Skew_yeojohnson"],parameters=stats.yeojohnson(df["Highly Negative Skew"]) df.skew()

import matplotlib.pyplot as plt import seaborn as sns import statsmodels.api as sm import scipy.stats as stats sm.qqplot(df["Moderate Negative Skew"],line='45') plt.show()

import matplotlib.pyplot as plt import seaborn as sns import statsmodels.api as sm import scipy.stats as stats sm.qqplot(df["Moderate Negative Skew"],line='45') plt.show()

sm.qqplot(np.reciprocal(df["Moderate Negative Skew"]),line='45')

from sklearn.preprocessing import QuantileTransformer qt=QuantileTransformer(output_distribution='normal',n_quantiles=891) df["Moderate Negative Skew"]=qt.fit_transform(df[["Moderate Negative Skew"]]) sm.qqplot(df["Moderate Negative Skew"],line='45') plt.show()

RESULT:

Hence performing Feature Encoding and Transformation process is Successful.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Data_to_Transform.csv		Data_to_Transform.csv
EXNO3_FEATURE_ENCODING_&_FEATURE_TRANSFORMATION.ipynb		EXNO3_FEATURE_ENCODING_&_FEATURE_TRANSFORMATION.ipynb
Encoding Data.csv		Encoding Data.csv
README.md		README.md
data.csv		data.csv
exno3_feature_encoding_&_feature_transformation.py		exno3_feature_encoding_&_feature_transformation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EXNO-3-DS

AIM:

ALGORITHM:

FEATURE ENCODING:

Methods Used for Data Transformation:

1. FUNCTION TRANSFORMATION

2. POWER TRANSFORMATION

CODING AND OUTPUT:

RESULT:

About

Uh oh!

Releases

Packages

Languages

Harsayazheni/Introduction-to-data-science-3

Folders and files

Latest commit

History

Repository files navigation

EXNO-3-DS

AIM:

ALGORITHM:

FEATURE ENCODING:

Methods Used for Data Transformation:

1. FUNCTION TRANSFORMATION

2. POWER TRANSFORMATION

CODING AND OUTPUT:

RESULT:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages