Data Frames A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns. import pandas as pd import pandas as pd data = { "Marks": [80, 75, 90], "Sub": ['Python', 'Java', 'Database'] } #load data into a DataFrame object: df = pd.DataFrame(data) print(df)
Data Frames Locate Row As you can see from the result above, the DataFrame is like a table with rows and columns. Pandas use the loc attribute to return one or more specified row(s) import pandas as pd data = { "Marks": [80, 75, 90], "Sub": ['Python', 'Java', 'Database'] } #load data into a DataFrame object: df = pd.DataFrame(data) print(df.loc[0]) #print(df.loc[[0, 1]])
Data Frames Named Index import pandas as pd data = { "Marks": [80, 75, 90], "Sub": ['Python', 'Java', 'Database'] } #load data into a DataFrame object: df = pd.DataFrame(data,index= ["day1","day2","day3"]) print(df) Locate Named Indexes Use the named index in the loc attribute to return the specified row(s). Example Return "day2": #refer to the named index: print(df.loc["day2"])
Data Frames Load Files Into a DataFrame If your data sets are stored in a file, Pandas can load them into a DataFrame. import pandas as pd df = pd.read_csv('data.csv') print(df) import pandas as pd print(pd.options.display.max_rows)
Data Frames Read JSON Big data sets are often stored, or extracted as JSON. JSON is plain text, but has the format of an object, and is well known in the world of programming, including Pandas. In our examples we will be using a JSON file called 'data.json’. use to_string() to print the entire DataFrame.
Data Frames import pandas as pd data = { "Duration":{ "0":60, "1":60, "2":60, "3":45, "4":45, "5":60 }, "Pulse":{ "0":110, "1":117, "2":103, "3":109, "4":117, "5":102 }, "Maxpulse":{ "0":130, "1":145, "2":135, "3":175, "4":148, "5":127 }, "Calories":{ "0":409, "1":479, "2":340, "3":282, "4":406, "5":300 } } df = pd.DataFrame(data) print(df)
Viewing the Data • One of the most used method for getting a quick overview of the DataFrame, is the head() method. • The head() method returns the headers and a specified number of rows, starting from the top. • import pandas as pd • df = pd.read_csv('data.csv') • print(df.head(10)) • #Print the first 5 rows of the DataFrame:print(df.head())
• There is also a tail() method for viewing the last rows of the DataFrame. • The tail() method returns the headers and a specified number of rows, starting from the bottom. • Example • Print the last 5 rows of the DataFrame: • print(df.tail())
import pandas as pd # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name") # retrieving rows by iloc method row2 = data.iloc[3] print(row2)
# importing pandas as pd import pandas as pd # importing numpy as np import numpy as np # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95], 'Second Score': [30, np.nan, 45, 56], 'Third Score':[52, 40, 80, 98], 'Fourth Score':[np.nan, np.nan, np.nan, 65]} # creating a dataframe from dictionary df = pd.DataFrame(dict) print(df)
• Dropping missing values using dropna() : • In order to drop a null values from a dataframe, we used dropna() function this fuction drop Rows/Columns of datasets with Null values in different ways. # importing pandas as pd import pandas as pd # importing numpy as np import numpy as np # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95], 'Second Score': [30, np.nan, 45, 56], 'Third Score':[52, 40, 80, 98], 'Fourth Score':[np.nan, np.nan, np.nan, 65]} # creating a dataframe from dictionary df = pd.DataFrame(dict) Print(df)
• Now we drop rows with at least one Nan value (Null value). # importing pandas as pd import pandas as pd # importing numpy as np import numpy as np # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95], 'Second Score': [30, np.nan, 45, 56], 'Third Score':[52, 40, 80, 98], 'Fourth Score':[np.nan, np.nan, np.nan, 65]} # creating a dataframe from dictionary df = pd.DataFrame(dict) # using dropna() function print(df.dropna())
• Iterating over rows and columns • Iteration is a general term for taking each item of something, one after another. Pandas DataFrame consists of rows and columns so, in order to iterate over dataframe, we have to iterate a dataframe like a dictionary. • Iterating over rows : • In order to iterate over rows, we can use three function iteritems(), iterrows(), itertuples() . These three function will help in iteration over rows.
# importing pandas as pd import pandas as pd # dictionary of lists dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"], 'degree': ["MBA", "BCA", "M.Tech", "MBA"], 'score':[90, 40, 80, 98]} # creating a dataframe from a dictionary df = pd.DataFrame(dict) print(df)
Now we apply iterrows() function in order to get a each element of rows. # importing pandas as pd import pandas as pd # dictionary of lists dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"], 'degree': ["MBA", "BCA", "M.Tech", "MBA"], 'score':[90, 40, 80, 98]} # creating a dataframe from a dictionary df = pd.DataFrame(dict) # iterating over rows using iterrows() function for i, j in df.iterrows(): print(i, j) print()
Data Frame Data structure in Python pandas.pptx

Data Frame Data structure in Python pandas.pptx

  • 1.
    Data Frames A PandasDataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns. import pandas as pd import pandas as pd data = { "Marks": [80, 75, 90], "Sub": ['Python', 'Java', 'Database'] } #load data into a DataFrame object: df = pd.DataFrame(data) print(df)
  • 2.
    Data Frames Locate Row Asyou can see from the result above, the DataFrame is like a table with rows and columns. Pandas use the loc attribute to return one or more specified row(s) import pandas as pd data = { "Marks": [80, 75, 90], "Sub": ['Python', 'Java', 'Database'] } #load data into a DataFrame object: df = pd.DataFrame(data) print(df.loc[0]) #print(df.loc[[0, 1]])
  • 3.
    Data Frames Named Index importpandas as pd data = { "Marks": [80, 75, 90], "Sub": ['Python', 'Java', 'Database'] } #load data into a DataFrame object: df = pd.DataFrame(data,index= ["day1","day2","day3"]) print(df) Locate Named Indexes Use the named index in the loc attribute to return the specified row(s). Example Return "day2": #refer to the named index: print(df.loc["day2"])
  • 4.
    Data Frames Load FilesInto a DataFrame If your data sets are stored in a file, Pandas can load them into a DataFrame. import pandas as pd df = pd.read_csv('data.csv') print(df) import pandas as pd print(pd.options.display.max_rows)
  • 5.
    Data Frames Read JSON Bigdata sets are often stored, or extracted as JSON. JSON is plain text, but has the format of an object, and is well known in the world of programming, including Pandas. In our examples we will be using a JSON file called 'data.json’. use to_string() to print the entire DataFrame.
  • 6.
    Data Frames import pandasas pd data = { "Duration":{ "0":60, "1":60, "2":60, "3":45, "4":45, "5":60 }, "Pulse":{ "0":110, "1":117, "2":103, "3":109, "4":117, "5":102 }, "Maxpulse":{ "0":130, "1":145, "2":135, "3":175, "4":148, "5":127 }, "Calories":{ "0":409, "1":479, "2":340, "3":282, "4":406, "5":300 } } df = pd.DataFrame(data) print(df)
  • 7.
    Viewing the Data •One of the most used method for getting a quick overview of the DataFrame, is the head() method. • The head() method returns the headers and a specified number of rows, starting from the top. • import pandas as pd • df = pd.read_csv('data.csv') • print(df.head(10)) • #Print the first 5 rows of the DataFrame:print(df.head())
  • 8.
    • There isalso a tail() method for viewing the last rows of the DataFrame. • The tail() method returns the headers and a specified number of rows, starting from the bottom. • Example • Print the last 5 rows of the DataFrame: • print(df.tail())
  • 9.
    import pandas aspd # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name") # retrieving rows by iloc method row2 = data.iloc[3] print(row2)
  • 10.
    # importing pandasas pd import pandas as pd # importing numpy as np import numpy as np # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95], 'Second Score': [30, np.nan, 45, 56], 'Third Score':[52, 40, 80, 98], 'Fourth Score':[np.nan, np.nan, np.nan, 65]} # creating a dataframe from dictionary df = pd.DataFrame(dict) print(df)
  • 11.
    • Dropping missingvalues using dropna() : • In order to drop a null values from a dataframe, we used dropna() function this fuction drop Rows/Columns of datasets with Null values in different ways. # importing pandas as pd import pandas as pd # importing numpy as np import numpy as np # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95], 'Second Score': [30, np.nan, 45, 56], 'Third Score':[52, 40, 80, 98], 'Fourth Score':[np.nan, np.nan, np.nan, 65]} # creating a dataframe from dictionary df = pd.DataFrame(dict) Print(df)
  • 12.
    • Now wedrop rows with at least one Nan value (Null value). # importing pandas as pd import pandas as pd # importing numpy as np import numpy as np # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95], 'Second Score': [30, np.nan, 45, 56], 'Third Score':[52, 40, 80, 98], 'Fourth Score':[np.nan, np.nan, np.nan, 65]} # creating a dataframe from dictionary df = pd.DataFrame(dict) # using dropna() function print(df.dropna())
  • 13.
    • Iterating overrows and columns • Iteration is a general term for taking each item of something, one after another. Pandas DataFrame consists of rows and columns so, in order to iterate over dataframe, we have to iterate a dataframe like a dictionary. • Iterating over rows : • In order to iterate over rows, we can use three function iteritems(), iterrows(), itertuples() . These three function will help in iteration over rows.
  • 14.
    # importing pandasas pd import pandas as pd # dictionary of lists dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"], 'degree': ["MBA", "BCA", "M.Tech", "MBA"], 'score':[90, 40, 80, 98]} # creating a dataframe from a dictionary df = pd.DataFrame(dict) print(df)
  • 15.
    Now we applyiterrows() function in order to get a each element of rows. # importing pandas as pd import pandas as pd # dictionary of lists dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"], 'degree': ["MBA", "BCA", "M.Tech", "MBA"], 'score':[90, 40, 80, 98]} # creating a dataframe from a dictionary df = pd.DataFrame(dict) # iterating over rows using iterrows() function for i, j in df.iterrows(): print(i, j) print()