top of page

PANDAS LIBRARY

Pandas is an open-source Python library that offers high-performance data manipulation. Pandas, which means an Econometrics from Multidimensional Data, gets its name from the phrase "Panel Data". Python is used for data analysis. Processing steps including merging, cleansing, and restructuring are all necessary for data analysis which can be done with Pandas.

 

Advantages of Pandas:

  • Easily handles missing data 

  • It provides easy way to slice data 

  • It provides flexible ways to merge, concatenate or reshape the data. 

  • Column and row operations are simple in Pandas.

For Installing Pandas Library :

>>> pip install pandas

For importing Pandas Library:

>>> import pandas as pd

Common data structures used in Pandas are:
  1. Series  (one- Dimensional structure)

  2. DataFrame (two - dimensional structure)

  3. Panel (three-dimensional structure) 

 

*Note: As per CBSE curriculum, Panel is not in course

Creating Series

It is described as a one-dimensional array that can store several forms of data. 

  • The term "index" refers to a series' row labels. 

  • The list, tuple, and dictionary are easily transformed into series using the "series' method

  • a series cannot have numerous columns as it has 1-D structure.

  • Data in series is mutable (i.e. changeable), but its size is immutable.

For creating empty Series :

>>> s1=pd.Series()

Series can be created with several methods by using :-

  1. Lists

  2. Array (using numpy)

  3. Dictionary

Creating Series Using Lists

>>>import pandas as pd

>>> list1=[1,2,3,"a","b","c"]
>>> s1=pd.Series(list1)

>>> print(s1)

Output:

0    1

1    2

2    3

3    a

4    b

5    c

dtype: object

Creating Series Using Dictionary

>>> dict1={"One":1,"Two":2,"Three":3,"Four":4,"Five":5}
>>> s2=pd.Series(dict1)

>>> print(dict1)

Output:

One       1

Two       2

Three    3

Four      4

Five       5

dtype: int64

Note: Here, keys are used as index for the series.

Creating Series Using Array (Numpy)

Numpy is a Python library used for working with arrays. It stands for "Numeric Python" or "Numerical Python".  It is one of the most commonly used packages for scientific computing and math operations in Python. NumPy was created in 2005 by Travis Oliphant.


Installing Numpy 

Numpy can be installed by typing following command: 

>>>pip install Numpy 


NumPy arrays are used to store lists of numerical data. It is a very versatile and efficient data structure. The Numpy array is officially called ndarray but commonly known as array.

Difference between List and array:

Methods to create ndarray:

  1. Using array()

>>>import numpy as np     (np is used as alias for Numpy methods)

>>>arr1=np.array([2,3,5])

>>>print(arr1)

Output:  array([2, 3, 5])

>>>arr2=np.array([[10,20,30],[80,90,50]])

>>>print(arr2)

Output:array([[10, 20, 30],

                [80, 90, 50]])

2. Using arange() - This method helps to print values with the given range.

Syntax: np.arange([start,] stop=10[, step], dtype=None)

import numpy as np
>>>arr4=np.arange(5,10)
>>>print(arr4)
Output: [5 6 7 8 9]

>>>arr4=np.arange(5,10,2)

>>>print(arr4)

Output: [5 7 9]

 

Creating Series Using Array (numpy) - contd. 

>>> array1=np.array([1,2,3,4,5])
>>> s3=pd.Ser
ies(array1)

>>>print(s3)

Output:

0    1

1    2

2    3

3    4

4    5

dtype: int32

Series Attributes 

Series Methods

Accessing Elements of a Series

Elements can be accessed either by indexing or slicing.

  1. Indexing:  Indexing  is used to extract the element stored inside the series by providing the index. Indices can be of two types:
    a. Positional Index                     b. Labeled index 

Positional index takes an integer value that corresponds to its position in the series starting from 0, whereas labeled index takes any user-defined label as index. 

Example :

2. Slicing : It is used to extract the sequence of values from the series. 

Syntax - Series[start:end:step],

Start - from where to start the range

End - till where to print. End index is not included in the result.

Step - if you want to skip some values in between or print in reverse order.

{Note: start and step are optional}

 

Example:

 

3. Using iloc and loc

  • iloc : The attribute . iloc is takes integer values i.e positional value(s) for accessing a particular series element.

  • loc : Whereas, the attribute . loc  takes row labels i.e. user defined indexes for accessing a particular series element.

Example:

Positional Index

>>> ser=pd.Series([1,2,3,4], index=[10,20,30,40])

>>>print(ser[2])

Output:    3

 >>>print(ser[[2,3]])

Output: 2    3

                3    4

                dtype: int64

Labeled Index

>>> ser=pd.Series([1,2,3,4], index=[10,20,30,40])

>>>print(ser[20])

Output:    2

  >>>print(ser[[20,40]])

Output: 20    2

                 40    4

                  dtype: int64

>>> ser=pd.Series(['Unity','Integrity','Loyalty','Devotion'])

>>> ser[0:3]

0        Unity

1    Integrity

2      Loyalty

dtype: object

>>> ser[:2]

0        Unity

1    Integrity

dtype: object

 

>>> ser[0:3:2]

0      Unity

2    Loyalty

dtype: object

>>> ser

a     5

b    10

c    15

d    20

e    25

dtype: int32

 

>>> ser.loc['a':'d']

a     5

b    10

c    15

d    20

dtype: int32

 

>>> ser.iloc[1:3]

b    10

c    15

dtype: int32

Note: while using .loc, both values are included in the result but in .iloc, end position is excluded from the result.

Conditional based extraction from the series

Elements can also be accessed on the basis of condition based on data. 

Example:

Mathematical Operations on Series

Pandas allows us to work with two series mathematically.

Index matching is used when working with series, and any missing values are automatically filled in with NaN by default.

Example:

For mathematical operations on Series, following methods can be used inplace of operators:

  • add()

  • sub()

  • mul()

  • div()  

Fill_value, argument can be used to fill values in place of non matching index values as demonstrated below:

 

>>> ser

a     5

b    10

c    15

d    20

e    25

dtype: int32

 

>>> ser>20  #if we only give condition, we get boolean result 

a    False

b    False

c    False

d    False

e     True

dtype: bool

 

 #but if we give condition inside[ ], then we get the rows which shall qualify for the condition.

>>> ser[ser>15]

d   20

e    25

dtype: int32

>>> import numpy as np

>>> serA=pd.Series(np.arange(5,30,5), index=['a','b','c','d','e'])

>>> serA

a     5

b    10

c    15

d    20

e    25

dtype: int32

>>> serB=pd.Series(np.arange(6,36,6), index=['x','y','c','d','e'])

>>> serB

x     6

y    12

c    18

d    24

e    30

dtype: int32

>>> print(serA+serB)

a     NaN

b     NaN

c    33.0

d    44.0

e    55.0

x     NaN

y     NaN

dtype: float64

>>> print(serA*serB)

a      NaN

b      NaN

c    270.0

d    480.0

e    750.0

x      NaN

y      NaN

dtype: float64

>>>print(serA,serB)

a     5

b    10

c    15

d    20

e    25

dtype: int32 

x     6

y    12

c    18

d    24

e    30

dtype: int32

 

>>> serA.add(serB,fill_value=0)

a     5.0

b    10.0

c    33.0

d    44.0

e    55.0

x     6.0

y    12.0

dtype: float64

>>> serA.sub(serB,fill_value=10)

a   -5.0

b    0.0

c   -3.0

d   -4.0

e   -5.0

x    4.0

y   -2.0

dtype: float64

>>> serA.mul(serB,fill_value=2)

a     10.0

b     20.0

c    270.0

d    480.0

e    750.0

x     12.0

y     24.0

dtype: float64

Pandas Basics (Continued)

Creating Dataframe

It is described as a multi-dimensional array that can store several forms of data. The term "index" refers to a series' row labels. The list, tuple, and dictionary are easily transformed into series using the "dataframe' method; a dataframe is just like a table with rows and columns making it easier to access any element present in table.

For creating empty Dataframe:

>>> df1=pd.DataFrame()

Dataframe can be created with several methods by using :-

  1. Arrays

  2. List of Dictionaries

  3. Dictionary of Lists

  4. Series

  5. Dictionary of Series

Creating DataFrame Using Arrays

>>> import numpy as np

>>> array1 = np.array([10,20,30])

>>> array2 = np.array([100,200,300])

>>> array3 = np.array([-100,-200,-300, -400])

>>> df1 = pd.DataFrame(array1) #From Single Array

>>> df2 = pd.DataFrame([array1,array2,array3],columns=["A","B","C","D"]) #From Multiple Array

Creating DataFrame Using List of Dictionaries

>>> listDict = [{'a':10, 'b':20}, {'a':5, 'b':10, 'c':20}]

>>> df1 = pd.DataFrame(listDict)

Creating DataFrame Using Dictionary of Lists

>>> dict= {'State': ['Goa', 'Maharashtra', 'Delhi'], 'Population': [98438, 5481, 56835] , 'Pollution' : [27, 6.72,16]}

>>> df1= pd.DataFrame(dict)

Creating DataFrame Using Series

>>> series1 = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'e'])

>>> series2 = pd.Series ([100,200,-300,-400,-1000], index = ['a', 'b', 'c', 'd', 'e'])

>>> series3 = pd.Series([12,80,10,-30,10], index = ['z', 'y', 'a', 'c', 'e'])

>>> df1 = pd.DataFrame(series1) #From Single Series

>>> df1 = pd.DataFrame([series1,series2,series3]) #From Multiple Series

Creating DataFrame Using Dictionary of Series

>>> Result={ 'Vaibhav': pd.Series([90, 91, 97], index=['Maths','Science','Hindi']),

'Akshitaa': pd.Series([92, 81, 96], index=['Maths','Science','Hindi']),

'Keshav': pd.Series([81, 71, 67], index=['Maths','Science','Hindi']),

'Nikhil': pd.Series([94, 95, 99], index=['Maths','Science','Hindi'])}

>>> ResultDF = pd.DataFrame(Result)

Attributes of Dataframe

Handling CSV (Comma Seperated Value) Files

Importing CSV Files to Dataframe

To Import CSV File to Dataframe

>>> df1=pd.read_csv( location(path to csv) , parameter)

Parameter

Handling CSV (Comma Seperated Value) Files

Exporting Dataframe to CSV Files 

To Import CSV File to Dataframe

>>> dataframename.to_csv( location(path to save/store) , parameter)

Parameter

For further information, please read the cheat sheet of Pandas Library. Click Here!

bottom of page