- pandas is an open source, BSD-licensed library providing high-performance,
easy-to-use data structures and data analysis tools for the Python programming language
- Support for the extremely powerful table i.e DATAFRAME system built off of NumPy
- Tools for reading/writing bwn many formats ( Can interact with HTML file, SQL databases too!)
- Intelligent grabbing of data based on the indexing/logic/subset etc.
- Handle missing data
- Adjust and restucture data structure
- Main Documentation Link : https://pandas.pydata.org/docs/
- Aabhar : Jose Portilla (Head of Data Science at Pierian Training) @Udemy
- SERIES -> 1 Dimensional ndarray with axis label
- Seris is a data structure in Pandas lib that holds an array of information along with a named index
- How to install pandas -> pip install pandas
- In case of error – ModuleNotFoundError: No module named ‘pandas’, open jupyter, Terminal -> Run Terminal ->(Type) pip install pandas – Successfully installed pandas-1.5.3 pytz-2022.7.1 – after installation restart your Jupyter kernel
- myScoreSeries = pd.Series(data=[55,35.0,’SeventyFive’], index=[‘Sachin’, ‘Dhoni’, ‘Kohli’])
import numpy as np
import pandas as pd
# ############# PANDA SERIES using Series() constructor
#help(pd.Series) #Upper case S
myIndex = ['Sachin', 'Dhoni', 'Kohli']
myData = [55,35,75]
mySeries = pd.Series(data=myData)
print(type(mySeries))
# RES -> <class 'pandas.core.series.Series'>
print(mySeries) # By defult int indexed
""" RES ->
0 55
1 35
2 75
dtype: int64
"""
mySeries = pd.Series(data=myData, index=myIndex)
print(mySeries)
""" RES ->
Sachin 55
Dhoni 35
Kohli 75
dtype: int64
"""
print(mySeries[0])
# RES -> 55
print(mySeries['Sachin'] , mySeries.shape)
# RES -> 55 (3,) ---> 3 rows, 1 column
# Series using Python Dictionary
myDict = {"India" : "Best", "Australia" : "Better"}
mySer = pd.Series(myDict)
print(mySer)
""" RES ->
India Best
Australia Better
dtype: object
"""
print(mySer.keys())
# RES -> Index(['India', 'Australia'], dtype='object')
print(mySer.values) # Use as attribute
# RES -> ['Best', 'Better']
ser1 = {"India" : 44, "Japan" : 40, "USA" : 65 }
ser2 = {"India" : 40, "Pak" : 24, "Nepal" : 20}
sales_q1 = pd.Series(ser1)
sales_q2 = pd.Series(ser2)
# Look what happens with a normal list
print([1, 2] * 3)
# RES -> [1, 2, 1, 2, 1, 2]
# Broadcasting -> the above operation is different in series
print(sales_q1 * 2)
""" RES ->
India 88
Japan 80
USA 130
dtype: int64
"""
print(sales_q1 + sales_q2) # Leave with NaN for the non matching keys from both the series
""" RES ->
India 84.0
Japan NaN
Nepal NaN
Pak NaN
USA NaN
dtype: float64
"""
# For a meaningful operation on series use method add, sub, mul, div - NaN will be replaced by 0.0
print( sales_q1.add(sales_q2, fill_value = 0.0) )
""" RES ->
India 84.0
Japan 40.0
Nepal 20.0
Pak 24.0
USA 65.0
dtype: float64
"""
# Traversing the series - We will have another post - it is not that straight forward
#for key in sales_q1:
# print(sales_q1[key])