Python Pandas – 01 – Series – A quick reference

  1. pandas is an open source, BSD-licensed library providing high-performance,
    easy-to-use data structures and data analysis tools for the Python programming language
  2. Support for the extremely powerful table i.e DATAFRAME system built off of NumPy
  3. Tools for reading/writing bwn many formats ( Can interact with HTML file, SQL databases too!)
  4. Intelligent grabbing of data based on the indexing/logic/subset etc.
  5. Handle missing data
  6. Adjust and restucture data structure
  7. Main Documentation Link : https://pandas.pydata.org/docs/
  8. Aabhar : Jose Portilla (Head of Data Science at Pierian Training) @Udemy
  9. SERIES -> 1 Dimensional ndarray with axis label
  10. Seris is a data structure in Pandas lib that holds an array of information along with a named index
  11. How to install pandas -> pip install pandas
  12. In case of error – ModuleNotFoundError: No module named ‘pandas’, open jupyter, Terminal -> Run Terminal ->(Type) pip install pandas – Successfully installed pandas-1.5.3 pytz-2022.7.1 – after installation restart your Jupyter kernel
  13. myScoreSeries = pd.Series(data=[55,35.0,’SeventyFive’], index=[‘Sachin’, ‘Dhoni’, ‘Kohli’])
import numpy as np
import pandas as pd

# ############# PANDA SERIES using Series() constructor
#help(pd.Series)  #Upper case S
myIndex = ['Sachin', 'Dhoni', 'Kohli']
myData = [55,35,75]


mySeries = pd.Series(data=myData)
print(type(mySeries))   
# RES -> <class 'pandas.core.series.Series'>
print(mySeries)  # By defult int indexed 
""" RES -> 
0    55
1    35
2    75
dtype: int64
"""

mySeries = pd.Series(data=myData, index=myIndex)
print(mySeries)
""" RES -> 
Sachin    55
Dhoni     35
Kohli     75
dtype: int64
"""

print(mySeries[0])
# RES -> 55
print(mySeries['Sachin'] , mySeries.shape)
# RES -> 55 (3,)   ---> 3 rows, 1 column

# Series using Python Dictionary
myDict = {"India" : "Best", "Australia" : "Better"}
mySer = pd.Series(myDict)
print(mySer)
""" RES -> 
India          Best
Australia    Better
dtype: object
"""

print(mySer.keys())  
# RES -> Index(['India', 'Australia'], dtype='object')
print(mySer.values) # Use as attribute
# RES -> ['Best', 'Better']

ser1 = {"India" : 44, "Japan" : 40, "USA" : 65 } 
ser2 = {"India" : 40, "Pak" : 24, "Nepal" : 20}

sales_q1 = pd.Series(ser1)
sales_q2 = pd.Series(ser2)

# Look what happens with a normal list
print([1, 2] * 3)
# RES -> [1, 2, 1, 2, 1, 2]

# Broadcasting -> the above operation is different in series
print(sales_q1 * 2)
""" RES -> 
India     88
Japan     80
USA      130
dtype: int64
"""
print(sales_q1 + sales_q2)  # Leave with NaN for the non matching keys from both the series
""" RES -> 
India    84.0
Japan     NaN
Nepal     NaN
Pak       NaN
USA       NaN
dtype: float64
"""
# For a meaningful operation on series use method add, sub, mul, div - NaN will be replaced by 0.0
print( sales_q1.add(sales_q2, fill_value = 0.0) )
""" RES -> 
India    84.0
Japan    40.0
Nepal    20.0
Pak      24.0
USA      65.0
dtype: float64
"""

# Traversing the series - We will have another post - it is not that straight forward
#for key in sales_q1:
#    print(sales_q1[key])