Pandas

Introduction to Pandas :

Pandas is a powerful open-source data analysis and manipulation tool built on top of the Python programming language. It provides data structures and functions needed to work with structured data seamlessly and efficiently.

Why Use Pandas?

Easy handling of missing data.
Size mutability: columns can be inserted and deleted from DataFrames.
Automatic and explicit data alignment.
Powerful, flexible group by functionality for performing split-apply-combine operations on datasets.

Getting Started :

To start using Pandas, you'll need to install it first:

pip install pandas

Pandas DataFrame Table

This image represents a Pandas DataFrame table structure, showcasing how data is organized in rows and columns. A crucial tool for data analysis!

A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the data.frame in R.

Basic Example :

Here's a quick example to show how Pandas works:

import pandas as pd

        
        
    
    # Creating a DataFrame
    data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [24, 27, 22]}
    df = pd.DataFrame(data)
    
    print(df)

Reading and Writing Data

Let's explore some fundamental operations you can perform with Pandas.

How to read and write tabular data? :

The image above represents the various methods Pandas offers to read from and write to different file formats. This functionality is essential for data manipulation and analysis. Whether you’re dealing with CSV, Excel, JSON, or SQL databases, Pandas provides straightforward functions to import and export data seamlessly. Using these methods, you can efficiently bring data into your workspace and export your analysis results, making Pandas an incredibly versatile tool for data scientists and analysts. 📊

pandas provides the read_csv() function to read data stored as a csv file into a pandas DataFrame. pandas supports many different file formats or data sources out of the box (csv, excel, sql, json, parquet, …), each of them with the prefix read_*.:

import pandas as pd
    
    # Reading a CSV file
    df = pd.read_csv('filename.csv')
    
    print(df.head())  # Display the first 5 rows

Data Inspection :

It’s essential to understand your data. Pandas provides several methods to inspect the data:

# Display the first few rows
    print(df.head())
    
    # Display summary statistics
    print(df.describe())
    
    # Display information about the DataFrame
    print(df.info())

Filtering Data :

You can filter data based on certain conditions:

# Filter rows where age is greater than 25
    filtered_df = df[df['Age'] > 25]
    
    print(filtered_df)

Selecting Data Subsets

The image shows selecting specific columns from a Pandas DataFrame, making data extraction easy and efficient. 📊

Selecting Specific Columns:

You can select any columns. For example, to select the 'Age' column:

import pandas as pd

# Example DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# Select the 'Age' column
age_column = df['Age']
print(age_column)

Selecting Specific Rows and Columns:

We can select specific rows and columns from a DataFrame. For example:

import pandas as pd

# Example DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# Select specific rows and columns (e.g., rows 0 and 2, and columns 'Name' and 'Age')
selected_data = df.loc[[0, 2], ['Name', 'Age']]
print(selected_data)

filtering specific rows from a DataFrame :

You can filter rows in a DataFrame based on specific conditions. For example, to select rows where the age is greater than 30:

import pandas as pd

# Example DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# Filter rows where Age is greater than 30
filtered_data = df[df['Age'] > 30]
print(filtered_data)

creating plots in pandas

Pandas provides a convenient way to create plots using the built-in plotting functions. You can visualize your data quickly with just a few lines of code. For example, to create a simple line plot:

import pandas as pd
import matplotlib.pyplot as plt

# Example DataFrame
data = {
    'Year': [2020, 2021, 2022, 2023],
    'Sales': [150, 200, 250, 300]
}
df = pd.DataFrame(data)

# Create a line plot
df.plot(x='Year', y='Sales', kind='line', marker='o')
plt.title('Sales Over Years')
plt.xlabel('Year')
plt.ylabel('Sales')
plt.grid()
plt.show()