Data Analysis with Pandas & NumPy

Learn how data analysts and machine learning engineers use NumPy and Pandas to clean, analyze, and understand data.

1️⃣ What is Data Analysis?

Data Analysis is the process of inspecting, cleaning, transforming, and modeling data to discover useful information and support decision-making.

Real-life example:

An e-commerce company analyzes sales data to find best-selling products and customer behavior.

NumPy is used for fast numerical computation. It works with multi-dimensional arrays and matrices.

import numpy as np

data = np.array([10, 20, 30, 40, 50])

print("Mean:", np.mean(data))
print("Sum:", np.sum(data))
print("Max:", np.max(data))

Pandas is built on top of NumPy and is used for working with structured data like tables.

import pandas as pd

data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Salary": [50000, 60000, 55000]
}

df = pd.DataFrame(data)
print(df)

Pandas provides two main data structures:

series = pd.Series([100, 200, 300])
print(series)

df = pd.DataFrame({
    "Product": ["A", "B", "C"],
    "Price": [10, 20, 30]
})
print(df)

Real-world data is messy. Pandas helps clean it.

df.isnull()
df.dropna()
df.fillna(0)

print(df.describe())
print(df["Price"].mean())
print(df.groupby("Product").sum())

df["Price"] = df["Price"] * np.array([1.1, 1.2, 1.3])
print(df)