Data Analysis with Pandas & NumPy

Learn how data analysts and machine learning engineers use NumPy and Pandas to clean, analyze, and understand data.

1️⃣ What is Data Analysis?

Data Analysis is the process of inspecting, cleaning, transforming, and modeling data to discover useful information and support decision-making.

Real-life example:

An e-commerce company analyzes sales data to find best-selling products and customer behavior.

2️⃣ NumPy in Data Analysis

NumPy is used for fast numerical computation. It works with multi-dimensional arrays and matrices.

import numpy as np

data = np.array([10, 20, 30, 40, 50])

print("Mean:", np.mean(data))
print("Sum:", np.sum(data))
print("Max:", np.max(data))
      

3️⃣ Pandas in Data Analysis

Pandas is built on top of NumPy and is used for working with structured data like tables.

import pandas as pd

data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Salary": [50000, 60000, 55000]
}

df = pd.DataFrame(data)
print(df)
      

4️⃣ Pandas Series & DataFrame

Pandas provides two main data structures:

series = pd.Series([100, 200, 300])
print(series)

df = pd.DataFrame({
    "Product": ["A", "B", "C"],
    "Price": [10, 20, 30]
})
print(df)
      

5️⃣ Data Cleaning

Real-world data is messy. Pandas helps clean it.

df.isnull()
df.dropna()
df.fillna(0)
      

6️⃣ Data Analysis Operations

print(df.describe())
print(df["Price"].mean())
print(df.groupby("Product").sum())
      

7️⃣ Using NumPy with Pandas

df["Price"] = df["Price"] * np.array([1.1, 1.2, 1.3])
print(df)
      

8️⃣ Real-World Use Cases

9️⃣ Typical Data Analysis Workflow

  1. Load data (CSV, Excel, API)
  2. Inspect & clean data
  3. Analyze patterns
  4. Visualize insights
  5. Prepare data for ML

🔗 External References