Learn how data analysts and machine learning engineers use NumPy and Pandas to clean, analyze, and understand data.
Data Analysis is the process of inspecting, cleaning, transforming, and modeling data to discover useful information and support decision-making.
Real-life example:
An e-commerce company analyzes sales data to find best-selling products and customer behavior.
NumPy is used for fast numerical computation. It works with multi-dimensional arrays and matrices.
import numpy as np
data = np.array([10, 20, 30, 40, 50])
print("Mean:", np.mean(data))
print("Sum:", np.sum(data))
print("Max:", np.max(data))
Pandas is built on top of NumPy and is used for working with structured data like tables.
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie"],
"Salary": [50000, 60000, 55000]
}
df = pd.DataFrame(data)
print(df)
Pandas provides two main data structures:
series = pd.Series([100, 200, 300])
print(series)
df = pd.DataFrame({
"Product": ["A", "B", "C"],
"Price": [10, 20, 30]
})
print(df)
Real-world data is messy. Pandas helps clean it.
df.isnull()
df.dropna()
df.fillna(0)
print(df.describe())
print(df["Price"].mean())
print(df.groupby("Product").sum())
df["Price"] = df["Price"] * np.array([1.1, 1.2, 1.3])
print(df)