22 Pandas Data Cleaning Examples with Output

Real data is messy.

Missing values.
Wrong data types.
Duplicates.
Inconsistent text.

That’s why data cleaning is one of the most important skills for data analysts, and Pandas is the tool most people use.

Below are 22 practical Pandas data cleaning examples, exactly the kind you’ll use in real projects and jobs.

Why Data Cleaning Matters in Pandas

Before analysis, visualization, or AI:

Bad data = bad insights
Clean data = reliable decisions

Most real-world analysis time is spent cleaning, not modeling.

22 Pandas Data Cleaning Examples (With Output)

Assume we’re working with a DataFrame called df.

1. View the first rows

df.head()

Output: Displays the first 5 rows.

2. Check dataset structure

df.info()

Output: Columns, data types, missing values.

3. Count missing values

df.isnull().sum()

Output: Number of missing values per column.

4. Drop rows with missing values

df.dropna()

Output: Rows with nulls removed.

5. Fill missing values

df.fillna(0)

Output: Nulls replaced with 0.

6. Fill missing values with column mean

df['sales'].fillna(df['sales'].mean())

Output: Missing sales filled with average.

7. Remove duplicate rows

df.drop_duplicates()

Output: Duplicate rows removed.

8. Convert data types

df['date'] = pd.to_datetime(df['date'])

Output: Column converted to datetime.

9. Rename columns

df.rename(columns={'Total Sales': 'total_sales'})

Output: Cleaner column names.

10. Strip whitespace

df['name'] = df['name'].str.strip()

Output: Extra spaces removed.

11. Convert text to lowercase

df['city'] = df['city'].str.lower()

Output: Consistent text values.

12. Replace values

df['gender'] = df['gender'].replace('M', 'Male')

Output: Standardized categories.

13. Filter invalid values

df = df[df['age'] > 0]

Output: Invalid ages removed.

14. Remove outliers

df = df[df['salary'] < 200000]

Output: Extreme values excluded.

15. Split a column

df[['first_name', 'last_name']] = df['name'].str.split(' ', expand=True)

Output: Name split into two columns.

16. Create a new calculated column

df['revenue'] = df['price'] * df['quantity']

Output: New column added.

17. Change column order

df = df[['date', 'customer', 'revenue']]

Output: Reordered columns.

18. Remove unwanted columns

df.drop(columns=['unnecessary_column'])

Output: Column removed.

19. Handle inconsistent categories

df['status'] = df['status'].str.title()

Output: Consistent formatting.

20. Reset index

df.reset_index(drop=True)

Output: Clean index starting from 0.

21. Detect duplicates

df.duplicated()

Output: Boolean mask showing duplicates.

22. Save cleaned data

df.to_csv('clean_data.csv', index=False)

Output: Clean dataset saved.

Common Pandas Cleaning Mistakes

Skipping info() and isnull()
Cleaning without understanding the data
Removing too much data
Not documenting cleaning steps

Cleaning should be intentional, not random.

Why Interviewers Care About This Skill

Interviewers often ask:

“How do you handle missing data?”
“How do you clean messy datasets?”

These examples answer those questions directly.

If you can clean data with Pandas, you can:

Trust your analysis
Build better dashboards
Work with real datasets

Data cleaning is not optional, it’s the foundation.

FAQs

1. Why is data cleaning important in Pandas?

Because messy data leads to incorrect analysis and decisions.

2. Are these Pandas examples beginner-friendly?

Yes. They focus on fundamentals used in real projects.

3. Do interviewers ask about Pandas data cleaning?

Very often. It’s a core data analyst skill.

4. Should I clean data before visualization?

Always. Visualization depends on clean data.

5. Is Pandas enough for real-world data cleaning?

Yes, for most analyst and junior roles