Real data is messy.
Missing values.
Wrong data types.
Duplicates.
Inconsistent text.
That’s why data cleaning is one of the most important skills for data analysts, and Pandas is the tool most people use.
Below are 22 practical Pandas data cleaning examples, exactly the kind you’ll use in real projects and jobs.
Why Data Cleaning Matters in Pandas
Before analysis, visualization, or AI:
- Bad data = bad insights
- Clean data = reliable decisions
Most real-world analysis time is spent cleaning, not modeling.
22 Pandas Data Cleaning Examples (With Output)
Assume we’re working with a DataFrame called df.
1. View the first rows
df.head()
Output: Displays the first 5 rows.
2. Check dataset structure
df.info()
Output: Columns, data types, missing values.
3. Count missing values
df.isnull().sum()
Output: Number of missing values per column.
4. Drop rows with missing values
df.dropna()
Output: Rows with nulls removed.
5. Fill missing values
df.fillna(0)
Output: Nulls replaced with 0.
6. Fill missing values with column mean
df['sales'].fillna(df['sales'].mean())
Output: Missing sales filled with average.
7. Remove duplicate rows
df.drop_duplicates()
Output: Duplicate rows removed.
8. Convert data types
df['date'] = pd.to_datetime(df['date'])
Output: Column converted to datetime.
9. Rename columns
df.rename(columns={'Total Sales': 'total_sales'})
Output: Cleaner column names.
10. Strip whitespace
df['name'] = df['name'].str.strip()
Output: Extra spaces removed.
11. Convert text to lowercase
df['city'] = df['city'].str.lower()
Output: Consistent text values.
12. Replace values
df['gender'] = df['gender'].replace('M', 'Male')
Output: Standardized categories.
13. Filter invalid values
df = df[df['age'] > 0]
Output: Invalid ages removed.
14. Remove outliers
df = df[df['salary'] < 200000]
Output: Extreme values excluded.
15. Split a column
df[['first_name', 'last_name']] = df['name'].str.split(' ', expand=True)
Output: Name split into two columns.
16. Create a new calculated column
df['revenue'] = df['price'] * df['quantity']
Output: New column added.
17. Change column order
df = df[['date', 'customer', 'revenue']]
Output: Reordered columns.
18. Remove unwanted columns
df.drop(columns=['unnecessary_column'])
Output: Column removed.
19. Handle inconsistent categories
df['status'] = df['status'].str.title()
Output: Consistent formatting.
20. Reset index
df.reset_index(drop=True)
Output: Clean index starting from 0.
21. Detect duplicates
df.duplicated()
Output: Boolean mask showing duplicates.
22. Save cleaned data
df.to_csv('clean_data.csv', index=False)
Output: Clean dataset saved.
Common Pandas Cleaning Mistakes
- Skipping
info()andisnull() - Cleaning without understanding the data
- Removing too much data
- Not documenting cleaning steps
Cleaning should be intentional, not random.
Why Interviewers Care About This Skill
Interviewers often ask:
- “How do you handle missing data?”
- “How do you clean messy datasets?”
These examples answer those questions directly.
If you can clean data with Pandas, you can:
- Trust your analysis
- Build better dashboards
- Work with real datasets
Data cleaning is not optional, it’s the foundation.
FAQs
1. Why is data cleaning important in Pandas?
Because messy data leads to incorrect analysis and decisions.
2. Are these Pandas examples beginner-friendly?
Yes. They focus on fundamentals used in real projects.
3. Do interviewers ask about Pandas data cleaning?
Very often. It’s a core data analyst skill.
4. Should I clean data before visualization?
Always. Visualization depends on clean data.
5. Is Pandas enough for real-world data cleaning?
Yes, for most analyst and junior roles