Pandas Filter Rows Based on Condition

Q: How do I filter rows in pandas based on a condition?

Use boolean indexing: df[df['column'] > value]. Create a condition that produces True/False values, then pass it inside square brackets to select only the True rows.

Q: How do I filter with multiple conditions in pandas?

Use & for AND and | for OR, with each condition wrapped in parentheses: df[(df['salary'] > 80000) & (df['status'] == 'Active')].

Q: What is the difference between loc[] and boolean indexing?

Boolean indexing filters rows only. loc[] filters rows AND selects specific columns simultaneously: df.loc[df['salary'] > 80000, ['name', 'salary']].

Q: How do I filter rows where a column value is in a list?

Use isin(): df[df['department'].isin(['Engineering', 'Sales'])]. To exclude values from a list use ~df['department'].isin(['HR']).

Q: How do I filter out rows with missing values in pandas?

Use notna() to keep non-null rows: df[df['salary'].notna()]. Or use dropna() to remove all rows with any missing value.

Q: What is the pandas query() method?

query() lets you filter rows using a readable SQL-style string: df.query('salary > 80000 and department == "Engineering"'). Use @variable_name to reference external Python variables inside the query string.

If you are working with data in Python, filtering rows from a DataFrame is one of the most fundamental and frequently used operations you will perform.

Whether you are removing outliers, isolating specific customer segments, filtering sales data for a particular date range, or extracting records that meet business criteria — filtering is how you go from a full dataset to the specific subset you actually need.

In this guide, we will break down every major way to filter rows in pandas based on conditions with clear, practical examples you can apply immediately.

Setting Up the Example Dataset

We will use one consistent dataset throughout this guide.

python

import pandas as pd
import numpy as np

data = {
    'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank', 'Grace'],
    'department': ['Engineering', 'Marketing', 'Engineering', 'HR', 'Sales', 'Marketing', 'Engineering'],
    'salary': [95000, 62000, 88000, 55000, 71000, 67000, 91000],
    'experience': [8, 4, 6, 3, 5, 7, 9],
    'status': ['Active', 'Active', 'Inactive', 'Active', 'Inactive', 'Active', 'Active']
}

df = pd.DataFrame(data)
print(df)

Output:

name	department	salary	experience	status
Alice	Engineering	95000	8	Active
Bob	Marketing	62000	4	Active
Charlie	Engineering	88000	6	Inactive
Diana	HR	55000	3	Active
Eve	Sales	71000	5	Inactive
Frank	Marketing	67000	7	Active
Grace	Engineering	91000	9	Active

Method 1: Boolean Indexing

Boolean indexing is the most fundamental filtering technique in pandas. You create a boolean Series (True/False values) and pass it inside square brackets to select only the True rows.

python

# Step 1: Create a boolean condition
condition = df['salary'] > 80000
print(condition)

# Output:
# 0     True   ← Alice
# 1    False   ← Bob
# 2     True   ← Charlie
# 3    False   ← Diana
# 4    False   ← Eve
# 5    False   ← Frank
# 6     True   ← Grace

# Step 2: Apply the condition to filter rows
high_earners = df[condition]
print(high_earners)

Output:

name	department	salary	experience	status
Alice	Engineering	95000	8	Active
Charlie	Engineering	88000	6	Inactive
Grace	Engineering	91000	9	Active

In practice, you combine both steps into one line:

python

high_earners = df[df['salary'] > 80000]

Common Single Condition Filters

python

# Equality
engineering = df[df['department'] == 'Engineering']

# Inequality
not_hr = df[df['department'] != 'HR']

# Greater than / Less than
senior = df[df['experience'] >= 7]
junior = df[df['experience'] < 4]

# Filter on string column
active = df[df['status'] == 'Active']

Method 2: Multiple Conditions — AND, OR, NOT

Real-world filtering almost always involves more than one condition. Pandas uses & for AND, | for OR, and ~ for NOT.

AND Conditions — Both Must Be True

python

# Engineering employees with salary above 90000
result = df[(df['department'] == 'Engineering') & (df['salary'] > 90000)]
print(result)

Output:

name	department	salary	experience	status
Alice	Engineering	95000	8	Active
Grace	Engineering	91000	9	Active

OR Conditions — At Least One Must Be True

python

# Employees in Marketing OR Sales
result = df[(df['department'] == 'Marketing') | (df['department'] == 'Sales')]
print(result)

Output:

name	department	salary	experience	status
Bob	Marketing	62000	4	Active
Eve	Sales	71000	5	Inactive
Frank	Marketing	67000	7	Active

NOT Condition — Invert the Filter

python

# All employees who are NOT in Engineering
not_engineering = df[~(df['department'] == 'Engineering')]

# Equivalent but cleaner
not_engineering = df[df['department'] != 'Engineering']

Combining Three or More Conditions

python

# Active Engineering employees with more than 7 years experience
result = df[
    (df['department'] == 'Engineering') &
    (df['status'] == 'Active') &
    (df['experience'] > 7)
]
print(result)

Output:

name	department	salary	experience	status
Alice	Engineering	95000	8	Active
Grace	Engineering	91000	9	Active

Important: Always wrap each condition in parentheses when combining with & or |. Without parentheses, Python’s operator precedence causes unexpected behavior.

python

# Wrong — will raise an error or produce wrong results
df[df['salary'] > 80000 & df['status'] == 'Active']

# Correct — each condition wrapped in parentheses
df[(df['salary'] > 80000) & (df['status'] == 'Active')]

Method 3: Using loc[] for Label-Based Filtering

loc[] filters rows by label or condition and optionally selects specific columns at the same time.

python

# Filter rows AND select specific columns in one step
result = df.loc[df['salary'] > 80000, ['name', 'salary', 'department']]
print(result)

Output:

name	salary	department
Alice	95000	Engineering
Charlie	88000	Engineering
Grace	91000	Engineering

python

# Multiple conditions with loc
result = df.loc[
    (df['department'] == 'Engineering') & (df['status'] == 'Active'),
    ['name', 'salary', 'experience']
]

loc[] is the preferred method when you need to filter rows AND select specific columns simultaneously. It is more explicit and readable than chaining.

Method 4: isin() — Filter by a List of Values

isin() is the clean way to filter rows where a column’s value matches any item in a list — replacing long chains of OR conditions.

python

# Filter multiple departments — clean way
target_departments = ['Engineering', 'Sales']
result = df[df['department'].isin(target_departments)]
print(result)

Output:

name	department	salary	experience	status
Alice	Engineering	95000	8	Active
Charlie	Engineering	88000	6	Inactive
Eve	Sales	71000	5	Inactive
Grace	Engineering	91000	9	Active

python

# Negate isin() — exclude specific values
excluded = ['HR', 'Marketing']
result = df[~df['department'].isin(excluded)]

Compare the clean isin() version to the messy alternative:

python

# Without isin() — verbose and hard to maintain
result = df[
    (df['department'] == 'Engineering') |
    (df['department'] == 'Sales')
]

# With isin() — clean and scalable
result = df[df['department'].isin(['Engineering', 'Sales'])]

Method 5: between() — Filter Numeric Ranges

between() filters rows where a column’s value falls within an inclusive range.

python

# Salaries between 65000 and 90000 (inclusive)
result = df[df['salary'].between(65000, 90000)]
print(result)

Output:

name	department	salary	experience	status
Charlie	Engineering	88000	6	Inactive
Eve	Sales	71000	5	Inactive
Frank	Marketing	67000	7	Active

python

# Experience between 5 and 8 years
mid_experience = df[df['experience'].between(5, 8)]

# Equivalent using comparison operators
mid_experience = df[(df['experience'] >= 5) & (df['experience'] <= 8)]

Both approaches produce identical results. between() is cleaner for range filters.

Method 6: String Filtering With str Methods

Pandas provides string methods through the .str accessor for filtering text columns.

python

# Contains a substring (case-sensitive)
result = df[df['department'].str.contains('ing')]
# Returns: Engineering (contains 'ing')

# Case-insensitive match
result = df[df['name'].str.contains('a', case=False)]
# Returns: Alice, Diana, Frank, Grace

# Starts with
result = df[df['department'].str.startswith('En')]
# Returns: Engineering rows

# Ends with
result = df[df['name'].str.endswith('e')]
# Returns: Alice, Charlie, Grace

# Exact length
result = df[df['name'].str.len() <= 3]
# Returns: Bob, Eve

Method 7: query() — SQL-Style Filtering

The query() method lets you filter using a SQL-like string expression — making complex filters much more readable.

python

# Simple condition
result = df.query('salary > 80000')

# Multiple conditions
result = df.query('salary > 70000 and department == "Engineering"')

# OR condition
result = df.query('department == "Marketing" or department == "Sales"')

# Using Python variables in query strings
min_salary = 70000
dept = 'Engineering'
result = df.query('salary > @min_salary and department == @dept')
print(result)

Output:

name	department	salary	experience	status
Alice	Engineering	95000	8	Active
Grace	Engineering	91000	9	Active

The @ prefix references external Python variables inside a query string.

query() vs Boolean Indexing — When to Use Each

python

# Boolean indexing — verbose but explicit, works with complex logic
result = df[(df['salary'] > 70000) & (df['department'] == 'Engineering')]

# query() — cleaner for readable conditions, great for exploration
result = df.query('salary > 70000 and department == "Engineering"')

Use query() for readable, exploratory filtering. Use boolean indexing for programmatic filtering where conditions are built dynamically.

Method 8: Filtering With notna() and isna()

Handling missing values is a critical part of real-world data filtering.

python

# Add some NaN values for demonstration
df_with_nan = df.copy()
df_with_nan.loc[1, 'salary'] = None
df_with_nan.loc[4, 'experience'] = None

# Keep only rows where salary is NOT null
clean_salary = df_with_nan[df_with_nan['salary'].notna()]

# Keep only rows WHERE salary IS null (to inspect missing data)
missing_salary = df_with_nan[df_with_nan['salary'].isna()]

# Drop rows with ANY null values
df_complete = df_with_nan.dropna()

# Drop rows with nulls only in specific columns
df_clean = df_with_nan.dropna(subset=['salary', 'experience'])

Comparison Table: Filtering Methods

Method	Best For	Syntax Style	Selects Columns Too
Boolean indexing	Simple to complex conditions	`df[df['col'] > val]`	No
loc[]	Conditions plus column selection	`df.loc[condition, cols]`	Yes
isin()	Matching multiple specific values	`df[df['col'].isin(list)]`	No
between()	Numeric or date ranges	`df[df['col'].between(a,b)]`	No
str methods	Text pattern matching	`df[df['col'].str.contains()]`	No
query()	Readable SQL-style filtering	`df.query('col > val')`	No
notna/isna()	Missing value filtering	`df[df['col'].notna()]`	No

Real-World Use Cases

HR Analytics — Active Senior Employees

python

senior_active = df[
    (df['status'] == 'Active') &
    (df['experience'] >= 7)
][['name', 'department', 'salary', 'experience']]

print(senior_active)

Salary Band Analysis

python

salary_bands = {
    'High': df[df['salary'] >= 85000],
    'Medium': df[df['salary'].between(65000, 84999)],
    'Low': df[df['salary'] < 65000]
}

for band, data in salary_bands.items():
    print(f"{band}: {len(data)} employees")

Department Performance Filter

python

# Engineering employees who are active and well-compensated
top_engineering = df.query(
    'department == "Engineering" and status == "Active" and salary > 88000'
)
print(top_engineering[['name', 'salary', 'experience']])

Common Mistakes to Avoid

Forgetting parentheses with & and | — This is the most common pandas filtering mistake. Always wrap each condition: (df['a'] > 1) & (df['b'] == 'x')
Using Python and/or instead of &/| — df[df['a'] > 1 and df['b'] == 'x'] raises an error. Always use & and | for element-wise operations in pandas
Modifying filtered results directly — Filtering returns a view, not a copy. Modifying it raises a SettingWithCopyWarning. Use .copy() when you need to modify filtered results:

python

# Safe — explicit copy
result = df[df['salary'] > 80000].copy()
result['salary'] = result['salary'] * 1.1  # No warning

Using == to check for NaN — df[df['salary'] == None] never works. Always use isna() or notna() for null checks
Overwriting the original DataFrame accidentally — Always assign filtered results to a new variable unless intentional: filtered_df = df[condition]

Filtering rows in pandas is one of those skills you will use in virtually every data analysis project. Once you are comfortable with all the methods, choosing the right one becomes intuitive.

Here is the simplest decision guide:

Single condition → Boolean indexing
Filter AND select specific columns → loc[]
Match against a list of values → isin()
Numeric or date range → between()
Text pattern matching → str methods
Readable, exploratory filtering → query()
Handle missing values → isna() / notna()

Start with boolean indexing — it is the foundation everything else builds on. Then add isin(), between(), and loc[] to your toolkit. Once those feel natural, query() makes complex filters significantly more readable.

Master these methods and transforming raw data into exactly the subset you need becomes one of the fastest steps in your entire workflow.

FAQs

How do I filter rows in pandas based on a condition?

Use boolean indexing: df[df['column'] > value]. Create a condition that produces True/False values, then pass it inside square brackets to select only the True rows.

How do I filter with multiple conditions in pandas?

Use & for AND and | for OR, with each condition wrapped in parentheses: df[(df['salary'] > 80000) & (df['status'] == 'Active')].

What is the difference between loc[] and boolean indexing?

Boolean indexing filters rows only. loc[] filters rows AND selects specific columns simultaneously: df.loc[df['salary'] > 80000, ['name', 'salary']].

How do I filter rows where a column value is in a list?

Use isin(): df[df['department'].isin(['Engineering', 'Sales'])]. To exclude values from a list use ~df['department'].isin(['HR']).

How do I filter out rows with missing values in pandas?

Use notna() to keep non-null rows: df[df['salary'].notna()]. Or use dropna() to remove all rows with any missing value.

What is the pandas query() method?

query() lets you filter rows using a readable SQL-style string: df.query('salary > 80000 and department == "Engineering"'). Use @variable_name to reference external Python variables inside the query string.

Pandas Filter Rows Based on Condition

Setting Up the Example Dataset

Method 1: Boolean Indexing

Common Single Condition Filters

Method 2: Multiple Conditions — AND, OR, NOT

AND Conditions — Both Must Be True

OR Conditions — At Least One Must Be True

NOT Condition — Invert the Filter

Combining Three or More Conditions

Method 3: Using loc[] for Label-Based Filtering

Method 4: isin() — Filter by a List of Values

Method 5: between() — Filter Numeric Ranges

Method 6: String Filtering With str Methods

Method 7: query() — SQL-Style Filtering

query() vs Boolean Indexing — When to Use Each

Method 8: Filtering With notna() and isna()

Comparison Table: Filtering Methods

Real-World Use Cases

HR Analytics — Active Senior Employees

Salary Band Analysis

Department Performance Filter

Common Mistakes to Avoid

FAQs

How do I filter rows in pandas based on a condition?

How do I filter with multiple conditions in pandas?

What is the difference between loc[] and boolean indexing?

How do I filter rows where a column value is in a list?

How do I filter out rows with missing values in pandas?

What is the pandas query() method?

Leave a Comment Cancel Reply

Copyright © 2026 codewithfimi.com - All Rights Reserved