Pandas Filter Rows Based on Condition

Pandas Filter Rows Based on Condition

If you are working with data in Python, filtering rows from a DataFrame is one of the most fundamental and frequently used operations you will perform.

Whether you are removing outliers, isolating specific customer segments, filtering sales data for a particular date range, or extracting records that meet business criteria — filtering is how you go from a full dataset to the specific subset you actually need.

In this guide, we will break down every major way to filter rows in pandas based on conditions with clear, practical examples you can apply immediately.

Setting Up the Example Dataset

We will use one consistent dataset throughout this guide.

python

import pandas as pd
import numpy as np

data = {
    'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank', 'Grace'],
    'department': ['Engineering', 'Marketing', 'Engineering', 'HR', 'Sales', 'Marketing', 'Engineering'],
    'salary': [95000, 62000, 88000, 55000, 71000, 67000, 91000],
    'experience': [8, 4, 6, 3, 5, 7, 9],
    'status': ['Active', 'Active', 'Inactive', 'Active', 'Inactive', 'Active', 'Active']
}

df = pd.DataFrame(data)
print(df)

Output:

namedepartmentsalaryexperiencestatus
AliceEngineering950008Active
BobMarketing620004Active
CharlieEngineering880006Inactive
DianaHR550003Active
EveSales710005Inactive
FrankMarketing670007Active
GraceEngineering910009Active

Method 1: Boolean Indexing

Boolean indexing is the most fundamental filtering technique in pandas. You create a boolean Series (True/False values) and pass it inside square brackets to select only the True rows.

python

# Step 1: Create a boolean condition
condition = df['salary'] > 80000
print(condition)

# Output:
# 0     True   ← Alice
# 1    False   ← Bob
# 2     True   ← Charlie
# 3    False   ← Diana
# 4    False   ← Eve
# 5    False   ← Frank
# 6     True   ← Grace

# Step 2: Apply the condition to filter rows
high_earners = df[condition]
print(high_earners)

Output:

namedepartmentsalaryexperiencestatus
AliceEngineering950008Active
CharlieEngineering880006Inactive
GraceEngineering910009Active

In practice, you combine both steps into one line:

python

high_earners = df[df['salary'] > 80000]

Common Single Condition Filters

python

# Equality
engineering = df[df['department'] == 'Engineering']

# Inequality
not_hr = df[df['department'] != 'HR']

# Greater than / Less than
senior = df[df['experience'] >= 7]
junior = df[df['experience'] < 4]

# Filter on string column
active = df[df['status'] == 'Active']

Method 2: Multiple Conditions — AND, OR, NOT

Real-world filtering almost always involves more than one condition. Pandas uses & for AND, | for OR, and ~ for NOT.

AND Conditions — Both Must Be True

python

# Engineering employees with salary above 90000
result = df[(df['department'] == 'Engineering') & (df['salary'] > 90000)]
print(result)

Output:

namedepartmentsalaryexperiencestatus
AliceEngineering950008Active
GraceEngineering910009Active

OR Conditions — At Least One Must Be True

python

# Employees in Marketing OR Sales
result = df[(df['department'] == 'Marketing') | (df['department'] == 'Sales')]
print(result)

Output:

namedepartmentsalaryexperiencestatus
BobMarketing620004Active
EveSales710005Inactive
FrankMarketing670007Active

NOT Condition — Invert the Filter

python

# All employees who are NOT in Engineering
not_engineering = df[~(df['department'] == 'Engineering')]

# Equivalent but cleaner
not_engineering = df[df['department'] != 'Engineering']

Combining Three or More Conditions

python

# Active Engineering employees with more than 7 years experience
result = df[
    (df['department'] == 'Engineering') &
    (df['status'] == 'Active') &
    (df['experience'] > 7)
]
print(result)

Output:

namedepartmentsalaryexperiencestatus
AliceEngineering950008Active
GraceEngineering910009Active

Important: Always wrap each condition in parentheses when combining with & or |. Without parentheses, Python’s operator precedence causes unexpected behavior.

python

# Wrong — will raise an error or produce wrong results
df[df['salary'] > 80000 & df['status'] == 'Active']

# Correct — each condition wrapped in parentheses
df[(df['salary'] > 80000) & (df['status'] == 'Active')]

Method 3: Using loc[] for Label-Based Filtering

loc[] filters rows by label or condition and optionally selects specific columns at the same time.

python

# Filter rows AND select specific columns in one step
result = df.loc[df['salary'] > 80000, ['name', 'salary', 'department']]
print(result)

Output:

namesalarydepartment
Alice95000Engineering
Charlie88000Engineering
Grace91000Engineering

python

# Multiple conditions with loc
result = df.loc[
    (df['department'] == 'Engineering') & (df['status'] == 'Active'),
    ['name', 'salary', 'experience']
]

loc[] is the preferred method when you need to filter rows AND select specific columns simultaneously. It is more explicit and readable than chaining.

Method 4: isin() — Filter by a List of Values

isin() is the clean way to filter rows where a column’s value matches any item in a list — replacing long chains of OR conditions.

python

# Filter multiple departments — clean way
target_departments = ['Engineering', 'Sales']
result = df[df['department'].isin(target_departments)]
print(result)

Output:

namedepartmentsalaryexperiencestatus
AliceEngineering950008Active
CharlieEngineering880006Inactive
EveSales710005Inactive
GraceEngineering910009Active

python

# Negate isin() — exclude specific values
excluded = ['HR', 'Marketing']
result = df[~df['department'].isin(excluded)]

Compare the clean isin() version to the messy alternative:

python

# Without isin() — verbose and hard to maintain
result = df[
    (df['department'] == 'Engineering') |
    (df['department'] == 'Sales')
]

# With isin() — clean and scalable
result = df[df['department'].isin(['Engineering', 'Sales'])]

Method 5: between() — Filter Numeric Ranges

between() filters rows where a column’s value falls within an inclusive range.

python

# Salaries between 65000 and 90000 (inclusive)
result = df[df['salary'].between(65000, 90000)]
print(result)

Output:

namedepartmentsalaryexperiencestatus
CharlieEngineering880006Inactive
EveSales710005Inactive
FrankMarketing670007Active

python

# Experience between 5 and 8 years
mid_experience = df[df['experience'].between(5, 8)]

# Equivalent using comparison operators
mid_experience = df[(df['experience'] >= 5) & (df['experience'] <= 8)]

Both approaches produce identical results. between() is cleaner for range filters.

Method 6: String Filtering With str Methods

Pandas provides string methods through the .str accessor for filtering text columns.

python

# Contains a substring (case-sensitive)
result = df[df['department'].str.contains('ing')]
# Returns: Engineering (contains 'ing')

# Case-insensitive match
result = df[df['name'].str.contains('a', case=False)]
# Returns: Alice, Diana, Frank, Grace

# Starts with
result = df[df['department'].str.startswith('En')]
# Returns: Engineering rows

# Ends with
result = df[df['name'].str.endswith('e')]
# Returns: Alice, Charlie, Grace

# Exact length
result = df[df['name'].str.len() <= 3]
# Returns: Bob, Eve

Method 7: query() — SQL-Style Filtering

The query() method lets you filter using a SQL-like string expression — making complex filters much more readable.

python

# Simple condition
result = df.query('salary > 80000')

# Multiple conditions
result = df.query('salary > 70000 and department == "Engineering"')

# OR condition
result = df.query('department == "Marketing" or department == "Sales"')

# Using Python variables in query strings
min_salary = 70000
dept = 'Engineering'
result = df.query('salary > @min_salary and department == @dept')
print(result)

Output:

namedepartmentsalaryexperiencestatus
AliceEngineering950008Active
GraceEngineering910009Active

The @ prefix references external Python variables inside a query string.

query() vs Boolean Indexing — When to Use Each

python

# Boolean indexing — verbose but explicit, works with complex logic
result = df[(df['salary'] > 70000) & (df['department'] == 'Engineering')]

# query() — cleaner for readable conditions, great for exploration
result = df.query('salary > 70000 and department == "Engineering"')

Use query() for readable, exploratory filtering. Use boolean indexing for programmatic filtering where conditions are built dynamically.

Method 8: Filtering With notna() and isna()

Handling missing values is a critical part of real-world data filtering.

python

# Add some NaN values for demonstration
df_with_nan = df.copy()
df_with_nan.loc[1, 'salary'] = None
df_with_nan.loc[4, 'experience'] = None

# Keep only rows where salary is NOT null
clean_salary = df_with_nan[df_with_nan['salary'].notna()]

# Keep only rows WHERE salary IS null (to inspect missing data)
missing_salary = df_with_nan[df_with_nan['salary'].isna()]

# Drop rows with ANY null values
df_complete = df_with_nan.dropna()

# Drop rows with nulls only in specific columns
df_clean = df_with_nan.dropna(subset=['salary', 'experience'])

Comparison Table: Filtering Methods

MethodBest ForSyntax StyleSelects Columns Too
Boolean indexingSimple to complex conditionsdf[df['col'] > val] No
loc[]Conditions plus column selectiondf.loc[condition, cols] Yes
isin()Matching multiple specific valuesdf[df['col'].isin(list)] No
between()Numeric or date rangesdf[df['col'].between(a,b)] No
str methodsText pattern matchingdf[df['col'].str.contains()] No
query()Readable SQL-style filteringdf.query('col > val') No
notna/isna()Missing value filteringdf[df['col'].notna()] No

Real-World Use Cases

HR Analytics — Active Senior Employees

python

senior_active = df[
    (df['status'] == 'Active') &
    (df['experience'] >= 7)
][['name', 'department', 'salary', 'experience']]

print(senior_active)

Salary Band Analysis

python

salary_bands = {
    'High': df[df['salary'] >= 85000],
    'Medium': df[df['salary'].between(65000, 84999)],
    'Low': df[df['salary'] < 65000]
}

for band, data in salary_bands.items():
    print(f"{band}: {len(data)} employees")

Department Performance Filter

python

# Engineering employees who are active and well-compensated
top_engineering = df.query(
    'department == "Engineering" and status == "Active" and salary > 88000'
)
print(top_engineering[['name', 'salary', 'experience']])

Common Mistakes to Avoid

  • Forgetting parentheses with & and | — This is the most common pandas filtering mistake. Always wrap each condition: (df['a'] > 1) & (df['b'] == 'x')
  • Using Python and/or instead of &/|df[df['a'] > 1 and df['b'] == 'x'] raises an error. Always use & and | for element-wise operations in pandas
  • Modifying filtered results directly — Filtering returns a view, not a copy. Modifying it raises a SettingWithCopyWarning. Use .copy() when you need to modify filtered results:

python

# Safe — explicit copy
result = df[df['salary'] > 80000].copy()
result['salary'] = result['salary'] * 1.1  # No warning
  • Using == to check for NaNdf[df['salary'] == None] never works. Always use isna() or notna() for null checks
  • Overwriting the original DataFrame accidentally — Always assign filtered results to a new variable unless intentional: filtered_df = df[condition]

Filtering rows in pandas is one of those skills you will use in virtually every data analysis project. Once you are comfortable with all the methods, choosing the right one becomes intuitive.

Here is the simplest decision guide:

  • Single condition → Boolean indexing
  • Filter AND select specific columns → loc[]
  • Match against a list of values → isin()
  • Numeric or date range → between()
  • Text pattern matching → str methods
  • Readable, exploratory filtering → query()
  • Handle missing values → isna() / notna()

Start with boolean indexing — it is the foundation everything else builds on. Then add isin(), between(), and loc[] to your toolkit. Once those feel natural, query() makes complex filters significantly more readable.

Master these methods and transforming raw data into exactly the subset you need becomes one of the fastest steps in your entire workflow.

FAQs

How do I filter rows in pandas based on a condition?

Use boolean indexing: df[df['column'] > value]. Create a condition that produces True/False values, then pass it inside square brackets to select only the True rows.

How do I filter with multiple conditions in pandas?

Use & for AND and | for OR, with each condition wrapped in parentheses: df[(df['salary'] > 80000) & (df['status'] == 'Active')].

What is the difference between loc[] and boolean indexing?

Boolean indexing filters rows only. loc[] filters rows AND selects specific columns simultaneously: df.loc[df['salary'] > 80000, ['name', 'salary']].

How do I filter rows where a column value is in a list?

Use isin(): df[df['department'].isin(['Engineering', 'Sales'])]. To exclude values from a list use ~df['department'].isin(['HR']).

How do I filter out rows with missing values in pandas?

Use notna() to keep non-null rows: df[df['salary'].notna()]. Or use dropna() to remove all rows with any missing value.

What is the pandas query() method?

query() lets you filter rows using a readable SQL-style string: df.query('salary > 80000 and department == "Engineering"'). Use @variable_name to reference external Python variables inside the query string.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top