How to Drop Columns in Pandas DataFrame

How to Drop Columns in Pandas DataFrame

If you are working with data in Python, removing unwanted columns from a DataFrame is something you will do in almost every project.

Whether you are cleaning a raw dataset, removing redundant features before building a machine learning model, dropping columns with too many missing values, or simply trimming a DataFrame to only the fields you need — knowing how to drop columns efficiently is a core pandas skill.

In this guide, we will cover every major way to drop columns in a pandas DataFrame with clear, practical examples you can apply immediately.

Setting Up the Example Dataset

We will use one consistent dataset throughout this guide.

python

import pandas as pd
import numpy as np

data = {
    'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'department': ['Engineering', 'Marketing', 'Engineering', 'HR', 'Sales'],
    'salary': [95000, 62000, 88000, 55000, 71000],
    'experience': [8, 4, 6, 3, 5],
    'age': [34, 29, 41, 26, 33],
    'temp_id': [101, 102, 103, 104, 105],
    'notes': [None, None, None, None, None]
}

df = pd.DataFrame(data)
print(df)

Output:

namedepartmentsalaryexperienceagetemp_idnotes
AliceEngineering95000834101None
BobMarketing62000429102None
CharlieEngineering88000641103None
DianaHR55000326104None
EveSales71000533105None

Method 1: drop()

The drop() method is the most common and explicit way to remove columns in pandas. You pass the column name and set axis=1 to tell pandas you are targeting columns, not rows.

python

# Drop a single column
df_clean = df.drop('notes', axis=1)
print(df_clean)

Output:

namedepartmentsalaryexperienceagetemp_id
AliceEngineering95000834101
BobMarketing62000429102

You can also use columns= as a keyword argument instead of axis=1. Many developers find this more readable:

python

# Same result, cleaner syntax
df_clean = df.drop(columns='notes')

Both approaches produce identical results. The columns= keyword is generally preferred for readability.

Method 2: Dropping Multiple Columns at Once

To drop more than one column, pass a list of column names to drop().

python

# Drop multiple columns
df_clean = df.drop(columns=['temp_id', 'notes'])
print(df_clean)

Output:

namedepartmentsalaryexperienceage
AliceEngineering95000834
BobMarketing62000429
CharlieEngineering88000641
DianaHR55000326
EveSales71000533

This is far cleaner than dropping columns one at a time in separate steps. Always batch your drops into a single operation when possible.

Method 3: Using inplace=True

By default, drop() returns a new DataFrame and leaves the original unchanged. If you want to modify the original DataFrame directly, use inplace=True.

python

# Modify the original DataFrame directly
df.drop(columns=['temp_id', 'notes'], inplace=True)
print(df)

After this operation, df itself no longer contains the temp_id and notes columns — no need to reassign.

When to use inplace vs reassignment:

python

# Option A — reassignment (recommended for most cases)
df_clean = df.drop(columns=['temp_id', 'notes'])

# Option B — inplace (modifies original, no return value)
df.drop(columns=['temp_id', 'notes'], inplace=True)

Most experienced pandas users prefer reassignment over inplace=True because it keeps the original DataFrame intact and makes code easier to debug. Use inplace=True only when memory is a concern or you are certain you no longer need the original.

Method 4: Drop Columns by Index Position

Sometimes you do not know the column name but you know its position. Use iloc with column index numbers to select and drop by position.

python

# Drop the column at position 5 (temp_id) and position 6 (notes)
cols_to_drop = df.columns[[5, 6]]
df_clean = df.drop(columns=cols_to_drop)
print(df_clean)

You can also drop the last N columns using negative indexing:

python

# Drop the last 2 columns
df_clean = df.drop(columns=df.columns[-2:])

This is particularly useful when working with datasets where column names are not known in advance or are auto-generated.]

Method 5: Keep Specific Columns Instead of Dropping

When you want to keep only a small number of columns, it is often cleaner to select what you want rather than drop what you do not.

python

# Keep only specific columns
cols_to_keep = ['name', 'department', 'salary']
df_clean = df[cols_to_keep]
print(df_clean)

Output:

namedepartmentsalary
AliceEngineering95000
BobMarketing62000
CharlieEngineering88000

python

# Alternative using loc
df_clean = df.loc[:, ['name', 'department', 'salary']]

If you need to drop 10 columns but keep only 3, selecting the 3 you want is much cleaner than listing all 10 to drop.

Method 6: Drop Columns With All or Most Missing Values

In real-world data cleaning, you often need to drop columns based on how many null values they contain rather than by name.

python

# Add more realistic missing data for demonstration
df['notes'] = None
df['temp_score'] = [None, None, 88, None, None]

# Drop columns where ALL values are null
df_clean = df.dropna(axis=1, how='all')
print(df_clean)

# Drop columns where MORE THAN 60% of values are null
threshold = len(df) * 0.6
df_clean = df.dropna(axis=1, thresh=int(threshold))
print(df_clean)

The thresh parameter keeps only columns that have at least that many non-null values. This is one of the most useful techniques for cleaning messy real-world datasets automatically.

Method 7: Drop Columns Using a Filter or Pattern

When working with wide datasets that have many columns, you can drop columns whose names match a pattern using string methods combined with drop().

python

# Drop all columns that start with 'temp'
temp_cols = [col for col in df.columns if col.startswith('temp')]
df_clean = df.drop(columns=temp_cols)
print(df_clean)

# Drop all columns that contain 'id' in the name
id_cols = [col for col in df.columns if 'id' in col.lower()]
df_clean = df.drop(columns=id_cols)

# Drop columns using filter() — keeps columns matching a pattern
df_clean = df.filter(regex='^(?!temp)')  # Keep columns NOT starting with 'temp'

Pattern-based dropping is extremely useful when dealing with auto-generated datasets or database exports that include many system columns you need to clean out.

Methods for Dropping Columns

MethodBest ForModifies OriginalSyntax
drop(columns='col')Single column by nameNo (returns new df)df.drop(columns='col')
drop(columns=[...])Multiple columns by nameNo (returns new df)df.drop(columns=['a','b'])
drop(..., inplace=True)Modifying in placeYesdf.drop(columns='col', inplace=True)
Index-based dropDrop by positionNodf.drop(columns=df.columns[[5,6]])
Column selectionKeeping a few columnsNodf[['col1', 'col2']]
dropna(axis=1)Drop by missing valuesNodf.dropna(axis=1, how='all')
Pattern-based dropDrop by name patternNodf.drop(columns=[c for c in df.columns if ...])

Real-World Use Cases

Data Cleaning — Remove System and Placeholder Columns

python

# Drop auto-generated or irrelevant columns after importing raw data
system_cols = ['temp_id', 'notes', 'created_by', 'last_modified']
df_clean = df.drop(columns=[c for c in system_cols if c in df.columns])
print(df_clean)

Machine Learning — Drop Non-Feature Columns Before Training

python

# Remove identifier and target columns before feeding into a model
df_features = df.drop(columns=['name', 'department'])
X = df_features.drop(columns=['salary'])  # features
y = df_features['salary']                 # target

Removing Highly Correlated Columns

python

# Drop one column from each highly correlated pair
corr_matrix = df[['salary', 'experience', 'age']].corr().abs()
upper = corr_matrix.where(
    pd.np.triu(pd.np.ones(corr_matrix.shape), k=1).astype(bool)
)
to_drop = [col for col in upper.columns if any(upper[col] > 0.90)]
df_clean = df.drop(columns=to_drop)

Common Mistakes to Avoid

Not reassigning after drop()drop() does not modify the original DataFrame by default. This is the most common beginner mistake:

python

# Wrong — df is unchanged
df.drop(columns='notes')

# Correct — capture the result
df_clean = df.drop(columns='notes')

Dropping a column that does not exist — This raises a KeyError. Always check first or use errors='ignore':

python

# Safe drop — no error if column doesn't exist
df_clean = df.drop(columns=['notes', 'missing_col'], errors='ignore')

Using axis=0 instead of axis=1axis=0 targets rows, axis=1 targets columns. Mixing them up gives unexpected results:

python

# Wrong — tries to drop a row labelled 'notes'
df.drop('notes', axis=0)

# Correct — drops the 'notes' column
df.drop('notes', axis=1)

Overusing inplace=True — While convenient, it makes code harder to debug and chain. Prefer explicit reassignment for cleaner, more predictable code.

Dropping columns without checking first — Always verify a column exists before dropping in production pipelines:

python

if 'notes' in df.columns:
    df_clean = df.drop(columns='notes')

Dropping columns in pandas is a simple operation with several variations and knowing which one to use in each situation keeps your code clean and your data pipelines robust.

Here is the quickest decision guide:

  • Drop one column by name → df.drop(columns='col')
  • Drop multiple columns → df.drop(columns=['col1', 'col2'])
  • Column might not exist → add errors='ignore'
  • Keep a few columns instead → df[['col1', 'col2']]
  • Drop by null count → df.dropna(axis=1, how='all')
  • Drop by name pattern → list comprehension + drop()
  • Modify original directly → inplace=True (use sparingly)

Start with drop(columns=...) — it is the clearest and most readable approach. Add errors='ignore' whenever you are working with data whose structure may vary. Once those feel natural, pattern-based dropping will handle even the messiest real-world datasets with ease.

FAQs

How do I drop a column in pandas?

Use df.drop(columns='column_name') and assign the result to a new variable: df_clean = df.drop(columns='notes'). By default, drop() returns a new DataFrame and leaves the original unchanged.

How do I drop multiple columns in pandas?

Pass a list of column names: df.drop(columns=['col1', 'col2']). This removes all listed columns in a single operation.

What is the difference between drop() with and without inplace=True?

Without inplace=True, drop() returns a new DataFrame. With inplace=True, it modifies the original DataFrame directly and returns None. Most developers prefer reassignment over inplace=True for cleaner, more debuggable code.

How do I drop a column that might not exist?

Use errors='ignore': df.drop(columns='col', errors='ignore'). This silently skips the column if it is not found instead of raising a KeyError.

How do I drop columns with missing values in pandas?

Use df.dropna(axis=1, how='all') to drop columns where every value is null. Use df.dropna(axis=1, thresh=n) to drop columns with fewer than n non-null values.

How do I drop columns by position instead of name?

Use df.drop(columns=df.columns[index]) to target a column by its integer position. For example, df.drop(columns=df.columns[-1]) drops the last column.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top