If you have worked with pandas for any length of time, you have almost certainly encountered a situation where your DataFrame has a messy, non-sequential, or confusing index and you needed to reset it back to a clean 0, 1, 2, 3 sequence.
That is exactly what reset_index() does. But it does more than just fix messy row numbers — it is an essential tool for cleaning up after groupby operations, converting index columns back to regular columns, and managing MultiIndex DataFrames.
In this guide, we will walk through everything you need to know about reset_index() in pandas with clear examples, real-world use cases, and practical tips.
Why Does the Index Matter in Pandas?
Every pandas DataFrame has an index — a label for each row. By default, it is a sequential integer starting at 0. But the index changes in ways that are not always obvious:
- Filtering rows leaves gaps in the index
- Sorting reorders rows but keeps the original index values
- groupby() and agg() promote group columns into the index
- concat() may produce duplicate index values
- pivot() uses a column as the index
In all these situations, your DataFrame ends up with an index that no longer represents a clean sequential range and that is when reset_index() saves you.
Setting Up the Example Dataset
python
import pandas as pd
import numpy as np
data = {
'name': ['Alice', 'Bob', 'Charlie', 'Diana',
'Eve', 'Frank', 'Grace', 'Henry'],
'department': ['Engineering', 'Marketing', 'Engineering',
'HR', 'Sales', 'Marketing', 'Engineering', 'HR'],
'salary': [95000, 62000, 88000, 55000,
71000, 67000, 92000, 58000],
'years_exp': [8, 4, 6, 3, 5, 7, 9, 3]
}
df = pd.DataFrame(data)
print(df)
Output:
| name | department | salary | years_exp | |
|---|---|---|---|---|
| 0 | Alice | Engineering | 95000 | 8 |
| 1 | Bob | Marketing | 62000 | 4 |
| 2 | Charlie | Engineering | 88000 | 6 |
| 3 | Diana | HR | 55000 | 3 |
| 4 | Eve | Sales | 71000 | 5 |
| 5 | Frank | Marketing | 67000 | 7 |
| 6 | Grace | Engineering | 92000 | 9 |
| 7 | Henry | HR | 58000 | 3 |
The Core Problem — When Indexes Go Wrong
Problem 1: Gaps After Filtering
python
# Filter to Engineering employees only
engineering = df[df['department'] == 'Engineering']
print(engineering)
Output:
| name | department | salary | years_exp | |
|---|---|---|---|---|
| 0 | Alice | Engineering | 95000 | 8 |
| 2 | Charlie | Engineering | 88000 | 6 |
| 6 | Grace | Engineering | 92000 | 9 |
The index jumps from 0 to 2 to 6 — gaps from the rows that were filtered out. This causes problems when code expects a clean sequential index.
Problem 2: Index Becomes a Column After groupby()
python
grouped = df.groupby('department')['salary'].mean()
print(grouped)
print(type(grouped.index))
Output:
department
Engineering 91666.67
HR 56500.00
Marketing 64500.00
Sales 71000.00
Name: salary, dtype: float64
<class 'pandas.core.indexes.base.Index'>
Department is now the index — not a regular column. You cannot access it as grouped['department'] the way you would a normal column.
Problem 3: Duplicate Index After concat()
python
df1 = df.iloc[:4]
df2 = df.iloc[4:]
combined = pd.concat([df1, df2])
print(combined.index.tolist())
# Output: [0, 1, 2, 3, 4, 5, 6, 7] — looks fine here
# But if you had reset df2's index separately, you might get [0,1,2,3,0,1,2,3]
Basic reset_index() Usage
The simplest use — reset the index to a clean 0, 1, 2 sequence.
python
engineering = df[df['department'] == 'Engineering']
print("Before reset:")
print(engineering)
engineering_reset = engineering.reset_index()
print("\nAfter reset:")
print(engineering_reset)
After reset output:
| index | name | department | salary | years_exp | |
|---|---|---|---|---|---|
| 0 | 0 | Alice | Engineering | 95000 | 8 |
| 1 | 2 | Charlie | Engineering | 88000 | 6 |
| 2 | 6 | Grace | Engineering | 92000 | 9 |
The new index is 0, 1, 2. But notice — the old index values (0, 2, 6) are preserved as a new column called “index”. This is the default behavior.
The drop Parameter — Most Important Option
In most cases, you do not want the old index preserved as a column. Use drop=True to discard it.
python
engineering_clean = engineering.reset_index(drop=True)
print(engineering_clean)
Output:
| name | department | salary | years_exp | |
|---|---|---|---|---|
| 0 | Alice | Engineering | 95000 | 8 |
| 1 | Charlie | Engineering | 88000 | 6 |
| 2 | Grace | Engineering | 92000 | 9 |
Clean sequential index. Old index values discarded. This is the most common way to use reset_index().
drop=True → discard old index (most common — use this by default)
drop=False → keep old index as a column (default behavior)
The inplace Parameter
By default, reset_index() returns a new DataFrame — it does not modify the original.
python
# Creates a new DataFrame — df is unchanged
new_df = df.reset_index(drop=True)
# Modifies df directly — no assignment needed
df.reset_index(drop=True, inplace=True)
Best practice: Avoid inplace=True in most cases. Explicit assignment (df = df.reset_index(drop=True)) is clearer and easier to debug. The inplace parameter may also be deprecated in future pandas versions.
reset_index() After groupby() — The Most Common Use Case
This is where reset_index() is used most frequently — converting the grouped index back to a regular column after aggregation.
python
# groupby returns department as the index
result = df.groupby('department')['salary'].mean()
print(result)
print("\nIndex type:", type(result.index))
Output:
department
Engineering 91666.67
HR 56500.00
Marketing 64500.00
Sales 71000.00
Department is the index — not a column. You cannot pass this to most visualization tools or use it in further filtering without resetting.
python
# reset_index converts department back to a column
result_flat = df.groupby('department')['salary'].mean().reset_index()
print(result_flat)
Output:
| department | salary | |
|---|---|---|
| 0 | Engineering | 91666.67 |
| 1 | HR | 56500.00 |
| 2 | Marketing | 64500.00 |
| 3 | Sales | 71000.00 |
Now department is a regular column — easy to sort, filter, merge, and visualize.
Renaming the Aggregated Column
python
result = (
df.groupby('department')['salary']
.mean()
.reset_index()
.rename(columns={'salary': 'avg_salary'})
)
print(result)
Output:
| department | avg_salary | |
|---|---|---|
| 0 | Engineering | 91666.67 |
| 1 | HR | 56500.00 |
| 2 | Marketing | 64500.00 |
| 3 | Sales | 71000.00 |
reset_index() After Multiple groupby Columns
When grouping by multiple columns, all grouped columns become index levels — reset_index() brings them all back.
python
multi_grouped = df.groupby(['department', 'years_exp'])['salary'].mean()
print(multi_grouped)
Output:
department years_exp
Engineering 6 88000.0
8 95000.0
9 92000.0
HR 3 56500.0
Marketing 4 62000.0
7 67000.0
Sales 5 71000.0
Both department and years_exp are index levels. reset_index() brings both back as columns.
python
flat_result = multi_grouped.reset_index()
print(flat_result)
Output:
| department | years_exp | salary | |
|---|---|---|---|
| 0 | Engineering | 6 | 88000.0 |
| 1 | Engineering | 8 | 95000.0 |
| 2 | Engineering | 9 | 92000.0 |
| 3 | HR | 3 | 56500.0 |
| 4 | Marketing | 4 | 62000.0 |
| 5 | Marketing | 7 | 67000.0 |
| 6 | Sales | 5 | 71000.0 |
reset_index() After sort_values()
Sorting reorders rows but preserves the original index values — producing a non-sequential index.
python
sorted_df = df.sort_values('salary', ascending=False)
print(sorted_df)
Output:
| name | department | salary | years_exp | |
|---|---|---|---|---|
| 0 | Alice | Engineering | 95000 | 8 |
| 6 | Grace | Engineering | 92000 | 9 |
| 2 | Charlie | Engineering | 88000 | 6 |
| 4 | Eve | Sales | 71000 | 5 |
| 5 | Frank | Marketing | 67000 | 7 |
| 1 | Bob | Marketing | 62000 | 4 |
| 7 | Henry | HR | 58000 | 3 |
| 3 | Diana | HR | 55000 | 3 |
The index reflects original positions (0, 6, 2, 4, 5, 1, 7, 3) — not the sort order. Reset to get a clean 1-through-8 ranking index:
python
sorted_clean = df.sort_values('salary', ascending=False).reset_index(drop=True)
sorted_clean.index = sorted_clean.index + 1 # Start rank from 1 instead of 0
print(sorted_clean)
Output:
| name | department | salary | years_exp | |
|---|---|---|---|---|
| 1 | Alice | Engineering | 95000 | 8 |
| 2 | Grace | Engineering | 92000 | 9 |
| 3 | Charlie | Engineering | 88000 | 6 |
| 4 | Eve | Sales | 71000 | 5 |
| 5 | Frank | Marketing | 67000 | 7 |
| 6 | Bob | Marketing | 62000 | 4 |
| 7 | Henry | HR | 58000 | 3 |
| 8 | Diana | HR | 55000 | 3 |
reset_index() After concat()
Concatenating DataFrames often produces duplicate index values.
python
df_engineering = df[df['department'] == 'Engineering'].copy()
df_hr = df[df['department'] == 'HR'].copy()
# Both retain their original indexes
combined = pd.concat([df_engineering, df_hr])
print(combined)
print("\nIndex:", combined.index.tolist())
Output:
Index: [0, 2, 6, 3, 7]
The indexes from both DataFrames are preserved — no duplicates here, but in cases where both DataFrames start from 0, you would get [0,1,2,0,1,2].
python
combined_clean = pd.concat([df_engineering, df_hr]).reset_index(drop=True)
print(combined_clean.index.tolist())
# Output: [0, 1, 2, 3, 4]
Best practice: Always use reset_index(drop=True) after pd.concat() to ensure a clean sequential index in the result.
set_index() and reset_index() — Working Together
set_index() promotes a column to the index. reset_index() reverses this. They are complementary operations.
python
# Set name as the index
df_indexed = df.set_index('name')
print(df_indexed)
Output:
| name | department | salary | years_exp |
|---|---|---|---|
| Alice | Engineering | 95000 | 8 |
| Bob | Marketing | 62000 | 4 |
| … | … | … | … |
Now name is the index — useful for fast lookups by employee name.
python
# Reverse — bring name back as a column
df_restored = df_indexed.reset_index()
print(df_restored)
The original DataFrame structure is restored — name is back as a regular column with a fresh integer index.
Using set_index() for Fast Lookups
python
# Set index for fast lookup
df_by_name = df.set_index('name')
# Fast O(1) lookup by name
alice_data = df_by_name.loc['Alice']
print(alice_data)
Reset the index when you are done with lookups and need to work with the DataFrame in its standard form again.
MultiIndex reset_index() — Selective Level Reset
When a DataFrame has a MultiIndex (multiple index levels), you can reset just one level at a time using the level parameter.
python
# Create a MultiIndex DataFrame
df_multi = df.groupby(['department', 'years_exp']).agg(
avg_salary=('salary', 'mean'),
headcount=('name', 'count')
)
print(df_multi)
print("\nIndex names:", df_multi.index.names)
Output:
avg_salary headcount
department years_exp
Engineering 6 88000.0 1
8 95000.0 1
9 92000.0 1
HR 3 56500.0 2
Marketing 4 62000.0 1
7 67000.0 1
Sales 5 71000.0 1
Two index levels: department (level 0) and years_exp (level 1).
python
# Reset only the years_exp level (level 1)
partial_reset = df_multi.reset_index(level='years_exp')
print(partial_reset)
Output:
| years_exp | avg_salary | headcount | |
|---|---|---|---|
| Engineering | 6 | 88000.0 | 1 |
| Engineering | 8 | 95000.0 | 1 |
| Engineering | 9 | 92000.0 | 1 |
| HR | 3 | 56500.0 | 2 |
| … | … | … | … |
years_exp is now a column. Department remains as the index.
python
# Reset all levels — both become columns
full_reset = df_multi.reset_index()
print(full_reset)
Resetting Index to a Named Column
When the existing index has a meaningful name, reset_index() uses that name for the new column automatically.
python
df_named = df.set_index('name')
print(df_named.index.name) # Output: name
restored = df_named.reset_index()
print(restored.columns.tolist())
# Output: ['name', 'department', 'salary', 'years_exp']
The index name becomes the column name automatically.
Common Patterns and When to Use Them
python
# Pattern 1: After filtering — most common
filtered = df[df['salary'] > 70000].reset_index(drop=True)
# Pattern 2: After groupby — essential for flat output
summary = df.groupby('department')['salary'].mean().reset_index()
# Pattern 3: After sort — for clean ranked output
ranked = df.sort_values('salary', ascending=False).reset_index(drop=True)
# Pattern 4: After concat — avoid duplicate indexes
combined = pd.concat([df1, df2]).reset_index(drop=True)
# Pattern 5: After pivot — clean up the MultiIndex columns
pivoted = df.pivot_table(
index='department', columns='years_exp', values='salary'
).reset_index()
Comparison Table: When to Use drop=True vs drop=False
| Situation | drop=True | drop=False |
|---|---|---|
| After filtering rows | Use this — old index not meaningful | Creates confusing ‘index’ column |
| After groupby() | Use this — grouped cols become columns | N/A — grouped cols are the index |
| After sort_values() | Use this — old position not needed | Creates confusing ‘index’ column |
| After concat() | Use this — avoid duplicate index artifacts | Rarely useful |
| After set_index(‘name’) | Would lose name values | Use this — brings name back as column |
| After set_index with unique ID | Only if ID not needed as column | Use this — restores ID column |
Rule of thumb: If the index contains meaningful data you want to keep as a column → drop=False. If the index is just positional garbage → drop=True.
Common Mistakes to Avoid
- Forgetting drop=True after filtering — The most common mistake. Without
drop=True, the old index values become a column named “index” — cluttering your DataFrame with a column you did not want - Not chaining reset_index() after groupby() — Without reset_index(), grouped columns stay as index levels. Functions like merge(), plot(), and to_excel() often behave unexpectedly with non-default indexes
- Using inplace=True carelessly —
inplace=Truemodifies the DataFrame in place and returns None. Assigning the result —df = df.reset_index(inplace=True)— sets df to None. Either use inplace=True without assignment or use assignment without inplace - Resetting index when set_index would serve better — If you frequently look up rows by a column value, keeping that column as the index with
set_index()is more efficient. Only reset when you need the standard integer index for other operations - Forgetting reset_index() after concat() produces duplicate indexes — Duplicate index values can cause unexpected behavior in loc-based indexing and merge operations. Always reset after concatenation
Quick Reference Cheat Sheet
| Task | Code |
|---|---|
| Reset to clean 0,1,2 (discard old index) | df.reset_index(drop=True) |
| Reset and keep old index as column | df.reset_index() |
| Reset after groupby | df.groupby('col')['val'].mean().reset_index() |
| Reset after filtering | df[condition].reset_index(drop=True) |
| Reset after sort | df.sort_values('col').reset_index(drop=True) |
| Reset after concat | pd.concat([df1, df2]).reset_index(drop=True) |
| Reset one level of MultiIndex | df.reset_index(level='level_name') |
| Set a column as index | df.set_index('column_name') |
| Restore column from index | df_indexed.reset_index() |
| Start index from 1 | df.reset_index(drop=True); df.index += 1 |
reset_index() is one of those pandas functions that seems trivial at first but becomes essential once you understand all the situations where the index gets into a messy state.
Here is the simplest summary:
- Use
reset_index(drop=True)after filtering, sorting, or concatenating — to get a clean sequential index - Use
reset_index()(without drop) after groupby operations — to convert grouped columns back into regular columns - Use
reset_index(level=...)to selectively reset one level of a MultiIndex - Pair
set_index()andreset_index()together for lookup-optimized workflows - Always chain
reset_index()after groupby for cleaner, more portable DataFrame output
Once you develop the habit of resetting the index at the right moments, your DataFrames stay clean and predictable — and the downstream code that depends on them works correctly without mysterious index-related bugs.
FAQs
What does reset_index() do in pandas?
reset_index() resets the DataFrame index to the default sequential integer index (0, 1, 2…). It optionally converts the existing index into a column (drop=False, the default) or discards it entirely (drop=True).
When should I use drop=True in reset_index()?
Use drop=True when the current index contains no meaningful data — like after filtering rows, sorting, or concatenating. This discards the old index values instead of adding them as a new column called “index”.
Why does groupby() in pandas require reset_index()?
groupby() promotes the grouping columns into the DataFrame index — making them inaccessible as regular columns. reset_index() converts them back to regular columns, producing a flat DataFrame that is easier to work with in further operations, visualizations, and exports.
What is the difference between reset_index() and set_index()?
promotes a column to become the index — removing it from regular columns. reset_index() reverses this — converting the index back to a regular column and restoring the default integer index. They are complementary operations.
Does reset_index() modify the original DataFrame?
By default, no. reset_index() returns a new DataFrame without modifying the original. To modify in place, use inplace=True — but be careful never to assign the result when using inplace=True, as it returns None.
How do I reset only one level of a MultiIndex?
Use the level parameter: df.reset_index(level='level_name') or df.reset_index(level=0) to reset only the specified level. The other levels remain as index levels.