If you work with data in Python, you will regularly face a common challenge — your data is structured one way but you need it structured another way entirely.
Two pandas functions handle this reshaping problem:
- melt() — Converts data from wide format to long format
- pivot() — Converts data from long format to wide format
They are opposites of each other. Understanding both and knowing when to use each is one of the most practical data wrangling skills in pandas.
In this guide, we will break down melt and pivot clearly with practical examples, real-world use cases, and a complete comparison.
Wide Format vs Long Format — The Core Concept
Before comparing melt and pivot, you need to understand the two data shapes they convert between.
Wide Format
Each row represents one entity. Each variable gets its own column. The table is wide — many columns.
name Q1_sales Q2_sales Q3_sales Q4_sales
Alice 45000 52000 48000 61000
Bob 38000 41000 44000 39000
Charlie 51000 55000 58000 63000
Long Format
Each row represents one observation. A single column holds the variable name. Another column holds the value. The table is long — many rows.
name quarter sales
Alice Q1_sales 45000
Alice Q2_sales 52000
Alice Q3_sales 48000
Alice Q4_sales 61000
Bob Q1_sales 38000
Bob Q2_sales 41000
...
Neither format is better — they serve different purposes. Wide format is easier for humans to read at a glance. Long format is required by many visualization libraries (Seaborn, Plotly), statistical tools, and machine learning pipelines.
Setting Up the Example Datasets
We will use two datasets — one in each format.
python
import pandas as pd
# Wide format dataset
wide_df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'department': ['Engineering', 'Marketing', 'Sales'],
'Q1_sales': [45000, 38000, 51000],
'Q2_sales': [52000, 41000, 55000],
'Q3_sales': [48000, 44000, 58000],
'Q4_sales': [61000, 39000, 63000]
})
# Long format dataset
long_df = pd.DataFrame({
'name': ['Alice', 'Alice', 'Bob', 'Bob', 'Charlie', 'Charlie'],
'quarter': ['Q1', 'Q2', 'Q1', 'Q2', 'Q1', 'Q2'],
'sales': [45000, 52000, 38000, 41000, 51000, 55000],
'units': [188, 217, 158, 171, 213, 229]
})
print("Wide format:")
print(wide_df)
print("\nLong format:")
print(long_df)
Wide format output:
| name | department | Q1_sales | Q2_sales | Q3_sales | Q4_sales |
|---|---|---|---|---|---|
| Alice | Engineering | 45000 | 52000 | 48000 | 61000 |
| Bob | Marketing | 38000 | 41000 | 44000 | 39000 |
| Charlie | Sales | 51000 | 55000 | 58000 | 63000 |
Long format output:
| name | quarter | sales | units |
|---|---|---|---|
| Alice | Q1 | 45000 | 188 |
| Alice | Q2 | 52000 | 217 |
| Bob | Q1 | 38000 | 158 |
| Bob | Q2 | 41000 | 171 |
pandas melt() — Wide to Long
melt() takes a wide DataFrame and converts it to long format by unpivoting columns into rows.
Syntax
python
pd.melt(
frame,
id_vars=None, # Columns to keep as identifier variables
value_vars=None, # Columns to unpivot (default: all non-id columns)
var_name='variable', # Name for the new variable column
value_name='value' # Name for the new value column
)
Example 1: Basic melt()
Convert our wide sales DataFrame to long format.
python
melted = pd.melt(
wide_df,
id_vars=['name', 'department'],
value_vars=['Q1_sales', 'Q2_sales', 'Q3_sales', 'Q4_sales'],
var_name='quarter',
value_name='sales'
)
print(melted.sort_values(['name', 'quarter']).reset_index(drop=True))
Output:
| name | department | quarter | sales |
|---|---|---|---|
| Alice | Engineering | Q1_sales | 45000 |
| Alice | Engineering | Q2_sales | 52000 |
| Alice | Engineering | Q3_sales | 48000 |
| Alice | Engineering | Q4_sales | 61000 |
| Bob | Marketing | Q1_sales | 38000 |
| Bob | Marketing | Q2_sales | 41000 |
| Bob | Marketing | Q3_sales | 44000 |
| Bob | Marketing | Q4_sales | 39000 |
| Charlie | Sales | Q1_sales | 51000 |
| Charlie | Sales | Q2_sales | 55000 |
| Charlie | Sales | Q3_sales | 58000 |
| Charlie | Sales | Q4_sales | 63000 |
Three rows became twelve. Each quarter is now a separate row instead of a separate column.
Example 2: melt() Without Specifying value_vars
If you omit value_vars, pandas melts all columns that are not in id_vars.
python
melted = pd.melt(
wide_df,
id_vars=['name'],
var_name='metric',
value_name='value'
)
print(melted)
This melts both the department column and all four quarterly sales columns — melting every column except name.
Example 3: Cleaning the Quarter Labels After melt()
The quarter values contain “_sales” — clean them up with str.replace():
python
melted = pd.melt(
wide_df,
id_vars=['name', 'department'],
value_vars=['Q1_sales', 'Q2_sales', 'Q3_sales', 'Q4_sales'],
var_name='quarter',
value_name='sales'
)
melted['quarter'] = melted['quarter'].str.replace('_sales', '')
print(melted.head())
Now the quarter column contains clean values: Q1, Q2, Q3, Q4.
When to Use melt()
- Preparing data for Seaborn or Plotly which expect long format for grouped charts
- Feeding data into statistical models that require one observation per row
- Normalizing wide survey data where each question is a column
- Preparing data for GROUP BY style aggregations across variable types
- Tidying data that violates the one-observation-per-row principle
pandas pivot() — Long to Wide
pivot() takes a long DataFrame and converts it to wide format by spreading row values into columns.
Syntax
python
df.pivot(
index=None, # Column to use as row labels
columns=None, # Column whose values become new column headers
values=None # Column whose values fill the new columns
)
Example 1: Basic pivot()
Convert our long sales DataFrame back to wide format.
python
pivoted = long_df.pivot(
index='name',
columns='quarter',
values='sales'
)
print(pivoted)
Output:
| quarter | Q1 | Q2 |
|---|---|---|
| name | ||
| Alice | 45000 | 52000 |
| Bob | 38000 | 41000 |
| Charlie | 51000 | 55000 |
Each unique value in the quarter column became a new column. Each row represents one person.
Cleaning the pivot() Output
After pivoting, you often want to clean the column axis name and reset the index:
python
pivoted = long_df.pivot(
index='name',
columns='quarter',
values='sales'
)
pivoted.columns.name = None # Remove 'quarter' label from column axis
pivoted = pivoted.reset_index() # Make name a regular column again
print(pivoted)
Output:
| name | Q1 | Q2 |
|---|---|---|
| Alice | 45000 | 52000 |
| Bob | 38000 | 41000 |
| Charlie | 51000 | 55000 |
Example 2: Pivoting Multiple Value Columns
When you need to pivot multiple value columns simultaneously, use pivot_table() or set values to None to get a MultiIndex result.
python
# Pivot both sales and units
pivoted_multi = long_df.pivot(
index='name',
columns='quarter'
# No values specified — pivots all remaining columns
)
print(pivoted_multi)
Output:
| sales | units | |||
|---|---|---|---|---|
| quarter | Q1 | Q2 | Q1 | Q2 |
| name | ||||
| Alice | 45000 | 52000 | 188 | 217 |
| Bob | 38000 | 41000 | 158 | 171 |
| Charlie | 51000 | 55000 | 213 | 229 |
This creates a MultiIndex column structure — sales and units are the top level, Q1 and Q2 are the second level.
The Duplicate Values Problem in pivot()
pivot() fails when there are duplicate combinations of index and columns values.
python
# This will raise ValueError: Index contains duplicate entries
df_with_dupes = pd.DataFrame({
'name': ['Alice', 'Alice', 'Alice'],
'quarter': ['Q1', 'Q1', 'Q2'], # Alice has Q1 twice!
'sales': [45000, 47000, 52000]
})
df_with_dupes.pivot(index='name', columns='quarter', values='sales')
# ValueError: Index contains duplicate entries, cannot reshape
When duplicates exist, use pivot_table() with an aggregation function instead:
python
# pivot_table handles duplicates by aggregating
pivoted = df_with_dupes.pivot_table(
index='name',
columns='quarter',
values='sales',
aggfunc='mean' # or sum, max, min, etc.
)
print(pivoted)
When to Use pivot()
- Converting long-format reporting data back to wide format for Excel exports or human-readable tables
- Creating cross-tabulation style summaries from long transactional data
- Restructuring data for time series analysis where each timestamp is a column
- Preparing data for comparison across categories side by side
- Creating heatmap-ready matrices where rows and columns both carry meaning
pivot() vs pivot_table() — Quick Clarification
Both reshape long to wide but they handle data differently:
| Feature | pivot() | pivot_table() |
|---|---|---|
| Handles duplicates | Raises error | Aggregates with aggfunc |
| Aggregation function | Not supported | mean, sum, count, etc. |
| Missing values | Returns NaN | Returns NaN (or fill_value) |
| Performance | Faster | Slightly slower |
| Best for | Unique index+column pairs | Any real-world data with duplicates |
When in doubt — use pivot_table(). It handles all the edge cases that pivot() cannot.
Full Comparison Table: melt() vs pivot()
| Feature | melt() | pivot() |
|---|---|---|
| Direction | Wide → Long | Long → Wide |
| Rows after | More rows | Fewer rows |
| Columns after | Fewer columns | More columns |
| Key parameters | id_vars, value_vars, var_name, value_name | index, columns, values |
| Handles duplicates | Yes — creates multiple rows | No — use pivot_table() |
| Inverse operation | pivot() | melt() |
| Common use | Visualization prep, tidy data | Reporting, human-readable tables |
| Output index | Integer range index | Provided index column |
Real-World Use Cases
HR Analytics — melt() for Visualization
An HR team has quarterly headcount data in wide format — one column per quarter. Seaborn’s grouped bar chart requires long format.
python
import seaborn as sns
import matplotlib.pyplot as plt
hr_wide = pd.DataFrame({
'department': ['Engineering', 'Marketing', 'Sales'],
'Q1': [45, 20, 35],
'Q2': [48, 22, 33],
'Q3': [52, 21, 37],
'Q4': [55, 25, 40]
})
# Melt for Seaborn
hr_long = pd.melt(
hr_wide,
id_vars=['department'],
var_name='quarter',
value_name='headcount'
)
sns.barplot(data=hr_long, x='quarter', y='headcount', hue='department')
plt.title('Headcount by Department and Quarter')
plt.show()
Financial Reporting — pivot() for Excel Export
A finance team has transactional data in long format from their database. The CFO wants a pivot-style report with months as columns.
python
finance_long = pd.DataFrame({
'cost_center': ['Operations', 'Operations', 'IT', 'IT'],
'month': ['Jan', 'Feb', 'Jan', 'Feb'],
'spend': [125000, 118000, 87000, 92000]
})
report = finance_long.pivot_table(
index='cost_center',
columns='month',
values='spend',
aggfunc='sum'
).reset_index()
report.columns.name = None
report.to_excel('monthly_spend_report.xlsx', index=False)
Survey Data — melt() for Analysis
Survey results have one column per question. Melting produces one row per respondent-question combination enabling question-by-question analysis.
python
survey = pd.DataFrame({
'respondent_id': [1, 2, 3],
'Q1_satisfaction': [4, 5, 3],
'Q2_recommendation': [5, 4, 4],
'Q3_ease_of_use': [3, 5, 4]
})
survey_long = pd.melt(
survey,
id_vars=['respondent_id'],
var_name='question',
value_name='rating'
)
# Average rating per question
print(survey_long.groupby('question')['rating'].mean())
Common Mistakes to Avoid
- Using pivot() when duplicates exist — pivot() raises an error if any combination of index and column values is not unique. Always check for duplicates first or use pivot_table() with an aggfunc
- Forgetting to clean up the column axis name after pivot() — After pivoting, the column axis often has a name (the column you pivoted on). Set it to None with
df.columns.name = Nonefor cleaner output - Not specifying var_name and value_name in melt() — Without these, the new columns are named “variable” and “value” by default — not meaningful names. Always name them clearly
- Melting identifier columns accidentally — If you omit id_vars, pandas melts every column including identifiers like customer_id or date. Always explicitly specify id_vars to protect your identifier columns
- Expecting melt() and pivot() to be perfectly inverse — While they are conceptually opposite, you may need additional cleaning (sorting, renaming, resetting index) after applying one to restore the exact shape produced by the other
Quick Reference Cheat Sheet
| Task | Code |
|---|---|
| Wide to long | pd.melt(df, id_vars=['id_col'], var_name='variable', value_name='value') |
| Long to wide (unique) | df.pivot(index='id', columns='var_col', values='val_col') |
| Long to wide (duplicates) | df.pivot_table(index='id', columns='var_col', values='val_col', aggfunc='sum') |
| Clean column axis name | df.columns.name = None |
| Clean quarter labels | df['col'].str.replace('_sales', '') |
| Reset index after pivot | df.reset_index() |
| Melt multiple value cols | pd.melt(df, id_vars=['id'], value_vars=['col1','col2']) |
melt() and pivot() are two sides of the same coin — they reshape DataFrames between wide and long format, which is one of the most common data wrangling tasks in any data science workflow.
Here is the simplest summary:
- melt() — Wide to long. More rows, fewer columns. Use for visualization and statistical analysis
- pivot() — Long to wide. Fewer rows, more columns. Use for reporting and human-readable tables
- pivot_table() — Like pivot() but handles duplicate values through aggregation. Use this when your data is not guaranteed to have unique index-column combinations
The moment you recognize which format your data is in and which format you need — choosing between melt() and pivot() becomes an immediate, obvious decision.
FAQs
What is the difference between melt and pivot in pandas?
melt() converts wide format to long format — turning columns into rows. pivot() converts long format to wide format — turning row values into columns. They are opposite reshaping operations.
When should I use melt() in pandas?
Use melt() when your data has multiple columns that represent the same type of measurement across different categories (like Q1, Q2, Q3, Q4 sales) and you need them in a single column for visualization, statistical analysis, or machine learning.
Why does pivot() raise a ValueError about duplicate entries?
pivot() requires that each combination of index and column values is unique. If your data has duplicates — the same person appearing twice for the same quarter — use pivot_table() with an aggregation function like aggfunc=’sum’ or aggfunc=’mean’ instead.
What is the difference between pivot() and pivot_table() in pandas?
pivot() performs a simple reshape and fails if duplicates exist. pivot_table() aggregates duplicate values using a function you specify (sum, mean, count, etc.) and also supports margins for row and column totals. For real-world data, pivot_table() is almost always safer.
Can I use melt() and pivot() together?
Yes. They are inverse operations. melt() followed by pivot() (with the same parameters in reverse) should restore your original DataFrame shape — though you may need to clean up column names and reset the index after pivoting.
Does melt() work on all column types?
Yes. melt() works on numeric, string, boolean, and datetime columns. The melted values column will take on whatever dtype the original columns had — if they were mixed types, pandas will upcast to object dtype.