How to Create a Star Schema in Power BI

How to Create a Star Schema in Power BI

If your Power BI reports are slow, your DAX measures are returning wrong numbers, or your slicers are not filtering the way you expect, the problem is almost certainly not the DAX. It is not the visuals. It is the data model underneath everything. And the most common data model problem in Power BI is one flat table trying to do everything at once.

A star schema fixes this. It is the data modeling approach that Power BI was designed around and the one Microsoft explicitly recommends for every semantic model. Reports built on a proper star schema load faster, produce correct DAX results, and scale cleanly when new data or new dimensions are added. Reports built on a flat table hit performance and logic walls that more DAX cannot solve.

This guide walks you through how to create a star schema in Power BI from a flat file, step by step, using Power Query to split the data and Model View to connect the pieces.

What a Star Schema Is and Why Power BI Needs One

A star schema organizes data into two types of tables. A fact table sits at the center and stores measurable events: individual sales transactions, website sessions, support tickets, or any other record of something that happened. Fact tables tend to be long and narrow, meaning they have many rows and only a few columns, mostly numeric measures and foreign key IDs that point to dimension tables.

Dimension tables surround the fact table like points on a star. Each one stores descriptive context about one aspect of the business. A customer dimension stores customer names, segments, and locations. A product dimension stores product names, categories, and prices. A date dimension stores every date with attributes like day of week, month, quarter, and year. Dimension tables tend to be shorter and wider, meaning they have fewer rows but more descriptive columns.

The relationship between the two is always one-to-many. One customer can appear in many transactions. One product can appear in many orders. One date can appear across many rows. Power BI’s Vertipaq engine, the compression and query processing engine that runs underneath every report, is specifically optimized for this pattern. It compresses dimension table values using dictionary encoding, which produces compression ratios of ten to one or better on columns that repeat values across many rows. That compression is the reason star schema models are faster. The engine was built for them.

A flat table that contains customer name, product name, category, region, order date, revenue, and quantity all in one place cannot be compressed as efficiently because every column is wide relative to the number of unique values and the Vertipaq engine has no clean dimension boundaries to work against. The result is a larger model file, slower queries, and DAX that has to work harder to produce correct results.

The Dataset We Will Use

For this guide we are working with the Sample Superstore dataset, which is available free on Kaggle. It is a flat file with 9,994 rows and the following columns: Row ID, Order ID, Order Date, Ship Date, Ship Mode, Customer ID, Customer Name, Segment, Country, City, State, Postal Code, Region, Product ID, Category, Sub-Category, Product Name, Sales, Quantity, Discount, and Profit.

This is a classic flat file. Everything about the order, the customer, the product, the location, and the financial outcome is stored together in one row. The first step of building a star schema is identifying which columns belong in the fact table and which belong in dimension tables.

Fact table columns are the ones that measure something: Sales, Quantity, Discount, Profit. The foreign key IDs that connect to dimensions also belong in the fact table: Order ID, Order Date, Customer ID, Product ID.

Dimension table columns are the ones that describe something: Customer Name, Segment, Region for the customer dimension. Category, Sub-Category, Product Name for the product dimension. City, State, Country, Postal Code for the location dimension. Ship Mode for a shipping dimension.

Step 1: Load the Data Into Power BI and Open Power Query

Open Power BI Desktop. Go to Home and click Get Data, then select Text/CSV and load the Superstore CSV file. When the preview appears, click Transform Data instead of Load. This opens the Power Query Editor, which is where all the table splitting happens before any data touches the model.

You will see the flat file as a single query in the left panel. The first thing to do before creating any dimension tables is to rename this query to Fact Sales by right-clicking it in the left panel and selecting Rename. Using a naming convention consistently from the start makes the model easier to read and maintain.

Step 2: Create the Customer Dimension Table

In Power Query, every dimension table starts as a reference of the original flat file rather than a copy. Using a reference means both tables stay connected to the same source, so if the data source changes, both tables update automatically without you maintaining two separate connections.

Right-click on Fact Sales in the left panel and select Reference. A new query appears. Rename it Dim Customer.

With Dim Customer selected, go to Home and click Choose Columns. Select only the columns that belong in this dimension: Customer ID, Customer Name, and Segment. Click OK. You now have a table with three columns but many duplicate rows because each customer appears in every transaction row they were part of.

Remove the duplicates by going to Home and clicking Remove Rows, then Remove Duplicates. Now you have one row per unique customer.

The last step for this dimension is making sure Customer ID is the primary key, meaning every value in it is unique and no nulls exist. Click the Customer ID column header, go to Transform, and click Remove Duplicates one more time to be safe. Then right-click the column and confirm there are no null values shown in the column quality bar at the top of the preview.

Step 3: Create the Product Dimension Table

Right-click Fact Sales again and select Reference. Rename the new query Dim Product.

Choose Columns and select only Product ID, Category, Sub-Category, and Product Name. Remove Duplicates on the full table since a product might have appeared in many transactions but it should only appear once in the dimension.

One thing to verify before moving on: check that Product ID is truly unique after deduplication. If the same Product ID appears with different product names in different rows of the source data, you have a data quality issue to resolve before building the relationship. Go to Transform, select the Product ID column, and click Statistics, then Count Distinct Values. Compare that number to the row count of the table. If they match, every Product ID is unique and the dimension is clean.

Step 4: Create the Location Dimension Table

Right-click Fact Sales and select Reference. Rename it Dim Location.

Choose Columns and select Country, City, State, Postal Code, and Region. Remove Duplicates on the full table to get one row per unique location combination.

For this dimension, there is no single natural surrogate key in the source data. Location is identified by the combination of City, State, and Country rather than a single ID column. In Power Query, add a surrogate key by going to Add Column and selecting Index Column, starting from 1. Rename this column Location ID by double-clicking the column header. This becomes the primary key for the dimension and the foreign key you will later add to the fact table.

Step 5: Create the Date Dimension Table

The date dimension is the most important dimension in any Power BI model and the one most often skipped by beginners. Without a proper date table, time intelligence DAX functions like DATESYTD, DATEADD, and SAMEPERIODLASTYEAR either return errors or produce incorrect results. Power BI requires a date table marked as such and connected to the fact table through a date column.

In Power Query, go to Home, click New Source, and select Blank Query. In the formula bar that appears, paste this M code to generate a complete date table spanning the years in your data:

let
    StartDate = #date(2014, 1, 1),
    EndDate = #date(2018, 12, 31),
    DateList = List.Dates(StartDate, Duration.Days(EndDate - StartDate) + 1, #duration(1,0,0,0)),
    DateTable = Table.FromList(DateList, Splitter.SplitByNothing(), {"Date"}),
    ChangedType = Table.TransformColumnTypes(DateTable, {{"Date", type date}}),
    AddYear = Table.AddColumn(ChangedType, "Year", each Date.Year([Date]), Int64.Type),
    AddMonth = Table.AddColumn(AddYear, "Month Number", each Date.Month([Date]), Int64.Type),
    AddMonthName = Table.AddColumn(AddMonth, "Month Name", each Date.ToText([Date], "MMMM"), type text),
    AddQuarter = Table.AddColumn(AddMonthName, "Quarter", each "Q" & Text.From(Date.QuarterOfYear([Date])), type text),
    AddWeekday = Table.AddColumn(AddQuarter, "Day of Week", each Date.ToText([Date], "dddd"), type text),
    AddWeekNum = Table.AddColumn(AddWeekday, "Week Number", each Date.WeekOfYear([Date]), Int64.Type)
in
    AddWeekNum

Rename this query Dim Date. Adjust the StartDate and EndDate values to cover the full range of dates in your dataset.

Step 6: Clean Up the Fact Table

Go back to Fact Sales in the left panel. Now that the dimension tables exist, remove all the descriptive columns from the fact table and keep only the measures and the foreign key IDs that connect to each dimension.

Click Choose Columns and keep only: Order ID, Order Date, Customer ID, Product ID, Ship Mode, Sales, Quantity, Discount, and Profit. Remove everything else.

For the Location ID that you created as a surrogate key in Dim Location, you need to bring it into the fact table using a merge. Go to Home, click Merge Queries, and select Fact Sales as the primary table. Select Dim Location as the table to merge with. Match on the combination of City, State, and Country by holding Ctrl and clicking each column in both tables. Choose Left Outer as the join kind and click OK.

Expand the merged column by clicking the double arrow icon at the top of the new column. Select only Location ID. This brings the surrogate key from the dimension into the fact table as a foreign key.

Your fact table should now have only these columns: Order ID, Order Date, Customer ID, Product ID, Location ID, Ship Mode, Sales, Quantity, Discount, and Profit.

Step 7: Close Power Query and Configure Relationships in Model View

Click Close and Apply in Power Query. Power BI loads all five tables into the model.

Switch to Model View by clicking the relationship diagram icon in the left sidebar. You will see all five tables displayed as boxes. Power BI may have auto-detected some relationships based on matching column names. Do not trust these without verifying them. Right-click any auto-detected relationship line and check the cardinality and cross-filter direction before accepting it.

Manually create or verify each relationship by going to Home and clicking Manage Relationships, or by dragging from a column in one table to the matching column in another directly in the diagram.

The relationships you need are all one-to-many with a single-direction filter flowing from the dimension to the fact table:

Dim Customer to Fact Sales on Customer ID, one to many, single filter direction. Dim Product to Fact Sales on Product ID, one to many, single filter direction. Dim Location to Fact Sales on Location ID, one to many, single filter direction. Dim Date to Fact Sales on Date, one to many, single filter direction. For this one, you need to ensure the Order Date column in Fact Sales is formatted as a date type and the Date column in Dim Date is also a date type before the relationship will connect cleanly.

Once all four relationships are created, the Model View should show Fact Sales at the center with four lines radiating outward to the four dimension tables. That is the star shape the schema is named for.

Step 8: Mark the Date Table and Write Your First DAX Measures

Before writing any time intelligence DAX, you must mark Dim Date as an official date table. In Model View, right-click Dim Date and select Mark as Date Table. A dialog asks you to choose which column contains unique date values with no blanks. Select the Date column and click OK.

With the model now correctly structured, write your measures in the Fact Sales table:

Total Sales = SUM(FactSales[Sales])

Total Profit = SUM(FactSales[Profit])

Profit Margin % =
DIVIDE([Total Profit], [Total Sales])

YTD Sales =
CALCULATE([Total Sales], DATESYTD(DimDate[Date]))

Sales vs Prior Year =
CALCULATE([Total Sales], SAMEPERIODLASTYEAR(DimDate[Date]))

Every one of these measures works correctly because the date table is marked, the relationship between the date dimension and the fact table is properly defined, and the filter context flows in the right direction from the dimension to the fact table.

Why Star Schema Makes DAX Easier and Faster

Without a star schema, a measure like Sales vs Prior Year requires complex CALCULATE overrides to define what the prior year means in a flat table with no proper date context. With a star schema and a marked date table, SAMEPERIODLASTYEAR works exactly as intended in two lines.

The performance difference is also measurable. A flat table model with 500,000 rows and 25 columns sits at a significantly larger file size than the same data split into a fact table and four dimension tables. The Vertipaq engine compresses the repeated values in dimension columns such as customer segment, product category, and region from thousands of repeated text strings into a dictionary with a handful of unique entries and an integer column pointing to them. That compression reduces memory usage and query time across every visual on every page.

Star Schema Cheat Sheet for Power BI

ElementWhat Goes HereExample Columns
Fact tableMeasures and foreign key IDsSales, Profit, Quantity, Customer ID, Date
Dim CustomerCustomer descriptive attributesCustomer Name, Segment
Dim ProductProduct descriptive attributesProduct Name, Category, Sub-Category
Dim LocationLocation descriptive attributesCity, State, Region, Country
Dim DateDate attributes for time intelligenceDate, Month, Quarter, Year, Day of Week
RelationshipsOne-to-many from dimension to factDim Customer to Fact Sales on Customer ID
Filter directionSingle direction from dimension to factDimension filters fact, not vice versa
Date table markingRequired for time intelligence DAXMark Dim Date as Date Table in Model View

Common Mistakes to Avoid

Leaving descriptive columns in the fact table. A fact table that contains Customer Name, Product Name, and Region alongside Sales and Profit is a partial flat table pretending to be a fact table. Every descriptive column that is not a measure or a foreign key belongs in a dimension table. Clean it out.

Using bidirectional relationships everywhere. Power BI lets you set relationship filter direction to both, meaning filters flow from the fact table back to the dimension as well as from the dimension into the fact. This causes DAX ambiguity, slower queries, and filter context errors in complex measures. The default for every relationship should be single direction from dimension to fact. Only use bidirectional when you have a specific, well-understood reason and never as a default.

Skipping the date dimension. Using the raw date column from the fact table as the basis for time intelligence without a marked date table is one of the most common beginner mistakes in Power BI. Functions like DATESYTD and SAMEPERIODLASTYEAR require a properly marked date table to work correctly. Without one, those functions either error or produce silently wrong results. Create the date table, mark it, and connect it before writing any time intelligence DAX.

Building dimension tables with duplicate primary keys. A dimension table where the primary key column contains duplicate values causes Power BI to create a many-to-many relationship instead of one-to-many. Many-to-many relationships produce inflated or unpredictable aggregation results. Always verify that your dimension primary key columns have no duplicates after deduplication in Power Query.

Auto-detecting relationships without verifying them. Power BI’s auto-detect relationship feature matches columns with the same name across tables and creates relationships automatically. It gets most of them right most of the time but it does get them wrong and when it does the errors show up in your visuals rather than in an error message. Always open Manage Relationships after loading data and verify every detected relationship before building a single visual.

The star schema is not an advanced topic. It is the foundation. Every Power BI model you build should start with the question: what is the fact table, what are the dimensions, and how do they connect? Answer those three questions first and every decision that follows, the DAX, the visuals, the performance tuning, becomes significantly more straightforward.

FAQs

What is the difference between a star schema and a flat table in Power BI?

A flat table stores all data including measures, descriptions, and attributes in a single table. A star schema separates the measurable data into a fact table and the descriptive context into dimension tables connected through relationships. Flat tables are simpler to understand initially but produce larger model files, slower queries, more complex DAX, and break time intelligence functions. Star schemas require more setup but produce faster, more accurate, and more maintainable Power BI models.

Does Power BI require a star schema?

Power BI does not technically require a star schema but it is the design pattern the Vertipaq engine is optimized for. Microsoft’s official guidance explicitly recommends star schema for all Power BI semantic models. Models built on flat tables or snowflake schemas often encounter performance problems, DAX complexity, and incorrect filter behavior that a star schema resolves by design rather than by workaround.

What is a date dimension table in Power BI and why is it required?

A date dimension table is a table that contains one row for every calendar date within the range of your data, with additional attributes for each date such as month name, quarter, year, and day of week. It is required in Power BI for time intelligence DAX functions like DATESYTD, DATEADD, and SAMEPERIODLASTYEAR to work correctly. Without a properly marked date table connected to your fact table, those functions either return errors or silently produce wrong results.

What is the difference between a star schema and a snowflake schema in Power BI?

A star schema connects each dimension table directly to the fact table in a single layer. A snowflake schema normalizes dimension tables further by splitting them into sub-dimension tables. For example, a product dimension might split into a product table and a separate category table connected to it. Snowflake schemas reduce data redundancy but add join complexity and are generally slower in Power BI because the engine is optimized for the single-layer star pattern. Use star schema as your default and snowflake only when a specific dimension table is very large and contains many high-cardinality columns.

How do I fix a many-to-many relationship in Power BI?

A many-to-many relationship usually means the column you are using as the join key has duplicate values in one or both tables. In a properly structured star schema, dimension table primary keys must be unique. Go back to Power Query, select the dimension table that contains the duplicate key, and run Remove Duplicates on the primary key column. If duplicates remain after deduplication, you have a data quality issue in the source that needs to be resolved before the relationship can be correctly defined as one-to-many.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top