Problem:
It is a common situation to have a header/detail tables in SQL Server. Additionally sometimes you include a calculated value on the header table to make reporting easier. For Example lets say you had an SalesPerson table that had a column for total sales. Now you also have a SalesDetails table that lists every sale that the company made. The purpose of having the total sales column is that it eliminates the need to touch the larger SalesDetails table when you just need to get the total by sales person. This is a good thing until the user asks for a report that has a column for both the total sales of the sales person and a line for each of the SalesDetails. Maybe a picture will help explain.
So here is the setup. As you can see George has sold 5 items for $100 each for a total of $500.
Now when we are asked to produce the report that joins these two together we get the following result:
Now if someone who gets this report sums up the total sales column they get $5,000, the wrong answer. What we need to do is zero out the total sales on each row except for the first row. So our results will look something like this.
Now the user of this report will get the same answer if they sum up the total sales column and sale amount column. So now we know what we want how do we get there in SQL Server.
Well if we could number the SalesPerson Records by EmplID, then we would have the ability to zero out the records on all rows except for the first one. For example something like this. . .
Solution:
ROW_NUMBER() to the rescue. MSDN defines ROW_NUMBER() as “Returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.” (http://msdn.microsoft.com/en-us/library/ms186734.aspx).
Here is my starting point, a standard join to put the two tables together.
As you can see the ROW_NUMBER function takes 2 arguments. The second is the ORDER BY, that argument tells ROW_NUMBER what order we want the results in. In this application this is irrelevant since each SalesPerson record will be the same. The first argument PARTITION BY tells SQL Server to restart the count every time it sees a change in a column or set of columns. In our case we want to restart the count for each SalesPerson.
So the next step is to zero out the TotalSales on all the rows except for the first one.
Yes! Now our users can sum up the TotalSales column or the SaleAmt column and get the same answer! Another happy report user = another happy developer!