Shawn Weisfeld

I find when I talk to myself nobody listens. - Shawn Weisfeld
posts - 356, comments - 173, trackbacks - 34

My Links

News

The views expressed in this blog are mine and mine alone, not that of my employer, Microsoft, or anyone else’s. No warrantee is given for the quality of any material on this site.

Archives

Post Categories

Using ROW_NUMBER() to eliminate the multiplicative effects of a join

Problem:

It is a common situation to have a header/detail tables in SQL Server. Additionally sometimes you include a calculated value on the header table to make reporting easier. For Example lets say you had an SalesPerson table that had a column for total sales. Now you also have a SalesDetails table that lists every sale that the company made. The purpose of having the total sales column is that it eliminates the need to touch the larger SalesDetails table when you just need to get the total by sales person. This is a good thing until the user asks for a report that has a column for both the total sales of the sales person and a line for each of the SalesDetails. Maybe a picture will help explain. 

So here is the setup. As you can see George has sold 5 items for $100 each for a total of $500.

image

Now when we are asked to produce the report that joins these two together we get the following result:

image

Now if someone who gets this report sums up the total sales column they get $5,000, the wrong answer. What we need to do is zero out the total sales on each row except for the first row. So our results will look something like this.

image

Now the user of this report will get the same answer if they sum up the total sales column and sale amount column. So now we know what we want how do we get there in SQL Server.

Well if we could number the SalesPerson Records by EmplID, then we would have the ability to zero out the records on all rows except for the first one. For example something like this. . .

image

Solution:

ROW_NUMBER() to the rescue. MSDN defines ROW_NUMBER() as “Returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.” (http://msdn.microsoft.com/en-us/library/ms186734.aspx).

Here is my starting point, a standard join to put the two tables together.

image

As you can see the ROW_NUMBER function takes 2 arguments. The second is the ORDER BY, that argument tells ROW_NUMBER what order we want the results in. In this application this is irrelevant since each SalesPerson record will be the same. The first argument PARTITION BY tells SQL Server to restart the count every time it sees a change in a column or set of columns. In our case we want to restart the count for each SalesPerson.

image

So the next step is to zero out the TotalSales on all the rows except for the first one.

image

Yes! Now our users can sum up the TotalSales column or the SaleAmt column and get the same answer! Another happy report user = another happy developer!

Print | posted on Friday, July 03, 2009 12:33 PM | Filed Under [ SQL ]

Feedback

Gravatar

# re: Using ROW_NUMBER() to eliminate the multiplicative effects of a join

I have a problem somehow close to what you have done here. In my case I just wanted to display the ONLY 1 row from the table that contained the multiple records

SELECT M.PropertyId
,M.Email
,M.PropertySummary
,M.PropertyLocation
,M.PropertyType
,M.PropertyStatus
,M.BedRooms
,M.Price
,P.BathRooms
,P.PropertyAddress
,P.AvailableTitles
,PP.PictureName
FROM MetaData AS M INNER JOIN Property AS P ON M.PropertyId=P.PropertyId
INNER JOIN PropertyPictures AS PP ON P.PropertyId=PP.PropertyId WHERE PP.Email='s@sail.com'
9/22/2011 8:54 AM | kamal
Gravatar

# re: Using ROW_NUMBER() to eliminate the multiplicative effects of a join

Kamal: Yes, your issue is similar to the one I talk about in this article. What you could do is get your query to look like what I have in the second step (i.e. with the actual row numbers) then just add a where clause to "filter off" all records with a row number != 1. Hope that helps point you in the right direction. . . .
9/22/2011 9:03 AM | shawnweisfeld
Gravatar

# re: Using ROW_NUMBER() to eliminate the multiplicative effects of a join

Thanks Shawn.

I tried this out and this is the error displayed Msg 207, Level 16, State 1, Line 17 Invalid column name 'RN'.

Meanwhile when i removed the AND RN = 1, it displays all the duplaicate records and set the values in the RN columns to 1 for the records


SELECT M.PropertyId
,M.Email
,M.PropertySummary
,M.PropertyLocation
,M.PropertyType
,M.PropertyStatus
,M.BedRooms
,M.Price
,P.BathRooms
,P.PropertyAddress
,P.AvailableTitles
,PP.PictureName
,(SELECT ROW_NUMBER() OVER(PARTITION BY PP.PropertyId ORDER BY PP.Email)) AS RN
FROM MetaData M, Property P, PropertyPictures PP WHERE M.PropertyId=P.PropertyId AND
P.PropertyId=PP.PropertyId AND PP.Email='samson@gmail.com' AND RN = 1
9/22/2011 12:37 PM | kamal
Gravatar

# re: Using ROW_NUMBER() to eliminate the multiplicative effects of a join

Kamal we are getting closer :)

I am thinking that the reason you have 1's for all your row numbers is your Partition By value. Please try changing PP.PropertyId to something else, something that is unique for each group of records.

as for the where try nesting the queries. . . . something like this

SELECT *
FROM ( -- insert your query here without the RN filter --) AS A
WHERE A.RN = 1
9/22/2011 4:07 PM | shawnweisfeld
Gravatar

# re: Using ROW_NUMBER() to eliminate the multiplicative effects of a join

Thanks Shawn for your support.

Apparently, a friend came up with this and it works

SELECT M.PropertyId
,M.Email
,M.PropertySummary
,M.PropertyLocation
,M.PropertyType
,M.PropertyStatus
,M.BedRooms
,M.Price
,P.BathRooms
,P.PropertyAddress
,P.AvailableTitles
,PP.PictureName
,PP.PropertyId
FROM MetaData AS M INNER JOIN Property AS P ON M.PropertyId=P.PropertyId
INNER JOIN (SELECT ROW_NUMBER() OVER (PARTITION BY PropertyId ORDER BY PropertyId) AS Rn,
* FROM PropertyPictures )AS PP
ON P.PropertyId=PP.PropertyId
AND PP.Rn=1 AND PP.Email='samn@gmail.com'
9/23/2011 9:17 AM | kamal
Gravatar

# re: Using ROW_NUMBER() to eliminate the multiplicative effects of a join

Kamal, happy it all worked out!
9/25/2011 11:53 AM | shawnweisfeld

Post Comment

Title  
Name  
Email
Url
Comment   
Please add 5 and 5 and type the answer here:

Powered by: