Chomper Stomping jQuery/JavaScript/CSS 3/HTML 5, Java/PHP/Python/ActionScript, Git, Chrome/Firefox Extensions, Wordpress/Game/iPhone App Development and other random techie tidbits I've collected

20May/088

How to ORDER BY before GROUP BY with MySQL

I've got a table that tracks inventory at multiple vendors for multiple products by timestamp. I want to only select the most recent inventory count per vendor by product.

One way to do this is with a bunch of while loops and hitting the DB over and over, or selecting a crap load of data and just dropping out everything but the most recent inventory count manually. Both of these are viable solutions with a lot of overhead, but what I really want is just a silver bullet SQL statement that does this for me. I think I have it.

What I initially envisioned was:

SELECT *
FROM INVENTORY
ORDER BY vendors_no,
timestamp DESC
GROUP BY vendors_no,
product_no

However, if you are familiar with MySQL, you will know that you cannot put ORDER BY before GROUP BY. You have to GROUP BY and then ORDER BY.

It appears that the way that GROUP BY works is that it takes the first row of each "group" and just drops the rest. So whichever row happens to be first in the DB when the SQL is executing will get grabbed and the rest will get dropped. If I had wanted to get the first inventory count for each vendor/product, this would have been ok, since those would have been the first row, but it's going to be the last row each time. So what to do? If only I could flip the whole table on it's head or something...

Actually, I kind of can! This is what I came up with:

SELECT *
FROM(
SELECT *
FROM INVENTORY
ORDER BY vendors_no,
timestamp DESC
) as inv
GROUP BY vendors_no,
product_no

If I understand GROUP BY correctly and ORDER BY correctly (which I very well may not) the sub-query will first basically create a temporary table with the results in reverse timestamp order. Then when GROUP BY is going through and finding the first vendor_no and dropping the rest it will be keeping the most recent one.

If anyone knows any reason why this is stupid or wrong, please let me know. It's the best solution I've been able to find so far.

***UPDATED: In response to Brian's suggestion***

Dang. I really wanted that to work, but it didn't.

Here is the original SQL:

Here is the SQL you suggested:

Did I do something wrong?

Here is the total data being queried:

Thanks for the suggestion!

Comments (8) Trackbacks (0)
  1. Have you tried the MAX function?

    Select MAX(timestamp) as latestDate, inventoryCount, vendor, product From Inventory Group By product, vendor

    I use the MAX/MIN functions a lot specifically for the reason you are referring to. I don’t know from your post exactly how your data is set up, and I usually like to test my SQL statements so you might have to tweak this a bit.

    I also use Group By quite a bit, and it’s for using aggregate functions like MAX/MIN. It’s not necessarily “dropping” the rows, it’s just not finding them since you did not use an aggregate in your From clause. If you use the SQL I posted w/o the Group By, you will get get a resultset with all the inventoryCounts, vendors, and products, but they will all have the same timestamp (the latest date in the entire table). That’s why you have to use the Group By because you’re telling it you want the latest date per product, per vendor.

    Your method will work too, but it is not as efficient – but that might not matter depending on how much data you have.

  2. Thanks for the suggestion!

    Updated post…

  3. superb awesome mysql query

  4. I’m not very advanced with MySQL and I searched for hours trying to GROUP BY before ORDER BY, and it seemed impossible to do without a sub query. I didn’t want to do that because of server load, so instead I created a temporary array in my php result loop that dumps the row if the vendor ID has been used before:

    // CHECK ROW
    if(@$tempArray[$row[VendorID]] == 1){
    // SKIP
    }else{
    $tempArray[$row[VendorID]] = 1;
    // DISPLAY TABLE ROW HERE
    }

  5. thanks for select :)

    @Brian: you can not use MAX if you order by date/time

  6. this is an obvious and straightforward solution, however if you run this on a large table (mine was only 350k rows) you get mysql error 28. I ran this on a pretty good dedicated server with some 100gb disk space available and still got the error .. need to find a work around

  7. A possibly simpler solution is to add an having clause that only keeps the records with a timestamp = max(timestamp), something like this:

    SELECT *
    FROM INVENTORY
    GROUP BY vendors_no, product_no
    HAVING timestamp=max(timestamp)
    ORDER BY vendors_no DESC

  8. See some related thing here ? use of group before order by:

    http://arfeen.net/2012/10/mysql-use-of-group-after-order-by/


Leave a comment

No trackbacks yet.