Create a Unique List in Excel - Feature Image

Create a Unique List in Excel based on Criteria

Nathan is working on a spreadsheet that contains a list of car models and owners. He needs to create a unique list of owners per car. “Maybe I can use an IF() formula,” he thinks to himself and decides to give it a go. However, after several failed attempts he decides to search the web to see how people create a unique list in Excel based on a specific condition. He comes across a formula that looks pretty complicated, but others had claimed it worked for them, so Nathan decides to give it a whirl. Then he copy/pastes the formula in a cell and adjusts the ranges for his worksheet. After that he copies the formula down to create the list. It works! “Perfect,” Nathan excitedly thinks to himself, “now I’ll just copy that to the right…” and when he does, the formula breaks. He tries modifying the formula, but it’s too advanced for him. One small change upsets the formula and returns errors. Frustrated and tired, he reluctantly decides he’ll just create the list by hand.

Does this sound familiar? We’ve all been here at one point or another. The problem with these complicated formulas is that most people don’t understand what they’re copying / pasting into Excel. This leads to many tries and retries, only to lead to frustration. Excel has literally hundreds of formulas and it’s easy to get lost if you have a lot of nested functions to work with. It’s great that people share their clever formulas online and the internet is full of helpful advice, but if you find a formula online and simply copy / paste it without understanding how it works, you limit yourself from the power it has, and therefore, you limit the power that you have.

In this post, I’d like to walk us through how to understand one of these complicated formulas: how to create a unique list in Excel based on criteria. Let’s get started.

The Problem

Let’s say you have this basic info:

How to Create a Unique List - Setup

Download this workbook to follow along.

On the left is an Excel Table named Cars (if you’re not familiar with Excel Tables, click here). This will be our data source we want to create a list from.

On the right side of the worksheet is where we will make that list, with each car in its own column. Note that there are duplicate cars owned by different people in the data source – this can make it difficult to create a unique list of car owners, which is why we need a more advanced formula.

We also want the formula needs to be dynamic, meaning that we can copy it down and to the right to create the unique lists for each car model.

How do you think you would construct this kind of formula?

The Solution

Here’s what the resulting formula will look like:

=IFERROR(INDEX(Cars,SMALL(IF(Cars[Car]=E$1,ROW(Cars)-1),ROW(1:1)),2),"")

Please note that this is an array formula and must be entered by pressing Ctrl+Shift+Enter. Click here to learn more about array formulas.

When we enter this in E2, we get Susan as the first answer.

How to Create a Unique List - Solution

Then copy down and to the right and we have our solution:

How to Create a Unique List - Solution Expanded

Before You Simply Copy / Paste!

Remember what we discussed in the intro? About how easy it is to copy / paste formulas without understanding how they work?

How easy is it to copy / paste answers like these?

Very easy.

And how much power does doing that have?

Very little.

Don’t you want to harness the power of building complex formulas?

fry - go on

How does this formula work?

Let’s break this formula down piece-by-piece to understand what’s really going on.

First, the main IFERROR() function is there to handle any errors and replaces them with a blank cell. Let’s now focus on the part of the formula that’s doing the real work.

The Index() Function’s Role

The INDEX() formula works like so:
INDEX(arrayrow, column)

Here is our INDEX() formula for the unique list we are trying to create:

INDEX(Cars,SMALL(IF(Cars[Car]=E$1,ROW(Cars)-1),ROW(1:1)),2)
      |__| |___________________________________________| |
       |                       |                         |
      array                   row                     column

If you’re not familiar with the INDEX() function, check out this link to learn more.

Basically, the INDEX() function will look at a table (the array), then based on the ROW and COLUMN you give it, a single value will be returned. This single value will be one of the items in our unique list. The secret here is to manipulate the ROW portion, which we will dig into shortly.

The array that we reference is the Cars table.

The ROW portion of the formula has a more complicated formula which allows us to get a list of non-repeating values from the data set. This is actually where the Array Formula (Ctrl+Shift+Enter) is required and where the unique list comes from.

The COLUMN portion simply says 2, which means “use the 2nd column from the array (table) I’ve given you.” While the 2nd column in our case is column B, the formula actually refers to the 2nd column of the array you gave it. If we had our table starting in cell E5, the 2nd column of the table would be in column F.

The Small() Function’s Role

Let’s break down the ROW portion of the INDEX() function a bit further.

The SMALL() function simply says:

SMALL(data_set, k-th_smallest_item)

Here is the formula:

SMALL(IF(Cars[Car]=E$1,ROW(Cars)-1),ROW(1:1))
      |___________________________| |______|
                    |                  |
                data_set        k-th_smallest_item

Here’s a quick sample formula to better understand how the SMALL() function works:

=SMALL({4,5,1,3}, 2)

The data set is an array of 4, 5, 1, and 3. The k-th_smallest_item of the formula is set to 2. This returns the second smallest item of the set, which is 3.

Both portions of our SMALL() function have another formula in them. Let’s dig into this even further (we’re almost there, I promise).

The If() Function’s Role in the data_set Portion

IF(Cars[Car]=E$1,ROW(Cars)-1)

This is the part that needs the array formula (Ctrl+Shift+Enter) and this is also where things get a little tricky, but it’s not as bad as you might think.

This function is looking in the Car column of the Cars table and looking for a match from cell E1, which is the Car’s make and model. If True, we want to return the row number of where that Car was found. We use the ROW() function to accomplish that. Right now, we’re looking to find the owners of the Honda Civic, which are Susan and Miranda. To get that, we first need to find all of the rows that match the car that we’re looking for, then we will simply move 1 cell to the right to get the name of the owner. All matching rows of cars will have the corresponding owners in the next cell and this is how we will get our unique list.

A Note about the word 'Rows'
I’d like to take a second and explain something about the word “Rows”.

When working with Excel Tables (or any “data array”) it’s important to realize that the first “row of data” is actually considered Row 1 regardless of whether the data starts at A1 or Z35. When using the INDEX() function and your data is in C5:D10 and you have headers in C5 and D5, then your data begins at C6 and ends at D10. Even though on the worksheet it’s the 6th row for where the data begins, it’s still considered Row 1 in your INDEX() function (and other functions that work with arrays).

Because of this confusion, I will refer to rows in an array as “data rows” and rows in the worksheet as “worksheet rows”.

The reason for the -1 is because if you take the Honda Civic as an example, that text first appears on row 2 of the worksheet. While the Excel Table’s data begins on Row 2, the first “row of data” is still considered “row 1” when working with the table (these are data rows). The ROW() function only returns rows of the worksheet, regardless of any data you may be referencing. And since ROW() will return 2 for Honda Civic, we need to adjust for how INDEX() is working with the data rows.

For finding Susan and Miranda, those would be data rows 1 and 4 because the data starts in A2 and ends in B8.

Here’s how this IF() function is broken down:

IF(Cars[Car]=E$1,ROW(Cars)-1)

Cars[Car] becomes:

IF({"Honda Civic";"Toyota Corolla";"Ford Mustang";"Honda Civic";"Honda CRV";"Toyota Corolla";"Honda CRV"}=E$1,ROW(Cars)-1)

E$1 becomes:

IF({"Honda Civic";"Toyota Corolla";"Ford Mustang";"Honda Civic";"Honda CRV";"Toyota Corolla";"Honda CRV"}="Honda Civic",ROW(Cars)-1)

Notice that I highlighted the matches that will be made. This will correspond to how the condition section is evaluated in the IF() function:

IF({TRUE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE},ROW(Cars)-1)

Now, the ROW(Cars)-1 part gets evaluated:

IF({TRUE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE},{1;2;3;4;5;6;7})

Also, since we didn’t include a value_if_false section for the IF statement, Excel gives us FALSE be default. The resulting IF formula is actually:

IF({TRUE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE},{1;2;3;4;5;6;7},{FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE})

The TRUE matches in the first parameter of the IF statement will correspond to the same item in the value_if_true section, whereas if FALSE is found in the IF statement, we simply get FALSE as the IF formula is building up the resulting array. The resulting formula after the IF statement is evaluated is:

{1;FALSE;FALSE;4;FALSE;FALSE;FALSE}

What this says is for the Honda Civic, we found Honda Civic in data rows 1 and 4 (which correspond to the data rows where Susan and Miranda are).

The Row() Function’s Role in the k-th_smallest_item Portion

Coming back to our SMALL() function:

SMALL({1;FALSE;FALSE;4;FALSE;FALSE;FALSE},ROW(1:1))

The part:

ROW(1:1)

Will simply return 1 when we put the formula in cell E2, like we are. When we copy the formula down, it will become ROW(2:2) which will return 2.

Because this formula is sitting in the k-th_smallest_item portion of the SMALL() function, when we copy the formula down, instead of using the first match of who has a Honda Civic (i.e. Susan) we will use the 2nd owner (i.e. Miranda).

SMALL({1;FALSE;FALSE;4;FALSE;FALSE;FALSE},1)

This will now simply pick the 1st smallest item in the set, which is 1. Note that it ignores the FALSE items in the set.

The result for the SMALL() function is simply:

1

Putting it all back together

How to Create a Unique List - Solution

Coming back to the formula:

INDEX(Cars,SMALL(IF(Cars[Car]=E$1,ROW(Cars)-1),ROW(1:1)),2)

When this is in cell E2, the evaluation is:

INDEX(Cars,1,2)

Which means to say, “in the Cars table, return data row 1 and data column 2,” corresponding to Susan. That’s what we get for the first formula.

When we copy the formula down by one cell (in E3), here is the breakdown (with changes highlighted):

INDEX(Cars,SMALL(IF(Cars[Car]=E$1,ROW(Cars)-1),ROW(2:2)),2)
INDEX(Cars,SMALL({1;FALSE;FALSE;4;FALSE;FALSE;FALSE},2)),2)
INDEX(Cars,4,2)
Miranda

Notice here that since ROW(1:1) changed to ROW(2:2) that it’s simply adjusting our k-th_smallest_item in the SMALL() function, which is how we can avoid creating duplicates.

Also notice that when you copy to the right, the ONLY thing that changes is the column in the IF() function:

INDEX(Cars,SMALL(IF(Cars[Car]=F$1,ROW(Cars)-1),ROW(1:1)),2)

I didn’t lock this column so we can simply copy to the right and have the rest of the worksheet adapt correctly.

What happens if you copy down or to the right too far?

You’ll get a #NUM! error, which is why we put the IFERROR() portion in there (to make any error just show a blank string…essentially hiding the errors).

Are you ready to master Excel Dashboards?

Learn Excel Dashboard Course Power BI Webinar

Still not convinced? Check out my review of the course!

  • Amar Najal

    Hello Joseph. I am confused in the formula “IF(Cars[Car]=E$1,ROW(Cars)-1)”, there’s no condition set if the statement is false, how come it did not give a “FALSE” value in return? =)

    • Joseph

      Hi Amar,

      You’re right. I should have included that as part of the evaluation. I’ll update the post to show the full evaluation of the formula. Thanks for pointing this out!

      Joseph

  • Scottie

    Hi Joseph,
    I’ve found your article to be very good. But i was wanting to confirm if there is a way to add a secondary or even a third criteria to help define a more distinct list.

    • Scottie

      After doing some more research looks like it was because i was trying to do “and” in the array changing to * to simulate “and” has allowed me to define multiple criteria and return the correct results.

      • Hunter

        Can you provide more information on this? I also would like to do multiple criteria and am having trouble inserting the “AND” formula into the IF statement. Any help would be much appreciated.

        • Joseph

          Hi Hunter,

          Could you please provide the formula you are trying to use and at which part you are stuck?

          Thank you,
          Joseph

  • Deborah Good

    Is there a way to modify this formula so that the resulting list skips blanks?

    • Joseph

      Hi Deborah,

      Sorry I missed this, but in case anyone else is wondering, here’s how to do it:

      =IFERROR(INDEX(Cars,SMALL(IF(Cars[Car]=E$1,IF(Cars[Person]””,ROW(Cars)-1)),ROW(1:1)),2),””)

      Note that this will skip blanks in the second column of Cars[Person]. You could change this to Cars[Car] to skip entries that don’t have a car.

      Initially, I thought that I could use AND(...) in the IF statement inside the SMALL() function, but the AND() function doesn’t work well when comparing two arrays. That’s why you have to use a nested IF statement here (or at least, that’s one solution to this).

      • Tim Newman

        Thank you for this, Joseph. Very informative. Is there to change this to only return unique values? For instance, if cell C5 of your sheet was also “Susan”, your formulas in column E would only return Susan one time (followed by blanks).

        Currently I am using =IFERROR(INDEX(TB!$I$2:$I$92,MATCH(0,INDEX((LEN(TB!$I$2:$I$92)=0)+COUNTIF($B$7:B7,TB!$I$2:$I$92),0),0)),””) to create a dynamic list of unique values, but I’m not sure how to add in a lookup criteria for column H. I was trying to merge some of your logic in this post with my formula, but have been unsuccessful.

        Any help would be greatly appreciated.

        • Joseph

          Hi Tim,

          I’m glad you liked the post and thanks for the feedback. I do appreciate it.

          I think I understand your question. Using the example that you gave where we have 2 Susans in the list, here’s how you can change the formula (entered in E2 as an array formula):

          =IFERROR(INDEX(Cars[Person],MATCH(0,IF(Cars[Car]=$E$1,COUNTIF($E$1:$E1,Cars[Person])),0)),””)

          It’s very similar to what you had, but the main difference is that the COUNTIF() portion is wrapped around an IF statement that holds the criteria.

          I hope this helps.
          Joseph

          • Tim Newman

            Thank you, Joseph! I had to toy with it a bit to get it to work on my spreadsheet but was able to get it working. Thanks again!

          • Joseph

            Sure thing, Tim 🙂 glad to help!

  • Edd

    Hi, I’ve appropriated this formula to create a list of case numbers with an action due on a particular date. I’ve successfully done it for today or tomorrow (based on a cell with that value in), but would like to create a list of case actions coming up in the next few working days. I’ve tried inserting an AND(>2days time,D$14),(‘Complaints Log’!$AC$2:$AC$100<$P$14)),ROW('Complaints Log'!$A$2:$AJ$100)-1),ROW(1:1)),4)," ")

    It seems to only return the first case on the list (which actually has an action due today in my test scenario).

    If anyone can point out where I have gone wrong I would be very grateful.

    Kind regards,

    Edd

    • Edd

      =IFERROR(INDEX(‘Complaints Log’!$A$2:$AJ$500,SMALL(IF(((‘Complaints Log’!$AC$2:$AC$500>D$14)*(‘Complaints Log’!$AC$2:$AC$500<$P$14)),ROW('Complaints Log'!$A$2:$AJ$500)-1),ROW(1:1)),4)," ")

      Just in case anyone reads and could follow my problem above, I have solved it. I'm sure I had tried this before (a few times) without success, but it now works. First time I got it to work I'd reduced the data range to A2:A10, thinking it would help me follow the evaluate tool, but it worked. I used find and replace to change it to A2:AJ500 and it still worked. The range is large enough, fortunately we dont get THAT many complaints!

      • Joseph

        Hi Edd,

        Glad to see you got it sorted out. I think I know why you were running into those issues. I believe it’s because when you do:

        AND(bool1,bool2)

        It won’t work with arrays, so this would work return True:

        AND(TRUE,TRUE)

        While this would return just False instead of {TRUE;FALSE}

        AND({TRUE;FALSE},{TRUE;FALSE})

        The AND() function does not seem to work well with arrays. It probably evaluates the first array constant and when it finds false, it calls the whole thing false.

        Also, if you use this:

        ={TRUE;FALSE}*{TRUE;FALSE}

        And put that in cells A1 and A2 as an array formula, it will result in:

        1
        1

        But if you surround each array constant with parenthesis like this:

        =({TRUE;FALSE})*({TRUE;FALSE})

        A1 and A2 will have:

        1
        0

        Which is what I would expect. So it seems that operator precedence is making a difference in the array formula here, which is how I think you solved your problem, by using the boolean workaround with *, but adding the parenthesis.

        Pretty interesting stuff. I should probably write a post on this topic 🙂

  • Kazh

    Hi, I wanted to get a list (column) from headers in another Excel table, and it’s only thanks to this very detailed article that I could understand how to tweak it to my liking.

    I case it could help: (From an Excel Table to another Excel Table)

    =IFERROR(INDEX(Courses[#Headers],1,SMALL(IF(IFERROR(SEARCH(“2″,Courses[[#Headers];[Intermarché]:[Cf2]]),0)=0,COLUMN(Courses[[#Headers];[Intermarché]:[Cf2]])),ROW()-1)),””)

    Basically, from the headers in the “Courses” table, I wanted a list with all the headers that didn’t have “2” in it. The headers in question were between the columns “Intermarché” and “Cf2”. Considering the source data was in a row instead of a column, there was a need to switch the “SMALL” formula to the column value of the INDEX formula, and keep “1” for the row value (as I’m dealing with headers). “ROW()-1” works very well in an Excel table.

    Thanks so much !

    P.S.: Might be some syntax mistakes in the formula as it came from a French Excel

    • Joseph

      Wow, that’s an interesting solution to a complex problem there, nice work! I’m glad I was able to help you out 🙂