# Create a Unique List in Excel based on Criteria

February 23, 2017

Nathan is working on a spreadsheet that contains a list of car models and
owners. He needs to create a unique list of owners per car. “Maybe I can use an
`IF()`

formula,” he thinks to himself and decides to give it a go. However,
after several failed attempts he decides to search the web to see how people
create a unique list in Excel based on a specific condition. He comes across a
formula that looks pretty complicated, but others had claimed it worked for
them, so Nathan decides to give it a whirl. Then he copy/pastes the formula in a
cell and adjusts the ranges for his worksheet. After that he copies the formula
down to create the list. It works! “Perfect,” Nathan excitedly thinks to
himself, “now I’ll just copy that to the right…” and when he does, the formula
breaks. He tries modifying the formula, but it’s too advanced for him. One small
change upsets the formula and returns errors. Frustrated and tired, he
reluctantly decides he’ll just create the list by hand.

Does this sound familiar? We’ve all been here at one point or another. The problem with these complicated formulas is that most people don’t understand what they’re copying / pasting into Excel. This leads to many tries and retries, only to lead to frustration. Excel has literally hundreds of formulas and it’s easy to get lost if you have a lot of nested functions to work with. It’s great that people share their clever formulas online and the internet is full of helpful advice, but if you find a formula online and simply copy / paste it without understanding how it works, you limit yourself from the power it has, and therefore, you limit the power that you have.

In this post, I’d like to walk us through how to understand one of these complicated formulas: how to create a unique list in Excel based on criteria. Let’s get started.

## The Problem

Let’s say you have this basic info:

Download this workbook to follow along.

On the left is an Excel Table named **Cars** (if you’re not familiar with Excel
Tables, click here). This
will be our data source we want to create a list from.

On the right side of the worksheet is where we will make that list, with each car in its own column. Note that there are duplicate cars owned by different people in the data source - this can make it difficult to create a unique list of car owners, which is why we need a more advanced formula.

We also want the formula needs to be dynamic, meaning that we can copy it down and to the right to create the unique lists for each car model.

**How do you think you would construct this kind of formula?**

## The Solution

Here’s what the resulting formula will look like:

`=IFERROR(INDEX(Cars,SMALL(IF(Cars[Car]=E$1,ROW(Cars)-1),ROW(1:1)),2),"")`

Please note that this is an array formula and must be entered by
pressing `Ctrl+Shift+Enter`

.
Click here to
learn more about array formulas.

When we enter this in `E2`

, we get Susan as the first answer.

Then copy down and to the right and we have our solution:

### Before You Simply Copy / Paste!

Remember what we discussed in the intro? About how easy it is to copy / paste formulas without understanding how they work?

How easy is it to copy / paste answers like these?

*Very easy.*

And how much power does doing that have?

*Very little.*

Don’t you want to harness the power of building complex formulas?

### How does this formula work?

Let’s break this formula down piece-by-piece to understand what’s *really* going
on.

First, the main `IFERROR()`

function is there to handle any errors and replaces
them with a blank cell. Let’s now focus on the part of the formula that’s doing
the real work.

#### The Index() Function’s Role

The `INDEX()`

formula works like so: `INDEX(array, row, column)`

Here is our `INDEX()`

formula for the unique list we are trying to create:

```
INDEX(Cars,SMALL(IF(Cars[Car]=E$1,ROW(Cars)-1),ROW(1:1)),2)
|__| |_________________________________________| |
| | |
array row column
```

If you’re not familiar with the `INDEX()`

function, check out
this link
to learn more.

Basically, the `INDEX()`

function will look at a table (the `array`

), then based
on the `ROW`

and `COLUMN`

you give it, a single value will be returned. This
single value will be one of the items in our unique list. The secret here is to
manipulate the `ROW`

portion, which we will dig into shortly.

The `array`

that we reference is the Cars table.

The `ROW`

portion of the formula has a more complicated formula which allows us
to get a list of non-repeating values from the data set. This is actually where
the
Array Formula
(`Ctrl+Shift+Enter`

) is required and where the unique list comes from.

The `COLUMN`

portion simply says `2`

, which means “use the 2nd column from the
array (table) I’ve given you.” While the 2nd column in our case is column `B`

,
the formula actually refers to the 2nd column of the array you gave it. If we
had our table starting in cell `E5`

, the 2nd column of the table would be in
column `F`

.

#### The Small() Function’s Role

Let’s break down the `ROW`

portion of the `INDEX()`

function a bit further.

The `SMALL()`

function simply says:

`SMALL(data_set, k-th_smallest_item)`

Here is the formula:

```
SMALL(IF(Cars[Car]=E$1,ROW(Cars)-1),ROW(1:1))
|___________________________| |______|
| |
data_set k-th_smallest_item
```

Here’s a quick sample formula to better understand how the SMALL() function works:

`=SMALL({4,5,1,3}, 2)`

The data set is an array of `4, 5, 1, and 3`

. The `k-th_smallest_item`

of the
formula is set to `2`

. This returns the second smallest item of the set, which
is `3`

.

Both portions of our `SMALL()`

function have another formula in them. Let’s dig
into this even further (we’re almost there, I promise).

##### The If() Function’s Role in the data_set Portion

`IF(Cars[Car]=E$1,ROW(Cars)-1)`

This is the part that needs the array formula (`Ctrl+Shift+Enter`

) and this is
also where things get a little tricky, but it’s not as bad as you might think.

This function is looking in the Car column of the Cars table and looking for a
match from cell `E1`

, which is the Car’s make and model. If `True`

, we want to
return the row number of where that Car was found. We use the `ROW()`

function
to accomplish that. Right now, we’re looking to find the owners of the Honda
Civic, which are Susan and Miranda. To get that, we first need to find all of
the rows that match the car that we’re looking for, then we will simply move 1
cell to the right to get the name of the owner. All matching rows of cars will
have the corresponding owners in the next cell and this is how we will get our
unique list.

A Note about the word ‘Rows’I’d like to take a second and explain something about the word “Rows”.

When working with Excel Tables (or any “data array”) it’s important to realize that the first “row of data” is actually considered Row 1 regardless of whether the data starts at A1 or Z35. When using the

`INDEX()`

function and your data is in`C5:D10`

and you have headers in`C5`

and`D5`

, then your data begins at`C6`

and ends at`D10`

. Even though on the worksheet it’s the 6th row for where the data begins, it’s still considered Row`1`

in your`INDEX()`

function (and other functions that work with arrays).Because of this confusion, I will refer to rows in an array as “data rows” and rows in the worksheet as “worksheet rows”.

The reason for the `-1`

is because if you take the `Honda Civic`

as an example,
that text first appears on row `2`

of the worksheet. While the Excel Table’s
data begins on Row `2`

, the first “row of data” is still considered “row 1” when
working with the table (these are data rows). The `ROW()`

function only returns
rows of the worksheet, regardless of any data you may be referencing. And since
`ROW()`

will return `2`

for `Honda Civic`

, we need to adjust for how `INDEX()`

is working with the data rows.

For finding `Susan`

and `Miranda`

, those would be data rows `1`

and `4`

because
the data starts in `A2`

and ends in `B8`

.

Here’s how this `IF()`

function is broken down:

`IF(Cars[Car]=E$1,ROW(Cars)-1)`

`Cars[Car]`

becomes:

`IF({"Honda Civic";"Toyota Corolla";"Ford Mustang";"Honda Civic";"Honda CRV";"Toyota Corolla";"Honda CRV"}=E$1,ROW(Cars)-1)`

`E$1`

becomes:

```
IF({"Honda Civic";"Toyota Corolla";"Ford Mustang";"Honda Civic";"Honda CRV";"Toyota Corolla";"Honda CRV"}="Honda Civic",ROW(Cars)-1)
------------- ------------- -------------
```

Notice that I underlined the matches that will be made. This will
correspond to how the condition section is evaluated in the `IF()`

function:

```
IF({TRUE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE},ROW(Cars)-1)
---- ----
```

Now, the `ROW(Cars)-1`

part gets evaluated:

```
IF({TRUE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE},{1;2;3;4;5;6;7})
---- ---- - -
```

Also, since we didn’t include a `value_if_false`

section for the `IF`

statement,
Excel gives us `FALSE`

be default. The resulting `IF`

formula is actually:

`IF({TRUE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE},{1;2;3;4;5;6;7},{FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE})`

The `TRUE`

matches in the first parameter of the `IF`

statement will correspond
to the same item in the `value_if_true`

section, whereas if `FALSE`

is found in
the `IF`

statement, we simply get `FALSE`

as the `IF`

formula is building up the
resulting array. The resulting formula after the `IF`

statement is evaluated is:

```
{1;FALSE;FALSE;4;FALSE;FALSE;FALSE}
- -
```

What this says is for the Honda Civic, we found Honda Civic in data rows `1`

and `4`

(which correspond to the data rows where Susan and Miranda are).

##### The Row() Function’s Role in the k-th_smallest_item Portion

Coming back to our `SMALL()`

function:

`SMALL({1;FALSE;FALSE;4;FALSE;FALSE;FALSE},ROW(1:1))`

The part:

`ROW(1:1)`

Will simply return `1`

when we put the formula in cell `E2`

, like we are. When
we copy the formula down, it will become `ROW(2:2)`

which will return `2`

.

Because this formula is sitting in the `k-th_smallest_item`

portion of the
`SMALL()`

function, when we copy the formula down, instead of using the first
match of who has a `Honda Civic`

(i.e. `Susan`

) we will use the 2nd owner (i.e.
`Miranda`

).

`SMALL({1;FALSE;FALSE;4;FALSE;FALSE;FALSE},1)`

This will now simply pick the 1st smallest item in the set, which is `1`

. Note
that it ignores the `FALSE`

items in the set.

The result for the SMALL() function is simply:

`1`

### Putting it all back together

Coming back to the formula:

`INDEX(Cars,SMALL(IF(Cars[Car]=E$1,ROW(Cars)-1),ROW(1:1)),2)`

When this is in cell `E2`

, the evaluation is:

`INDEX(Cars,1,2)`

Which means to say, “in the Cars table, return data row 1 and data column 2,”
corresponding to `Susan`

. That’s what we get for the first formula.

When we copy the formula down by one cell (in `E3`

), here is the breakdown (with
changes underlined):

```
INDEX(Cars,SMALL(IF(Cars[Car]=E$1,ROW(Cars)-1),ROW(2:2)),2)
--------
INDEX(Cars,SMALL({1;FALSE;FALSE;4;FALSE;FALSE;FALSE},2)),2)
- -
INDEX(Cars,4,2)
Miranda
```

Notice here that since `ROW(1:1)`

changed to `ROW(2:2)`

that it’s simply
adjusting our `k-th_smallest_item`

in the `SMALL()`

function, which is how we
can avoid creating duplicates.

Also notice that when you copy to the right, the ONLY thing that changes is the
column in the `IF()`

function:

`INDEX(Cars,SMALL(IF(Cars[Car]=F$1,ROW(Cars)-1),ROW(1:1)),2)`

I didn’t lock this column so we can simply copy to the right and have the rest of the worksheet adapt correctly.

#### What happens if you copy down or to the right too far?

You’ll get a `#NUM!`

error, which is why we put the `IFERROR()`

portion in there
(to make any error just show a blank string…essentially hiding the errors).