CSV stands for Comma Separated Values where the data lives in a simple text file that is split up by commas. You’ll know you’re dealing with CSV files if the file extension is .csv like myData.csv (this also depends on if you have your OS settings so that you can see what the file type is) or if you open the file in a text editor and notice that the data is separated by a comma. If you use Excel long enough, working with CSV files is inevitable.
Here’s a simple example of what a CSV file looks like if you open it with a text editor (preferably Sublime Text, the best text editor ever).
Name,Age Kyle,25 Leslie,30
When you open this file in Excel, the data gets transferred to the cells and it is split by the comma, which would look something like this:
This is the basic gist of CSV files, but there’s more to it than this, which we’ll cover shortly.
Working with CSV Files
Working with CSV files is pretty straightforward. However, depending on your workflow, there might be some caveats that you may want to watch out for.
Opening a CSV File
If you have a CSV file, you can open it in Excel without much trouble. Just open Excel, click Open and find the CSV file to work with (or right-click on the CSV file and choose Open in Excel). After you open the file, you’ll notice that the data is just plain text put into different cells.
Saving a CSV File
If you want to save your current workbook into a CSV file, you have to use File -> Save As… and choose CSV file. More often than not, you will get this warning:
What Excel is trying to say here is that CSV Files don’t save any kind of formatting at all. Column widths, font styles, colors, etc. will not be saved. Just your plain old data will be saved in a comma-separated file.
Note that even after you save it, Excel will still show the formats that you had, so don’t be fooled by this and think that when you open the workbook again that your formats will still be there. They won’t be.
Even after you open up a CSV file in Excel, if you apply any kind of formatting at all, like adjust the column widths to see the data, Excel will still warn you that you can’t save the formats that you added. You will get a warning like this one:
So beware that your formats will never be saved in CSV Files.
If this article helps solve your problem, please consider supporting me because it takes a lot of effort (and coffee!) to provide this content.
👇 There's a special gift for you in return for your support.
Enjoy the post!
Why CSV Files?
CSV files are used as a way to communicate data between different applications. Say you had a database application and you wanted to export the data to a file. If you want to export it to an Excel file, then the database application would need to support exporting to XLS* files.
However, since the CSV file format is extremely simple and lightweight (much more so than XLS* files), it’s easier for different applications to support it. In its basic usage, you have a line of text, with each column of data split up by a comma. That’s it. And because of this simplicity, it’s easy for developers to create Export / Import functionality with CSV files to transfer data between applications instead more complicated file formats.
While the format is simple, it is not yet an internet standard, even though there is a proposal for one (by the way, it’s the shortest RFC I’ve ever seen, which is pretty nice). There are some problems that need to be overcome with CSV files and some applications handle this in different ways.
For example, what if your data has a comma in it? How do you separate that out? What about line breaks (like Alt+Enter)?
Overcoming Data Complexity
Let’s discuss how to overcome the following two issues with CSV files: commas in the data and handling new lines.
What if there are commas in the data?
Imagine you had this data:
|Star Wars||A long time ago in a galaxy far, far away…|
|What color is your parachute?||In today’s challenging job-market, the time-tested advice of What Color Is Your Parachute? is needed more than ever.|
If you took that data and put it into a CSV file, it would look like this:
Book,Description Star Wars,A long time ago in a galaxy far, far away... What color is your parachute?,In today's challenging job-market, the time-tested advice of What Color Is Your Parachute? is needed more than ever.
However, if you imported that into Excel, it would look like this:
As you can see, even though we want the data in only columns A and B, there is some data that went into column C. This is because of the commas that exist in the data. So how do we get around this?
The commonly accepted solution is to wrap the data inside of quotation marks.
Here’s what the new data file looks like in a text editor, where the data items that have a comma are in quotes:
Book,Description Star Wars,"A long time ago in a galaxy far, far away..." What color is your parachute?,"In today's challenging job-market, the time-tested advice of What Color Is Your Parachute? is needed more than ever."
And when you open it in Excel:
There we go. Now the data is in columns A and B only, as we intended it to be.
What if there is a new line in one of the rows?
Let’s say that you have a new line in your data in an Excel file that looks something like this:
You can see that there is a new line in cell B2, but how would this look in a CSV file?
If you save the workbook as CSV and then open that file in a text editor, you can see what happens:
Data1,Data2 Some item,"Data with a new line in it" Some more,And more stuff
It might be hard to tell, but basically the quotation marks are used again in this example, but this time it’s to allow the newline character to be in the data and span across two rows.
What if there is a quotation mark in my data?
At this point you may be wondering “what happens if there is a quotation in my data?” This is a good question and the answer might be familiar to you if you ever had to put a quotation mark in your formula output.
If your data is not already surrounded by quotes, you can just add the quote. However, if you are using quotes to surround your data because it has a comma in it, then you have to “escape” the quotation mark by adding another quotation mark.
Here are some examples:
Data1,Data2 First,Item with a "quote" in it Second,Item with a "quote and , comma" in it Third,"Item with a ""quote and , comma"" in it" Fourth,"Another item, with a "" quote in it"
And here is the output in Excel:
Let’s break down each item:
These are just the headers. No big deal here.
First,Item with a "quote" in it
The first row of the data. In the second column, the “quote” is within the data and Excel understands to interpret this literally when it processes the file. That’s why you will see the quote actually make it to the cell in B2.
Second,Item with a "quote and , comma" in it
Here, in the second column, you will see that the comma within the quotes actually split out the data into a third column. This is because the entire data field was not encapsulated within quotes, so Excel interpreted this as two different columns of data rather than one.
Third,"Item with a ""quote and , comma"" in it"
Here, we show how to properly do what we were trying to do in the second row of data. We encapsulated the second column in quotes and we escaped the quotes that we wanted in our data by adding an extra quote. Excel interpreted this as we wanted it to, by having only two columns of data and for it to have quotes within the data.
Fourth,"Another item, with a "" quote in it"
This last example shows how to add a quote in the second column that also has a comma within it. So we need to add the encapsulating quotes for the comma so Excel doesn’t split the data up, and we also need to add the escaping quote so that Excel doesn’t think that the data is finished before the word “quote.”
Different Separators in CSV Files
CSV files don’t always have to use a comma to separate the values in the data. Many applications allow you to use a set of commonly used separators or you can specify your own. However, you can’t have multiple separators in a single CSV file. Once you pick a separator, that’s the separator for the entire file.
While there are applications out there that allow you to specify a different character separator, Excel doesn’t have a native way to change the separator. You can, however, change the default list separator in your OS which would work, but I think we’ll have to save that for another post.
I’ve seen the following as typical characters to use for the separator in other applications:
, (comma) TAB (the tab key) ; (semi-colon) | (pipe) ^ (carat)
For me, my favorite to use is the pipe | character. It’s easy to see and it’s almost never used in data (unless you’re saving PowerShell command snippets), making it a unique choice for saving to CSV.
If you are going to use another character other than the comma for separating values, there is an important line that you need to make sure is at the beginning of the file. For example, let’s say we had this data:
Header1|Header2 Item one|Item "two" Item 3|Item 4
The separator is clearly the pipe | character. But if you opened this in Excel, it would look like this:
Notice how all of the data is in column A? This is because Excel is not finding a comma for separating values, which is its default separator. In order to tell Excel to use another separator, you have to add a special line at the beginning of the file:
sep=| Header1|Header2 Item one|Item "two" Item 3|Item 4
By adding the line:
We are telling Excel to use the pipe | character as the separator (sep = separator).
Now when we open the file, it looks like this:
So if you ever run into an issue opening the CSV file, check the file in a text editor and see if that special line is in there.
Hopefully this post helps give you a better idea of how CSV files work, why they are useful, and how you can use them for your own needs.
If you’d like to understand more or think I may have missed something, please let me know in the comments below!