How do you get data out of your system and into a proper CSV format? Good question. If you’ve tried this, you know it can be harder than it first appears. We’ll talk about Excel first, because it’s easy. Relational databases can be trickier, so I’ll share some tips for getting good CSVs out of some of the most common relational databases.
<br>Use Excel to Create a CSV
Excel does a decent job of exporting proper CSV files, so if you are starting from an Excel file, you’re in luck. But Excel is limited to handling about a million rows.
Here’s how to use Excel to create a CSV:
Open the Excel file and choose Save As.
Select the format Comma Separated Values (.csv).
If you see a warning that the worksheet contains features that won’t work in CSV, click Continue to save it anyway.
<br>Get a CSV Out of Your Relational Database
As it turns out, most databases do not have good utilities for exporting proper CSV files. This is why there are so many malformed CSV files that exist in this world to cause confusion and pain to people who just want to load some data.
Most database vendors probably don’t spend a lot of effort on exporting perfect CSV files, because they prefer that you keep the data inside their database. So although their loading tools may be great, many of their export tools are lacking.
So how can you get the best possible starting point for your data load, when the source data comes from a relational database?
You may be able to use the database vendor’s SQL, but only if it has enough text processing commands to replace double quotes, and do any other processing you need. Typically, you’ll to use the enclosing character to place double quotes around all the fields, to be on the safe side. This protects you if the delimiter appears in the data values.
If you’re using Postgres, there is a COPY command that will export to CSV.
If you’re using SQL Server the logical approach would be to use bcp, their bulk command copier, but shockingly, you can’t. It just doesn’t work. You’re stuck using the Management Studio GUI, which works fine for small tables, but is not scalable to hundreds of GigaBytes.
In Oracle, there are a few options:
You can do the export with SQL*Plus, which is a bit of a pain because you have to set up all the formatting using SQL*Plus system variables.
You can use Oracle SQL Developer.
There are some helpful utilities, like TOAD, that you can buy from third parties.
Hopefully, this will be enough to get you started on the right track to a clean, perfect CSV export. If not, we’ll tackle the most common problems with CSV files and how to fix them in the final post in this series.
(By the way, if you missed the earlier segments of this blog series, you can check them out here and here.)