I was inspired to write my own blog post on generating CSV output in Postgresql after researching the topic and finding numerous posts with wrong answers. I hope that this post will be useful to others and also to myself the next time I want to create CSV output in Postgresql.
If you just want a CSV dump of an entire table and order is not important, then run:
psql -c "COPY TABLE_NAME TO STDOUT WITH CSV HEADER " > CSV_FILE.csv
where TABLE_NAME is the name of the table you want to dump.
If you can also dump the results of a more complicated query as follows:
psql -c "COPY ( QUERY ) TO STDOUT WITH CSV HEADER " > CSV_FILE.csv
where QUERY is the query you wish to dump. E.g.
psql -c "COPY ( SELECT * FROM TABLE ORDER BY id limit 10 ) TO STDOUT WITH CSV HEADER " > CSV_FILE.csv
What Not To Do
The typical naive suggestions involve attempting to generate CSV output using basic SQL. These approaches will generate broken CSV files in many cases such as when fields contain quotes and commas. E.g.
psql -A -F ‘,’ -c ‘SELECT * from TABLE limit 10’ > CSV_FILE.csv
A CSV file (despite the name) is not simply a bunch of values separated by values. Generating proper CSV output requires handling a number of complicated corner cases. For example, you must handle the cases in which the values contain commas. Typically this is done by quoting these values. We might try quoting every field, but then what about fields that contain quotes? The point is that, generating proper CSV output is not something you’re going to be able to using a query standard SQL without incredible complexity (if at all). Built in CSV output functionality exists for a reason. Use it!
Many of the blog posts I found that suggested incorrect Postgresql CSV generation techniques contained comments describing the correct approach.
One such comment is listed below: