8

It is a simple problem. I have a csv file with multiple columns I would like to extract 3 columns and save the output to a text file.

sample of my dataset:

page_id post_name link post_type likes_count 5550296508 Ben Carson www.cnn.com shared_story 192583 5830242058 John Smith www.abc.com news_story 467 9485676544 Sara John www.msc.com shared_story 462 

I would like to select three columns and save them to a text file with a comma seperator.The desired output: (or any similar format that shows the columns in a neat way. it doesn't have to be exactly like this format)

"page_id","post_name","post_type" "5550296508","Ben Carson","shared_story" "5830242058","John Smith", "news_story" "9485676544", "Sara John", "shared_story" 

I tried to use awk:

awk -F',' '{print $1,$2,$4}' Data.csv > output.txt 

It returns this output with a blank space between the columns, I would like to replace the blank space with a comma:

page_id post_name post_type 5550296508 Ben Carson shared_story 5830242058 John Smith news_story 9485676544 Sara John shared_story 

I tried printf but I am not sure I am using the correct string because it doesn't return the output I want.

awk '{printf "%s,%s,%s", $1,$2,$4}' Data.csv > output.txt 

using sed. This only replaces the first blank with a comma.

awk -F',' '{print $2,$5,$10}' Data.csv | sed 's/ /,/' > output.txt 

2 Answers 2

6

You can use below command to separate it out with comma , :

awk '{print $1","$2","$4}' Data.csv > output.txt 

Output Will be :

page_id,post_name,post_type 5550296508,Ben,www.cnn.com 5830242058,John,www.abc.com 9485676544,Sara,www.msc.com 
0
1

Your input file is not comma-separated. I am guessing that it is tab-separated. If that is the case, then try:

$ awk -F'\t' '{print "\""$1,$2,$4"\""}' OFS='","' Data.csv "page_id","post_name","post_type" "5550296508","Ben Carson","shared_story" "5830242058","John Smith","news_story" "9485676544","Sara John","shared_story" 

If that is not quite it, then try:

awk -F'\t+' '{print "\""$1,$2,$4"\""}' OFS='","' Data.csv 

How it works

  • -F'\t' tells awk to use tab as the field separator. Alternatively, -F'\t+' tells awk to use any sequence of one or more field tabs as a field separator.

  • print "\""$1,$2,$4"\"" tells awk to print a double-quote, followed by field 1 followed by a field separator followed by field 2 followed by a field separator followed by field 4 followed by a double-quote.

  • OFS='","' tells awk to use "," as the field separator on output.

3
  • it works too. And the output looks very neat. Thank you John. I just had to replace the tab delimiter with a comma.-F',' Commented Sep 22, 2019 at 23:50
  • @leena Very good. Your observation is curious, though: Your sample input, as shown in the question ("sample of my dataset"), has no commas in it all. Commented Sep 23, 2019 at 0:21
  • 1
    yes I didn't reflect the sample dataset in the best way. Thanks for your help Commented Sep 23, 2019 at 0:31

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.