I am working with tweeter text data in JSON format which I have stored in a text file. I am not interested in retweets and i created a parser that could extract most of the text, but somehow some retweets also came along. So i was wondering for a quick solution for this problem, i.e. to remove the text that starts with RT.
So a text in the file looks like
`"RT ...... RT ....."` "..." are the other words in the sentence. I would like to only remove the lines starting with the word "RT" and save it in another file. The same word RT might come in the middle of text that doesn't start with RT, such texts should not be removed. I tried with the following command, which I am not entirely sure
grep -v "RT" twitterDataset.txt > clean_RT.txt I would really appreciate for a solution to this problem and an explanation of the code would be also helpful.
grep -v "^RT"?grepwould also remove any key whose name containedRT). Please include a representable sample of your data.