-1

I have been having a look at this for a while but I haven't found any answer yet.

I've got a curl command to send a HTTP POST request to a server, then I have created a script called "tmg.sh" which looks like the next one:

#! /bin/bash echo "There you go:" sleep 3s curl "http://tmg.xunta.gal/consulta-tarxeta?blah_blah_blah&numero=$1" echo "Thanks!" 

Right then, so when I write the next command on the terminal:

chmode u+x ./tmg.sh 

Because for some reason even in root account if I don't do this, it returns: bash: ./tmg.sh: Permission denied, but anyway let's go on, after I have done that, when I write the following:

./tmg.sh NUMBER_GOES_HERE 

That number is the variable, then I got this answer:

There you go: <html code not relevant> <div class="infoContido"><p>Non hai ningunha recarga para o n&uacute;mero de tarxeta introducido.</p></div> <html code not relevant> Thanks! 

Right, here comes my question, how can I get just a part of the whole HTML code? I mean, I just want a part of the website, something like this:

Non hai ningunha recarga para o número de tarxeta introducido. 

Also, I'd like to notice that as I get a full page, there are plenty of <p>, <div>... Is that possible, if it is, how should I edit my script to get just this part?

Thank you so much and have a lovely day!

2
  • 1
    Please provide a working example, with the exact command you are running, the exact web-site address, the output, and the expected result. Please use a public website which everyone can access for the example. Commented Jun 18, 2017 at 12:13
  • 1
    This is a very simple task with an Xpath parser, but which one to use depends on your platform. Also, your terminology is off; you want to "extract" the text from an HTML "element". Google that (maybe add "div" and "class") and you should get plenty of hits. (No obvious duplicates on this site; this isn't really a Unix problem, anyway.) Commented Jun 18, 2017 at 13:54

3 Answers 3

0

You do not explain how the text you want should be identified.

If you just want the text, try with links instead:

#! /bin/bash echo "There you go:" sleep 3s links -dump "http://tmg.xunta.gal/consulta-tarxeta?blah_blah_blah&numero=$1" echo "Thanks!" 

If the line identifier is "infoContido" this might be the solution:

#! /bin/bash echo "There you go:" sleep 3s curl "http://tmg.xunta.gal/consulta-tarxeta?blah_blah_blah&numero=$1" | grep infoContido | cut -d\> -f2 echo "Thanks!" 
0

If the class of the div object you want will always be infoContido, you can use pup with a command like:

curl "http://tmg.xunta.gal/consulta-tarxeta?blah_blah_blah&numero=$1" | pup "div.infoContido" 
-2

Send the curl output to a new file. For example in the above mentioned case.

curl "http://tmg.xunta.gal/consulta-tarxeta?blah_blah_blah&numero=$1" | tee -a /var/tmp/mycurl 

Now all you need to do is grep your line from this new file.

cat /var/tmp/mycurl | grep Non hai ningunha recarga para o número de tarxeta introducido. 
2
  • Until something puts a line break between any of the words, which has no effect on the HTML but completely breaks your "parser". Commented Jun 19, 2017 at 17:52
  • With your logic, a simple echo would be enough .... echo "Non hai ningunha recarga para o número de tarxeta introducido." OP wants to "grep" what is enclosed between <p> and </p>. Does not know what text it is inside those tags.... Commented Jun 19, 2017 at 20:22

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.