For a bunch of URLs I'd like to extract a YEAR, f.e. 2022, which appears between these tags, f.e.:
<td class="text" style="border-right:0;"> 2022 </td> How to store '2022' locally, without storing the webpage here?
For a bunch of URLs I'd like to extract a YEAR, f.e. 2022, which appears between these tags, f.e.:
<td class="text" style="border-right:0;"> 2022 </td> How to store '2022' locally, without storing the webpage here?
This is a simple sample code. It gives the idea, you may update the sed -n according to your new search key.
index.html (that u mentioned in the post):
<td class="text" style="border-right:0;">2022</td> Sample Code with Bash:
#!/bin/bash urls=( "file:/home/<username>/index.html" # "URL2" ) extract_year() { url="$1" html_content=$(curl -s "$url") year=$(echo "$html_content" | sed -n 's/.*<td class="text" style="border-right:0;">\([0-9]\{4\}\)<\/td>.*/\1/p' | head -1) # store the year in a file if [ -n "$year" ]; then echo "$year" >> years.txt else echo "Year not found for $url" fi } for url in "${urls[@]}"; do extract_year "$url" done Output: