0

First of all, I am but a lowly web-programmer so have very little experience with actual programming. I have been given a list of 30,000 urls and I am not going to waste my time clicking each one to check if they are valid - is there a way to read through the text file that they are in and have a program check each line?

The code I currently have is in java as really that's all I know so if there's a better language again, please let me know. Here is what I have so far:

public class UrlCheck { public static void main(String[] args) throws IOException { URL url = new URL("http://www.google.com"); //Need to change this to make it read from text file try { InputStream inp = null; try { inp = url.openStream(); } catch (UnknownHostException ex) { System.out.println("Invalid"); } if (inp != null) { System.out.println("Valid"); } } catch (MalformedURLException exc) { exc.printStackTrace(); } } } 
1
  • how the urls distributed? can you post the sample .txt file? Commented Apr 10, 2014 at 8:19

4 Answers 4

2

First you read the file line by line using a BufferedReader and check each line. Below code should work. It is upto you to decide what to do when you encounter an invalid URL. You could just print it as I showed or write to another file.

import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; import java.io.InputStream; import java.net.MalformedURLException; import java.net.URL; import java.rmi.UnknownHostException; public class UrlCheck { public static void main(String[] args) throws IOException { BufferedReader br = new BufferedReader(new FileReader("_filename")); String line; while ((line = br.readLine()) != null) { if(checkUrl(line)) { System.out.println("URL " + line + " was OK"); } else { System.out.println("URL " + line + " was not VALID"); //handle error as you like } } br.close(); } private static boolean checkUrl(String pUrl) throws IOException { URL url = new URL(pUrl); //Need to change this to make it read from text file try { InputStream inp = null; try { inp = url.openStream(); } catch (UnknownHostException ex) { System.out.println("Invalid"); return false; } if (inp != null) { System.out.println("Valid"); return true; } } catch (MalformedURLException exc) { exc.printStackTrace(); return false; } return true; } } 

The checkUrl method can be simplified as below as well

private static boolean checkUrl(String pUrl) { URL url = null; InputStream inp = null; try { url = new URL(pUrl); inp = url.openStream(); return inp != null; } catch (IOException e) { e.printStackTrace(); return false; } finally { try { if (inp != null) { inp.close(); } } catch (IOException e) { e.printStackTrace(); } } } 
Sign up to request clarification or add additional context in comments.

1 Comment

glad that it helped (you could also select it as right answer if you are satisfied :-))
0

You could just use httpURLConnection. If it is not valid you won't get anything back.

HttpURLConnection connection = null; try{ URL myurl = new URL("http://www.myURL.com"); connection = (HttpURLConnection) myurl.openConnection(); //Set request to header to reduce load connection.setRequestMethod("HEAD"); int code = connection.getResponseCode(); System.out.println("" + code); } catch { //Handle invalid URL } 

3 Comments

My issue isn't checking if the url is valid or not, it's getting 30,000 urls into where the url is at the moment. The code I currently have returns if the url is valid or not well enough.
URL url = new URL (String containing url);
I am not getting "it's getting 30,000 urls into where the url is at the moment" this line, what do you mean, can you please explain better.
0

I am unsure of your experience but a multi-threaded solution is possible here. As you read through the text file store the urls in a thread-safe structure and allow a number of threads to go and attempt to open these connections. This will make for a more efficient solution as it may take a while to test the 30000 urls while you are reading them in.

Check out a producer-consumer example if you are unsure:

http://www.journaldev.com/1034/java-blockingqueue-example-implementing-producer-consumer-problem

Comments

0
public class UrlCheck { public static void main(String[] args) { try { URL url = new URL("http://www.google.com"); //Open the Http connection HttpURLConnection connection = (HttpURLConnection) url.openConnection(); //Get the http response code int responceCode = connection.getResponseCode(); if (responceCode == HttpURLConnection.HTTP_OK) //if the http response code is 200 OK so the url is valid { System.out.println("Valid"); } else //Else the url is not valid { System.out.println("Invalid"); } } catch (MalformedURLException ex) { System.out.println("Invalid"); } catch (IOException ex) { System.out.println("Invalid"); } } } 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.