1

I'm using the following code to Validate URLs

private Boolean CheckURL(string url) { using (MyClient myclient = new MyClient()) { try { myclient.HeadOnly = true; // fine, no content downloaded string s1 = myclient.DownloadString(url); statusCode = null; return true; } catch (WebException error) { if (error.Response != null) { HttpStatusCode scode = ((HttpWebResponse)error.Response).StatusCode; if (scode != null) { statusCode = scode.ToString(); } } else { statusCode = "Unknown Error"; } return false; } } } class MyClient : WebClient { public bool HeadOnly { get; set; } protected override WebRequest GetWebRequest(Uri address) { WebRequest req = base.GetWebRequest(address); // req.Timeout = 3000; if (HeadOnly && req.Method == "GET") { req.Method = "HEAD"; } return req; } } 

This works fine for most of the cases,but for some URLs it returns False Positive Results. For Valid URLs(when I browse using chrome) the method Returns Not Found. Also for some URLs this method takes too much time to process.

What I'm doing wrong? Please advice..

UPDATE:

I'm checking the URLs from Multiple threads using Parallel,does this cause the problem?

 public void StartThreads() { Parallel.ForEach(urllist, ProcessUrl); } private void ProcessUrl(string url) { Boolean valid = CheckURL(url); this.Invoke((MethodInvoker)delegate() { if (valid) { //URL is Valid } else { //URL is Invalid } }); } 

I'm starting the threads from a BackGround Worker to prevent UI Freezing

 private void worker_DoWork(object sender, DoWorkEventArgs e) { StartThreads(); } 
10
  • 1
    A valid url returning "not found" would be a false negative... but: this is impossible to diagnose without a lot more context and ideally data from an http capture (fiddler, wireshark, etc) to see what actually got sent. Note that if you're running too many requests against the same provider, they might choose to block you in any way they choose as a defence against DOS attacks (or just misbehaving crawlers); it could also be a difference of opinion on how URLs are formed - especially with unicode, what you see in a browser is not always the actual URL Commented Jul 19, 2017 at 8:57
  • @MarcGravell Thanks for your response.I'm running the code from a Desktop Application so IP Blocking(ie:multiple users >same resource ) is not an issue.When i type the URL in the Browser i can view the contents of the Page Without any issue.. Commented Jul 19, 2017 at 9:17
  • Instead of pasing the url directly as string, try passing HttpUtility.UrlEncode(url) instead. Commented Jul 19, 2017 at 9:21
  • "so IP Blocking(ie:multiple users >same resource ) is not an issue" - um, desktop applications aren't immune from that, and ... well, I can only speak for myself, but : I'd still automatically block you if you if you were hitting us too often - although I might use 429 if we felt kind... or more likely: 418 Commented Jul 19, 2017 at 9:22
  • @LocEngineer Do you mean myclient.DownloadString(HttpUtility.UrlEncode(url))? Commented Jul 19, 2017 at 9:26

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.