2

I am writing macro to extract the href value from a website, example here is to extract the value: '/listedco/listconews/SEHK/2015/0429/LTN201504291355_C.pdf' from the html code below. The href is one of the attributes of the html tag 'a', I have add the code getElementbyTagName'a' but it did not work, my question is how to extract that href value to column L. Anyone could help? Thanks in advance!

 <a id="ctl00_gvMain_ctl03_hlTitle" class="news" href="/listedco/listconews/SEHK/2015/0429/LTN201504291355_C.pdf" target="_blank">二零一四年年報</a> 

Sub Download_From_HKEX() Dim internetdata As Object Dim div_result As Object Dim header_links As Object Dim link As Object Dim URL As String Dim IE As Object Dim i As Object Dim ieDoc As Object Dim selectItems As Variant Dim h As Variant Dim LocalFileName As String Dim B As Boolean Dim ErrorText As String Dim x As Variant 'Key Ratios For x = 1 To 1579 Set IE = New InternetExplorerMedium IE.Visible = True URL = "http://www.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main_c.aspx" IE.navigate URL Do DoEvents Loop Until IE.readyState = 4 Application.Wait (Now + TimeValue("0:00:05")) Call IE.Document.getElementById("ctl00_txt_stock_code").setAttribute("value", Worksheets("Stocks").Cells(x, 1).Value) Set selectItems = IE.Document.getElementsByName("ctl00$sel_tier_1") For Each i In selectItems i.Value = "4" i.FireEvent ("onchange") Next i Set selectItems = IE.Document.getElementsByName("ctl00$sel_tier_2") For Each i In selectItems i.Value = "159" i.FireEvent ("onchange") Next i Set selectItems = IE.Document.getElementsByName("ctl00$sel_DateOfReleaseFrom_d") For Each i In selectItems i.Value = "01" i.FireEvent ("onchange") Next i Set selectItems = IE.Document.getElementsByName("ctl00$sel_DateOfReleaseFrom_m") For Each i In selectItems i.Value = "04" i.FireEvent ("onchange") Next i Set selectItems = IE.Document.getElementsByName("ctl00$sel_DateOfReleaseFrom_y") For Each i In selectItems i.Value = "1999" i.FireEvent ("onchange") Next i Application.Wait (Now + TimeValue("0:00:02")) Set ieDoc = IE.Document With ieDoc.forms(0) Call IE.Document.parentWindow.execScript("document.forms[0].submit()", "JavaScript") .submit End With Application.Wait (Now + TimeValue("0:00:03")) 'Start here to extract the href value. Set internetdata = IE.Document Set div_result = internetdata.getElementById("ctl00_gvMain_ctl03_hlTitle") Set header_links = div_result.getElementsByTagName("a") For Each h In header_links Set link = h.ChildNodes.Item(0) Worksheets("Stocks").Cells(Range("L" & Rows.Count).End(xlUp).Row + 1, 12) = link.href Next Next x End Sub 
6
  • Is there anyone could help? Commented Sep 20, 2015 at 12:47
  • What is the problem you have encountered? It's not clear from your question. Edit your question to elaborate. Commented Sep 20, 2015 at 20:18
  • div_result.getElementsByClassName("a") >> div_result.getElementsByTagName("a") Commented Sep 21, 2015 at 0:26
  • @TimWilliams Hi Tim, thanks for your answer but sorry it is just a typo. I did tried getElementsByTagName, as well as getElementsByClassName("news"), they did not work, a solution may be get the attribute "href" after getting the TagName"a", since the "href" is one of the attributes of tag "a". But I don't know the code to get attributes, could you help? Commented Sep 21, 2015 at 1:02
  • Your anchor element with target href has id="ctl00_gvMain_ctl03_hlTitle, so you can retrirve url IE.document.getElementById("ctl00_gvMain_ctl03_hlTitle").href or simply IE.document.ctl00_gvMain_ctl03_hlTitle.href. Also try to retrieve the data you need via XHR instead of IE. Commented Sep 21, 2015 at 2:55

3 Answers 3

3
For Each h In header_links Worksheets("Stocks").Cells(Range("L" & Rows.Count).End(xlUp).Row + 1, 12) = h.href Next 

EDIT: The id attribute is supposed to be unique in the document: there should only be a single element with any given id. So

IE.Document.getElementById("ctl00_gvMain_ctl03_hlTitle").href 

should work.

Sign up to request clarification or add additional context in comments.

2 Comments

@Tim Williams, Hi Tim, your code works for extracting all the href links on the website but what I want to extract is only those with id="ctl00_gvMain_ctl03_hlTitle", the href attribute is one of those attributes in the tag <a>, I think something like getAttribute would work but I don't know the coding for that, could you help again? Thanks for your patience!
@Tim Hi Tim, I am currently using your firs answer which is h.href method, it works perfectly. But I did not test your updated answer. However, thanks a lot!
1
WB.Document.GetElementById("ctl00_gvMain_ctl04_hlTitle").GetAttribute("href").ToString 

Comments

1

Use a CSS selector to get the element then access its href attribute.

#ctl00_gvMain_ctl03_hlTitle 

The above is element with id ctl00_gvMain_ctl03_hlTitle. "#" means id.

Debug.Print IE.document.querySelector("#ctl00_gvMain_ctl03_hlTitle").href 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.