Attr
and Text
.After grokking the concept its super easy to use and get fast results.
There are really only two things to understand:
1) The traversing functions Find and FindAll:
// starts to search the node tree beginning at node article (*html.Node)titlenode, ok = scrape.Find(article,
func(n *html.Node) bool {
if n.DataAtom == atom.Td && scrape.Attr(n, "class") == "title" {
return true
}
return false
})
if !ok {
... do some error
}
2) The matcher function which controls which nodes will be returned:
A matcher function is an argument to Find or Findall, gets a Node as an input param and returns ether false or true. If true the node n is included in the result of FindAll. In the case of Find a true result from the matcher causes Find to stop and return the Node n.
// define a matcher
matcher := func(n *html.Node) bool {
if n.DataAtom == atom.Tr && n.Parent != nil && n.Parent.DataAtom == atom.Tbody {
matched := scrape.Attr(n, "class") == "athing"
return matched
}
return false
}
The full example code for this blog entry can be found at github.com/kimxilxyong/intogooglego/hackernewsCrawlerScrape
hello
ReplyDeletei am new to golang, i want to scrap result data from result websites. How i do it in golang?