Web scrapping with Golang

Web scrapping with Golang

Web scrapping is a technic to parse HTML output of website. Most of the online bots are based on same technic to get required information about particular website or page.

Using XML parser we can parse HTML page and get the required information. However, jquery selector are best to parse HTML page. So, in this tutorial we will be using Jquery library in Golang to parse the HTML doc.

Project Setup and dependencies

As mention above, we will be using Jquery library as a parser. So go get the library using following command

go get github.com/PuerkitoBio/goquery

Create a file webscraper.go and open it in any of your favorite text editor.

Web Scraper code to get post from website

package main

import (
	// import standard libraries
	"fmt"

	// import third party libraries
	"github.com/PuerkitoBio/goquery"
)

func postScrape() {
	doc, err := goquery.NewDocument("http://code2succeed.com")
	if err != nil {
		log.Fatal(err)
	}

	// use CSS selector found with the browser inspector
	// for each, use index and item
	doc.Find("#main article .entry-title").Each(func(index int, item *goquery.Selection) {
		title := item.Text()
		linkTag := item.Find("a")
		link, _ := linkTag.Attr("href")
		fmt.Printf("Post #%d: %s - %s\n", index, title, link)
	})
}

func main() {
	postScrape()
}

Output

Post #0:
                                Getting started with ReactJs
                         - http://www.code2succeed.com/getting-started-with-reactjs/
Post #1:
                                Intro to React
                         - http://www.code2succeed.com/intro-to-react/
Post #2:
                                Caesar Decryption of string using javascript
                         - http://www.code2succeed.com/caesar-decryption-of-string-using-javascript/
Post #3:
                                Caesar encryption of string using JavaScript
                         - http://www.code2succeed.com/caesar-encryption-of-string-using-javascript/

Stay tuned for more updates and tutorials !!!

Leave a Reply

Your email address will not be published. Required fields are marked *