Building a scraper to search for ga.js references

While Google Classic Analytics reaches its end, still a lot of websites are using the ga.js trackingcode to send data to their Google Analytics account.

This ga.js trackingcode will stop working in the future. We always advise to upgrade to the analytics.js trackingmethod, as used by Google Universal Analytics.

 

How can I find out which websites are still using the old ga.js code?

When you're working for a big company or agency, you'll face the problem of many (hidden/forgotten) web properties that need to be checked and upgraded.

During my research for our talk at the Google Analytics User Conference in Brussels, I discovered +700 websites that needed to be checked.

 

 

Building a ga.js scraper using Google Docs

I always like to think how MacGyver would use Google Docs instead of a paperclip when trying to save the world.

One of the hidden functions in Google Docs is the importXML function, which you can use to build a webscraper.

You could scrape the Google SERP's (hello SEO people), your competitor's website or your own. Endless possibilities!

809078-macgyver.jpg

 

Let's go MacGyver!

You can combine the importXML function with Xpath code to loop through the website code you're processing in Google docs.

 

For example:

If you want to select all the H2 titles from a web page, you'll have to use the following code in a Google spreadsheet:

=IMPORTXML("http://blog.intracto.com", "//h2")

xpath_h2.jpg

 

Knowing this, it's only a 2 minute job to create a scraper that will check if the Google Classic Analytics ga.js codesnippet can be found on a website.

=IMPORTXML("http://www.intracto.com", "//script[contains(.,'ga.js')]")

Now, create a Google sheet with this formula and you can check a long list of domains at once.

ga.js_scraper.jpg

  

Remarks: 

1. Google spreadsheet has a limit of 50 ImportXML calls. You can always duplicate your spreadsheet tab and split up your long list into multiple sheets.

2. Google spreadsheet uses caching for about 2 hours when reading a url using importXML.

 the-possibilities-theyre.jpg

 

 

Meer weten? Contacteer ons!