Back in October 2016, I wrote about how you can use a Python script to determine whether a page has been indexed by Google in the SERPs. As it turns out, Google’s webmaster trends analyst Gary Illyes wasn’t too happy with the technique that was being utilized by the script, so I cannot endorse this method:
I'll just leave this here: https://t.co/NO4s6JbSfJ https://t.co/qRhIGXcG7g — Gary Illyes ᕕ( ᐛ )ᕗ (@methode) October 5, 2016
Shortly after, Sean Malseed and his team at Greenlane SEO built a similar tool based in Google Sheets (among other awesome tools like InfiniteSuggest), and Googler John Mueller expressed reservations:
@greenlaneseo Is this a blackhat tool or does it abide by the webmaster guidelines & robots.txt? (just curious) — John ☆.o(≧▽≦)o.☆ (@JohnMu) December 14, 2016
How could I learn which pages weren’t indexed by Google, and do it in a way that didn’t break Google’s rules? Google doesn’t indicate whether a page has been indexed in Google Search Console, won’t let us scrape search results to get the answer and isn’t keen on indirectly getting the answer from an undocumented API. (That was Sean Malseed’s clever solution and scraping workaround.) Let’s explore some solutions.
Comments