Is Google Able To Detect Hacked Webpages Beyond Home Page?
In the month of May, Google did 39 changes to their search engine algorithms to improve the search results quality. The number of tweaks in a month were down from more than 50 in April to 39 in May but still, some of the tweaks are worth noticing. The first change that Google has listed in their official blog is Deeper detection of hacked pages.
Deeper detection of hacked pages: [launch codename "GPGB", project codename "Page Quality"] For some time now Google has been detecting defaced content on hacked pages and presenting a notice on search results reading, “This site may be compromised.” In the past, this algorithm has focused exclusively on homepages, but now we’ve noticed hacking incidents are growing more common on deeper pages on particular sites, so we’re expanding to these deeper pages.
Google warns its users when they try to navigate to a compromised website, but these warnings were focused exclusively to homepages. So, hackers started using deeper webpages to use their compromised content to get past Google’s security warnings.
After a recent update, Google states that they are detecting deeper webpages to know their security status but, we are not sure if this update has hit throughout the globe. If yes, they still need to improve in this regard because we have come up across some scenarios where this trick by hackers still works.
Example Where Hacked Webpages Have Not Been Detected by Google
My friend was using Google Search for the query “Shipping cost for Beijing to Bangalore” and he noticed some weird search results. He could see similar results for shipping cost queries even after changing the names of cities to different ones.
Google Search Results for the term – Shipping Cost From Beijing To Bangalore.
As you can notice in the above screenshot, most of the organic results contained the target term in them, which is nowhere related to shipping or transport. I was in a context that these results might be of junk websites. But when I dug deeper in this matter, I found that these websites were not junk. In fact, many of them were of .edu and .org top level domains. I did extended tests on NFDA, OECS and PodioBooks websites.
Bing Search Results for the term – Shipping Cost From Beijing To Bangalore.
I repeated the same search query in Bing and DuckDuckGo to find that the results were proper this time around.
How Hackers Might Have Used Cloaking
The above scenario is an example of Cloaking which is a trick in which the content presented to the search engine spider is different from that presented to the user’s web browser. In the below example, you can see an Apache Rewrite Rule to show different content based on the User-Agent. It could be one of the tricks hackers could have used. Similar to Apache Web Server, this type of cloaking can be set for almost every web servers by hacking their configuration files or inserting scripts in those webpages.
Hackers have used deeper webpages of these websites or inserted new webpages in them to pump up the target term in Google Search results but blocked other search engines. It is clear that, the content viewed by Google Bot and Google Chrome user agents are different. Cloaking could be done in various ways based on a single or multiple parameters such as User-Agent, IP Address, Screen Size and so on, to show different content matching with those parameters.
How Can Webmasters Identify If Their Website Is Being Cloaked?
Cloaked Webpage With Web Browser User Agent.
Cloaked Webpage With Google Search Bot User Agent.
Although it is very hard to find if a website is being cloaked based on IP address, it can be identified if they are being cloaked based on User Agents. You can use web browser plugins such as User Agent Switcher for Mozilla Firefox or User-Agent Switcher for Google Chrome to change the user agent parameter and check if their are any changes in the webpages.
In the above screenshots, you can notice the changes in the same webpage with different User Agents where selecting the googlebot2.1 shows you the content containing the target term, which was inserted by the hackers into the webpage.
Google needs to refine their algorithms with better pattern matching so that they can identify deeper hacked or cloaked webpages.