Search Engines – what are they good for?

Absolutely nothing in some cases!

One increasingly well known fact is that paywalls – those barriers that some websites put in the way to try and get you to hand over money or register to gain access – are these days, very porous.

They’re designed that way so that they can charge for access, either by charging or selling your details, but do not close themselves off to the rest of the internet i.e. search engines.

They still want search engines to scan their site and therefore have to open themselves up to them whilst blocking anyone else. Google’s T&C’s even state that websites must not block access when referred from Google search results if they want to appear in those results.

That’s the loophole that apps like BreakthePaywall! (http://www.breakthepaywall.com) use to circumvent the paywalls – they impersonate a search engine.

For example: if you go to a site and notice an article that, when you click on it, displays a paywall blocking page or popup then if you copy that same article heading and paste it into a search engine then, more often than not, the link in the search result list to the same article, when clicked, will result in the article being displayed without any problems.

That’s because the website is seeing that you have come from a search engines results page – using the ‘Referer’ header that is sent by the linking website.

This is how paywalled websites keep themselves at the top of search results but also block people that have gone to the website directly and/or click on subsequent articles. This is usaully referred to as opening themselves up a little bit in order to gain search engine traction but still make money out of the ignorant. It must be rather galling for loyal customers who stump up annual subscription fees to discover that others can get it for free – hey, that’s the mad world of the internet!

Another technical aspect is that websites need to allow the search engine robots unfettered access. These robots go out onto the internet and scan websites and report back to the databases that collate the results. They have their own special names that websites can pick up and therefore allow access.

So, search engines are useful after all, but interestingly, not all of them…

Here’s a list of the top search engines in the world according to Wikipedia:

Search engine Market share in June 2014
Google 68.69%

 

Baidu 17.17%

 

Yahoo! 6.74%

 

Bing 6.22%

 

Excite 0.22%

 

Ask 0.13%

 

AOL 0.13%

 

https://en.wikipedia.org/wiki/Web_search_engine

As you would expect Google is way out in front. Experimenting with the method of copying a header, paste into search engine and see if the resulting link to the paywalled website works it’s not surprising that using Google as the search engine always works. Most websites want to be on Google’s search results so they allow full access from Google. But what about the other search engines:

Baidu – is of course the Chinese based search engine – try a paywalled western based news organisation  website on there and you don’t get anywhere. Consequently they don’t appear anywhere near the top of the search results (you have to put in the website domain to get anywhere near). Yes, it seems like some websites are not interested in a few billion chinese customers.

Yahoo next – again, no joy at all. It turns out that Yahoo doesn’t actually do its own search results. They currently use Microsoft Bing’s search results – they used to use Google up until 2004. So, it seems some sites have excluded Yahoo from their allow lists, which seems strange as it is the 3rd largest search provider.

Microsoft Bing – works!

Excite, Ask, AOL and any other smaller search engine – DuckDuckGo uses Bing as well as other sources for its search results – do not work, or rather, are ignored by paywalled websites.

So paywalled websites want to expose themselves and effectively give free access but don’t want to do it for everybody – maybe they are just as ignorant of what they are doing as their customers.

But it also highlights the fact that for non-Chinese internet users at least there is really only two search engines – Google and Bing. What a stitch up!

Advertisements

Author: James

IT Manager - Network, Web coding, MS SQL and Online Mapping expert

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s