Google Competition & Google "Authority Sets"

71

By ezseonews

A bit of Google History

Back when the Internet was born, it was easy to get high rankings in the search engines. You just put the phrase you wanted to rank for several times on the page and spammed the hell out of your Meta tags. Bingo, a number one ranking - at least until the next guy added a couple more of the keyword phrases to his Meta tags J.

To combat this on-page manipulation, the search engines had to change, and change they did. Meta tags began to be ignored, and the on-page factors became secondary to inbound links, Page Rank and Page/Site “Trust”.

As the momentum shifted to off-page factors, webmasters started looking for ways to beat the system. It was quite easy – get as many links as possible to your site, increase your Page Rank, and bingo, you are back at the top. The search engines have been evolving rapidly, responding to the evolutionary factors created by the webmasters in their attempts to “manipulate” their way to the top.

Today it would seem that the on-page factors are on the rise again. For this to happen, the search engines would have to be very certain they could get the wheat from the chaff. They would have to be confident that they could determine exactly which pages in their index provided quality information on any given topic..... And I believe they can.

Not only that, but I believe they are getting better and better at determining the quality of a piece of content. To me, that means on-page factors are going to return as a major force in the ranking algorithm again, and in my view, this is exactly the way it should be – content being judged on the merits of what it says.

But how on earth can Google take a document and determine not only the topic, but also the quality? After all, to do that, Google’s algorithm would have to be thinking like a human.

As I discussed some ideas with Michael Campbell, he reminded me of something important - something that could provide the answer to that last question.

“People may not remember that Google didn't create AdSense. They bought the semantic technology from Applied Semantics back in 2003”.

You can read the press release here:

http://www.google.com/press/pressrel/applied.html

Pay particular interest to this paragraph:

"Applied Semantics is a proven innovator in semantic text processing and online advertising," said Sergey Brin, Google's co-founder and president of Technology. "This acquisition will enable Google to create new technologies that make online advertising more useful to users, publishers, and advertisers alike."

The press release goes on to say:

“Applied Semantics' products are based on its patented CIRCA technology, which understands, organizes, and extracts knowledge from websites and information repositories in a way that mimics human thought and enables more effective information retrieval”

Back in 2004/2005 I started looking at the idea of themeing.

I certainly wasn’t the first as Michael Campbell had written a white paper in 2000 on the subject of how search engines were able to determine the theme of a web page based on the words on the page. If they could do that in the year 2000, then the acquisition of Applied Semantics makes perfect sense – it was an evolutionary step towards a better search engine.

In 2004/2005, SEOs were heavily promoting Page Rank as the answer to all things Google, but I wasn’t convinced. While I could see Page Rank as an important part of the equation (and still do, despite other SEOs now saying Page Rank is dead), I think that it’s something that can be far too easily manipulated (and Google know this), and is therefore not something that is the be-all and end-all of search engine rankings.

The search engines want to show the BEST content for any search phrase, and that means they have to pay more attention to the factors that cannot be manipulated – the quality of the content. Of course the quality can be manipulated – you can make it better! However, this type of manipulation the search engines approve of.

Over the years I have carried out a lot of research, analyzing thousands of pages in the Google index to see if I could find evidence that the Applied Semantics technology (or some technology that evolved from Applied Semantics) was being used in the ranking algorithm.

The results of my research were the inspiration for my “Creating Fat Content Course” (which began life in 2006) with the included Fat Content Creator software (the predecessor of Web Content Studio).

Over the years, not everyone has been convinced by my research and those individuals are certainly entitled to their opinions. However, from the research I have done, I still believe that:

“With all other things being equal, the page that is the best themed for a search term, will rank higher than the less well themed pages”.

I think that statement was backed up by a report I released in July 2008 called the “Gestational Diabetes Report”.

It’s funny how things change. You would once have been burnt at the stake for suggesting the earth was round, yet today we all know it’s true.

Today in 2010, I think you’ll find a lot more people have now come to accept the conclusions I came to 5 years ago. These are the same conclusions that some SEOs would rather keep to themselves.

You only have to look at the technology that Google have acquired over the years (like Applied Semantics) to know that page context is determined by the words on the page. Haven’t you ever wondered how Google know the most relevant Adsense ads to show on your pages? If they can do that, then wouldn’t it be a waste if they did not use a similar technology in their ranking algorithm?

In the Gestational Diabetes report I mentioned earlier, I analyzed around 30 pages. Here, I’ll be analyzing over 300.


Latent Semantic Indexing

There is one thing I should mention before we begin and it relates to Latent Semantic Indexing. If you have heard of the term, great, if not, don’t let it worry you.

There are a lot of arguments over the term Latent Semantic Indexing (LSI) and its use in search engine technology, but many of the arguments are a result of not everyone being on the same page.

Not long ago, a well respected search engine guru came out and said that search engines did not use LSI (you may have seen the video). This lead to a lot of people questioning whether search engines actually used themes to help determine the topic of a web page.

Let me put your mind at rest.

The LSI that this guy talked about was the real, true, original meaning of LSI.

If that is the meaning you want to attribute to LSI, then you can go on believing that search engines do not use LSI.

However, when many marketers talk about LSI, we are not referring to that mathematical equation (most of us don’t understand the equation anyway and have no intentions of trying to). What we are referring to is “themes”. The grouping of certain words on a page to help the search engines determine topic and relevancy of a page in response to a searchers query.

In the context of this report, if I ever refer to LSI, I am referring to the themeing of a page and how that theme can help a search engine determine the topic of a page.

NOTE: I spell themeing with an “e” before the “ing” bit to emphasize the theme. I know I should probably drop that extra “e”, but please humor me and give me this one extravagance J

Now, I know from some emails I have received that some people don’t believe that themeing a page is important. I think the evidence is clear enough to see, and I will show you plenty of evidence in this report. However, if you also think logically about this, Google needs to accurately determine the topic of a page if it is going to display Google Adsense on that page, correct?

If Google is not determining the theme from the words on the page, where is it getting the theme from?

As a final bit of evidence that themeing your content is important, head on over to Google and do a tilde search (a tilde search is using the “~” character immediately before your search term).

Here is a link to open a tilde search up for the phrase “dog training

Do you see how Google puts synonyms in bold?

Google Synonyms

Google Synonyms for the term Dog Training
See all 13 photos
Google Synonyms for the term Dog Training

As you can see, Google is aware of a number of words and phrases that are related to the term “dog training”.

Is it too far-fetched to believe that they are harnessing this knowledge to help them determine relevancy of a page?

Doesn’t it make sense that when Google is trying to find relevant pages in its index to show someone who has just searched for “dog training”, they will look for pages that contains words like:

dog, dogs, dog training, pet education, obedience, training, train, pet, canine obedience, dog trainers, pets, dog training classes, dog obedience, schools, trainers, canine, pet training, train pet dog, pet learning, tutorial dog, course, canine training, canine learning, dog trainer, train dogs, dog train, puppy, leash, certification, breeds, barking, socialization.

I want to show you the evidence that themeing is alive and kicking, and that writing well-themed content should be one of your top priorities as you build your own sites. Not only can it help you rank better for your chosen phrase, but it could also help you rank for hundreds of long-tail phrases you hadn’t even considered!

The Problem

There are billions of web pages out there on a myriad of topics. A search engine needs to be able to retrieve the most relevant pages for ANY search phrase that is typed in, and it needs to provide those pages FAST...

While there are a lot of pieces of the puzzle in achieving this, I want to concentrate on just one aspect – how does a search engine know what is relevant to a search and more importantly, how does it know which pages are the most relevant.

In a White Paper written by Michael Campbell in 2000 and entitled “Themes, Context and Topic How to "Theme Up" Your Web Site - Part 1”, Michael describes the measures that search engines have gone to in their attempts at providing relevant content to their searchers:

“... search engines tried everything to provide relevant results within a reasonably sized database, while cutting down on duplication and spammers.

They tried reducing the importance of META tags, stopped looking in all sorts of html tags filtered out invisible text and keyword repetition.

Then they added in link popularity, link quality (Page Rank), counting clicks, even length of visits with temporal tracking (Direct Hit), and yet the spammers kept on coming and the size of the web kept growing. They needed a way to store more pages, into the billions, and still maintain a high degree of relevancy on searches. This is where a new concept called "term vector databases" come in as a foundation, or building block for all these new technologies.”

NOTE: Members of Michael Campbell’s Vault can read the full document if they wish. Also as an aside, you’ll notice that this document title says “Part 1”. I asked Michael whether there was a Part 2, and he said “There is a part two. It evolved from a long article into Revenge of the Mininet in 2003“. Follow the Revenge link as this book is now free.

Before we go on, it’s important to realize that this document is a good few years old. In Internet time, you could say it is prehistoric. However, this document provides a glimpse of where the search engines were at during the “turn of the century”

The problem of finding the best, most relevant articles for any search term is an evolving battle against spammers who flood the web with poor content. For a search engine to be able to provide good quality results, it MUST be able to analyze a web page and know whether that page is good quality or not.

Different search engines have varying levels of success in achieving this. In my opinion, the most relevant search engine results come from Google – an opinion shared by millions of searchers around the world who have even created a new verb in our language – “to Google”.

Let’s take a closer look at the Google search results and see just how good they are at filtering the wheat from the chaff...

Contact Lens

The Contact Lens Example

In the screenshot I have Google setup to show me 100 results at a time. You can see I searched for Contact Lens and that Google returned the results in 0.17 seconds – pretty fast huh?

Google also tells me that it knows of around 29 million pages related to my search phrase “contact lens”.

How on earth does Google return the first 100 results so fast when there are 29 million other results that it could show??

Well, let’s take a closer look.

Scrolling to the bottom of the results page I see this:

It’s the first 10 pages of results.  If I click on the number 10, I should be able to view the results ranking between 900 – 1000. 

However, here is the screenshot of the 10th page.  Look specifically at the highlighted bit of the screenshot:

What’s going on? 

Google is actually only listing the top 811 pages for the term contact lens.   

We can get an answer if we scroll to the bottom of the page.  There Google tells us:

There you have it. Google has somehow managed to narrow down those 29 million pages into just 811 of the best, most relevant pages. Any page that was similar to these 811 documents was omitted.

They have done this to avoid showing you duplicate content.

At the same time, it probably makes their lives a lot easier having to only deal with 811 documents rather than 29 million!!

You could think of this as a duplicate content penalty, since if your content does not add anything new to the “chosen few pages”, it will be relegated to a lower division – a division that rarely gets to play with the big boys.

The fact that Google has narrowed down 29 million pages into just 811 should tell us that Google thinks these 811 are valuable and add something to the overall knowledge relating to the search phrase. But how does Google know this?

These 811 documents are what I call Google’s Authority Set for the term “contact lens”, and as you’ll see, no matter what search phrase you type in, Google has an Authority Set for each phrase.

So how does Google decide which documents to include in the authority set for a given search phrase?

In Michael Campbell’s white paper, he discussed something called the “vector database”. While I am not suggesting that the systems used in the year 2000 are still used today, it still gives us a valuable insight into how search engines could categorize your content. Michael gives a nice example:

“One fellow in the discussion boards noticed that he was top in the search engines for "exotic car engines". What happened was, his site talks about search engines, but he also had an ad on his site for exotic cars. The vector database examined his site and determined that it was not about search engines, but about exotic car engines.”

Hopefully search engines are better than 10 years ago at discerning the correct theme of a page, but I did find my own ezseonews.com site (about search engines and affiliate marketing) being found for a few curry related phrases after I used Balti food as an example in one section of a particular newsletter (despite not having any inbound links pointing to the page with link text related to Indian cuisine).

Since then, Google has dropped my newsletter page from the “Balti authority set”, which further goes to show that Google are on the ball, since this page really was not useful to the general curry enthusiast.

It is clear that the words on the page are very important.

Let’s take a topic that should contain a lot of niche-specific phrases.  A medical term is a good one to use as an illustration. 

The Astigmatism Example

If I type the word “astigmatism” into Google and hit the search button, at this very moment, Google finds 15 million documents on this condition.

If I scroll to the end of the search results, that number shrinks to just 566 documents that Google think are worthy to include.

To have any chance of coming up for the search term “astigmatism”, a page needs to be in that authority set of 566 documents.

Turning out attention to the words on the page, is it just a coincidence that all of the top 10 pages that rank for the term “astigmatism” have the following words within their content:

Astigmatism, contact, causes, vision, cornea, right, eye

That should not really be a surprise, but what if I told you that the following words appeared on at least 9 of the top 10 results for the search phrase Astigmatism:

Refractive, treatment, distance, special, surgery, person, health, lenses, light, focus, care, test

Also, the following words appear on 8 of the top 10 pages:

Information, sightedness, corneal, provide, glasses, retina, point, order, shape, cause, link, near, part, term, ear

.. And at least 7 of the top 10 pages had these words:

Procedure, corrected, problems, medical, correct, degree, treat, ratio, exam

I think it is safe to say that these top ranking pages are themed around a set of niche-specific keywords.

Here is the complete list of words that appear on at least 7 of the top 10 pages for the term astigmatism:

astigmatism, eye, ear, cornea, vision, lenses, correct, focus, contact, cause, shape, light, sightedness, ratio, treat, refractive, glasses, test, term, distance, health, surgery, treatment, information, part, provide, retina, corrected, person, degree, order, point, care, exam, near, causes, corneal, link, medical, procedure, right, special, problems

OK; so that is for a good article.

I had a ghost written article on the topic of astigmatism written for me. I checked it against this list of theme words and found that my ghost written article did not contain 27 of the words from my list above.

That’s 27 missing from my original list of 43 theme words.

If you read my ghost written article, it’s total rubbish.

Here’s a thought.

If only I had given my ghost writer a list of theme words to include, they would have found it a lot more difficult to write such drivel, and I would have probably gotten a half decent article in return.

So, what is there about my article that would make Google include it in the authority set for this topic?

Absolutely nothing, nada, diddly squat! So where does my article rank for the phrase astigmatism? Here’s a screenshot of the rank checker in Market Samuari:

Bet you are not surprised by that.... Google doesn’t think my page is important enough to include in the main index for this particular search phrase. Just in case you are wondering, it’s an old page as well, first published a few years ago.

If I do a site search in Google, my page is found, so Google does know about the page. It’s just buried with the other 1.5 million pages that just didn’t cut the mustard.

Ranking for a single word is quite difficult anyway and often not a very good plan of attack since even if you did rank in the top 10, you would attract very untargeted visitors. Maybe we should look at a longer phrase.

Let's look at another example.

The Insulin Example

How does insulin work has close to 3.7 million matches.  If we look at some theme words for this search phrase, 7 or more web pages in the top 10 of Google contain the following 29 words:

insulin, glucose, cell, body, blood, sugar, energy, pancreas, level, diabetes, fat, hormone, store, levels, eat, low, liver, form, test, age, gene, rate, muscle, sign, amount, release, site, acid, inject

If we look at the page ranking #1 from my own bloodsugardiabetic.com site, that page includes all 29 of these words and more.    Scrolling to the end of the search results, Google actually only lists 668 pages for that phrase.

The Sugar Spike Example

The search phrase “blood sugar spikes” has over 11 million matching pages.

However, scrolling to the end of the search results, Google actually only rates 592 articles.

So of 11 million pages, Google only finds 592 worth listing – at the time of writing this report, my blood sugar diabetic site is at #1.

What about the theme words for this search phrase.

Are there any?

Well 7 or more of the top 10 pages all have the following 38 words in them:

diabetic, foods, meals, after, food, test, diet, diabetes, disease, control, effect, eating, spikes, treat, spike, hour, care, help, body, own, information, medical, insulin, weight, health, levels, lower, sugar, rate, low, use, eat, glucose, cause, blood, high, ice, age

My page is currently first in Google and has all of these EXCEPT the word “hour”.

Before we go on, I am not for one minute suggesting that the only reason a page ranks well is because of the theme.

Google uses a lot of different metrics to come up with its search results.

We can see that if we analyze other pages in the top 10. e.g. the #6 ranked page in Google for the term “blood sugar spikes” does not include the following words:

spikes, sugar, foods, body, diet, food

.. and the first two of those words are actually in the search phrase, but that site does have a domain PR of 6, and that page is a PR 4 (whereas my domain PR is 3 and page PR 0).

My page is competing with the big boys, despite not having quite the authority of the other players in the top 10.


A Note About Page Rank

BTW; I have received personal emails from people saying that “Guru X says Page Rank is not important”.  

To answer this, let me ask you a couple of questions. 

Question: How is Page Rank is formed?

Answer: Inbound links

Question: What does a high Page Rank mean to Google?

Answer: It’s a measure of “Authority

Therefore, anyone that says Page Rank is not important is saying that inbound links and authority are not important, which they very clearly are. 

With a well themed article, you should be able to beat higher PR / authority pages simply by having better content, since Google is very aware that good content is naturally themed.

The Good, the Bad & the Themed

I thought it would be fun to extend the research we have seen so far by looking at how well themed the content is in the top, middle and bottom of the Google search results.

As you have seen, Google may know about 10 million pages on a topic, but you might only get to see 500 of those because Google values those 500 above the other 10 million pages in the index. 

If the idea of themeing is correct, I would expect to see that all 500 in the authority set were themed to a degree (because for Google to value them, most should be good content), though obviously I would expect the theme to drop off as we moved down from the top 10, towards the 490 – 500 range. 

This isn’t an exact science because as we have seen, pages can outrank others based on authority of the site and other factors that Google keeps close to its chest.  However, we can work with several sites in each group and take an average. 

Let’s do it....


Analysis of Pages Ranking for the Term Astigmatism

Remember the Astigmatism example mentioned earlier?

Let’s use that.

As I check today, there are 442 results in the main Google index. For the test, I want to take several “batches” of URLs, some from the top, some lower down, some at the end, etc.

Here are the groups of URLs I chose for the test:

Top 10 - In the test I picked 9 of the top 10 results for this test, excluding position #2 because it was a second listing from the site in top slot.

Around the 100 position – I selected URLs in position 101 – 110 inclusive

Around the 200 position – I selected URLs 201 – 210 excluding 203 which was a PDF file.

Around the 300 position – I selected URLs 301 – 310 excluding 309 which was a PDF file.

Around the 400 position – I selected URLs 401 – 410 excluding 409 which was a second entry from the domain in position 408.


OK; drum roll please....

Here are the results of the theme analysis carried out by Web Content Studio.

Remember the numbers in the table are averages rounded to the nearest 0.1.


NOTE:


1.       As we move down the rankings towards the bottom of the “authority set, the Quality theme score awarded by Web Content Studio drops.  It’s not exact, but the trend is there. 

2.       Even the web pages that rank at the bottom of the “chosen few” have a good level of themeing including an average of about 50% + of the theme words we identified earlier.  No wonder my astigmatism ghost written article was not selected to be in the authority set – it only had 37% of the important theme words in it.

We can also look at a couple of other factors:

Be aware that while the average number of words per page looks very large, this is not equivalent to the number of words in the article. The words on the page will include navigation bars and other text.

I’ll leave you to draw your own conclusions from this table.

This example used a single word, “astigmatism”, and trying to rank for single words is much more difficult. Common sense tells me that if we choose a longer phrase, true competition will be less, so we’ll see less of this dramatic themeing.

The Diabetic Alert Dogs Example

The phrase I have chosen is “diabetic alert dogs”, and this search term has 66,500 known pages in Google, with 701 making up the authority set.

For this search phrase, there were less theme phrases to choose from.  In fact, I had to rely on just 20 theme words which were found on 6 or more of the top 10 pages.

Here is the summary table for that phrase, with all data squashed into a single table.

NOTE: Averages we based on 10 pages in each section but some pages were skipped if they were either PDFs or a site had a second listing in that section of results.  That is why some of the positions are the 11th place in that set.

I think that the % of chosen theme words used is a telling metric in the rankings of these pages. 

The High Blood Pressure Example

Let’s look at one final table for a really competitive search phrase – high blood pressure.

Google reports over 21 million pages for this term, although it only rates 691 of these pages.

I set the Keyword Spider going in Web Content Studio and it came back with a list of 758 potential theme words.  A quick sort through these and I had my list narrowed down to 157.

I then ran a report in Web Content Studio to compare those 157 words against the top 10 in Google.  When that came back with the results, I got Web Content Studio to delete any theme words that did not appear in at least 8 out of the top 10 results in Google, and so my theme word list was cut to just 38 theme words.

Here is the table of results based on those 38 in-demand theme words.


OK, what can we conclude from this table?

Well, look at the theme score. For this high competition phrase, theme scores range from 100 down to 71.6% on average. That is much higher than we have seen so far, but don’t forget that this is a much more competitive phrase than we have seen so far (in terms of pages Google knows about). We would therefore expect it to be more difficult to be included in the authority set for this phrase. I would suggest here that Google is selecting the better themed pages to include in the authority set, and because of the higher number pages competing for this phrase, the average theme score is therefore higher than we have seen so far.

Look also at the % chosen theme words used.

Even the worst ranking pages in this group of pages (that Google has chosen to represent the best information on this topic) include 65% of the important theme words or higher.

You can draw your own conclusions from this, but my own conclusion is that it appears Google is making an initial selection based on how well themed an article is.

For less competitive phrases, there will be fewer articles competing and therefore not as many articles with a high percentage of important theme words included.

As competition for a phrase increases, there will be more articles competing, and therefore more articles that potentially include more of the important theme words.

Make sense?

I would therefore suggest that if you write your articles based around a core set of relevant theme words, you are more likely to get noticed by Google and included in the authority set for your chosen phrases.

Where do you get the relevant theme words from? Well why not let Google tell you?

Web Content Studio can analyze the top ranking pages and return suggestions for important theme words. Once you have made your selection from the suggestions, get Web Content Studio to check them against the top 10 in Google. That way you can be sure that the theme words you are targeting are the theme words that are most commonly found on the top 10 of Google. You can see this process in this tutorial I created for Web Content Studio owners.

Finding theme words for any article

If you theme your content, you don't need to optimize for the long tail

In the past, when I was setting up a new site, I would tend to write articles that targeted the long-tail phrases. The long-tail is something I am sure you have heard of, but if not, just be aware that these are the phrases that have more words in them, and are more specific. They also tend to have lower competition, and are therefore easier to rank for. The downside of the long-tail phrases is that they also tend to have fewer searches.

How I used to target the long-tail

To give you an example, if I was building a golfing site, I might want to have a section to sell Ping golf clubs. Rather than target “ping putter”, “ping driver”, “ping woods”, etc, I would look at long-tail phrases like this:

ping piper g 2 putter

ping zing 2 putter specs

ping g2i my day putter us golf

ping custom made golf clubs

used ping karsten i golf clubs

dot used ping eye2 golf clubs

ping golf clubs s 59 irons

ping golf isi becu irons & silver dot

left hand ping copper golf irons

used ping golf club driver

ping darby f isoforce putter

collegiate putter cover golf free shipping

See how these phrases are related to Ping clubs, but they have more words per phrase? These are long-tail phrases that I could easily rank in the top 10 for, but I wouldn’t get too much traffic because some of these may only get one or two searches a day.


Why I no longer target the long tail

OK, with the long-tail understood, what do I mean by the title of this section - If you “Theme”, you don’t need to target the long-tail?

Firstly I want to make an exception to this statement. When I am writing product reviews, I like to concentrate on the product name (which is often a buyer keyword rather than an info-seekers keyword), in the title, Meta description, h1 header and at least the opening paragraph. This ensures that my long-tail buyer phrase is on the page and stands out. The themeing that goes on around this phrase helps to build the picture of the page as one that is an authority on the product. The product review will still be themed, so still benefit from the themeing.

OK; with that proviso, let’s get back to the idea of themeing and the long-tail.

By its very nature, themeing an article will incorporate many of the words found in these long tail phrases into your article. That means, while your article may concentrate on “ping drivers”, your page would have a chance of ranking for a very wide number of long-tail phrases. Let’s look at some examples on some of my own sites.

Google Analytics from my Blood Pressure Site

I have a site on blood pressure.  I have written all of the content myself using the techniques of themeing that I have been refining over the last few years.  Here are a few of the pages as seen in Google Analytics:

This first page is the most visited page on my site, and as you can see, in the last month, this page has been found 1538 times for 809 different keyword phrases. 

A few more examples from my Diabetes Site

Below I have a couple more screenshots showing how my pages are being found for a lot of long tail phrases (in a one month period), despite me not optimizing for any of them.  The content on my sites is just well themed!


Can you see the power of themeing?

Not only does it seem to give your page a better chance of making it into the real search results for the more competitive phrases, but it means your page becomes visible to lots and lots of long tail phrases that you couldn’t possibly work into a page if you tried.

The Link between Themeing, Quality & Ranking

I have tried to show you a relationship between well-themed articles and their inclusion in an elite set of web pages that Google considers to be the “authority set” in response to a searchers query.

However, one thing I am not suggesting here is that the authors of the pages that appear in the top 10 (or in fact anywhere in the authority set) sat down an intentionally themed their article (although they might have if they had been following my newsletter for the last few years).

The thing is, when an expert in any field writes a quality article on a topic they know, they will automatically use a group of niche specific words to explain their ideas.

E.g. a doctor writing about high blood pressure will automatically use words like:

hypertension, pressure, blood, men, age, high, disease, man, heart, risk, health, treatment, kidney, systolic, control, people, diagnosis, diastolic, medication, test, condition, lifestyle, stroke, symptoms, years, information, medical, causes, doctor, prevent, diabetes, problem, arteries, attack

These words are NECESSARY to write a quality article about high blood pressure. It’s not a case of the doctor sitting down and finding the theme words and then writing the article. The doctor uses the theme words WITHOUT thinking, NATURALLY.

Following on from this argument, articles written by experts will be themed with a set of niche-specific words and phrases – something that would not have been missed by those PhD’s at the Googleplex. It just makes sense, doesn’t it?

Think about this.

If you were a search engine, how would you determine which articles were quality (and written by experts) and which were spammy?

I would suggest that the answer is simple.... Think like a human!

You would look for a range of niche-specific keywords in the articles that you know are relevant to the searchers query. Any article worth their salt should contain a core set of niche-specific keywords, and any that don’t simply cannot be covering the topic with any depth.

So how could Google possibly be able to determine core sets of theme words for any search term?

Remember that Google acquired Applied Semantics in 2003.

“Applied Semantics' products are based on its patented CIRCA technology, which understands, organizes, and extracts knowledge from websites and information repositories in a way that mimics human thought and enables more effective information retrieval”

I think that quote tells us all we need to know.

The technology “extracts knowledge” in a way that “mimics human thought” for “more effective information retrieval”.

So the Good News:

If you are an expert in your field, the content you write is likely to be automatically themed as you write, and therefore much more likely to be deemed quality by Google and included in the authority set for your chosen topic.

Even More Good News:

If you are not an expert in your field, you can easily extract a set of niche specific theme words for any search phrase, and create your content around that core set of words and phrases. These theme words will guide you and push you towards writing better content as long as you use the words in the correct context, and all the time remembering – write for humans, not for search engines.

Three Random Examples

I wanted to show you that the results shown in this report are typical and the topics were not handpicked by me or anyone else. To prove this, I wanted to select three random search phrases and do the same keyword theme analysis on the three.

To randomly pick the topics, I went to the free Wordtracker tool and typed in “How”.

Here are the top 3 results returned:

So, there are my three search phrases:

  • How to cook a turkey
  • How to carve a pumpkin
  • How do hurricanes form

While I like to think of myself as a bit of a cook and I have carved the odd pumpkin, I am no expert in any of these three topics, so let’s see what happens as we select the phrases and analyze the pages in the “authority set” within Google’s search results.

Here are the list of theme words I chose (using Web Content Studio's Spider) for each of these phrases:

1. How to Cook a Turkey

turkey, cook, eat, ave, tin, cooking, recipe, time, oven, over, roast, breast, bird, meat, side, pan, stuffing, thin, cooked, minutes, pound, every, roasting, recipes, temperature, hours, water, family, stuffed, hot, heat, thanksgiving, thermometer, table, turkeys, butter, fresh, frozen, thaw, remove, dinner, friend, taste, tips, air, dish, thigh, giblets, holiday, serve, test, carving, check, carve, wrap

2. How to Carve a Pumpkin

lid, pumpkin, art, carving, carve, tin, cut, eat, how, sign, lantern, light, make, jack, pumpkins, knife, pattern, face, sit, large, thin, book, halloween, place, cat, draw, paper, stem, seeds, tool, video, carved, lines, clean, pieces, tradition

3. How do hurricanes form?

hurricane, form, storm, air, wind, tropical, warm, low, weather, ocean, water, pressure, winds, land, cyclone, sea, storms, red, area, cloud, moist, heat, met, move, cause, energy, rise, surface, high, clouds, earth, east, miles, mph, north, speed, atlantic, cyclones, south, develop, formation, national, atmosphere, eye, rises, space, system, west, atmospheric, category, equator, reach, astronomy, cool, feet, force, meteor, region, temperature, term, answers, call, coast, effect, help, hour, level, pacific, rain, science, warming, conditions, depression, higher, hit, hot, island, large, long, moisture, rising, satellite, disturbance, global, typhoon, universe, upper, vapor, waters

.. which Web Content Studio narrowed down to the following by comparing the list against the top 10 results in Google:

hurricane, form, storm, air, wind, tropical, warm, low, weather, ocean, water, pressure, winds, land, sea, storms, red, area, heat, met, move, cause, energy, surface, clouds, earth, east, north, speed, atlantic, south, atmosphere, eye, rises, system, west, atmospheric, equator, reach, cool, meteor, term, call, rain, science, long

OK, so I have my theme words.  How did the pages in Googles results pages fair against these lists?


How to Cook A Turkey

How To Carve A Pumpkin

How do Hurricanes form?

A Final Summary

I would like to leave one thought with you.

In all of the cases we have looked at in this report, the articles that Google has chosen as their “authority set” all include a core set of theme words related to the search term. Put another way, these pages include a lot of the words and phrases that Google would expect an authority article to include in that niche.

While the top pages include more of the theme words (80% + in the examples we have looked), even those pages that rank at the end of the main index have a relatively high number of those phrases included in the pages (around 50% + in the examples we have seen).

You can think of these authority sets as the A-List celebrities in your niche. They are the pages that have been chosen by Google to represent their niche.

What makes them different from those pages not found in the main index?

Quite simple, they have put the work in to achieve their status, and while outward appearances can be deceptive, these a-listers certainly pass the theme test.

If you want to join the a-list, you too need to put the ground work in too.

To help those suffering in the aftermath of Google's latest changes, I have updated my Creating Fat Content Course (150+ meaty pages) and made it available for free. Download Creating Fat Content for 2011. This book shows you how and why you should create quality content.

Comments

No comments yet.

Submit a Comment
Members and Guests

Sign in or sign up and post using a hubpages account.



    • No HTML is allowed in comments, but URLs will be hyperlinked
    • Comments are not for promoting your Hubs or other sites

    Please wait working