Blacklisted by Google... Again
Before getting in to it, I should probably make it clear for anyone who hasn't read my previous Google-blacklisting article, only my most recent content ended up blacklisted, not my entire website.
My recent Lode Runner tutorial was removed from Google's index three days after it's publication on my site. If you're not familiar with early 80's home computer gaming, that's the Lode Runner character in the slashed-G image. This time the blacklisted content wasn't as popular as the previous, so not much traffic was lost as a result. It still seems pretty senseless to me as to why this happens.
The good news is that I think I've figured out the pattern. I ping Google with updates, so it usually doesn't take more than an hour or two for my latest content to show up in search results. It seems like Google checks back again later, reevaluates my site's content in greater depth, and identifies what it perceives as duplicate content. Google warns about about duplicate content in their Webmaster Guidelines. I personally don't think an entry that appears on a site's home page and also on the permalink page (simultaneously) qualifies as a "substantial" amount of duplicate content, as Google spells out in their guide, but that might be why my content ends up temporarily removed from their index. (see update)
I use Movable Type 4 as my CMS. Its default template scheme (as with most blogging/CMS platforms) publishes the latest entry on both the home page as well as on the permalink page. I've modified my templates from the default, but my latest entry is still present on both the home page and on its own permalink page when published.
My recently published tutorial page was showing up on the first page of results (or first few pages of depending on search criteria), but it is currently completely missing from Google's search results. No combination of words from the page itself results in it being listed on Google. It's showing up just fine on Yahoo!, and even at the top of the list. Yahoo doesn't have a problem with my latest content appearing both on the home page and permalink at the same time. The permalink page is returned in the search results, as it used to be the case in Google's results prior to today. Yahoo's indexing process seems to be a little better engineered than Google's. Not because Yahoo is still displaying my latest content whereas Google has removed it and therefore I like Yahoo better. It's because Yahoo's process seems to be more webmaster-friendly and works, whereas Google's process needlessly dumps completely legitimate content. Google didn't even choose one page or the other - it removed both. What other legitimate content is Google removing from their search index and why? I'm very curious about this.
As a result of my previous page getting unindexed by Google (the TR2N article), I made some changes to my sitemap. I lowered home page's priority value and gave the permalink page a higher priority. When Google reads the sitemap, it should see that the permalink page has a higher value than the home page, and therefore it should ignore the home page's (redundant) content and index the permalink page, right? It didn't work. The logic made sense to me but what do I know? I'm still trying to figure out exactly why my content initially shows up well on Google, then goes away, then shows up again after I've posted newer content.
I'm guessing that once I publish this entry, my currently blacklisted (as of writing this) page will once again show up in Google's search results after a day or so, but at a lower rank. In the day or two following that, it will probably crawl its way back up to where it was in the results previously. That's how it went with my previous Google-blacklisted page.
If the "duplicate" content is what's causing my pages to be temporarily removed from Google's index, do all blogs and websites get the same treatment? My site is one of millions that regularly display the most recent content on the home page and on the permalink page at the same time. Because this is such a widely used publishing format, I don't understand how Google could be in the practice of punishing webmasters who present their content in this way. If that is what's really going on. (see update)
Although I think I have discovered why Google is removing my content from their index, I'm not really certain how to prevent it from happening each time I publish new content. One thing I can try is to publish content more frequently - as in before Google reevaluates it and punishes me for "duplicate" material. I'd love to add new content more frequently. Unfortunately, there just aren't enough hours in the day. For now, my site is just a hobby and I can't devote as much time to creating new content as I'd like.
Another approach that might work is to display a shortened version of my latest content on my home page. This should effectively prevent Google's system from labeling it as duplicate content and punishing me by completely removing both pages of it from their search index. There are a couple problems with that idea though. The main problem is that I'm not currently proficient enough in Movable Type 4 to alter my home page template to do so in the way that I want to. I'll have to invest more time and experiment to figure it out.
Today I inserted "robots-nocontent" tags around the most recent post area of my home page. Yahoo introduced this tag in 2007, allowing webmasters to indicate to search-bots areas on pages that are less relevant - such as text found on every page of a site like navigation links, etc. As far as I know, Google's system is not recognizing this tag, but I'm hoping it is at least taking note of it somehow. I'm trying, G - work with me.
Even though I think I've figured out why Google keeps blacklisting (unindexing) my most recent content, I still have no hard evidence indicating such. Like I said, many websites also publish new content on their home page and permalink page simultaneously without punishment from the big G. What is it that other webmasters are doing right that I'm not? Hopefully, I'll figure it out in time. If you know the secret, please take a moment to share. I'd appreciate it.
There are so many rules webmasters have to follow to be included in Google's index. We all want a high page rank. We all want to be seen in the mighty search giant's results. I've been poking around the enormous amount of information in Google's webmaster knowledge base, but can't seem to find the specific rules for this issue logically spelled out. There are general guides and suggestions, but nothing that spells out "this [occurrence] will result in content being removed from the search index because of [whatever]."
I understand that some people were finding ways to trick Google's system into getting a higher page rank. This is bad for everyone - especially those who've worked hard on their content, investing their time and money to get seen legitimately. I invest good time and work hard on my own content - from the serious to the silly. I'm not playing any tricks, but I keep getting penalized by Google, presumably because of the checks and balances they put in place to prevent people from tricking the system.
At least I've managed to get seen on Yahoo and remain seen there. Although not as widely used as Google, it still points millions of people daily to the information they seek. Since previously discovering my content blacklisted by Google, I've been working on making my site more visible on Yahoo - in case it happened again. I wish it were as easy to stay listed in Google's index as it is with Yahoo. It would be nice if I could spend my time creating new content rather having to spend it researching how to keep it listed on Google without it getting blacklisted after a few days. Of course, in the grander scheme of life - what does it really matter anyway, right? Perhaps I should just forget about how Google plays the game and simply have fun with my website. People will still find their way to my content. It will just take a little longer for it to show up on the radar for Google users.
Update: August 14, 2008
It is unlikely that my small amount of duplicate content was resulting in my recent pages disappearing from Google's index. See this follow up post for my latest findings on the subject.