Comments (10)

    • Hi James,

      I’ll be totally honest with you – I’m not entirely sure. I have seen both methods recommended. It makes logical sense that the robots.txt option would be more effective, as it is a file specifically made for the search engine bots, but then an XML sitemap is too.

      Implementing indexing and noindexing using the plugin mentioned above is a piece of cake, and as far as I am aware, works 100%. That being the case, I decided it was the best method to recommend.

      Thanks for the comment! :-)

      Tom

  1. Two points:

    1) A sitemap (xml or otherwise) is nothing more than a *clue* to Google (and other search engines) where your content is located so it can be indexed. It will NOT *prevent* Google or other search engines from finding content other ways (such as browsing or even randomly guessing at locations based on the engine in use on your site). Thus, the sitemap will help it find content, but CAN NOT prevent it from indexing anything at all.

    2) Even using robots.txt to prohibit these types of pages (categories, posts or dated archives) won’t necessarily help. Robots.txt tells search engines what content not to “visit”, not what content not to “index”. This distinction is HUGE as it means that by blocking a search engine from indexing a page (such as “/categories/josh-brolin/”) means that the page might show up in the SERPS anyway, but lacking any descriptive text.

    The best solution to this scenario is to ensure that these pages are not duplicates in the first place. Ensure that they have their own unique content (“/tags/josh-brolin/” should have a brief bio or something in addition to the list of all content with that tag) and that only a small portion of the entire post/page content is duplicated, effectively teasing the visitor to the canonical (“real”) page.

Participate