Google Sitemap

Ensure Google locates site content that might be missed during normal indexing.

On this page:

Last reviewed: May 24, 2022

Introduction

In its simplest terms, an XML Sitemap — usually called a 'Sitemap', with a capital 'S' — is a list of the pages on your website. Creating and submitting your Sitemap URL to Google's Webmaster Tools site will ensure their crawlers know about all the pages on your site, including URLs that may not be discoverable by their normal indexing process.

Set Your Page as a Google Sitemap

  • For the URL of a standard UBCMS site's homepage, simply drop the ".html" and replace it with ".sitemap.xml."
    • e.g., http://www.buffalo.edu/sustainability.html would become
      http://www.buffalo.edu/sustainability.sitemap.xml
  • For a UBCMS site which has a virtualhost (e.g. medicine.buffalo.edu), use the virtualhost plus the UBCMS Author page path (e.g. /content/medicine/...).
    • e.g., http://medicine.buffalo.edu.edu (actually located under /content/medicine/...) would become 
      http://medicine.buffalo.edu/content/medicine.sitemap.xml
  • Your Sitemap should be in the format described at www.sitemaps.org, as referenced by Google Webmaster Tools (see https://support.google.com/webmasters/answer/156184).
    • The Sitemap can be built from any page on your site, but it generally will be best to use your site's homepage.
    • You can experiment with this process in Author, but do not submit the Author link to Google since their crawler will not be able to log into our Author environment to read the Sitemap.
  • Once you are happy with your Sitemap, submit it to Google using the Google Webmaster Tools
     (read more https://support.google.com/webmasters/answer/156184).

Additional Notes

  • The last-modified date is included.
  • The "Hide in Navigation" setting is honored.
  • Related Links are included, but not deeply; for example, if the related links go to a hidden part of your site, or another UBCMS site, these pages will be ignored.
  • Audience and Task nav (in the header under search, and at the right side of the top nav, respectively) are included deeply; for example, if these links go to a hidden part of your site, or another CMS site, all the pages under those areas will be included.
  • Only pages that actually exist in the hierarchy of your site, such as your top and left nav, are included. Links in text, references, external embeds, etc., are not included. There is no way to include shared content hosted pages at this time.
  • The target URL of redirect pages is included, which can result in external links in your sitemap. (We believe this is harmless/meaningless to Google.)
  • The order of links (which shouldn't really matter to Google) will match a depth-first traversal of your site in the site admin tool (the left column tree if all pages are expanded), plus the related, audience and task nav.
  •  This page will not automatically be found by crawlers. Auto-detection expects it to be named “…/demo/sitemap.xml," not “…/demo.sitemap.xml” and/or have a <meta> tag referencing it or be referenced in a robots.txt.
    • It should be able to be manually fed into Google webmaster tools and similar.
    • If this is a success, after some feedback, we may make it automatically (or with opt-in) meet the auto-detect criteria.
    • Authors cannot currently create/edit their own robots.txt file.
  • The sitemaps.org protocol/format (http://www.sitemaps.org/protocol.html) also supports including “change frequency” and “priority” metadata in the sitemap, which we may at a future date support by adding fields in page properties if this is successful and deemed worthwhile.
    • Knowing this can help debug why a group of pages is being included if you don't know where they're coming from.
    • Taking off “.sitemap.xml” and adding “/jcr:content.sitemap.txt” will switch to a text-only debugging output with indentation that may also help trace problems; for example:
      https://www.buffalo.edu/content/demo/jcr:content.sitemap.txt

Additional Resources

Was This Information Helpful?

(Required)
(Required)
(so we can thank you or request more details)
(Required)
(buffalo.edu addresses only please)
(Required)