Posts tagged seo

Checklist for a Successful Website Redesign

We often receive questions when an agency conducts a major website upgrade, changes content management systems, or both. We created this checklist to help ensure your redesign is successful. The stages are:

Ready…

1. Let the Search.gov team know you are launching a new site

2. Develop a reindexing plan

Set…

3. Prepare xml sitemaps and SEO elements

4. Add a Search Page Alert

5. Prepare color scheme updates and new logo to add to Admin Center

6. Prepare updates to your other search features

Go!

7. Flip the new website live, let us know

8. Implement the changes to the search site

9. Results begin to show

Victory lap

10. Alert Google and Bing that your website has been refreshed

Flow chart showing the steps involved in getting the search index ready to go on Search.gov, for a website that’s being relaunched. Website relaunch flow chart detailed description
Open large version

Ready…

1. Let the Search.gov team know you are launching a new site

Who: You, the agency web team

What: Send us an email, give us a call, either way, please let us know that you’re working on a redesign of your website. If we know ahead of time, we can help you get your new search experience prepped and in good shape on the day of the relaunch. When you reach out to us, include the planned launch date.

It’s important to plan ahead, because if there are any changes to your site structure, your search results will break, which will lead to frustration for the public as they try to use your new site. This is true for our service, and out on Google and Bing. To avoid an avalanche of 404 not found errors from your search results, wherever possible, use 301 redirects to send visitors from the old pages to the appropriate new pages. For more on 301 redirects, read tips from Bing (External link) and Google (External link). Notify other websites that link to you of the changes.

2. Develop a reindexing plan

Who: Search.gov team, in consultation with you, the agency web team

What: We will ask you about your search needs for the new site, including what domains you need to include, how you can generate an xml sitemap, and what SEO supports you’re putting in place in your new templates, like metadata and other structured elements. Read more about our indexing process here.

As part of this discussion, we’ll make some recommendations and likely agree on some action items for your team to consider implementing prior to your launch. If there are any major SEO warning signs in your setup, we’ll let you know.

We’ll also ask you about the timeline for launch, so that we can reserve a time to coordinate Step 8 with you.

Set…

3. Prepare xml sitemaps and SEO elements

Who: You, the agency web team

What: Action items that usually come out of the planning discussions include

  • Ensure that each domain and subdomain you want to be searchable launches with an xml sitemap.
  • Add metadata blocks to the <head> of your page templates, and Semantic Markup to the <body>.
    • Sometimes these pieces are in place, but need to be modified or moved.
  • Talk with other web teams to ask them to do the above items on their sites, so you can leverage them when your site searches their site’s content.

4. Add a Search Page Alert in the Admin Center

Who: You, the agency web team

What: Use our Search Page Alert feature to display a “pardon our dust” type message on your results page. For example:

  • We are launching a new example.gov. If your search does not return the content you expected, please check back soon for updated results.
  • Set the status of the alert to Inactive and wait for the relaunch.

5. Prepare color scheme updates and new logo to add to Admin Center

Who: You, the agency web team

What: Gather your new logo and color palette, if needed. Many sites find it helpful to mock up their redesigned results page in a non-production search site - you can either clone your existing site or just use the Add Site button to create a totally new one.

Don’t implement these changes on your production site ahead of the actual relaunch (that comes in Step 8, below).

6. Prepare updates to your other search features

Who: You, the agency web team

What: When your URL structure changes, this will affect several of our search features. You’ll want to get your updates ready to go, but don’t implement them ahead of the relaunch, or people will end up in the wrong places:

  • Domains: make sure your Domains list includes the domains and subdomains that you want included as the default content to search.
  • Collections: make sure your Collections are searching for the right content in the new location.
  • Best Bets: make sure your Best Bet URLs are correct for the content’s new location.
  • Routed Queries: update the target of your Routed Queries, so searchers will end up on the correct page.
  • RSS feeds should be removed, and re-added from their new locations.

Go!

7. Flip the new website live, and let us know

Who: You, the agency web team

What: When your new website is publicly available, reach out to us by email or phone. This will be our signal to begin our part of Step 8.

8. Implement the changes to the search site

At this point, the work splits into two parallel tracks, with your team and ours working on related items at the same time.

Who: You, the agency web team

What: Add the updates you prepared in Steps 4, 5, and 6 to the Admin Center for your production search site:

  • Set your Search Page Alert to Active
  • Update your colors and logo
  • Update your Domains, Collections, Best Bets, Routed Queries, and RSS Feeds as necessary

Who: The Search.gov team

What: We complete several backend tasks

  • Switch your production search site to use the new index, which will begin empty for your domain(s).
  • Tell our indexer to begin working on your domain(s)
    • The time it takes to get your content indexed depends on the number of items you have, and whether you have a crawl delay declared in your /robots.txt file. Generally speaking, a few hundred items should be done in an hour or two, a few thousand items should be done in several hours, etc.

9. Results begin to show

What: Our indexer will first read your sitemap, collect the urls, and then work through them in the order they were collected. We will work at the crawl delay set in your /robots.txt file, or 1 request per second, whichever is slower. This delay is the time after we’ve rendered a page, before requesting the next page to render.

Victory lap

10. Alert Google and Bing that your website has been refreshed.

Who: You, the agency web team

What: Register for the commercial search engines’ webmaster tools, if you haven’t already done so.

If you’ve undergone a redesign, followed these steps, and your site search results are not what you’d expect, send us an email.

How Search.gov Ranks Your Search Results

Google and Bing hold their ranking algorithms closely as trade secrets, as a guard against people trying to game the system to ensure their own content comes out on top, regardless of whether that’s appropriate to the search. Search Engine Optimization (SEO) consulting has grown up as an industry to try to help websites get the best possible placement in search results. You may be interested in our webinars on technical SEO and best practices that will help you get your website into better shape for search, and we’re also available to advise federal web teams on particular search issues. Generally speaking, though, SEO is a lot like reading tea leaves.

We at Search.gov share our ranking factors because we want you to game our system. This helps ensure that the best, most appropriate content rises to the top of search results to help the American public find what they need.

This page will be updated as new ranking factors are added.

Guaranteed 1st Place Spot

For any pages you want always to appear in the top of search results, regardless of what the ranking algorithm might decide, use a Best Bet. Like an ad in the commercial engines, Best Bets allow you to pin recommended pages to the top of results. Text Best Bets are for single pages, and Graphics Best Bets allow you to boost a set of related items. Our Match Keywords Only feature allows you to put a tight focus on the terms you want a Best Bet to respond to. Read more here.

Ranking Factors

Each of the following ranking factors is calculated separately, and then multiplied together to create the final ranking score of a given item for a given search.

File Type

We prefer HTML documents over other file types. Non-HTML results are demoted significantly, to prevent, for instance, PDF files from crowding out their respective landing pages.

Freshness

We prefer documents that are fresh. Anything published or updated in the past 30 days is considered fresh. After that, we use a Gaussian decay function to demote documents, so that the older a document is, the more it is demoted. When documents are 5 years old or older, we consider them to be equally old and do not demote further. We use either the article:modified_time on an individual page, or that page’s <lastmod> date from the sitemap, whichever is more recent. If there is only an article:published_time for a given page, we use that date.

Documents with no date metadata at all are considered fresh and are not demoted. Read more about date metadata we collect and why it’s important to add metadata to your files.

Page Popularity

We prefer documents that users interact with more. Currently we leverage our own search analytics to track the number of times a URL is clicked on from the results page. The more clicks, the more that URL is promoted, or boosted. We use a logarithmic function to determine how much to boost the relevance score for each URL. For sites new to our service, please expect this ranking factor to take 30 days to fully warm up after your search goes live.

Note: Sites using the search results API to present our results on their own websites will not be able to take advantage of our click data ranking.

Core Ranking Algorithm

Our system is built on Elasticsearch, which itself is built on Apache Lucene. For the first several generations, Elasticsearch used Lucene’s default ranking, the Practical Scoring Function. This Function starts with a basic Boolean match for single terms and adds in TF/IDF and a vector space model. Here are some high level definitions for these technical terms:

  • Boolean matches are the AND / OR / NOT matches you’ve probably heard about.
    • This AND that
    • This OR that
    • This NOT that
    • This AND (that OR foo) NOT bar
    • Note that while the relevance ranking takes these into account, we do not currently use these operators if entered by a searcher. Support for user-entered Boolean operators is coming in 2019.
  • TF/IDF means term frequency / inverse document frequency. It counts the number of times a term appears in a document, and compares it to how many documents have that word. It aims to identify documents where the query terms appear frequently, and documents with more rare terms across the whole set of documents will get a higher score. Documents with a lot of common terms appearing in many documents will get a lower score.
    • They also have tempered the TF/IDF score with a method called BM25, which attempts to balance the TF/IDF scores of documents that are very different in length. If there are ten documents containing rare terms, the longest doc with the most instances of the terms would get a much higher score than a short doc with only a few instances of the terms. This makes intuitive sense, but when considered as a full pdf of a report vs the summary of the report, the full report isn’t that much more relevant to the query than the summary is. BM25’s length ‘normalizatin’ addresses that issue.
  • The vector space model allows the search engine to weight the individual terms in the query, so a common term in the query would receive a lower match score than a rare term in the query.
  • Read detailed technical documentation here (External link)

The latest versions of Elasticsearch takes into account the context of terms within the document, whether they are in structured data fields or in unstructured fields, like body text.

  • Structured data fields, like dates, are treated with a Boolean match method - does the field value match, or not?
  • Unstructured data fields, like webpage body content, are considered for how well a document matches a query.
  • Read highly technical documentation here (External link)

What Search.gov Indexes From Your Website

Content

When we think about indexing pages for search, we usually think about indexing the primary content of the page. But if the page isn’t structured to tell the search engine where that content is to be found, it will collect the <body> tag, and then filter out the <nav> and <footer> elements, if present. If <main>, <nav>, or <footer> are not present, we collect the full contents of the <body> tag. Learn more on our post about aiming search engines at the content you really want to be searchable, using the </main> element.

Metadata

You can read more detail on each of the following elements here.

Standard metadata elements

  • title
  • meta description
  • meta keywords
  • locale or language (from the opening <html> tag)
  • url
  • lastmod (collected from XML sitemaps)
  • og:description
  • og:title
  • article:published_time
  • article:modified_time

File formats

In addition to HTML pages with their various file extensions, Search.gov indexes the following file types:

  • PDFs
  • Word docs
  • Excel docs
  • TXT
  • Images can be indexed either using our Flickr integration, or by sending us an MRSS feed. Note that images are not indexed during web page indexing, so you’ll need to use one of these two methods.

Coming soon:

  • Powerpoint

Please note that at this time we cannot index javascript content, similar to most search engines (External link). At this time we recommend your team adds well crafted, unique description text for each of your pages, or perhaps auto-generate description tag text from the first few lines of the article text. However the text is added, it should include the keywords you want the page to respond to in search, framed in plain language. This will give us, and other search engines, something to work with when we’re matching and ranking results. See our discussion of description metadata for more information.

Metadata and tags you should include in your website

Search.gov, like other search engines, relies on structured data to help inform how we index your content and how it is presented in search results. You should also read up on the metadata and structured data used by Google (External link) and Bing (External link).

Including the following tags and metadata in each of your pages will improve the quality of your content’s indexing, as well as results ranking. We also encourage you to read about more HTML5 semantic markup (External link) you can include in your websites.

This page will be updated over time as we add more tag-based indexing functions and ranking factors to our service.

<title>
Detail: Unique title of the page. If you want to include the agency or section name, place that after the actual page title.
Used in: Query matching, term frequency scoring

<meta name=”description” content=”foo” />
Used in: Your well crafted, plain language summary of the page content. This will often be used by search engines in place of a page snippet. Be sure to include the keywords you want the page to rank well for. Best to limit to 160 characters, so it will not be truncated. Read more here (External link).
Used in: Query matching, term frequency scoring

<meta name=”keywords” content=”foo bar baz ” />
Detail: While not often used by commercial search engines due to keyword stuffing (External link), Search.gov indexes your keywords, if you have added them.
Used in: Query matching, term frequency scoring

<meta property="og:title” content=”Title goes here” />
Detail: Usually duplicative of <title>, we use the og:title property as the result title if it appears to be more substantive than the <title> tag. Note, Open Graph elements are used to display previews of your content in FaceBook and some other social media platforms.
Used in: Query matching, term frequency scoring

<meta property="og:description” content=”Description goes here” />
Detail: Often duplicative of the meta description, we index this field as well, in case it has different content. This field is a good opportunity to include more keywords than you could write into the meta description. Note, Open Graph elements are used to display previews of your content in FaceBook and some other social media platforms.
Used in: Query matching, term frequency scoring

<meta property="article:published_time" content="YYYY-MM-DD" />
Detail: Exact time is optional; read more here (External link).
Used in: Page freshness scoring.

<meta property="article:modified_time" content="YYYY-MM-DD" />
Detail: Exact time is optional; read more here (External link).
Used in: Page freshness scoring.

<meta name="robots" content="..., ..." />
Detail: Use the meta robots tag to block the search engine from indexing a particular page.
Used in: Used during indexing, does not affect relevance ranking.

<main>
Detail: Allows the search engine to target the actual content of the page and avoid headers, sidebars and other page content not useful to search. Read more about the <main> element here
Used in: Query matching, term frequency scoring

<lastmod>
Detail: This field is included in XML sitemaps to signal to search engines when a page was last modified. Search.gov collects this metadata in case there is no article:modified_time data included in the page itself.
Used in: Indexing processing, page freshness scoring.


Everything You Need to Know About Indexing with Search.gov

How does all this work?

Domain Level SEO Supports

Page Level SEO Supports

How to get search engines to index the right content for better discoverability

Website structure and content can have a significant impact on the ability of search engines to provide a good search experience. As a result, the Search Engine Optimization industry evolved to provide better understanding of these impacts and close critical gaps. Some elements on your website will actively hinder the search experience, and this post will show you how to target valuable content and exclude distractions.

We’ve written a post about robots.txt files, talking about high level inclusion and exclusion of content from search engines. There are other key tools you will want to employ on your website to further target the content on individual pages:


The <main> element

Targeting particular content on a page

A <main> element allows you to target content you want indexed by search engines. If a <main> element is present, the system will only collect the content inside the element. Be sure that the content you want indexed is inside of this element. If the element is closed too early, important content will not be indexed. Unless the system finds a <main> element demarcating where the primary content of the page is to be found, repetitive content such as headers, footers, and sidebars will be picked up by search engines as part of a page’s content.

The element is implemented as a stand-alone tag:

<body>
Redundant header code and navigation elements, sidebars, etc.
<main>
<h1>This is your page title</h1>
<p>This is the main text of your page
</main>
Redundant footer code
Various scripts, etc.
</body>

The element can also take the form of a <div> with the role of main, though this approach is now outdated:

<body>
Redundant header code and navigation elements, sidebars, etc.
<div role=”main”>
<h1>This is your page title</h1>
<p>This is the main text of your page
</div>
Redundant footer code
Various scripts, etc.
</body>

As mentioned above, if no <main> element is present, the entire page will be scraped. This is best reserved for non-HTML file types, though, including PDFs, DOCs, and PPTs.


Declare the ‘real’ URL for a page

There are two good reasons to declare the URL for a given page: CMS sites can easily become crawler traps, and list views can generate urls that are unhelpful as search results.

A crawler trap occurs when the engine falls into a loop of visiting, opening, and “discovering” pages that seem new, but are modifications on existing URLs. These URLs may have appended parameters such as tags, referring pages, Google Tag Manager tokens, page numbers, etc. Crawler traps tend to occur when your site can generate an infinite number of URLs. The crawler is ultimately unable to determine what constitutes the entirety of a site. <link rel="canonical" href="https://www.example.gov/topic1" />

By using a canonical link, shown above, you tell the crawler this is the real URL for the page despite parameters present in the URL when the page is opened. In the example above, even if a crawler opened the page with a URL like https://example.gov/topic1?sortby=desc, only https://www.example.gov/topic1 will be captured by the search engine.

Another important use-case for canonical links is the dynamic list. If the example above is a dynamic list of pages about Topic 1, it’s likely there will be pagination at the bottom of the page. This pagination dynamically separates items into distinct pages and generates urls like: https://example.gov/topic1?page=3. As new items are added to or removed from the list, there’s no guarantee that existing items will remain on a particular page. This behavior may frustrate users when a particular page no longer contains the item they want.

Use a canonical link to limit the search engine to indexing only the first page of the list, which the user can then sort or move through as they choose. The individual items on the list are indexed separately and included in search results.


Robots meta tags

There are individual pages on your websites that do not make good search results. This could be archived event pages, list views such as Recent Blog Posts, etc. Blocking individual pages on the robots.txt file will be difficult if you don’t have easy access to edit the file Even if edits are easy, it could quickly lead to an unmanageably long robots.txt.

It’s also important to note that search engines will pay attention to Disallow directives in robots.txt when crawling, but may not when accessing your URLs from other sources, like links from other sites or your sitemap. Search.gov will rely on robots meta tags when working off your sitemap to know what content you want searchable, and what you don’t want searchable.

To achieve best results for blocking indexing of particular pages, you’ll want to employ meta robots tags in the <head> of the pages you want to exclude from the search index.

This example says not to index the page, but allows following the links on the page:

<meta name="robots" content="noindex" />

This example says to index the page, but not follow any of the links on the page:

<meta name="robots" content="nofollow" />

This example tells bots not to index the page, and not to follow any of the links on the page:

<meta name="robots" content="noindex, nofollow" />

You can also add an X-Robots-Tag to you HTTP header response to control indexing for a given page. This requires deeper access to servers than our customers usually have themselves, so if you are interested in learning more, you can do so here  (External link).

If you have content that should be indexed when it’s fresh, but needs to be removed from the index once it’s outdated, you’ll want to take a few actions:

  • Once the page’s window of relevance is over, add a <meta name="robots" content="noindex" /> tag to the <head> of the page.
  • Make sure the modified_time on the page is updated.
  • Leave the item in the sitemap, so that search engines will see the page was updated, revisit it, and see that the item should be removed from the index.


Sample code structure

Dynamic list 1: Topic landing page

The following code sample is for a dynamically generated list of pages on your site, where you want the landing page for the list to appear in search results.

<head>
<title>Unique title of the page</title>
<meta name="description" content="Some multi-sentence description of various things a person will find on this page. This is a great place to use different terms for the same thing, which is hopefully both plain language and keyword stuffing at the same time." />
<meta property="og:title" content="Unique title of the page" />
<meta property="og:description" content="Some multi-sentence description of various things a person will find on this page. This is a great place to use different terms for the same thing, which is hopefully both plain language and keyword stuffing at the same time. This could be the same or slightly different than the regular meta description." />
<meta property=”article:published_time” content=”2018-09-28” />
<meta property=”article:modified_time” content=”2018-09-28” />
<link rel="canonical" href="https://www.example.gov/topic1" />
</head>

<body>
Redundant header code and navigation elements, sidebars, etc.
<main>
<h1>Unique title of the page</h1>
<p>This is the introductory text of the page. It tells people what they’ll find here, why the topic is important, etc. This text is within the main element, and so it will be used to retrieve this page in searches.
</main>
Dynamically generated list of relevant pages
Pagination
Redundant footer code
Various scripts, etc.
</body>

Dynamic list 2: Posts tagged XYZ

The following code sample is for a dynamically generated list of pages on your site, where you do not want the list to appear in search results. In the case of pages tagged with a particular term, the pages themselves would be good search results, but the list of them would be just another click between the user and the content.

Note: the description tags are still present in case someone links to this page in another system and that system wants to display a summary with the link.

<head>
<title>Unique title of the page</title>
<meta name="robots" content="noindex" />
<meta name="description" content="Some multi-sentence description of various things a person will find on this page. This is a great place to use different terms for the same thing, which is hopefully both plain language and keyword stuffing at the same time. Recommended max characters is 175." />
<meta property="og:title" content="Unique title of the page" />
<meta property="og:description" content="Some multi-sentence description of various things a person will find on this page. This is a great place to use different terms for the same thing, which is hopefully both plain language and keyword stuffing at the same time. Recommended max characters is 175. This could be the same or slightly different than the regular meta description." />
<meta property=”article:published_time” content=”2018-09-28” />
<meta property=”article:modified_time” content=”2018-09-28” />
<link rel="canonical" href="https://www.example.gov/posts-tagged-xyz" />
</head>

<body>
Redundant header code and navigation elements, sidebars, etc.
<h1>Unique title of the page</h1>
Dynamically generated list of relevant pages
Pagination
Redundant footer code
Various scripts, etc.
</body>

Event from last month

In the following example, an event page was published in June, and then updated the day after the event occurred. This update adds the meta robots tag, which declares the page should not be indexed, and links from the page should not be followed in future crawls. Again, the meta descriptions are retained in case of linking from other systems.

<head>
<title>Unique title of the page</title>
<meta name="robots" content="noindex, nofollow" />
<meta name="description" content="Some multi-sentence description of various things a person will find on this page. This is a great place to use different terms for the same thing, which is hopefully both plain language and keyword stuffing at the same time. Recommended max characters is 175." />
<meta property="og:title" content="Unique title of the page" />
<meta property="og:description" content="Some multi-sentence description of various things a person will find on this page. This is a great place to use different terms for the same thing, which is hopefully both plain language and keyword stuffing at the same time. Recommended max characters is 175. This could be the same or slightly different than the regular meta description." />
<meta property=”article:published_time” content=”2018-06-04” />
<meta property=”article:modified_time” content=”2018-08-13” />
<link rel="canonical" href="https://www.example.gov/events/august-12-title-of-event" />
</head>

<body>
Redundant header code and navigation elements, sidebars, etc.
<main>
<h1>Unique title of the page</h1>
<p>This is the introductory text of the page. It tells people what they’ll find here, why the topic is important, etc. This text is within the main element, and so it will be used to retrieve this page in searches.
Specifics about the event.
</main>
Redundant footer code
Various scripts, etc.
</body>

Resources

Government-managed Domains outside the .Gov and .Mil Top Level Domains

Overview

As the U.S. government’s official web portal, USA.gov (External link) searches across all federal, state, local, tribal, and territorial government websites. Most government websites end in .gov or .mil, but many end in .com, .org, .edu, or other top-level domains.

In support of USA.gov and M-17-06 - Policies for Federal Agency Public Websites and Digital Services (External link), Search.gov maintains a list of all government domains that don’t end in .gov or .mil.

How to Update the List

Federal agencies are required (External link) to submit to Search.gov all non-.gov websites for inclusion in the list. This includes subdomains of a second-level domain managed by a third party, and federally controlled subfolders of a domain managed by a third party.

State or local agencies can browse the list by by level of government (External link), you can sort by state by downloading the .csv file to your computer. Please email updates or additions to the Search team, or open an issue in GitHub (External link).

What’s Included in The List?

What’s Not Included in This List?

  • .gov URLs - these are managed by the .gov Registry (External link)
  • .mil URLs - these are managed by DOD (External link)
  • Subdomains or folders that are already covered by a higher-level domain
  • State institutions of higher education or their board of regents
  • K-12 school districts
  • Local fire, library, police, sheriff, etc. departments with separate websites
  • Local chambers of commerce or visitor bureaus
  • Nonprofit municipal leagues or councils of government officials
  • Nonprofit historical societies
  • Transit authorities

Search Engine Optimization for Government Websites

On June 10, 2014, the Metrics Community of Practice of the Federal Web Managers Council and DigitalGov University hosted an event to honor the memory of Joe Pagano, a former co-chair of the Web Metrics Sub-Council.

This third lecture honoring Joe focused on search engine optimization (SEO).

While commercial search engines do a remarkable job of helping the public find our government information, as web professionals, it’s also our job to help the public make sense of what they find.

Ammie Farraj Feijoo, our program manager, presented on SEO for government websites and specifically talked about:

  • What SEO is and why it is important;
  • SEO building blocks for writing content;
  • Conducting keyword research; and
  • Eliminating ROT (redundant, outdated, and trivial content).

Download the slide deck [PDF] and visit the resources below to learn more.

Webmaster Tools

A Few (of Many) SEO Resources


Page last reviewed or updated: