Posts tagged go-live

How a Page on a Sitemap Becomes a Search Result

We often get questions about how sitemaps control the search results for a given site. The answer is, they don’t! This page will describe to you the relationship between sitemaps, search indexes, and the search experiences you create through the Admin Center.

A frame for the relationships described below

Imagine a big lake. There are any number of tributaries feeding into the lake. There are fishing boats out on the lake, each loaded up with the gear they need and a guide to the kinds of fish they’re trying to catch.

The Big Search.gov Index: the Lake

Like a lake with its fish, the common search index has all the content from all the sites we index, ready to be brought up by any number of different search site configurations.

The main difference in the search site setup process is the source of the web results. Like Google and Bing, when we index your content, we collect every site’s web pages into a big, common index. All search sites using our index reference this same common data pool.

Sitemaps: the Tributaries

XML Sitemaps are like tributaries feeding into a lake. They do not feed into sitemap-specific indexes connected to particular search sites.

Sitemaps list the content available on websites in a machine-friendly format, so that search engines will know what to collect from the site. The content indexed from your website goes into the big index mentioned above, along with the content from all other websites. You can, in theory, pull content from any website we have indexed into your search experience. This supports portal search experiences.

Search Site Setup: the Fishing Boats

Like a fishing boat on the water, you’ve decided what fish you’re going after, you know what corners of the lake to go to, and you’ve collected the gear you need to get the fish.

Search.gov used to rely on the Bing web index for our main search results. Customers would log in to the Admin Center and use the Domains list to include the content they wanted to pull from Bing. Now that we’re building our index in house, all this remains the same. You log in to the Admin Center and configure what you want your search to return on the results page.

Tying it all together

We use sitemaps to inform what we index into our system. You use the Admin Center to determine what results will come out of the index when people search on your website. Tributaries feed into a lake, and fishers can go out to any part of the lake to get the particular kinds of fish that they want.

Following a particular page through this cycle looks like this:

  1. A page is posted to a website
  2. Its URL is added to the sitemap
  3. Search.gov’s indexer reads the sitemap and picks up the URL
  4. Search.gov’s indexer visits the page and scrapes the content
  5. The content is added to the index. Meanwhile, the search site had already been configured to include this content within the index.
  6. A member of the public searches on the website
  7. The query matches the page’s content
  8. The page is returned as a search result
  9. The searcher clicks on the URL on the results page
  10. The searcher is brought to the page on the website

    Diagram showing a large circle, representing the Search.gov website. To the left of the circle is an array of small blocks, each representing an individual sitemap. Arrows point from the sitemaps to the large circle. To the right of the circle is a set of pentagons representing search sites. To the right of these is a vertical bar representing the Public. Arrows flow from the circle, through the pentagon and end at the bar, representing the flow of search results from the central Search.gov index through the search sites to the members of the public who are searching.

Search Site Launch Guide

At Search.gov we aim to provide a self-service, plug and play search solution. This guide will walk you through everything you need to do, and let you know when to reach out to us. The basic steps are:

  1. Add a site
  2. Add Domains
  3. We will select the search index your site will use
  4. Add additional search features
  5. Turn on the search features
  6. Configure the branding of your results page
  7. Connect your website’s search box to your search site

Flow chart showing the steps involved in launching a search site on Search.gov Site launch flow chart detailed description
Open large version

1. Add Site

*Who: You, the agency web team

What: After you’ve successfully opened an account with Search.gov, you’ll need to create a search site. A search site is where you configure the search experience for your website. Find the Add Site link at the top of the Admin Center, and enter some basic details about your site. Please note that our service is for publicly accessible, federal government content. More detailed information can be found on our Add Site help page.

Once you’ve created your site, note the actions available on the left-hand navigation of your Admin Center.

The Dashboard is where you can view a Site Overview, manage users, update your site’s homepage, or site display name.

Analytics are provided for the past 13 months, reporting your top queries, clicks, and referrers (the pages people were on when they ran their searches), and monthly rollup data.

Content management is where you define what your search experience will include, both the default search scope, additional content sources, and alternative search views.

Display management is where you can configure the branding of your search results page.

Preview your search results page to see what your search experience will be like, before you go live.

And finally, the Activate section provides pre-formatted code snippets to help you go live. Don’t be afraid of entering this area, nothing will actually be activated.


2. Add Domains

Who: You, the agency web team

What: In the content management section, the domains list defines the default search scope for your site. You can include one domain or several, or you can focus on particular subdomains of one domain. Read more here.


3. Web Index Selection

Who: Search.gov team, in consultation with you, the agency web team

What: By default a new search site will be connected to the Bing web index to receive web results. Websites with very low levels of search traffic can continue to use the Bing web index after they launched our service. However, sites that will see greater than 150,000 queries per year will need to be indexed directly by our service before going live. We monitor new sites established in our system, and will reach out if we think your site will need to be indexed by us, or if we need more information to make a determination.

Regardless of the index used to support your search, we can only serve publicly accessible content. You will not be able to use our service for secure content, including intranets, and we can never index or serve personally identifiable information (PII) or other confidential data.

(Jump to Step 4. Add Features if you don’t need the details of the indexing process at this time.)

If we will be indexing your content ourselves, we will follow these steps:


Indexing with Search.gov


A. Define Domains and Subdomains

Who: You, the agency web team, in consultation with the Search.gov team

What: The Admin Center Domains list controls what we pull out of our index for a search on your site. But we also need to know what to put in to the index to begin with. We’ll work with you to confirm the domains and subdomains you want discoverable through search. For example, after discussing with you, we may plan to index all of your subdomains, or just a selection of the major sections:

www.example.gov
data.example.gov
archive.example.gov
www.subagencydomainexample.gov 


B. Sitemap for Each Subdomain

Who: You, the agency web team, in consultation with the Search.gov team

What: The easiest way for us to discover what URLs exist on your domain is via an XML sitemap. Each domain identified above will need a separate sitemap. Please read our detailed discussion of XML sitemaps, and let us know if you have any questions. We understand it can be difficult for some legacy systems to generate a sitemaps, so if this is the case, please reach out.

We do not crawl websites by default due to the high resource demand of crawling every page on every website all the time. One of the goals of our service is to contain the costs of search government-wide, and a crawling-first model would increase costs significantly.

C. Index Subdomains

Who: The Search.gov team

What: Once sitemaps are posted to your website, our system will index your content. Alert us when the sitemaps are posted, and we’ll add your domains to our list of domains that we monitor. Then, indexing will begin.

By default, we make 1 request per second to a domain. If a Crawl-delay is declared in your /robots.txt file, we will honor that delay while fetching your content for indexing. The length of time required to index a site is (number of items) x (crawl delay) / 3600 = hours to index.

If you use a firewall service, it’s possible our indexer will be blocked. We can provide our IP addresses for you to whitelist in your firewall.

Please note, we can only index domains that are publicly accessible. This means that if you have a password-protected staging environment, we will not be able to index it for you as part of your testing process. Please reach out and we can discuss options if you need to test our service pre-production.

D. Test Index

Who: Search.gov Team

What: For search sites switching from Bing: After your content is indexed, we’ll start up a parallel search site using your current site configuration and the new index, and run a number of test queries to ensure the index is performing satisfactorily. Our test will cover your live site’s most popular queries.

E. Review Index

Who: You, the agency web team

What: For sites switching from Bing: After we’re satisfied with the index, we’ll send you a link to the test search site, so you can review and provide feedback.

For brand new sites: You will be able to test the index using your regular search site(s).

F. Ready to Launch

Who: You, the agency web team, in collaboration with Search.gov

What: For brand new sites: Your index is ready to go, you can proceed with the rest of the site launch steps and go live without any further action from our team.

For sites switching from Bing: When you give us the green light to switch to the new index, there is no action needed on your part other than the approval. We will change a setting in our back end, which will point your existing search site’s web results module to our index, and the change is effective immediately. All other elements of your search site remain the same: search features, branding, etc.



4. Add Search Features

Who: You, the agency web team

What: We offer several additional search features you can configure to enhance your search experience.

  • Collections allow you to set up alternative search scopes from the Domains you declare for the main search. Often Collections point at particular subfolders or subdomains of the primary domain for the site. Sometimes they point at a different domain entirely. If you are indexed by Searhc.gov and you want a Collection to search another domain, check with us to see if we have that content already indexed.
  • Best Bets work like ads in Google, and allow you to pin certain results to the top of your search results. Use Text Best Bets to boost individual items, and Graphics Best Bets to boost a set of related items, such as a form, its instructions page, and other related material.
  • Routed queries allow you to bypass the results page entirely for a given query, where you know exactly the page you want a person to get to after running that query. This is helpful for always getting people to the landing page for a process, rather than their clicking to a mid-process page from a search results page.
  • RSS feeds can be indexed and searched either as separate tabs on the search results, or as an inline module promoting your latest content alongside your web results.
  • YouTube videos can also be searched
  • Twitter
  • Flickr
  • Jobs are one of the most frequently searched topics on agency websites. Use our jobs module to show your agency’s postings from USAJOBS in your own website’s search results.
  • Federal Register rules and notices can be added to your search results in a separate module.


5. Toggle Search Features On

Who: You, the agency web team

What: In order to display any of the search features you just added above, you’ll need to toggle ON the display for each one, using the Display Overview page. If you want to show Jobs or Federal Register results and you don’t see those options on the Display Overview page, let us know and we can connect your search site to those features.


6. Configure Results Page

Who: You, the agency web team

What: To make the results page complement your website’s look and feel, upload your logo, set the font style, and customize the page colors to ensure a more seamless experience for your searchers as they move from your site to ours, and back again. You can also add header and footer links to support navigation back to your website. See more details here.

Masking the domain for your results page is another way you can provide continuity to your searchers as they move back and forth between your site and our system.


7. Connect Your Search Box to Search.gov

Who: You, the agency web team, in collaboration with your deploy team, if different

What: Once you’re ready to go live with your search site, take a look at the Go-Live Checklist to make sure you’ve covered all your bases. Then you will need to modify the form code for the search box on your website. We provide simple pre-formatted code in the Admin Center, or you can include these same parameters in another style of search box. Read more and see required parameters here.

You’re now live with Search.gov!

Setting up Search.gov for Federalist Sites

For 18 years, GSA’s Search.gov has transformed the public’s search experience on federal government websites. Agencies use our free, shared service to power over 2,000 search boxes on over 30% of federal domains. This page will walk you through the steps required to integrate Search.gov with your Federalist website.

On the Federalist side of things, part 1

  1. Confirm you have the jekyll-sitemap gem installed in your repo. Read the docs here.

On the Search.gov side of things:

  1. Sign up for a user account.
  2. Read our Search Site Launch Guide if you’d like some direction.
  3. Request that your domain be indexed by emailing our team. Note, the site must be publicly available for our indexer to be able to access your content.
  4. Create and configure a search site in our Admin Center. Note: you’ll give your site a display name and a site handle, and you’ll need to enter the site handle in the search box form code on your website.
  5. Preview your search results once the indexing is complete.
  6. Put finishing touches on your search site in the Admin Center - brand your results page, etc.

On the Federalist side of things, part 2

  1. Add your site handle from the Admin Center to the _config.yml file in your Federalist repo, on the searchgov_affiliate line.
  2. Include the _includes/searchgov/form.html search box in your <header> include.
  3. If you would like type-ahead search suggestions to appear in your website’s search box, add include the _includes/searchgov/script.html block in your <footer> include.