Posts tagged manage-content

Robots.txt Files

A /robots.txt file is a text file that instructs automated web bots on how to crawl and/or index a website. Web teams use them to provide information about what site directories should or should not be crawled, how quickly content should be accessed, and which bots are welcome on the site.

What should my robots.txt file look like?

Please refer to the robots.txt protocol  (External link) for detailed information on how and where to create your robots.txt. Key points to keep in mind:

  • The file must be located at the root of the domain, and each subdomain needs its own file.
  • The robots.txt protocol is case sensitive.
  • It’s easy to accidentally block crawling of everything
    • Disallow: / means disallow everything
    • Disallow: means disallow nothing, thus allowing everything
    • Allow: / means allow everything
    • Allow: means allow nothing, thus disallowing everything
  • The instructions in robots.txt are guidance for bots, not binding requirements.

How can I optimize my robots.txt for Search.gov?

Crawl delay

A robots.txt file may specify a “crawl delay” directive for one or more user agents, which tells a bot how quickly it can request pages from a website. For example, a crawl delay of 10 specifies that a crawler should not request a new page more than every 10 seconds. We recommend a crawl-delay of 2 seconds for our usasearch user agent, and setting a higher crawl delay for all other bots. The lower the crawl delay, the faster Search.gov will be able to index your site. In the robots.txt file, it would look like this:

User-agent: usasearch  
Crawl-delay: 2

User-agent: *
Crawl-delay: 10

XML Sitemaps

Your robots.txt file should also list one or more of your XML sitemaps. For example:

Sitemap: https://www.exampleagency.gov/sitemap.xml
Sitemap: https://www.exampleagency.gov/independent-subsection-sitemap.xml
  • Only list sitemaps for the domain matching where the robots.txt file is. A different subdomain’s sitemap should be listed on that subdomain’s robots.txt.

Allow only the content that you want searchable

We recommend disallowing any directories or files that should not be searchable. For example:

Disallow: /archive/
Disallow: /news-1997/
Disallow: /reports/duplicative-page.html
  • Note that if you disallow a directory after it’s been indexed by a search engine, this may not trigger a removal of that content from the index. You’ll need to go into the search engine’s webmaster tools to request removal.
  • Also note that search engines may index individual pages within a disallowed folder if the search engine learns about the URL from a non-crawl method, like a link from another site or your sitemap. To ensure a given page is not searchable, set a robots meta tag on that page.

Customize settings for different bots

You can set different permissions for different bots. For example, if you want us to index your archived content but don’t want Google or Bing to index it, you can specify that:

User-agent: usasearch  
Crawl-delay: 2
Allow: /archive/

User-agent: *
Crawl-delay: 10
Disallow: /archive/

Robots.txt checklist

1. A robots.txt file has been created in the site’s root directory (https://exampleagency.gov/robots.txt)

2. The robots.txt file disallows any directories and files that automated bots should not crawl

3. The robots.txt file lists one or more XML sitemaps

4. The robots.txt file format has been validated  (External link)

Additional Resources

Yoast SEO’s Ultimate Guide to Robots.txt  (External link)

Google’s “Learn about robots.txt files”  (External link)


Page last reviewed or updated:

How to Boost Certain Results using Best Bets

Search.gov Home > Admin Center > YourSite > Manage Content > Best Bets: Text or Graphics

Do you want to promote a specific page or resource? Create a Text Best Bet.

Do you want to promote a set of pages or resources? We recommend you create a Graphics Best Bet if you have more than two recommendations on a given topic.

Best Bets appear at the top of the results page when a searcher’s query matches the text of its title, description, or keywords.

Best Bets: Text

Text best bets have the same look as standard web results. They’re listed under the heading, Recommended by YourSite.

See the sample results page below that shows a text best bet displayed on TSA.gov for a search on razors.

Text best bets for 'razors' on TSA.gov

Add an Individual Text Best Bet

URL. Add the URL of the web page that you want to promote. Make sure the URL is properly formatted, and includes the protocol (either http:// or https://).

Title and Description. Add the title and description of the web page that you want to promote. Each field can have up to 255 characters. Titles and descriptions are visible to searchers.

Status and Publish Dates. By default, newly created Best Bets are Active. If you don’t want your Best Bet to display, set it to Inactive. The default start date is the day on which you create the best bet. The default end date is null, so it will stay up forever until you decide to take it down. You can opt to specify other start and end dates using the date pickers.

Match Keywords Only? You have the option of having your Best Bets display only for exact keyword matches, and not for title or discription matches. If you select this option, be sure to include all query terms you want the Best Bet to display for, including terms you have used in the title and description.

Keywords. Keywords are optional and they’re not visible to searchers. Add specific words or phrases that aren’t already included in the visible title or description. Common keywords include synonyms, acronyms, compound words, plural variations, misspellings, slang, or other variants. Enter each keyword (word or phrase up to 255 characters) in a separate field. Use your search Analytics to inform your keyword lists. Keywords are not case sensitive, but are exact matches.

Add Multiple Best Bets via Bulk Upload

Create a comma-separated file with the following fields (in this order). Download our sample template for uploading best bets [CSV] to see the correct format.

Title, URL, Description, StartDate, EndDate, Keywords, Match_Keywords_Only, Status

  • Required fields:
    • Title
    • URL. Make sure the URL is properly formatted, and includes the protocol (either http:// or https://).
    • Description
  • Optional fields:
    • Start date
    • End date
    • Keywords
    • Match_Keywords_Only. Enter a 1 in this column if you want the Best Bet to respond only to the query terms and phrases you’ve specified in this column. Note that selecting this option means you need to list all terms or phrases you want the Best Bet to respond to.
    • Status. If you leave this column blank, the Best Bet will default to Active and will display to users. Enter 0 to set the Best Bet to Inactive.

Save the file with a .csv extension and upload it.

Bulk upload updates existing Best Bets (matching on the URL field) and adds new Best Bets. To turn off Best Bets, either use the Remove button in the Admin Center Best Bets list view, or set the Best Bet to Inactive in the edit view.

Best Bets: Graphics

Recommended items are displayed in two columns, and you have the option of including an image. We show the most relevant Graphics Best Bet for the query, with the heading, Recommended by YourSite.

See the sample results page below that shows a graphics best bet displayed on USGS.gov for a search on tsunamis.

Graphics best bet highlighting tsunami links on USGS.gov

See also

  • Two columns with a collection of links and an image displayed on USA.gov for a search on wildfires.
  • Two columns with a collection of links only displayed on USA.gov for a search on housing.
  • A single link to a specific web page and an image displayed on WhiteHouse.gov for a search on jobs.

Add a Graphics Best Bet

Title. Add the title (up to 255 characters) of the web page or collection of web pages that you want to promote. The title is visible to searchers.

Title URL. Add the URL of the primary web page that you want to promote. This field is optional.

Status and Publish Dates. By default, newly created Best Bets are Active. If you don’t want your Best Bet to display, set it to Inactive. The default start date is the day on which you create the best bet. The default end date is null, so it will stay up forever until you decide to take it down. You can opt to specify other start and end dates using the date pickers.

Image. You can opt to add an image. The file should be a GIF, JPG or PNG with a maximum size of 512 KB. The system will resize your image to fit.

Match Keywords Only? You have the option of having your Best Bets display only for exact keyword matches, and not for title and link title matches. If you select this option, be sure to include all query terms you want the Best Bet to display for, including terms you have used in the title and description.

Keywords. Keywords are optional and they’re not visible to searchers. Add specific words or phrases that aren’t already included in the visible title or link titles. Common keywords include synonyms, acronyms, compound words, plural variations, misspellings, slang, or other variants. Enter each keyword (word or phrase up to 255 characters) in a separate field. Use your search Analytics to inform your keyword lists. Keywords are not case sensitive, but are exact matches.

Links. Enter a title and URL for each link. There is no limit on the number of links. Use the list icons (“hamburger buttons”) on the left to rearrange the display order of the links. The two columns populate by rows, so if you have three links, you would have two links in the top row, and one link in the left column of the second row. The link titles are visible to searchers.

When Searchers See Your Best Bets: Graphics

For searchers to see a best bet on your site, it must match their query and be relevant and active.

It Matches Their Query

Searchers see your best bets when their query:

  • Matches any or all words in the title, description, or link titles, or
  • Matches a keyword exactly.

Matches are made within, but not across, fields.

A sample graphics best bet entry is below.

Title: Estate Tax  
Link title 1: Transfer Property After You Pass Away  
Link title 2: Estate Tax Rights  
Link title 3: Tax Rates  
Keyword 1: death tax  
Keyword 2: inheritance tax  
Keyword 3: fair market value  
Keyword 4: market value  

This best bet would display for searches on estate tax (exact title match), estate (partial title match), tax on the estate (title match that includes stopwords), estate taxes (title match for singular/plural variant), property tax (partial link title match), propertey tax (link title match with a slight misspelling), and death tax (exact keyword match), among other queries.

It would not display for searches on death, death property, taxes after death, fair value, as keywords matches must be exact, and these queries are only partial keyword matches. It also would not display for estate property tax, as this is a partial match across multiple fields.

It Is Very Relevant

After we determine which best bets match the searcher’s query, we rank their relevance and display only the most relevant. We’ll display up to two text best bets, and up to one graphics best bet. Default relevance is determined first by title, then by description (text best bets) or link titles (graphics best bets), and lastly by keywords. Date is used as a tiebreaker if the entries’ scores based on the above three factors are equal. We display the newer Best Bet.

If you’ve selected Match Keywords Only, then only keywords are used to determine relevance.

It Is Active

We use color coding to indicate each entry’s status.

Color Status
Green    Active         
Yellow Inactive
   

Active best bets are shown to searchers on your site. Inactive entries aren’t shown to searchers because they’re inactive, expired (by the publish end date), or both.


Watch the recording of our February 2015 webinar Straight to the Top: Best Bets in DigitalGov Search (55 mins)

Did you know? Use the Search Page Alert feature to add a text message to your search results page, which will appear at all times above all search results, regardless of the query.

Did you know? Analyze the number of impressions and clicks and clickthru rate for each best bet on the Monthly Reports page. Use the data to inform your titles, descriptions, and keywords and your decision to deactivate or delete an entry.

Did you know? When you use the sitelimit parameter to scope the search to a subsection of your website, we’ll apply this sitelimit to your Best Bets so searchers see recommended pages from within that folder or subdomain only.

Getting Started with i14y

Search.gov Home > APIs for Developers > i14y API

Important Note: April, 2018 - For new implementations, the Search.gov team recommends you index your content with us not using the i14y API, but rather by publishing a comprehensive xml sitemap, which we can use to index your content. Read more.

i14y Github repo (External link)

Technical Documentation

What is i14y?

i14y is a content indexing API that allows you to send your content directly to our indexes. When your site is enabled for i14y, it gives you complete control over what searchers see. No commercial web results will be served: instead, searchers will see content exactly as you’ve sent it to us.

How Do I Use i14y?

We currently have a Drupal module (External link) that will hook your CMS into i14y. You can also check out our help docs for working with the Drupal module.

If you use a different CMS, or don’t have one at all, we recommend you focus on publishing a comprehensive xml sitemap, which we would leverage to index your content.

Important Note: i14y does not visit your content to do full-text scraping of your content. For new implementations, the Search.gov team recommends you index your content with us not using the i14y API, but rather by publishing a comprehensive xml sitemap, which we can use to index your content. Read more.

Checklist to Go Live with i14y

Step 1. Connect one of your sites to i14y

i14y needs to be enabled in the back end for a site to be able to receive content through it. You can

  • Add a brand new search site,
  • Use the Clone Site tool to copy one of your existing site’s settings to a new site (YourSite > Dashboard > Clone Site), or
  • use an existing site.

Email us and let us know which site you want enabled for i14y. Please note: once we turn on i14y for a site, we no longer serve search results from a commercial web index. Therefore, we recommend that you set up a test site for i14y, so that your customers will continue to get results until you are fully set up and ready to move your i14y index to your production search site.

Step 2. Add a Drawer

An i14y Drawer is an index receiving content via the i14y API. We’ve called them Drawers because, like drawers in a filing cabinet, multiple indexes can be included in a single configuration to scope the entirety of a site’s search.

After we have enabled i14y for your search site, a new page will appear: Admin Center > YourSite > Manage Content > i14y Drawers.

Click Add i14y Drawer in the upper right corner. Create a handle for your drawer - the handle must be all lowercase alphanumeric, all one word, and can include underscores but no other special characters (e.g., agency_drawer_handle2).

The Drawer Description is optional.

If you have more than one website or domain that will be sending content, add a separate Drawer for each of these sites. We also recommend setting up separate drawers for staging or test content, so you can easily remove that content from your search site when you are ready to go live.

Step 3. Fill your Drawer

After you have created your Drawer(s), click Show from the i14y Drawers list to find that Drawer’s secret API key. Use this secret key with your drawer handle in your API call or your CMS module to send your content to the right place.

You are now ready to add content to your drawer. You can do this in two ways:

  • Use i14y directly. View our Github repo (External link) or Technical Documentation for more information.
  • Use a module plugged into your CMS. At this time there is a Drupal module (External link) available. Help docs for the module are here.
  • Important Note: i14y does not visit your content to do full-text scraping of your content. For new implementations, the Search.gov team recommends you index your content with us not using the i14y API, but rather by publishing a comprehensive xml sitemap, which we can use to index your content. Read more.

Step 4. Review your index

You can view the number of documents indexed for each drawer on the main i14y Drawers list, and when the most recent document was received. Click Show to view documents within a particular drawer. We display the most recent 1,000 items that were sent to the drawer. You can also search for keywords in the documents’ text, titles, and descriptions (Note: you cannot currently search for URLs).

Note: We use the Domains section to scope search results - if the domain(s) of your i14y content are not listed in the Domains section, that content will not appear on your search results page.

We send success and / or failure codes in response to your API call, so if the number of documents in our index doesn’t match what you sent, check those response codes.

If you experience difficulty sending documents to i14y, it is possible your firewall is not letting you communicate with the i14y server. Check out our cURL test commands or view the full i14y documentation.

We can attach each i14y drawer to multiple search configurations: if you have a drawer that you’d like to use for multiple search sites, email us.

Caution: A pop-up message will appear when you hit Remove on a drawer: please review this pop-up message carefully. If you remove a drawer that is only associated with one search configuration, the drawer and its contents will be deleted from our system. If the drawer is attached to multiple search configurations, it will only be removed from the search configuration you are currently on. The pop-up message will indicate what type of drawer you have.

If you accidentally delete a drawer, you will need to set-up a new drawer and resend the content - we are unable to retrieve deleted drawers. If you accidentally remove a shared drawer but it is still associated with other search configurations, we can re-attach it to your site. Contact us for assistance.

Once you have your index populated, you will set up the rest of your search as you would for a traditional Search.gov site:

If you have any RSS content that will not be sent to your i14y drawer, you can add those feeds as well.

Update your website’s search box form code to point to affiliate=youri14yenabledsitehandle.

Terms of Use

By accessing the i14y API, you agree to USA.gov’s Terms of Service for Developer Resources.


Did you know? i14y is hacker shorthand for “interoperability”, because there are 14 characters between the first and last letters. i14y can also be shorthand for Independence Day.

cURL Commands for i14y Testing

Important Note: April, 2018. i14y does not visit your content to do full-text scraping of your content. For new implementations, the Search.gov team recommends you index your content with us not using the i14y API, but rather by publishing a comprehensive xml sitemap, which we can use to index your content. Read more.

If you experience difficulty sending documents to our index via i14y, it is possible your firewall is not letting you communicate with the i14y server. Adding a test document to your i14y drawer can help you diagnose a firewall issue. Windows users may need to install cURL (External link) in order to run this test from the command line.

Adding the Test Document

From the command line, enter:

curl "https://i14y.usa.gov/api/v1/documents" -XPOST -H "Content-Type:application/json" -u your_drawer_handle:your_secret_token -d '{"document_id":"1", "title":"Test Document", "path": "http://www.gov.gov/cms/doc1.html", "created": "2015-05-12T22:35:09Z", "description":"The summary of the document goes here.", "content":"This is placeholder text, and in a real document would be paragraphs long.", "promote": false, "language" : "en", "tags" : "tag1, another tag"}'

Note: you need to replace your_drawer_handle with your i14y drawer handle, found in the i14y drawers section in the Search Admin Center (Search.gov Home > Admin Center > YourSite > Manage Content > i14y Drawers). You will also need to replace your_secret_token with the drawer’s secret token, which can be found by hitting “Show” in the 14y drawer list. The drawer handle and token should be separated by a colon (:) with no spaces on either side of the colon.

The above command returns JSON structured like this:

{
"status":200,
"developer_message":"OK",
"user_message":"Your document was successfully created."
}

If you do not see a 200 status, contact our team.

After successfully sending a document, you should see an increase (by 1) in the number of documents in your i14y drawer.

Removing the Test Document

Once you have successfully added the test document to your drawer, you will need to delete it, or it will appear in your site’s search results.

From the command line, enter:

curl "https://i14y.usa.gov/api/v1/documents/1" -XDELETE -u your_drawer_handle:your_secret_token

The above command returns JSON structured like this:

{
"status":200,
"developer_message":"OK",
"user_message":"Your document was successfully deleted."
}

Resources: Read tips on Getting Started With i14y, or view the full i14y technical documentation.

Route Queries to a Specific Page

Search.gov Home > Admin Center > YourSite > Manage Content > Routed Queries

Do you want to get searchers to a specific web page as quickly as possible? Create a Routed Query.

A routed query skips the search results page and automatically directs visitors to a web page of your choice for very specific queries. Use query routing to save visitors the extra step of reading through search results links by taking them directly to your content pages.

We recommend creating a routed query for top tasks that have a good content page but less-than-ideal search results.

Add a Routed Query

Routed Query URL. Add the URL of the web page that you want to direct visitors to.

Routed Query Description. Add a brief description to help you remember why you created this entry and what it does. Descriptions aren’t used for indexing or visible to searchers.

Keywords. Add the specific words or phrases used to trigger the routing. Searchers will only be directed to the URL above when their query term exactly matches one of the listed keywords. Common keywords include synonyms, acronyms, compound words, misspellings, slang, or other variants. Enter each keyword (word or phrase up to 255 characters) in a separate field.

Note: Any keyword that you add to a Routed Query will become a permanent type-ahead suggestion. This applies to all 3 ways that type-ahead suggestions are displayed from our system: the module that can be turned on in the Display Overview section, the JavaScript snippet, and the API. If you do not want certain keywords to appear as type-ahead suggestions, email us.

Examples of How It Works

Private industry has been using query routing for some time. If you go to Home Depot (External link) and search for a general term like carpet, you’re routed to their carpet navigation page. If you search for a more specific term like vanities, you get standard search results.

Using USA.gov an as example, every time someone goes to USA.gov and searches for any of the following terms (must be exact matches), they’ll automatically be directed to the USA.gov Unclaimed Money from the Government page.

  • missing money
  • unclaimed assets
  • unclaimed funds
  • unclaimed money
  • unclaimed money in my name
  • unclaimed property

If they get routed to the Unclaimed Money from the Government and search again for one of these terms, they’ll get the standard list of search results. We won’t take people in an endless loop.

If they search for something not on the above list, like show me missing money, they’ll still get the normal search results.

Standard search results for 'I am looking for unclaimed money' on USA.gov

How to Add Your Instagram Pictures to Our Index

Search.gov Home > Admin Center > YourSite > Manage Content > Instagram

ALERT: Instagram now requires accounts to grant permission to index their images via an integration between systems. Since this integration will not be possible for Search.gov in the foreseeable future, we’re exploring other options. Our Instagram index was last updated in June 2016. Any images in our index prior to that date will continue to be shown on your search results page, as long as you do not remove your Instagram account from the Admin Center. If you remove your account, any photos in our index will be permanently deleted from our system. This help manual page is for historical reference only.

Tell Us About Your Instagram Account

Provide us with the username for your Instagram Account.

When you’re logged into Instagram, you can see your username at https://instagram.com/accounts/login/?next=/accounts/edit/ (External link).

For example, the Instagram username for the Department of Labor is usdol (External link).

Opt to Display Your Instagram Pictures

When you add the username for your Instagram account, we’ll automatically index all of the pictures in your account.

On the image results page, we’ll display the pictures from your Instagram account by default. If you’d like to backfill them with the standard image results from your website, email us and we’ll turn on your web images for you.

See the sample results page below that shows image results displayed on DOL.gov for a search on minimum wage.

Image results from Flickr & Instagram


Did you know? Do you have a multimedia gallery on your website for your agency’s photos, images, videos, podcasts, or other multimedia content? Do you use a content management system, database, or media RSS (MRSS) feed to power this gallery? You can index MRSS feeds so that your multimedia content is automatically included in your search results.

Did you know? You can also tell us about your Flickr photostream. Note that searchers see interspersed results from both Flickr and Instagram. If you have a lot of duplicate images in the two services, consider listing only one in the Admin Center.

Troubleshooting tip: Flickr and Instagram results appear on the newly redesigned results page only. Email us at search@support.digitalgov.gov if you’re ready to turn on the new results page.

Filtering Tags

Search.gov Home > Admin Center > YourSite > Manage Content > Domains > Advanced > Filter Tags

Note: this feature is only available to sites that are enabled for i14y.

Tags are used in the full text searching of i14y documents. If a tag is added to a document, the document will appear in search results when a query matches the tag, even if the term does not appear in the document’s full text.

Use the Filter Tags feature to scope your search site based on tags. You can exclude documents that have a given tag from all search results. Or, you can require that a tag is present in order for a document to be shown.

For example:

  • You can have a “News Search” site that will only show results that have the tag “news”.
  • You can exclude all documents that have the “blog” tag from your search results, if you don’t want them to appear.

Add Your RSS Feeds to Our Index

Search.gov Home > Admin Center > YourSite > Manage Content > RSS

Would you like searchers to be able to search your news or multimedia content? Would you like them to be able to narrow results by date?

Tell us the locations and names of your RSS feeds. We index all new and updated content on your feeds within minutes.

Step 1. Tell Us About Your RSS Feeds

We love feeds! Tell us about all of your RSS feeds, even if you don’t opt to show them in step 2 below. Feeds are the fastest and most reliable way for us to learn about your new and updated content.

Select the option to add a new RSS feed.

Name

Create a name for the feed. Within this ‘bucket’ you can list a single RSS feed or many feeds. For example, you can opt to list all of your agency’s press releases separately (such as Police News, Fire News, EMS News, etc.) or you can list all three under one general name (such as News).

Feed type

Select the type of feed. The default option is RSS (for text content like press releases or blogs). Change it to media RSS (External link) for multimedia content like images and videos.

Each item within a RSS feed must include a title, link, description, and publication date.

Each item within a media RSS feed must also include both a media:content URL (External link) to specify a direct URL to the media object and a media:thumbnail URL (External link) to specify a URL to the object’s thumbnail.

Any items missing a required element won’t display in your search results.

Sample RSS Item with All Required Elements

<item>
<title>
Statement from Agriculture Secretary Tom Vilsack Regarding World Organization for Animal Health (OIE) Upgrade of United States&apos; BSE Risk Status
</title>
<pubDate>May 29, 2013 00:00:00 CDT</pubDate>
<link>
http://www.usda.gov/wps/portal/usda/usdahome?contentid=2013/05/0106.xml&contentidonly=true
</link>
<description>
WASHINGTON, May 29, 2013–Agriculture Secretary Tom Vilsack made the following statement about notification received today from the World Organization for Animal Health (OIE) upgrading the United States&apos; risk classification for bovine spongiform encephalopathy (BSE) to negligible risk:
</description>
</item>

Sample Media RSS Item with All Required Elements

<item>
<title type="html">
<![CDATA[ Great Lakes Beach Health ]]>
</title>
<link>
http://gallery.usgs.gov/photos/05_24_2013_gkb4Erq11X_05_24_2013_0
</link>
<guid>
http://gallery.usgs.gov/photos/05_24_2013_gkb4Erq11X_05_24_2013_0
</guid>
<pubDate>Fri, 24 May 2013 00:00:00 EDT</pubDate>
<media:description type="html">
<![CDATA[ As schools close for the year and summer weather beckons, many recreationalists head to the Great Lakes' public beaches. However, these coastal areas can become 	contaminated with disease-causing bacteria that threaten public health, disrupt water 	recreation, and pay a toll on the Great Lakes economies that depend on summer tourism. ]]>
</media:description>
<media:thumbnail url="http://gallery.usgs.gov/images/05_24_2013/gkb4Erq11X_05_24_2013/thumbs/CoastalEco_KPrzybyla_kelly18.JPG"/>
<media:content type="" url="http://gallery.usgs.gov/images/05_24_2013/gkb4Erq11X_05_24_2013/large/CoastalEco_KPrzybyla_kelly18.JPG"/>
</item>

Step 2. Opt to Show as a Facet, Inline Module, or Both

Allow searchers to see inline results for recent, relevant RSS results across all of your RSS feeds by turning on the News module on the Display Overview page. When a searcher’s query matches the title of an RSS article published within the past four months, the article appears in the News module. Very recent news results (less than five days) appear at the top of the page and less recent news results appear at the bottom. Up to three articles are displayed. You can edit the default module title, News, on the Display Overview page.

Allow searchers to narrow results to a specific feed by turning on the option to show the facet on the Display Overview page. To ensure searchers don’t encounter too many dead ends, we recommend showing only feeds with a significant amount of content as a facet.

Step 3. Check Your Search Results Page

Module

If you opted to show your RSS feed(s) as a module, searchers will see inline news results. It will highlight the three most recent, relevant results across all of your RSS feeds.

RSS Inline Module Example

Facet

If you opted to show your RSS feed(s) as a facet, searchers can narrow their results to see only RSS-based content by clicking on the facet.

Within the RSS-based results, searchers can opt to limit results to the last hour, day, week, month, or year, or they can set a custom date range. They also can sort results in descending order by relevance (best match first) or date (most recent first).

Results count

We show the number of results returned for searches against your feed(s).

RSS Facet Example with Results Count

Step 4. Check the Status of Your Feeds

We use color coding to indicate each feed’s status.

RSS status messages and colors

Color Status
Green    Feed indexed, no errors      
Yellow Pending Indexing
Red Feed indexed, but error in one or more items
   

Click on the name of any feed with an error to see more detailed information about it. Possible error messages follow.

  • 404 Not Found
  • Feed looks empty
  • Description can’t be blank
  • Title can’t be blank
  • Missing link field
  • Missing pubDate field
  • Link is not a valid URL
  • Linked URL does not exist (HTTP 404)

Troubleshooting tip: We support RSS 2.0 and Atom feeds. Learn more and validate your feeds at:

Troubleshooting tip: We index the content on your RSS feeds from the time you input them in the Admin Center. To backfill historical content, temporarily modify your RSS feeds to return more results. Leave this larger feed in place for one hour. You can do this during off-hours and you don’t need to coordinate with us.

Did you know? You can set up a search box on your website that limits results to your feed.

Start with the standard form snippet on the Code Snippets page under the Activate Search tab. Change the form action to action="https://search.usa.gov/search/news and add the following line to limit the results to your feed.

<input type="hidden" name="channel" value="###">

The value is the number for your feed ID, which is visible in the URL when you edit your feed in the Admin Center.

Did you know? You can click on the ‘Preview’ option to see the content we have indexed for each of your RSS feeds.

Did you know? When you provide us with your YouTube channel, we’ll automatically index the RSS feed for your YouTube channel.

Did you know? For any feeds that you’ve extended with a contributor, publisher, or subject Dublin Core (External link) property, searchers may narrow results by these facets in the sidebar on the results page.

Working with i14y Drawers

Search.gov Home > Admin Center > YourSite > Manage Content > i14y Drawers

An i14y Drawer is an index receiving content via the i14y API. We’ve called them Drawers because, like drawers in a filing cabinet, multiple indexes can be included in a single configuration to scope the entirety of a site’s search.

Information on working with our Drupal module is here.

Step 1. Make sure your site is connected to i14y

If your site Admin Center has the following page: Admin Center > YourSite > Manage Content > i14y Drawers, your site is connected to i14y. If you don’t see this page in the Admin Center, email us. You can also read about Getting Started with i14y.

Step 2. Add one or more Drawers

On Admin Center > YourSite > Manage Content > i14y Drawers, Click Add i14y Drawer in the upper right corner. If you have more than one website that will be sending content, add a separate Drawer for each of these sites.

Handle

Let us know what the handle for your drawer should be - the handle must be all lowercase alphanumeric, all one word, and can include underscores but no other special characters (e.g., agency_drawer_handle2).

The Drawer Description is optional

Step 3. Send Content to Fill your Drawer

After you have created your Drawer(s), click Show from the i14y Drawers list to find that Drawer’s secret API key. Use this secret key with your drawer handle in your API call or your CMS module to send your content to the right place.

Step 4. Review your Index

You can view the number of documents indexed for each drawer on the main i14y Drawers list, and when the most recent document was received. Click Show to view documents within a particular drawer. We display the most recent 1,000 items that were sent to the drawer. You can also search for keywords in the documents’ text, titles, and descriptions (Note: you cannot currently search for URLs).

Note: We use the Domains section to scope search results - if the domain(s) of your i14y content are not listed in the Domains section, that content will not appear on your search results page.

We send success and / or failure codes in response to your API call, so if the number of documents in our index doesn’t match what you sent, check those response codes.

Note: If you experience difficulty sending documents to i14y, it is possible your firewall is not letting you communicate with the i14y server. Check out our cURL test commands or view the full i14y documentation.


Caution: A pop-up message will appear when you hit Remove on a drawer: please review this pop-up message carefully. If you remove a drawer that is only associated with one search configuration, the drawer and its contents will be deleted from our system. If the drawer is attached to multiple search configurations, it will only be removed from the search configuration you are currently on. The pop-up message will indicate what type of drawer you have.

If you accidentally delete a drawer, you will need to set-up a new drawer and resend the content - we are unable to retrieve deleted drawers. If you accidentally remove a shared drawer but it is still associated with other search configurations, we can re-attach it to your site. Contact us for assistance.


Did you know? We can attach each i14y drawer to multiple search configurations: if you have a drawer that you’d like to use for multiple search sites, email us.

Adding Supplemental Content to Your Search Configuration

Search.gov Home > Admin Center > YourSite > Manage Content > Domains > Advanced

Is content missing from your search results? Do you want to have RSS content appear in your main page search results?

We offer two ways for you to tell us about content that you want us to fetch and include in our web index: via RSS feed (Supplemental Feed) or manually (via the Supplemental URLs section).

Note for customers using Collections: We serve Collections results from commercial indexes; the instructions below only apply to your main search results.

Note to our i14y customers: Supplemental content will not appear for customers using our i14y content indexing API.

Adding Content via Supplemental Feed

You can use an RSS feed to add URLs. The feed is useful if you’d like to automate the process, add multiple URLs, or both.

Enter the URL of your RSS feed. Click Submit. We’ll fetch each URL in your feed and index the title, description (optional), and the full text of the document/webpage for the items you provide. Please be sure to follow our schema, and note that we will only index the items that are listed in the feed.

You can see the list of URLs we have indexed from your feed by viewing the Supplemental URLs section.

Delete a URL added via RSS feed by deleting the item from your RSS feed. We’ll remove the URL from our index the next time we fetch your RSS feed (we fetch saved feeds once per day, in the evening in Eastern Time; for brand new feeds, we fetch as soon as you hit Submit on the feed URL).

You can also delete all of the URLs added via the RSS feed by deleting the RSS feed itself.

Note: We can’t restore URLs that don’t exist within your current feed. Please don’t remove items from the feed unless you want them to be removed from the index. You can add a feed that contains up to 1,000 items on your own. If you have a feed that will include more than 1,000 items, please contact us.

Adding Content via the Supplemental URLs Section

You can also manually add a specific URL on the Supplemental URLs page. We’ll fetch each URL you add manually and we’ll index the title, description, and the full text of the document/webpage for the link you provide. Note: You can’t manually add a Supplemental URL if it has already been added by via Supplemental feed.

Delete a manually added URL by selecting Remove within the list of supplemental URLs.

The source column in the crawl report shows how you added the URL, via an RSS feed or manually.

How Supplemental Content Appears on Your Search Results Page

We display titles as they are provided in the feed or as you entered them manually.

We display descriptions as follows:

  • If a searcher’s term matches terms in the description you provided: We’ll display the description exactly as you provided it.
  • If a searcher’s term only matches terms in the full text of the document: We’ll display snippets taken from the full text.

By default, supplemental content will appear after all Bing commercial search results have been shown. For example, a search on DigitalGov.gov for “serverless architecture” displays results from Bing’s index on page 1.

After hitting ‘next’ to view page 2, supplemental content is served.

Page 1:

Commercial Results are Displayed on Page One of the DigitalGov.gov Search Results Page

Page 2:

Supplemental Feed Results are Displayed on Page Two of the DigitalGov.gov Search Results Page

If there are no commercial results for a query but there are supplemental content results, we will display the supplemental content on page one.

Getting Supplemental Content To Always Appear First

If you would like your supplemental content to always appear first in search results, please contact our team. Searchers will first be served any supplemental content results, and are then offered the chance to “search again,” which will lead them to commercial index results.

Please note: if you have any regular RSS feeds set up in your search site, this regular RSS content will also appear first with the supplemental content, prior to commercial results. If content is present in both types of feed, duplicates will appear in your results.

In the example below, a Supplemental URL appears first, and clicking “Try your search again” will lead to a page of Bing results.

A Supplemental URL is Displayed on Page One of the DigitalGo Search Search Results Page

Getting Supplemental Content To Be Your Only Search Results

If you would like your supplemental content to be the only results served from your site, please contact our team. Searchers will not be offered the chance to “search again” on a commercial index after they exhaust your supplemental content results (see above example).

Please note: if you have any regular RSS feeds set up in your search site, this regular RSS content will also appear with the supplemental content. If content is present in both types of feed, duplicates will appear in your results.


Troubleshooting tip: We support RSS 2.0 and Atom feeds. Learn more and validate your feeds at:

Troubleshooting tip: Are you seeing an error message in the crawled URLs report for your PDF that says, “No content found in document”? Your PDF is likely an image-only, non-searchable file that was created from a paper document using a scanner. See the resources below for more information on how to create searchable PDF files.

Troubleshooting tip: Are you seeing an error message in the crawled URLs report that says your page is taking too long to load? Use Pingdom Tools Full Page Test (External link) to test the load time of your page, analyze it, and find the bottlenecks.

Did you know? To help the public find your web pages when they search on Bing.com, we notify Bing about any URLs you add. While this helps with search engine optimization (SEO), it is not a cure-all. You should also register for and use commercial search engines’ webmaster tools.