Posts tagged release-notes

February 2019 Release Notes

Improvements

  • We’ve added Click counts as a ranking factor for customers indexed by Search.gov. We look at the URLs that represent 75% of all clicks on search results, and give those a boost. This is the “fat head” that comes before the “long tail.” As always, following best practices will help results stay relevant:
    • Use Best Bets to promote frequently visited pages that are not bubbling to the top of results on their own
    • Periodically review and update Best Bets
    • Maintain an up-to-date and complete sitemap with updated dates

Fixes, Upgrades, Misc

  • Fixed an issue with the type-ahead feature on customer search boxes
  • Fixed an issue with disappearing search icon on search boxes
  • Continued with Rails upgrade for our applications

Page last reviewed or updated:

January 2019 Release Notes

Improvements

  • After integrating directly with the new USAJOBS API, we worked on additional tuning of trigger words to avoid false positive job-related searches
  • We have improved search performance by caching repeat queries made to our data store
  • We updated our content parser to accept some non-standard HTML tags, and to ignore any content within <nav> and <footer> elements

Fixes, Upgrades, Misc

  • We upgraded Ruby on our search-gov repo
  • We increased the processing power on the servers that support our primary web index
  • We reindexed our primary index into more Elasticsearch shards
  • We decreased the cookie timeout for Admin Center sessions
  • We made the failed password reset alert language more ambiguous, so people will no longer be able to tell whether the email address has an account
  • We fixed a bug in our MRSS photo indexer

Page last reviewed or updated:

December 2018 Release Notes

Improvements

  • We integrated with Bing v7 and transitioned our customers to this newer version.
  • We now will index content on a domain even if the root of that domain lists a different domain as the canonical domain. For example, https://publications.sampleagency.gov may list https://www.sampleagency.gov as the canonical domain, but still serve content from https://publications.sampleagency.gov/reports/first_report.pdf. We can now index https://publications.sampleagency.gov/reports/first_report.pdf`.
  • We updated our job search location feature to show more job openings, and cleaned up how we send job queries to the USAJOBS api to get more results.
  • We now automatically review URLs for reindexing, checking for 404s and 301s. We’re doing this every 30 days to begin with, and will adjust that timeframe as needed.

Fixes

  • We upgraded the Ruby version on the repo for our search.gov website, and asis, our image indexing repo.
  • We upgraded the activejob Ruby gem across repos.

Page last reviewed or updated:

October 2018 Release Notes

Improvements

  • We made several updates to our Chef cookbooks to further harden our operating system, including backend password policies, package configuration, and OS configuration.
  • We shifted our model for supporting domain masks for hosted search results pages to leverage CAA records.

Fixes

  • We fixed a gnarly bug in Elasticsearch that made queries containing very common words, like “the”, behave as if there were no results.

Page last reviewed or updated:

November 2018 Release Notes

Improvements

  • We integrated directly with the new USAJOBS API. This means that we
    • are now querying their system at query time, rather than building an index of their job postings within our own system and querying that at query time.
    • have reconfigured what information our jobs searches include in the full query that we send to USAJOBS
    • have increased the geographic radius we’ll look at when a user searches on a jobs-related term. The radius is now 75 miles from the user’s general location.
    • we are now always providing a link to USAJOBS.gov if someone has searched for a jobs-related term, even if there are no jobs located near the searcher.
  • We now support indexing TXT files. There are more TXT files on government websites than you would have thought!

Fixes

  • We fixed a link in the Jobs module that led searchers to a broken USAJOBS.gov page.
  • We now deduplicate sitemap URLs so we will not try to index the same content more than once.
  • We updated Ruby gems: Loofah, Rack, FFI.
  • We upgraded Ruby.

Page last reviewed or updated:

September 2018 Release Notes

Highlights

  • We began work on using click data in our relevancy ranking, starting with
    • Recording the domain of the clicked-on URL separately, so we can manage all the clicks for a particular domain.
    • Calculating the top N clicked-on URLs for a given domain.

Chores

  • We indexed a lot of content for agencies.
  • We got our new developers set up and ready to work on great stuff.

Bug Fixes

  • None

Page last reviewed or updated:

August 2018 Release Notes

Highlights

  • We began work on leveraging click data in our relevancy scoring. This will allow us to use the relative popularity of pages as a ranking signal.

Chores

  • We now record the domain of a URL that has been clicked in addition to recording the full click. This way we can compare the click volume of URLs within a given domain.
  • We resolved security vulnerabilities in grape & sprockets
  • Configure rspec to run specs in random order

Bug Fixes

  • None

Page last reviewed or updated:

June-July 2018 Release Notes

Highlights

  • We added support for XML sitemaps that are located in non-standard locations within a domain.
  • We added sort_by support to our Results API

Chores

  • We finished migrating to CircleCI for our continuous integration monitoring.
  • We improved our internal tracking of queries to the Bing API.
  • We improved how we handle indexing domains that time out.
  • We began indexing the last-modified date of a page, if provided
  • Our SitemapIndexer now processes one sitemap at a time, and we created an automated queue for indexing jobs and url fetching.
  • We improved the management of Searchgov domain states. Now each Searchgov domain has an “indexing activity”. States might include: indexing sitemaps, fetching new URLs (such as after bulk import), and crawling.
  • We now follow client-side redirects.
  • We improved our ability to avoid certain crawler traps.
  • We now index documents up to 15 MB in size. The previous limit was 10 MB.
  • We finalized our compliance with BOD 18-01.
  • We cleaned up how we handle temp files during indexing.
  • We tidied up our internal errors on indexing jobs, as well as our test suite.

Bug Fixes

  • We fixed a bug that was not showing diacritics properly in non-English searches.

Page last reviewed or updated:

May 2018 Release Notes

Highlights

  • We continue to make good progress towards our indexing system, and continues to be highly focused on the back end of our system. See below for more details.

Chores

  • We created back-end interfaces allowing the Search.gov team to manage indexed domains & urls.
  • We added a delay method to SearchGov Domain, to honor the crawl delay settings in a given site’s robots.txt file.
  • We created a SearchGov Domain Indexer job that will enqueue urls in need of fetching, to allow bulk indexing tasks to be automated without overloading anyone’s servers, and we added support for resque-scheduler to our configuration baseline.
  • We set the sitemap indexer to reject urls from other domains to avoid erroneous attempts to index content from beta sites, old domains, etc.
  • We now check the protocol of a domain, and whether the site is responding to us. We also set our url fetcher to throw an error if the domain is unavailable or blocking our indexer.
  • We re-indexed the searchgov indices.
  • We upgraded mySQL in demo environments, and streamlined the scenario data for our test suite.

Bug Fixes

  • We fixed bug that sent searchers back to page 1 results when changing the time scope in a Collection search.
  • We mitigated SSL certificate problems with some sites.
  • We made our redirection check more strict to avoid filling our database and indexes with domains and web pages that don’t need to be searchable.

Page last reviewed or updated:

April 2018 Release Notes

Highlights

  • We’re making good progress towards our indexing system, but all our work in April was in the back end of our system. See below for more information.

Chores

  • We have updated the jQuery version.
  • We configured our analytics alerts to send emails via SES instead of Mandrill.
  • We upgraded Ruby to version 2.3.7.
  • We computed filename extensions for documents in our primary index.
  • We improved how we handle email bounces for our notifications, and complaints that may come in.

Bug Fixes

  • We fixed and error with our S3 backups for Logstash.

Page last reviewed or updated: