May 2018 Release Notes
- We continue to make good progress towards our indexing system, and continues to be highly focused on the back end of our system. See below for more details.
- We created back-end interfaces allowing the Search.gov team to manage indexed domains & urls.
- We added a delay method to
SearchGov Domain, to honor the crawl delay settings in a given site’s
- We created a
SearchGov Domain Indexerjob that will enqueue urls in need of fetching, to allow bulk indexing tasks to be automated without overloading anyone’s servers, and we added support for
resque-schedulerto our configuration baseline.
- We set the sitemap indexer to reject urls from other domains to avoid erroneous attempts to index content from beta sites, old domains, etc.
- We now check the protocol of a domain, and whether the site is responding to us. We also set our url fetcher to throw an error if the domain is unavailable or blocking our indexer.
- We re-indexed the searchgov indices.
- We upgraded mySQL in demo environments, and streamlined the scenario data for our test suite.
- We fixed bug that sent searchers back to page 1 results when changing the time scope in a Collection search.
- We mitigated SSL certificate problems with some sites.
- We made our redirection check more strict to avoid filling our database and indexes with domains and web pages that don’t need to be searchable.
June 20, 2018