Posts tagged release-notes
- We integrated directly with the new USAJOBS API. This means that we
- are now querying their system at query time, rather than building an index of their job postings within our own system and querying that at query time.
- have reconfigured what information our jobs searches include in the full query that we send to USAJOBS
- have increased the geographic radius we’ll look at when a user searches on a jobs-related term. The radius is now 75 miles from the user’s general location.
- we are now always providing a link to USAJOBS.gov if someone has searched for a jobs-related term, even if there are no jobs located near the searcher.
- We now support indexing TXT files. There are more TXT files on government websites than you would have thought!
- We fixed a link in the Jobs module that led searchers to a broken USAJOBS.gov page.
- We now deduplicate sitemap URLs so we will not try to index the same content more than once.
- We updated Ruby gems: Loofah, Rack, FFI.
- We upgraded Ruby.
- We began work on using click data in our relevancy ranking, starting with
- Recording the domain of the clicked-on URL separately, so we can manage all the clicks for a particular domain.
- Calculating the top N clicked-on URLs for a given domain.
- We indexed a lot of content for agencies.
- We got our new developers set up and ready to work on great stuff.
- We began work on leveraging click data in our relevancy scoring. This will allow us to use the relative popularity of pages as a ranking signal.
- We now record the domain of a URL that has been clicked in addition to recording the full click. This way we can compare the click volume of URLs within a given domain.
- We resolved security vulnerabilities in grape & sprockets
- Configure rspec to run specs in random order
- We added support for XML sitemaps that are located in non-standard locations within a domain.
- We added sort_by support to our Results API
- We finished migrating to CircleCI for our continuous integration monitoring.
- We improved our internal tracking of queries to the Bing API.
- We improved how we handle indexing domains that time out.
- We began indexing the last-modified date of a page, if provided
- Our SitemapIndexer now processes one sitemap at a time, and we created an automated queue for indexing jobs and url fetching.
- We improved the management of Searchgov domain states. Now each Searchgov domain has an “indexing activity”. States might include: indexing sitemaps, fetching new URLs (such as after bulk import), and crawling.
- We now follow client-side redirects.
- We improved our ability to avoid certain crawler traps.
- We now index documents up to 15 MB in size. The previous limit was 10 MB.
- We finalized our compliance with BOD 18-01.
- We cleaned up how we handle temp files during indexing.
- We tidied up our internal errors on indexing jobs, as well as our test suite.
- We fixed a bug that was not showing diacritics properly in non-English searches.
- We continue to make good progress towards our indexing system, and continues to be highly focused on the back end of our system. See below for more details.
- We created back-end interfaces allowing the Search.gov team to manage indexed domains & urls.
- We added a delay method to
SearchGov Domain, to honor the crawl delay settings in a given site’s
- We created a
SearchGov Domain Indexer job that will enqueue urls in need of fetching, to allow bulk indexing tasks to be automated without overloading anyone’s servers, and we added support for
resque-scheduler to our configuration baseline.
- We set the sitemap indexer to reject urls from other domains to avoid erroneous attempts to index content from beta sites, old domains, etc.
- We now check the protocol of a domain, and whether the site is responding to us. We also set our url fetcher to throw an error if the domain is unavailable or blocking our indexer.
- We re-indexed the searchgov indices.
- We upgraded mySQL in demo environments, and streamlined the scenario data for our test suite.
- We fixed bug that sent searchers back to page 1 results when changing the time scope in a Collection search.
- We mitigated SSL certificate problems with some sites.
- We made our redirection check more strict to avoid filling our database and indexes with domains and web pages that don’t need to be searchable.
- We’re making good progress towards our indexing system, but all our work in April was in the back end of our system. See below for more information.
- We have updated the jQuery version.
- We configured our analytics alerts to send emails via SES instead of Mandrill.
- We upgraded Ruby to version 2.3.7.
- We computed filename extensions for documents in our primary index.
- We improved how we handle email bounces for our notifications, and complaints that may come in.
- We fixed and error with our S3 backups for Logstash.
- We have updated how we create and update the search.gov sitemap.
- We transitioned our i14y repo to CircleCI.
- We increased storage volumes used by Elasticsearch 5.6.4.
- We open-sourced the usasearch application, which is now known as search-gov.
- We are now sending our email notifications via AWS Simple Email Service (SES).
- We setup a metrics dashboard for SES emails.
- We removed all Rspec/VCR dependencies on actual private keys.
- We removed the link to Search.gov website from the “Powered by Search.gov” logo that appears at the bottom of our hosted SERPs.
- Our new indexing system has been in production since December! In February, we continued to release several features that are building on and improving our new system:
- We have made our indexing rake task more efficient.
- We began implementing indexing from XML sitemaps. Check out our new XML Sitemaps page to learn more about them.
- We made our title parsing more flexible, drawing from either
<og:title> tags, depending on where a site’s quality title metadata may be.
- We upgraded Rails for our core application
- We switched our highlighting of terms in search results to a much lighter method.
- We now have all apps enforcing HTTPs and are compliant with BOD 18-01.
- We removed abandoned i14y drawers.
- We improved how our indexing job updates records for existing pages.
- We fixed an RSS feeds fetch issue.
- We have fixed the bug causing search sites to have access to the Best Bets of other affiliates.
- We disabled four-byte UTF-8 characters in Twitter results, and now support eight-byte characters. That’s right - emojis in your search results if they are used in your tweets.
- Our new indexing system has been in production since December! In January, we released several features that are building on and improving our new system:
- We updated our index with new stemming settings. Stemming refers to how a system processes related words, based on the root of the words. For example, stemming is what allows a search for “renew passport” to show results for “passport renewal”.
- URLs in our index can be permanently deleted from our system.
- Documents in our index are now limited to 10 MB in size.
- We can extract body text from a document if the
<main> element is empty.
- The Instagram section is no longer displayed in the Admin Center dashboard, unless you had an Instagram account added to your search site prior to June 2016. At that time, Instagram began requiring accounts to grant permission to index their images via an integration between systems, which Search.gov cannot support. Therefore, our Instagram index was last updated in June 2016. Any images in our index prior to that date will continue to be shown on your search results page, as long as you do not remove your Instagram account from the Admin Center. If you remove your account, any photos in our index will be permanently deleted from our system.
- Drilldown tables and graphs are no longer available in the Monthly Reports section. According to our analytics, the tables and graphs were not being viewed. We anticipate rolling out new analytics viewing options later in 2018.
- Our new indexing system is now in production! Our team has been hard at work on this effort since June, and we are thrilled to reach this exciting milestone. In December, we released several features that helped us cross the first phase finish line:
- Our new system is live with an updated version of ElasticSearch.
- Our new system takes into consideration a “Promote” value when determining relevancy. “Promote” is a true/false value and is optional.
- Our technical lead improved the way the Loofah scraper gets HTML documents into our system. We work in the open as much as possible, and this minor change helped fix a large bug in the Loofah core code.
- The endpoint for our Jobs API was updated on December 7th. This change puts the hostname under Search.gov’s DNS zone; previously, it was hosted in another part of our division. This code change only affected agencies that are directly calling our open source API. If you are only using our Jobs Module on your hosted search results page, you did not need to take any action.
- We updated Rails on our Jobs API.
- We updated Ruby on our main application.
- We transitioned away from using UserVoice to collect feedback from our customers. Instead, you can submit feedback via Google form or by emailing us. Take a moment to review the feedback we’ve already received.