Posts tagged release-notes
- We began work on using click data in our relevancy ranking, starting with
- Recording the domain of the clicked-on URL separately, so we can manage all the clicks for a particular domain.
- Calculating the top N clicked-on URLs for a given domain.
- We indexed a lot of content for agencies.
- We got our new developers set up and ready to work on great stuff.
- We began work on leveraging click data in our relevancy scoring. This will allow us to use the relative popularity of pages as a ranking signal.
- We now record the domain of a URL that has been clicked in addition to recording the full click. This way we can compare the click volume of URLs within a given domain.
- We resolved security vulnerabilities in grape & sprockets
- Configure rspec to run specs in random order
- We added support for XML sitemaps that are located in non-standard locations within a domain.
- We added sort_by support to our Results API
- We finished migrating to CircleCI for our continuous integration monitoring.
- We improved our internal tracking of queries to the Bing API.
- We improved how we handle indexing domains that time out.
- We began indexing the last-modified date of a page, if provided
- Our SitemapIndexer now processes one sitemap at a time, and we created an automated queue for indexing jobs and url fetching.
- We improved the management of Searchgov domain states. Now each Searchgov domain has an “indexing activity”. States might include: indexing sitemaps, fetching new URLs (such as after bulk import), and crawling.
- We now follow client-side redirects.
- We improved our ability to avoid certain crawler traps.
- We now index documents up to 15 MB in size. The previous limit was 10 MB.
- We finalized our compliance with BOD 18-01.
- We cleaned up how we handle temp files during indexing.
- We tidied up our internal errors on indexing jobs, as well as our test suite.
- We fixed a bug that was not showing diacritics properly in non-English searches.
- We continue to make good progress towards our indexing system, and continues to be highly focused on the back end of our system. See below for more details.
- We created back-end interfaces allowing the Search.gov team to manage indexed domains & urls.
- We added a delay method to
SearchGov Domain, to honor the crawl delay settings in a given site’s
- We created a
SearchGov Domain Indexer job that will enqueue urls in need of fetching, to allow bulk indexing tasks to be automated without overloading anyone’s servers, and we added support for
resque-scheduler to our configuration baseline.
- We set the sitemap indexer to reject urls from other domains to avoid erroneous attempts to index content from beta sites, old domains, etc.
- We now check the protocol of a domain, and whether the site is responding to us. We also set our url fetcher to throw an error if the domain is unavailable or blocking our indexer.
- We re-indexed the searchgov indices.
- We upgraded mySQL in demo environments, and streamlined the scenario data for our test suite.
- We fixed bug that sent searchers back to page 1 results when changing the time scope in a Collection search.
- We mitigated SSL certificate problems with some sites.
- We made our redirection check more strict to avoid filling our database and indexes with domains and web pages that don’t need to be searchable.
- We’re making good progress towards our indexing system, but all our work in April was in the back end of our system. See below for more information.
- We have updated the jQuery version.
- We configured our analytics alerts to send emails via SES instead of Mandrill.
- We upgraded Ruby to version 2.3.7.
- We computed filename extensions for documents in our primary index.
- We improved how we handle email bounces for our notifications, and complaints that may come in.
- We fixed and error with our S3 backups for Logstash.
- We have updated how we create and update the search.gov sitemap.
- We transitioned our i14y repo to CircleCI.
- We increased storage volumes used by Elasticsearch 5.6.4.
- We open-sourced the usasearch application, which is now known as search-gov.
- We are now sending our email notifications via AWS Simple Email Service (SES).
- We setup a metrics dashboard for SES emails.
- We removed all Rspec/VCR dependencies on actual private keys.
- We removed the link to Search.gov website from the “Powered by Search.gov” logo that appears at the bottom of our hosted SERPs.
- Our new indexing system has been in production since December! In February, we continued to release several features that are building on and improving our new system:
- We have made our indexing rake task more efficient.
- We began implementing indexing from XML sitemaps. Check out our new XML Sitemaps page to learn more about them.
- We made our title parsing more flexible, drawing from either
<og:title> tags, depending on where a site’s quality title metadata may be.
- We upgraded Rails for our core application
- We switched our highlighting of terms in search results to a much lighter method.
- We now have all apps enforcing HTTPs and are compliant with BOD 18-01.
- We removed abandoned i14y drawers.
- We improved how our indexing job updates records for existing pages.
- We fixed an RSS feeds fetch issue.
- We have fixed the bug causing search sites to have access to the Best Bets of other affiliates.
- We disabled four-byte UTF-8 characters in Twitter results, and now support eight-byte characters. That’s right - emojis in your search results if they are used in your tweets.
- Our new indexing system has been in production since December! In January, we released several features that are building on and improving our new system:
- We updated our index with new stemming settings. Stemming refers to how a system processes related words, based on the root of the words. For example, stemming is what allows a search for “renew passport” to show results for “passport renewal”.
- URLs in our index can be permanently deleted from our system.
- Documents in our index are now limited to 10 MB in size.
- We can extract body text from a document if the
<main> element is empty.
- The Instagram section is no longer displayed in the Admin Center dashboard, unless you had an Instagram account added to your search site prior to June 2016. At that time, Instagram began requiring accounts to grant permission to index their images via an integration between systems, which Search.gov cannot support. Therefore, our Instagram index was last updated in June 2016. Any images in our index prior to that date will continue to be shown on your search results page, as long as you do not remove your Instagram account from the Admin Center. If you remove your account, any photos in our index will be permanently deleted from our system.
- Drilldown tables and graphs are no longer available in the Monthly Reports section. According to our analytics, the tables and graphs were not being viewed. We anticipate rolling out new analytics viewing options later in 2018.
- Our new indexing system is now in production! Our team has been hard at work on this effort since June, and we are thrilled to reach this exciting milestone. In December, we released several features that helped us cross the first phase finish line:
- Our new system is live with an updated version of ElasticSearch.
- Our new system takes into consideration a “Promote” value when determining relevancy. “Promote” is a true/false value and is optional.
- Our technical lead improved the way the Loofah scraper gets HTML documents into our system. We work in the open as much as possible, and this minor change helped fix a large bug in the Loofah core code.
- The endpoint for our Jobs API was updated on December 7th. This change puts the hostname under Search.gov’s DNS zone; previously, it was hosted in another part of our division. This code change only affected agencies that are directly calling our open source API. If you are only using our Jobs Module on your hosted search results page, you did not need to take any action.
- We updated Rails on our Jobs API.
- We updated Ruby on our main application.
- We transitioned away from using UserVoice to collect feedback from our customers. Instead, you can submit feedback via Google form or by emailing us. Take a moment to review the feedback we’ve already received.
- To accomplish our FY 2018 goals, we continued backend development that will allow your agency content to be served directly from our indexes. In November, our team began testing our new system. We also released several features related to this project:
- Collections results will now come from our indexes. Previously, a site that saw main page search results from our indexes would still see Bing or Google results when using Collections. Now, if a site is using our indexes for its main search page, its Collections results will also come from our indexes.
- The page-1 RSS module and search page alert will now appear for sites getting results from our indexes. Previously, these two features only worked on sites using Bing/Google.
- i14y documents will be rejected if the
document_id contains slashes or is more than 512 bytes.
- On November 14th, we notified our users of a Bing service degradation. This caused inconsistent results across sites, including incomplete results or 503 errors, and prevented access to the Search Admin Center. The inconsistency began at 2:05pm ET and ended at 3:05pm ET.
- We continued transitioning our repos to use Circle CI.
- Phrase searches now work with content that is served from our indexes. A search for “cheese curds” will return results for the specific phrase “cheese curds” rather than the separate words “cheese” and “curds”.