Alerts (old posts, page 3)

[Resolved] Planned maintenance will cause minor outages

Today’s scheduled maintenance is complete. We have completed our maintenance work for today, and all services are restored to normal operation. Thank you for your patience. (10:45 UTC — Feb 14)

The migration is underway. We have started this migration work and expect to complete it no later than 12:00 UTC. (10:00 UTC — Feb 14)

Phase one of our Alpine Linux 3.15 roll-out will cause minor service disruptions. We will be rolling out Alpine Linux 3.15 to some of hosts on Monday the 14th at 10:00 UTC. This will disrupt blob storage (causing artifacts on builds.sr.ht and git.sr.ht to become unavailable) and will temporarily reduce build throughput (possibly causing longer waits for builds to start executing). Additionally, metrics.sr.ht will be briefly unavailable.

The maintenance window is two hours. Following the completion of this maintenance, we will announce the window for phase two of the roll-out, which will be more disruptive. (10:00 UTC — Feb 10)

[Resolved] Our upstream network provider is experiencing issues

The issue has been resolved: The upstream network has corrected the problem. Normal service should be restored. Over the course of 19 minutes, we dropped 10 of our ~500 upstream chat.sr.ht connections. We are still experiencing internal issues, but they are not service impacting and we expect them to clear up shortly. (20:22 UTC — Dec 20)

An ongoing issue with our upstream network provider is causing chat.sr.ht reliability problems: An issue with our network provider is causing some unreliability for our services. The full extent of the problem is still not known, but the most important consequence of it so far is that chat.sr.ht connections to upstream networks are unreliable. You may experience disconnections and intermittent service. (20:03 UTC — Nov 16)

[Resolved] Unplanned meta.sr.ht outage

A deployment issue caused a brief outage on meta.sr.ht: An updated for one of our dependencies was not correctly verified before being deployed to production, resulting in a 6 minute outage. Only meta.sr.ht was affected; other services remained available for the duration of the outage. (08:13 UTC — Nov 16)

[Resolved] Unplanned partial outage of hg.sr.ht

HTTP(s) clone access has been restored: The issue was narrowed down to a stale nginx configuration which directed authorization requests to the wrong port. The configuration has been corrected and normal service is restored. (08:35 UTC — Oct 27)

HTTP(s) clone is currently unavailable for hg.sr.ht: A planned deployment ran afoul and has caused some problems with cloning hg.sr.ht repositories via https. An investigation is ongoing. SSH clone is still working correctly. (08:20 UTC — Oct 27)

[Resolved] Planned outage for all services

Planned maintenance is complete: We have finished upgrading all of the affected hosts and all services are restored to normal. (14:20 UTC — Aug 16)

Planned maintenance is ongoing: Our maintenance window has opened and we have started our work. (13:00 UTC — Aug 16)

Planned maintenance on August 16th will cause intermittent outages: We are planning the second (and last phase) of the maintenance which began on August 3rd. This will affect all services, causing intermittent outages that are expected to last between 15 and 30 minutes at most. The total maintenance period should last less than 2 hours. (13:00 UTC — Aug 16)

[Resolved] Planned outage for all services

Maintenance complete: The issue with pages.sr.ht has been resolved and all services are now available. (10:30 UTC — Aug 3)

Maintenance mostly complete but pages.sr.ht still pending: We have completed maintenance on all services, which can now be expected to be stable, with the exception of pages.sr.ht. We have encountered an issue during the pages.sr.ht upgrades and are addressing it now. (10:00 UTC — Aug 3)

Planned maintenance on August 3rd will cause intermittent outages: Planned maintenance will affect all services, causing intermittent outages that are expected to last between 15 and 30 minutes at most. The total maintenance period should last less than 2 hours. (09:00 UTC — Aug 3)

[Resolved] Unplanned git.sr.ht outage

Snapshot growth caused an outage on git.sr.ht

git.sr.ht’s ZFS snapshots grew to consume all available disk space. This is normally an understood pathology of the server configuration, but due to a change in billing with Twilio, our paging script did not alert the operators to the imminent issue. It seems that there is not a grace period with Twilio; they reported the billing issue to us only yesterday.

The issue with git.sr.ht has been resolved, the bill has been paid, and we are researching ways to avoid this occuring again in the future. (15:00 UTC — May 14)

[Resolved] Planned outage for all services

Planned maintenance has been completed. (16:30 UTC — Feb 8)

Planned maintenance is now underway. (15:00 UTC — Feb 8)

Planned maintenance on February 8th will cause intermittent outages: Planned maintenance will affect all services, causing intermittent outages that are expected to last between 15 and 30 minutes at most. (15:00 UTC — Feb 8)

[Resolved] Spamcop outage

One of our third-party DNSBL services, SpamCop, allowed their domain to expire, presumably as a mistake, and began to return “listed” for all DNSBL checks. We use a DNSBL as an early rejection for spam emails, and this caused 21 incoming emails to be wrongfully rejected. We have removed SpamCop from our list of DNSBLs and filed a ticket to improve our monitoring so that we may catch this sooner. Incoming emails are working now, but be advised that if your postmaster uses SpamCop, emails from Sourcehut will likely be rejected until the issue is resolved. (13:25 UTC — Jan 31)