Alerts (old posts, page 1)

[Resolved] Intermittent outages during maintenance

Deployments complete: The deployment has been completed for all services and no issues have been found. Service should be operating normally. (15:45 UTC — Nov 30)

Ongoing deployment causing service disruption: We are currently deploying a major change in the Sourcehut architecture, which is likely to cause intermittent outages during deployment as services are rotated onto the new system and oversights are discovered and resolved. (15:00 UTC — Nov 30)

[Resolved] git.sr.ht outage

Service restored: Service has been restored. I/O exhaustion flipped git.sr.ht’s filesystem into read-only mode and on reboot, filesystem corruption was found. The integrity of user data was not affected, and if they were, the integrity of our backups was found to be correct in an emergency integrity test (routine checks are also performed regularly and found to be in order).

An investigation into the underlying causes of the failure is ongoing. (20:08 UTC — Nov 15)

git.sr.ht outage: git.sr.ht is currently experiencing a total outage. We have identified the issue and a solution is being worked on. (20:00 UTC — Nov 15)

[Resolved] Major DNS outage

Resolved The issue has been fixed. It may take time for DNS updates to propegate to your local server. (21:45 UTC — Sep 29)

Working on the issue We have reached support and they’ve rolled back the changes. We’re monitoring our services to make sure the trasition occurs smoothly. (21:00 UTC — Sep 29)

No response We are still waiting for an update from our registrar. (20:00 UTC — Sep 29)

Major DNS outage: While deploying IPv6 support across Sourcehut, we submitted a routine request to our domain registrar to add IPv6 glue records for our nameservers. In the course of doing this, the registrar also removed the IPv4 glue records. We are attempting to reach them for assistance. Services are severely disrupted in the meantime. (10:00 UTC — Sep 29)

[Resolved] Planned maintenance on sr.ht

Maintenance complete: Everything appears to be in good working order. (20:00 UTC — Feb 18)

Upcoming planned maintenance: We’re doing planned maintenance on Monday that will cause sporadic outages throughout the day as our hosts are moved into a larger rack. In exchange for your patience, we’ll be installing a much more powerful host for builds.sr.ht builds at the same time. (12:00 UTC — Feb 18)

[Resolved] builds.sr.ht outage

Full service restored DNS udpates should be well and propegated by now, and the host is confirmed to be in working order. Apologies for the disruption. (16:42 UTC — Feb 13)

Rolling back migration There are other problems with the new host - DNS is being rolled back to the old host and another attempt to migrate will be taken at a later date. If you updated your /etc/hosts entry, please reset it to the old host: 45.56.100.217.

DNS error: The builds.sr.ht DNS records were incorrectly updated to the wrong IP address, resulting in an outage which will last for the duration of the TTL (approximately 1 hour). A workaround is to add builds.sr.ht to your /etc/hosts file: the correct IP address is 173.195.146.148. Please set a reminder to remove this entry, as future DNS updates may cause local outages for you if you do not. (15:12 UTC — Feb 13)

[Resolved] Planned login outage

Maintenance complete: We have completed this period of maintenance, sr.ht has been restored to full service. (03:00 UTC — Dec 29)

Partial outage during planned maintenance: A planned service upgrade requires logins to be intermittedly for some sr.ht services. Users who are already logged in should not experience a disruption, though minor issues here and there may occur. (01:00 UTC — Dec 29)

[Resolved] Service disruption

Debian & FreeBSD builds now available: Full service has been restored. (03:54 UTC — Nov 27)

Arch Linux builds now available: Arch is available and Debian is coming. (18:01 UTC — Nov 25)

Alpine builds now available: The next step is to start pulling build images down to the new build runner. Currently Alpine is available and Arch is being worked on next. (17:01 UTC — Nov 25)

Restoring builds service: Work is ongoing to provision a new build host. Service is expected to be restored within a few hours. (15:23 UTC — Nov 25)

Temporary database server provisioned: Full service has been restored to all services except for builds.sr.ht, which requires the provisioning of a new build slave. Until then no builds will be run. (22:34 UTC — Nov 24)

Replacing the database server: The database server has been taken out of service for maintenance, and in the meantime I’m provisioning a replacement. I believe that there has been no data loss. (22:06 UTC — Nov 24)

Reopened: Failures are back up. Investigating. (21:03 UTC — Nov 24)

Resolved: The issue has been resolved, and we’re monitoring the system for more information. (20:42 UTC — Nov 24)

Investigating: The same database server causing issues yesterday is causing more problems today. Investigating. (19:40 UTC — Nov 24)

[Resolved] Database RAID failure

Service restored: The database server has been returned to service. (23:27 UTC — Nov 23)

Restoring service: The RAID rebuild completed successfully and the server is on its way back to the datacenter to be entered back into service. (22:25 UTC — Nov 23)

Resolving: An issue with one of the hard drives on one of our database servers was discovered, resulting in a total outage for this server. The RAID array is being rebuilt and service is expected to be restored within a few hours. (20:00 UTC — Nov 23)

Investigating: One of our database servers has gone offline and is unresponsive. The issue will require manual intervention at the datacenter, but the NOC is slow to respond during the holiday weekend. Once we gain access to the server, an update will be posted. (18:00 UTC — Nov 23)