Broken XFS on Elastic Block Storage (EBS) and SSH failing on “Write failed: Broken pipe”
After attaching an EBS volume and rsyncing files to the EBS device the server had a load average of 6 before the server became unresponsive to ssh connections and complained of a broken pipe for ssh.
The server was still responding to http requests and ping, however trying to establish a ssh connection, it connected and then failed on a broken pipe.
The files that were rsync’d to the EBS device had completed, however creating a snapshot of the EBS device and mounting to a new EC2 instance caused the new server to become unresponsive to ssh connections.
I had not restarted the original instance, however I wanted to ensure that the data was safe that had been backed up and identify what the problem was. I forced the EBS device to detach, however ssh still failed on a broken pipe.
andrew@andrew-home:~$ ping ajohnstone.com PING ajohnstone.com (18.104.22.168) 56(84) bytes of data. 64 bytes from ec2-174-129-218-53.compute-1.amazonaws.com (22.214.171.124): icmp_req=1 ttl=44 time=101 ms 64 bytes from ec2-174-129-218-53.compute-1.amazonaws.com (126.96.36.199): icmp_req=2 ttl=44 time=120 ms 64 bytes from ec2-174-129-218-53.compute-1.amazonaws.com (188.8.131.52): icmp_req=3 ttl=44 time=99.5 ms ^C --- ajohnstone.com ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2003ms rtt min/avg/max/mdev = 99.591/107.424/120.877/9.562 ms andrew@andrew-home:~$ ssh email@example.com -v OpenSSH_5.5p1 Debian-4, OpenSSL 0.9.8n 24 Mar 2010 debug1: Reading configuration data /etc/ssh/ssh_config debug1: Applying options for * debug1: Connecting to ajohnstone.com [184.108.40.206] port 22. debug1: Connection established. debug1: identity file /home/andrew/.ssh/id_rsa type -1 debug1: identity file /home/andrew/.ssh/id_rsa-cert type -1 debug1: identity file /home/andrew/.ssh/id_dsa type 2 debug1: Checking blacklist file /usr/share/ssh/blacklist.DSA-1024 debug1: Checking blacklist file /etc/ssh/blacklist.DSA-1024 debug1: identity file /home/andrew/.ssh/id_dsa-cert type -1 debug1: Remote protocol version 2.0, remote software version OpenSSH_5.1p1 Debian-5 debug1: match: OpenSSH_5.1p1 Debian-5 pat OpenSSH* debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_5.5p1 Debian-4 debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug1: kex: server->client aes128-ctr hmac-md5 none debug1: kex: client->server aes128-ctr hmac-md5 none debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP debug1: SSH2_MSG_KEX_DH_GEX_INIT sent debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY debug1: Host 'ajohnstone.com' is known and matches the RSA host key. debug1: Found key in /home/andrew/.ssh/known_hosts:145 debug1: ssh_rsa_verify: signature correct debug1: SSH2_MSG_NEWKEYS sent debug1: expecting SSH2_MSG_NEWKEYS debug1: SSH2_MSG_NEWKEYS received debug1: Roaming not allowed by server debug1: SSH2_MSG_SERVICE_REQUEST sent debug1: SSH2_MSG_SERVICE_ACCEPT received debug1: Authentications that can continue: publickey debug1: Next authentication method: publickey debug1: Offering public key: /home/andrew/.ssh/id_dsa debug1: Server accepts key: pkalg ssh-dss blen 433 debug1: Authentication succeeded (publickey). debug1: channel 0: new [client-session] debug1: Requesting firstname.lastname@example.org debug1: Entering interactive session. Write failed: Broken pipe
The New instance :
andrew@andrew-home:~$ ping ec2-174-129-95-8.compute-1.amazonaws.com PING ec2-174-129-95-8.compute-1.amazonaws.com (220.127.116.11) 56(84) bytes of data. 64 bytes from ec2-174-129-95-8.compute-1.amazonaws.com (18.104.22.168): icmp_req=1 ttl=43 time=100 ms 64 bytes from ec2-174-129-95-8.compute-1.amazonaws.com (22.214.171.124): icmp_req=2 ttl=43 time=100 ms ^C --- ec2-174-129-95-8.compute-1.amazonaws.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 100.039/100.269/100.499/0.230 ms andrew@andrew-home:~$ ssh email@example.com -i ~/.ssh/id_ajohnstone.com.key The authenticity of host 'ec2-174-129-95-8.compute-1.amazonaws.com (126.96.36.199)' can't be established. RSA key fingerprint is a6:c9:19:45:bc:62:e0:e5:5f:c6:2b:d6:36:94:24:21. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'ec2-174-129-95-8.compute-1.amazonaws.com,188.8.131.52' (RSA) to the list of known hosts. Linux ip-10-251-202-8 184.108.40.206-2.fc8xen-ec2-v1.0 #2 SMP Tue Sep 1 10:04:29 EDT 2009 i686 The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. Amazon EC2 Debian 5.0.4 lenny AMI built by Eric Hammond http://alestic.com http://ec2debian-group.notlong.com ip-10-251-202-8:~# mkdir /mnt/tmp ip-10-251-202-8:~# mount /dev/sda10 /mnt/tmp/ ip-10-251-202-8:~# cd /mnt/tmp/ ip-10-251-202-8:/mnt/tmp# ll
Listing directories failed at that point and trying to establish an ssh connection failed at the following:
andrew@andrew-home:~$ ssh -v firstname.lastname@example.org -i ~/.ssh/id_ajohnstone.com.key OpenSSH_5.5p1 Debian-4, OpenSSL 0.9.8n 24 Mar 2010 debug1: Reading configuration data /etc/ssh/ssh_config debug1: Applying options for * debug1: Connecting to ec2-174-129-95-8.compute-1.amazonaws.com [220.127.116.11] port 22. debug1: Connection established. debug1: identity file /home/andrew/.ssh/id_ajohnstone.com.key type -1 debug1: identity file /home/andrew/.ssh/id_ajohnstone.com.key-cert type -1
I forced detach the snapshot EBS device on the new server, although the server was still unresponsive to ssh connections, however after rebooting the server I was able to ssh into the box. Syslog showed a series of faults with XFS after rebooting
The difference between the first server and the second was that the new server failed to even connect and the first stated that there was a broken pipe when connecting via ssh. I tried rebooting the original instance and and as soon was able to ssh back into the machine.
Typically I always use ext3 or reiserfs, however the AMI (ami-ed16f984) I was using did not have reiserfs compiled into the kernel.