Broken XFS on Elastic Block Storage (EBS) and SSH failing on “Write failed: Broken pipe”

In: Linux

29 May 2010

After attaching an EBS volume and rsyncing files to the EBS device the server had a load average of 6 before the server became unresponsive to ssh connections and complained of a broken pipe for ssh.

The server was still responding to http requests and ping, however trying to establish a ssh connection, it connected and then failed on a broken pipe.

The files that were rsync’d to the EBS device had completed, however creating a snapshot of the EBS device and mounting to a new EC2 instance caused the new server to become unresponsive to ssh connections.

I had not restarted the original instance, however I wanted to ensure that the data was safe that had been backed up and identify what the problem was. I forced the EBS device to detach, however ssh still failed on a broken pipe.

    andrew@andrew-home:~$ ping ajohnstone.com
    PING ajohnstone.com (174.129.218.53) 56(84) bytes of data.
    64 bytes from ec2-174-129-218-53.compute-1.amazonaws.com (174.129.218.53): icmp_req=1 ttl=44 time=101 ms
    64 bytes from ec2-174-129-218-53.compute-1.amazonaws.com (174.129.218.53): icmp_req=2 ttl=44 time=120 ms
    64 bytes from ec2-174-129-218-53.compute-1.amazonaws.com (174.129.218.53): icmp_req=3 ttl=44 time=99.5 ms
    ^C
    --- ajohnstone.com ping statistics ---
    3 packets transmitted, 3 received, 0% packet loss, time 2003ms
    rtt min/avg/max/mdev = 99.591/107.424/120.877/9.562 ms

    andrew@andrew-home:~$ ssh root@ajohnstone.com -v
    OpenSSH_5.5p1 Debian-4, OpenSSL 0.9.8n 24 Mar 2010
    debug1: Reading configuration data /etc/ssh/ssh_config
    debug1: Applying options for *
    debug1: Connecting to ajohnstone.com [174.129.218.53] port 22.
    debug1: Connection established.
    debug1: identity file /home/andrew/.ssh/id_rsa type -1
    debug1: identity file /home/andrew/.ssh/id_rsa-cert type -1
    debug1: identity file /home/andrew/.ssh/id_dsa type 2
    debug1: Checking blacklist file /usr/share/ssh/blacklist.DSA-1024
    debug1: Checking blacklist file /etc/ssh/blacklist.DSA-1024
    debug1: identity file /home/andrew/.ssh/id_dsa-cert type -1
    debug1: Remote protocol version 2.0, remote software version OpenSSH_5.1p1 Debian-5
    debug1: match: OpenSSH_5.1p1 Debian-5 pat OpenSSH*
    debug1: Enabling compatibility mode for protocol 2.0
    debug1: Local version string SSH-2.0-OpenSSH_5.5p1 Debian-4
    debug1: SSH2_MSG_KEXINIT sent
    debug1: SSH2_MSG_KEXINIT received
    debug1: kex: server->client aes128-ctr hmac-md5 none
    debug1: kex: client->server aes128-ctr hmac-md5 none
    debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent
    debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
    debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
    debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
    debug1: Host 'ajohnstone.com' is known and matches the RSA host key.
    debug1: Found key in /home/andrew/.ssh/known_hosts:145
    debug1: ssh_rsa_verify: signature correct
    debug1: SSH2_MSG_NEWKEYS sent
    debug1: expecting SSH2_MSG_NEWKEYS
    debug1: SSH2_MSG_NEWKEYS received
    debug1: Roaming not allowed by server
    debug1: SSH2_MSG_SERVICE_REQUEST sent
    debug1: SSH2_MSG_SERVICE_ACCEPT received
    debug1: Authentications that can continue: publickey
    debug1: Next authentication method: publickey
    debug1: Offering public key: /home/andrew/.ssh/id_dsa
    debug1: Server accepts key: pkalg ssh-dss blen 433
    debug1: Authentication succeeded (publickey).
    debug1: channel 0: new [client-session]
    debug1: Requesting no-more-sessions@openssh.com
    debug1: Entering interactive session.
    Write failed: Broken pipe

The New instance :

    andrew@andrew-home:~$ ping ec2-174-129-95-8.compute-1.amazonaws.com
    PING ec2-174-129-95-8.compute-1.amazonaws.com (174.129.95.8) 56(84) bytes of data.
    64 bytes from ec2-174-129-95-8.compute-1.amazonaws.com (174.129.95.8): icmp_req=1 ttl=43 time=100 ms
    64 bytes from ec2-174-129-95-8.compute-1.amazonaws.com (174.129.95.8): icmp_req=2 ttl=43 time=100 ms
    ^C
    --- ec2-174-129-95-8.compute-1.amazonaws.com ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1001ms
    rtt min/avg/max/mdev = 100.039/100.269/100.499/0.230 ms

    andrew@andrew-home:~$ ssh root@ec2-174-129-95-8.compute-1.amazonaws.com -i ~/.ssh/id_ajohnstone.com.key
    The authenticity of host 'ec2-174-129-95-8.compute-1.amazonaws.com (174.129.95.8)' can't be established.
    RSA key fingerprint is a6:c9:19:45:bc:62:e0:e5:5f:c6:2b:d6:36:94:24:21.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added 'ec2-174-129-95-8.compute-1.amazonaws.com,174.129.95.8' (RSA) to the list of known hosts.
    Linux ip-10-251-202-8 2.6.21.7-2.fc8xen-ec2-v1.0 #2 SMP Tue Sep 1 10:04:29 EDT 2009 i686

    The programs included with the Debian GNU/Linux system are free software;
    the exact distribution terms for each program are described in the
    individual files in /usr/share/doc/*/copyright.

    Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
    permitted by applicable law.

    Amazon EC2 Debian 5.0.4 lenny AMI built by Eric Hammond
    http://alestic.com  http://ec2debian-group.notlong.com
    ip-10-251-202-8:~# mkdir /mnt/tmp
    ip-10-251-202-8:~# mount /dev/sda10  /mnt/tmp/
    ip-10-251-202-8:~# cd /mnt/tmp/
    ip-10-251-202-8:/mnt/tmp# ll

Listing directories failed at that point and trying to establish an ssh connection failed at the following:

    andrew@andrew-home:~$ ssh -v root@ec2-174-129-95-8.compute-1.amazonaws.com -i ~/.ssh/id_ajohnstone.com.key
    OpenSSH_5.5p1 Debian-4, OpenSSL 0.9.8n 24 Mar 2010
    debug1: Reading configuration data /etc/ssh/ssh_config
    debug1: Applying options for *
    debug1: Connecting to ec2-174-129-95-8.compute-1.amazonaws.com [174.129.95.8] port 22.
    debug1: Connection established.
    debug1: identity file /home/andrew/.ssh/id_ajohnstone.com.key type -1
    debug1: identity file /home/andrew/.ssh/id_ajohnstone.com.key-cert type -1

I forced detach the snapshot EBS device on the new server, although the server was still unresponsive to ssh connections, however after rebooting the server I was able to ssh into the box. Syslog showed a series of faults with XFS after rebooting

The difference between the first server and the second was that the new server failed to even connect and the first stated that there was a broken pipe when connecting via ssh. I tried rebooting the original instance and and as soon was able to ssh back into the machine.

Typically I always use ext3 or reiserfs, however the AMI (ami-ed16f984) I was using did not have reiserfs compiled into the kernel.

Details of XFS faults

Comment Form

About this blog

I have been a developer for roughly 10 years and have worked with an extensive range of technologies. Whilst working for relatively small companies, I have worked with all aspects of the development life cycle, which has given me a broad and in-depth experience.