I recently converted a number Amazon Opsworks instances to be time based, so that they shut down at the end of the working day.
After doing so, I kept seeing that the Opsworks life cycle events always being marked as in progress.

 

amazon.opsworks.failed.reboot

 

Every morning the instances would be marked booting and would be able to SSH into the instances and the applications would be running on them.
When I finally managed to investigate this morning I figured out why this was occurring and was a little surprised how amazon run the opsworks agent.

We have a number of custom monit configurations to ensure that our own applications are running. These do not interfere with the Amazon Opsworks monit configuration.
The development/qa team occasionally stop monit when doing development work or switching branches on these instances.

As can be seen from the following…

[UTC Oct 9 09:50:14] info : monit: generated unique Monit id 437edc13afbe8e51fde4501f9af8db8f and stored to ‘/var/lib/monit/id’
[UTC Oct 9 09:50:14] info : Starting monit daemon
[UTC Oct 9 09:50:14] info : ‘ip-10-75-15-96.eu-west-1.compute.internal’ Monit started
[UTC Oct 9 09:54:35] info : monit daemon with pid [6927] killed
[UTC Oct 9 09:54:35] info : ‘ip-10-75-15-96.eu-west-1.compute.internal’ Monit stopped

Amazon opsworks doesn’t use any form of upstart/init.d/systemd so when servers boot up its reliant on monit to be started. If the last state is monit being stopped this will fail to run opsworks and you will see the above state.

$ egrep -rin opsworks /etc/rc* /etc/init.d/

I’ve asked our development team to disable the correct processes within monit

E.g.

$ monit unmonitor node_checkout_process

It would be sensible not to rely on monit and have a relevant script to spawn the process on server restart.

root@nodejs-dev-andy:/opt/aws/opsworks/current/bin# ./opsworks-agent-cli agent_report
Pidfile /var/lib/aws/opsworks/pid/opsworks-agent.pid present but no matching process running - cleaning up

AWS OpsWorks Instance Agent State Report:

Last activity was a "configure" on 2014-09-26 18:03:22 UTC
Agent Status: No AWS OpsWorks agent running
Agent Version: 325, up to date

Hopefully Amazon will improve the way the opsworks agent is spawned.