detect-hung.sh

in bots •  7 years ago 

If you're running apps written with #radiator, you might run into this error on occasion:

W, [2018-02-09T03:06:18.671911 #1622] WARN -- : database_api.get_account_history :: SSL Error (SSL_connect SYSCALL returned=5 errno=0 state=error: certificate verify failed), retrying ...

Normally, this might happen once in a while and recover right away. But sometimes, you'll see it happen over and over in a short period of time and never recover.

It can also happen if there's a "man-in-the-middle" attack or some other security breach, but that's pretty rare.

If it only happens once in a while, it's probably a reverse proxy timeout that happens during the response, instead of properly returning HTTP Code 502.

But if it happens over and over in a short period of time, it's probably your local machine running out of resources. There are too many file handles open, and it can't open the certificate.

Here's a script that will detect the problem (detect-hung.sh):

#!/bin/bash

cd $HOME/path/to/your/app

count=$((`tail -2000 debug.log | grep "certificate verify failed" | wc -l`))

if [[ $count -eq 1 ]] ; then
  exit 0
fi

exit $count

You'll need to change /path/to/your/app and debug.log to the correct values for your app.

This script will also work with monit. Just add the following definition to /etc/monit/monitrc:

 check program "detect-hung" with path "/home/your-user/scripts/detect-hung.sh"
   uid your-user
   gid your-user
   if status != 0 for 2 cycles then alert
   every 2 cycles
   group your-app

You'll also have to use the correct values for your-user and your-app.

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

@inertia,
Thank you friend, I am very interesting in this area! So, I will contact you to learn more about this!

Cheers~

Yes......

that's pretty cool to know

A helpful piece of information

This is an RPC timeout error ? Ie the reason for getting the error / exception in the first place

Thank you for the info @inertia

Very vital information, now I know where to pinpoint my errors

@inertia my busy app is not working, what should I do?
and how to use zappl?