Summary
By default, the value of 'MaxStartups' is 10:30:100, which means if there are 10 ssh requests established and waiting for authorizing, then the subsequent request got 30% possibility to be refused, which means if you are trans files with a script via sftp and do NOT limit the concurrent requests may cause somebody ssh or rsh failed unexpected.
Trouble Shooting
Problem Summary
- GPFS node shows
unknown
state when executingmmgetstate -a
command - Lots of
refused
ssh connections exist in file/var/log/secure
Common Causes
-
SSH port is wrong or not blocked
-
Credentials are incorrect
-
SSH Client is not installed correctly
-
SSH Daemon is not installed or not started correctly
SSH Connection Refused: What It Is, Causes, and 6 Effective Methods to Fix It
Possible Reasons
- Pool network quality
- High latency
- Package loss
- Package damaged
- Package reordering
- SSH daemon overload
Scripts & commands for testing
-
Networking traffic controlling tool
Name
:nc
Usage
:- Make network delay 1000ms though device
eth0
tc qdisc add dev eth0 root netem delay 1000ms
- Make network loss 10% packages though device
eth0
tc qdisc add dev eth0 root netem loss 10%
- Make 0.2% of packages damaged randomly
tc qdisc add dev eth0 root netem corrupt 0.2%
- Make 25% of packages be sent immediately and 50% delayed
tc qdisc change dev eth0 root netem delay 10ms reorder 25% 50%
- Make network delay 1000ms though device
-
Script for fetching
GPFS
nodes’ state:mmstat.sh
#!/bin/bash # log_file="/tmp/mmstate.log" # path of log file node_to_check="test01" # node name in GPFS cluster to test established_ssh= time_wait_ssh= mmresult= rsh_realtime= # 1. Function for counting SSH Connections function count() { count_ssh=`netstat -pt | grep sshd` established_ssh=$(echo "$count_ssh" | grep --count 'ESTABLISHED') time_wait_ssh=$(echo "$count_ssh" | grep --count 'TIME_WAIT') login_grace_time=`sshd -T | grep -i "LoginGraceTime" | grep -o "[0-9]*"` max_startups=`sshd -T | grep -i "MaxStartups" | grep -o "[0-9]*:[0-9]*:[0-9]*"` max_sessions=`sshd -T | grep -i "MaxSessions" | grep -o "[0-9]*"` echo "SSH Established: $established_ssh; SSH TIME_WAIT: $time_wait_ssh" | tee -a "$log_file" echo "SSH MaxStartups: $max_startups; LoginGraceTime: $time_wait_ssh; MaxSessions: $max_sessions" | tee -a "$log_file" } # 2. Function for fetching GPFS nodes' state function mmstate() { mmresult=`mmgetstate -a | grep -E "^[ ]{1,}[0-9]{1,}[ ]{1,}.*$" | sed 's/[ ]\{1,\}/ /g'` echo "$mmresult" | tee -a "$log_file" } function rshstate() { rsh_realtime=$({ time ssh root@"$node_to_check" date &>/dev/null; } 2>&1 | grep -i real | sed 's/real\t//g') echo "RSH Real Time: $rsh_realtime" | tee -a "$log_file" } while true; do echo '------------------' | tee -a "$log_file" date "+%Y/%m/%d %H:%M:%S" | tee -a "$log_file" count & mmstate & rshstate & wait echo done exit 0
Usage
:sh mmstat.sh
Output
:------------------ 2023/09/02 14:50:41 RSH Real Time: 0m0.239s SSH Established: 2; SSH TIME_WAIT: 0 SSH MaxStartups: 2:90:4; LoginGraceTime: 0; MaxSessions: 10 1 test01 active 2 test02 active
-
Sending
SFTP
request from local hostmain_sftp.sh
#!/bin/bash # /usr/bin/expect /path/to/sftp.exp &>/dev/null &
sftp.exp
#!/usr/bin/expect set timeout 5 spawn sftp -oHostKeyAlgorithms=+ssh-dss [email protected] expect "assword: " send "xx\n" expect "sftp> " send "put /Users/beyan/Desktop/config.csv\n" expect "sftp> " send "exit" exit 0
Usage
:for i in {1..100}; do sh /path/to/main_sftp.sh; done
Note
: changing 100 to number of the multiple processes you want -
Sending
ssh
request but never loginmain_ssh.sh
#!/bin/bash /usr/bin/expect /path/to/hold_login.exp &>/dev/null & exit 0
hold_login.exp
#!/usr/bin/expect set timeout 600 spawn ssh -oHostKeyAlgorithms=+ssh-dss [email protected] expect "assword: "
Usage
:for i in {1..100}; do sh /path/to/main_ssh.sh; done
Note
: changing 100 to number of the multiple processes you want
Test Environment
- GPFS Cluster
- node1: 21.21.78.101
- node2: 21.21.78.102
- GPFS Version
4.2.2.0
- Host OS Version
RHEL 6.5
- SSH Version
OpenSSH 5.3p1
[root@test01 ~]# mmdiag --version
=== mmdiag: version ===
Current GPFS build: "4.2.2.0 ".
Built on Nov 11 2016 at 11:51:09
Running 1 hour 30 minutes 35 secs, pid 2845
[root@test01 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.5 (Santiago)
[root@test01 ~]#
[root@test01 ~]# ssh -V
OpenSSH_5.3p1, OpenSSL 1.0.0-fips 29 Mar 2010
Verifying Possibility
-
High latency of network
-
Network package loss
-
Network package damaged
-
Network package reordering
Procedure
-
Logging the GPFS nodes’ state continuously
-
Manually manipulate the network traffic with the command provides above.
netem delay 1000ms
netem loss 10%
netem corrupt 0.2%
netem delay 10ms reorder 25% 50%
-
Rechecking the state of the nodes
Result
Pool network quality does
not
impact the GPFS nodes’ state. -
-
SSH server overload
Procedure
- Logging
SSH
&GPFS
state with scriptmmstat.sh
- Multiple sending
SFTP
requestfor i in {1..1000}; do sh /path/to/main_sftp.sh; done
- Checking script output
------------------ 2023/09/02 16:26:53 SSH Established: 1; SSH TIME_WAIT: 0 SSH MaxStartups: 10:30:100; LoginGraceTime: 0; MaxSessions: 10 RSH Real Time: 0m0.233s test01: ssh_exchange_identification: Connection closed by remote host mmdsh: test01 remote shell process had return code 255. 1 test01 unknown 2 test02 active
- Checking
sshd
DEBUG
messages/var/log/secure
(needsshd
DEBUG enabled)... Sep 2 16:26:53 test01 sshd[31642]: debug3: oom_adjust_restore Sep 2 16:26:53 test01 sshd[31642]: Set /proc/self/oom_score_adj to 0 ...
Result
1000
SFTP
requests cause OSout of memory
(not much sure), which leads tommgetstate
viassh
failed.Related Links
- Logging
Temporary summary
Typically, neither network issues nor sshd
load (load can’t cause OOM) could impact an ssh
request to fail.
Supposition
Default sshd
configuration refuse additional ssh
login request if some kind of queue is full.
SSH Daemon Configuration Tuning & Verifying
Options
MaxStartups
Specifies the maximum number of concurrent unauthenticated connections
to the SSH daemon. Additional connections will be dropped until
authentication succeeds or the LoginGraceTime expires for a connection.
The default is 10:30:100.
Alternatively, random early drop can be enabled by specifying the
three colon separated values “start:rate:full” (e.g. "10:30:60").
sshd(8) will refuse connection attempts with a probability of
“rate/100” (30%) if there are currently “start” (10) unauthenticated
connections. The probability increases linearly and all connection
attempts are refused if the number of unauthenticated connections
reaches “full” (60).
LoginGraceTime
The server disconnects after this time if the user has not successfully
logged in. If the value is 0, there is no time limit.
The default is 120 seconds.
MaxSessions
Specifies the maximum number of open sessions permitted per network
connection. The default is 10.
Procedure
-
Increasing the value of
LoginGraceTime
, hold the password entering as long as possible
FILE
:/etc/ssh/sshd_config
LoginGraceTime 10m
-
Decreasing the
start
value to4
(GPFS nodes already takes 2) -
Decreasing the
rate
value to90
(drop new connection with 90% probability) -
Decreasing the
full
value to6
(refuse addition connections after 4 established)MaxStartups 4:90:6
-
Restart
sshd
service & verifying the optionsservice sshd restart
sshd -T | grep -E "grace|startup"
-
Logging the GPFS nodes’ state continuously
sh mmstate.sh
------------------ 2023/09/02 17:23:19 SSH Established: 2; SSH TIME_WAIT: 0 SSH MaxStartups: 4:90:6; LoginGraceTime: 0; MaxSessions: 10 RSH Real Time: 0m0.167s 1 test01 active 2 test02 active
Note: 2 connections established.
-
Creating another 2 connections
ssh -oHostKeyAlgorithms=+ssh-dss [email protected]
-
Checking OS & GPFS state
------------------ 2023/09/02 17:25:29 SSH Established: 4; SSH TIME_WAIT: 0 SSH MaxStartups: 4:90:6; LoginGraceTime: 0; MaxSessions: 10 RSH Real Time: 0m0.166s 1 test01 active 2 test02 active
-
Creating
one
more connection -
Checking script output
------------------ 2023/09/02 17:29:22 RSH Real Time: 0m0.174s SSH Established: 5; SSH TIME_WAIT: 0 SSH MaxStartups: 4:90:6; LoginGraceTime: 0; MaxSessions: 10 test01: ssh_exchange_identification: Connection closed by remote host mmdsh: test01 remote shell process had return code 255. 1 test01 unknown 2 test02 active
Note: 90% probability, try create more connection if
ssh
request not be refused.
One More Possibility
By default, the value of ‘MaxStartups’ is 10:30:100, which means if there are 10 ssh
requests established and waiting for authorizing, then the next request got 30% possibility to be refused, which means if you are trans files with a script via sftp and do NOT limit the concurrent requests may cause somebody ssh or rsh failed unexpected.