北肙

当你不能够再拥有,唯一可以做的,就是令自己不要忘记。

What Cause SSH Connection be Refused

Summary By default, the value of 'MaxStartups' is 10:30:100, which means if there are 10 ssh requests established and waiting for authorizing, then the subsequent request got 30% possibility to be refused, which means if you are trans files with a script via sftp and do NOT limit the concurrent requests may cause somebody ssh […]

Summary

By default, the value of 'MaxStartups' is 10:30:100, which means if there are 10 ssh requests established and waiting for authorizing, then the subsequent request got 30% possibility to be refused, which means if you are trans files with a script via sftp and do NOT limit the concurrent requests may cause somebody ssh or rsh failed unexpected.

Trouble Shooting

Problem Summary

  1. GPFS node shows unknown state when executing mmgetstate -a command
  2. Lots of refused ssh connections exist in file /var/log/secure

Common Causes

  1. SSH port is wrong or not blocked

  2. Credentials are incorrect

  3. SSH Client is not installed correctly

  4. SSH Daemon is not installed or not started correctly

    SSH Connection Refused: What It Is, Causes, and 6 Effective Methods to Fix It

Possible Reasons

  1. Pool network quality
    1. High latency
    2. Package loss
    3. Package damaged
    4. Package reordering
  2. SSH daemon overload

Scripts & commands for testing

  1. Networking traffic controlling tool

    • Name: nc
    • Usage:
      1. Make network delay 1000ms though device eth0
        tc qdisc add dev eth0 root netem delay 1000ms
        
      2. Make network loss 10% packages though device eth0
        tc qdisc add dev eth0 root netem loss 10%
        
      3. Make 0.2% of packages damaged randomly
        tc qdisc add dev eth0 root netem corrupt 0.2%
        
      4. Make 25% of packages be sent immediately and 50% delayed
        tc qdisc change dev eth0 root netem delay 10ms reorder 25% 50%
        
  2. Script for fetching GPFS nodes’ state: mmstat.sh

    #!/bin/bash  
    #  
    log_file="/tmp/mmstate.log" # path of log file  
    node_to_check="test01" # node name in GPFS cluster to test  
      
    established_ssh=  
    time_wait_ssh=  
    mmresult=  
    rsh_realtime=  
      
    # 1. Function for counting SSH Connections  
    function count() {  
    	count_ssh=`netstat -pt | grep sshd`  
    	established_ssh=$(echo "$count_ssh" | grep --count 'ESTABLISHED')  
    	time_wait_ssh=$(echo "$count_ssh" | grep --count 'TIME_WAIT')  
    	  
    	login_grace_time=`sshd -T | grep -i "LoginGraceTime" | grep -o "[0-9]*"`  
    	max_startups=`sshd -T | grep -i "MaxStartups" | grep -o "[0-9]*:[0-9]*:[0-9]*"`  
    	max_sessions=`sshd -T | grep -i "MaxSessions" | grep -o "[0-9]*"`  
    	  
    	echo "SSH Established: $established_ssh; SSH TIME_WAIT: $time_wait_ssh" | tee -a "$log_file"  
    	echo "SSH MaxStartups: $max_startups; LoginGraceTime: $time_wait_ssh; MaxSessions: $max_sessions" | tee -a "$log_file"  
    }  
      
    # 2. Function for fetching GPFS nodes' state  
    function mmstate() {  
    	mmresult=`mmgetstate -a | grep -E "^[ ]{1,}[0-9]{1,}[ ]{1,}.*$" | sed 's/[ ]\{1,\}/ /g'`  
      
    	echo "$mmresult" | tee -a "$log_file"  
    }  
      
    function rshstate() {  
    	rsh_realtime=$({ time ssh root@"$node_to_check" date &>/dev/null; } 2>&1 | grep -i real | sed 's/real\t//g')  
    	  
    	echo "RSH Real Time: $rsh_realtime" | tee -a "$log_file"  
    }  
      
    while true; do  
    	echo '------------------' | tee -a "$log_file"  
    	date "+%Y/%m/%d %H:%M:%S" | tee -a "$log_file"  
    	  
    	count &  
    	mmstate &  
    	rshstate &  
    	  
    	wait  
    	echo  
    	  
    done  
    exit 0
    

    Usage: sh mmstat.sh

    Output:

    ------------------
    2023/09/02 14:50:41
    RSH Real Time: 0m0.239s
    SSH Established: 2; SSH TIME_WAIT: 0
    SSH MaxStartups: 2:90:4; LoginGraceTime: 0; MaxSessions: 10
     1 test01 active
     2 test02 active
    
  3. Sending SFTP request from local host

    main_sftp.sh

    #!/bin/bash  
    #
    /usr/bin/expect /path/to/sftp.exp &>/dev/null &
    

    sftp.exp

    #!/usr/bin/expect   
    set timeout 5  
    
    spawn sftp -oHostKeyAlgorithms=+ssh-dss [email protected]  
    
    expect "assword: "  
    send "xx\n"  
    
    expect "sftp> "  
    send "put /Users/beyan/Desktop/config.csv\n"  
    
    expect "sftp> "  
    send "exit"  
    
    exit 0
    

    Usage:

    for i in {1..100}; do sh /path/to/main_sftp.sh; done
    

    Note: changing 100 to number of the multiple processes you want

  4. Sending ssh request but never login main_ssh.sh

    #!/bin/bash  
    /usr/bin/expect /path/to/hold_login.exp &>/dev/null &  
    exit 0
    

    hold_login.exp

    #!/usr/bin/expect  
    set timeout 600  
    spawn ssh -oHostKeyAlgorithms=+ssh-dss [email protected]  
    expect "assword: "
    

    Usage:

    for i in {1..100}; do sh /path/to/main_ssh.sh; done
    

    Note: changing 100 to number of the multiple processes you want

Test Environment

  • GPFS Cluster
    • node1: 21.21.78.101
    • node2: 21.21.78.102
  • GPFS Version
    • 4.2.2.0
  • Host OS Version
    • RHEL 6.5
  • SSH Version
    • OpenSSH 5.3p1
[root@test01 ~]# mmdiag --version

=== mmdiag: version ===
Current GPFS build: "4.2.2.0 ".
Built on Nov 11 2016 at 11:51:09
Running 1 hour 30 minutes 35 secs, pid 2845
[root@test01 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.5 (Santiago)
[root@test01 ~]# 
[root@test01 ~]# ssh -V
OpenSSH_5.3p1, OpenSSL 1.0.0-fips 29 Mar 2010

Verifying Possibility


  1. High latency of network

  2. Network package loss

  3. Network package damaged

  4. Network package reordering

    Procedure

    1. Logging the GPFS nodes’ state continuously

    2. Manually manipulate the network traffic with the command provides above.

      • netem delay 1000ms
      • netem loss 10%
      • netem corrupt 0.2%
      • netem delay 10ms reorder 25% 50%
    3. Rechecking the state of the nodes

    Result

    Pool network quality does not impact the GPFS nodes’ state.


  1. SSH server overload

    Procedure

    1. Logging SSH & GPFS state with script mmstat.sh
    2. Multiple sending SFTP request
      for i in {1..1000}; do sh /path/to/main_sftp.sh; done
      
    3. Checking script output
      ------------------
      2023/09/02 16:26:53
      SSH Established: 1; SSH TIME_WAIT: 0
      SSH MaxStartups: 10:30:100; LoginGraceTime: 0; MaxSessions: 10
      RSH Real Time: 0m0.233s
      test01:  ssh_exchange_identification: Connection closed by remote host
      mmdsh: test01 remote shell process had return code 255.
       1 test01 unknown
       2 test02 active
      
    4. Checking sshd DEBUG messages /var/log/secure (need sshd DEBUG enabled)
      ...
      Sep  2 16:26:53 test01 sshd[31642]: debug3: oom_adjust_restore
      Sep  2 16:26:53 test01 sshd[31642]: Set /proc/self/oom_score_adj to 0
      ...
      

    Result

    1000 SFTP requests cause OS out of memory (not much sure), which leads to mmgetstate via ssh failed.

    Related Links

    Fix handling of /proc/self/oom_score_adj on Linux


Temporary summary

Typically, neither network issues nor sshd load (load can’t cause OOM) could impact an ssh request to fail.

Supposition

Default sshd configuration refuse additional ssh login request if some kind of queue is full.

SSH Daemon Configuration Tuning & Verifying

Options

MaxStartups
    Specifies the maximum number of concurrent unauthenticated connections 
    to the SSH daemon.  Additional connections will be dropped until 
    authentication succeeds or the LoginGraceTime expires for a connection.  
    The default is 10:30:100.
    Alternatively, random early drop can be enabled by specifying the 
    three colon separated values “start:rate:full” (e.g. "10:30:60").  
    sshd(8) will refuse connection attempts with a probability of 
    “rate/100” (30%) if there are currently “start” (10) unauthenticated 
    connections.  The probability increases linearly and all connection 
    attempts are refused if the number of unauthenticated connections 
    reaches “full” (60).

LoginGraceTime
    The server disconnects after this time if the user has not successfully 
    logged in.  If the value is 0, there is no time limit.  
    The default is 120 seconds.

MaxSessions
    Specifies the maximum number of open sessions permitted per network 
    connection. The default is 10.

Procedure

  1. Increasing the value of LoginGraceTime, hold the password entering as long as possible
    FILE: /etc/ssh/sshd_config

    LoginGraceTime 10m
    
  2. Decreasing the start value to 4 (GPFS nodes already takes 2)

  3. Decreasing the rate value to 90 (drop new connection with 90% probability)

  4. Decreasing the full value to 6 (refuse addition connections after 4 established)

    MaxStartups 4:90:6
    
  5. Restart sshd service & verifying the options

    service sshd restart
    
    sshd -T | grep -E "grace|startup"
    
  6. Logging the GPFS nodes’ state continuously

    sh mmstate.sh
    
    ------------------
    2023/09/02 17:23:19
    SSH Established: 2; SSH TIME_WAIT: 0
    SSH MaxStartups: 4:90:6; LoginGraceTime: 0; MaxSessions: 10
    RSH Real Time: 0m0.167s
     1 test01 active
     2 test02 active
    

    Note: 2 connections established.

  7. Creating another 2 connections

    ssh -oHostKeyAlgorithms=+ssh-dss [email protected]
    
  8. Checking OS & GPFS state

    ------------------
    2023/09/02 17:25:29
    SSH Established: 4; SSH TIME_WAIT: 0
    SSH MaxStartups: 4:90:6; LoginGraceTime: 0; MaxSessions: 10
    RSH Real Time: 0m0.166s
     1 test01 active
     2 test02 active
    
  9. Creating one more connection

  10. Checking script output

    ------------------
    2023/09/02 17:29:22
    RSH Real Time: 0m0.174s
    SSH Established: 5; SSH TIME_WAIT: 0
    SSH MaxStartups: 4:90:6; LoginGraceTime: 0; MaxSessions: 10
    test01:  ssh_exchange_identification: Connection closed by remote host
    mmdsh: test01 remote shell process had return code 255.
     1 test01 unknown
     2 test02 active
    

    Note: 90% probability, try create more connection if ssh request not be refused.

One More Possibility

By default, the value of ‘MaxStartups’ is 10:30:100, which means if there are 10 ssh requests established and waiting for authorizing, then the next request got 30% possibility to be refused, which means if you are trans files with a script via sftp and do NOT limit the concurrent requests may cause somebody ssh or rsh failed unexpected.

Leave a Reply

Your email address will not be published. Required fields are marked *