EMC & NetApp

Online Disk Replacement on NetApp FAS3210 NAS Storage

Replacing a hard disk in a storage array can be roughly divided into three steps. First, check the system status and identify the location of the failed disk. S

Replacing a hard disk in a storage array can be roughly divided into three steps. First, check the system status and identify the location of the failed disk. Second, replace the disk and let the system confirm it automatically or manually. Third, verify the storage state. For the first two steps, a failed disk is usually indicated by its status light, and virtually all storage-class products support hot-swapping. So for someone who just wants to get through it and brush off the client, the easiest method is to look at the panel lights and swap whichever disk has an abnormal light. If the storage happens to auto-detect the new disk and do the right thing, all is well.

But storage is no small matter—no matter how many or how few disks a device has, it is the most basic and important building block of production. So, with an attitude of responsibility toward both yourself and others, handling storage still calls for extreme care; no amount of caution is excessive.

1. Checking the System State

Fas3210A> sysconfig -r
Aggregate aggr0 (online, raid\_dp) (block checksums)
 Plex /aggr0/plex0 (online, normal, active)
  RAID group /aggr0/plex0/rg0 (normal)
 RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
 ---------       ------          ------------- ---- ---- ---- ----- --------------    --------------
 dparity         0a.01.0         0a    1   0   SA:A   -  BSAS  7200 1695466/3472315904 1695759/3472914816
 parity          0a.01.1         0a    1   1   SA:A   -  BSAS  7200 1695466/3472315904 1695759/3472914816
 data            0a.01.2         0a    1   2   SA:A   -  BSAS  7200 1695466/3472315904 1695759/3472914816

Aggregate aggr1 (online, raid\_dp, reconstruct) (block checksums)
 Plex /aggr1/plex0 (online, normal, active)
  RAID group /aggr1/plex0/rg0 (reconstruction 5% completed)
 RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
 ---------       ------          ------------- ---- ---- ---- ----- --------------    --------------
 dparity         0a.02.0         0a    2   0   SA:A   -  BSAS  7200 1695466/3472315904 1695759/3472914816
 parity          0a.03.0         0a    3   0   SA:A   -  BSAS  7200 1695466/3472315904 1695759/3472914816
 data            0a.01.3         0a    1   3   SA:A   -  BSAS  7200 1695466/3472315904 1695759/3472914816
 FAILED          0a.02.6         0a    2   6   SA:A   -  BSAS  7200 1695466/3472315904 1695759/3472914816
 data            0a.03.1         0a    3   1   SA:A   -  BSAS  7200 1695466/3472315904 1695759/3472914816
 data            0a.02.7         0a    2   7   SA:A   -  BSAS  7200 1695466/3472315904 1695759/3472914816

Spare disks (empty)
Broken disks
 RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
 ---------       ------          ------------- ---- ---- ---- ----- --------------    --------------
 failed          0a.03.9         0a    3   9   SA:A   -  BSAS  7200 1695466/3472315904 1695759/3472914816

Partner disks
 RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
 ---------       ------          ------------- ---- ---- ---- ----- --------------    --------------
 partner         0a.01.7         0a    1   7   SA:A   -  BSAS  7200 0/0               1695759/3472914816
 partner         0a.01.6         0a    1   6   SA:A   -  BSAS  7200 0/0               1695759/3472914816
 partner         0a.01.20        0a    1   20  SA:A   -  BSAS  7200 0/0               1695759/3472914816
 partner         0a.01.13        0a    1   13  SA:A   -  BSAS  7200 0/0               1695759/3472914816

As shown above, there are two failed disks, displayed as "failed" in the command output, with their corresponding locations.

You can also obtain failed disk information by checking the status of all disks:

Fas3210A> disk show -v
DISK       OWNER                      POOL   SERIAL NUMBER         HOME
------------ -------------              -----  -------------         -------------
0a.02.2      Fas3210B  (1574419336)    Pool0  WD-WMAY02394580       Fas3210B  (1574419336)
0a.01.4      Fas3210B  (1574419336)    Pool0  B9KKWBZF              Fas3210B  (1574419336)
0a.03.7      Fas3210B  (1574419336)    Pool0  WD-WMAY04134224       Fas3210B  (1574419336)
0a.01.1      Fas3210A  (1574419769)    Pool0  BFGUKM5F              Fas3210A  (1574419769)
0a.01.3      Fas3210A  (1574419769)    Pool0  BFGUKKBF              Fas3210A  (1574419769)
0a.02.6      Fas3210A  (1574419769)   FAILED  WD-WMAY02391728       Fas3210A  (1574419769)
0a.03.9      Fas3210A  (1574419769)    FAILED WD-WMAY02504662       Fas3210A  (1574419769)

Once the slot of the failed disk is identified, you can replace the disk. But before doing so, you should check the disk's PN for the procurement preparation:

Fas3210A> storage show disk
DISK                  SHELF BAY SERIAL           VENDOR   MODEL      REV
--------------------- --------- ---------------- -------- ---------- ----
0a.02.6                 2    6  WD-WMAY02391728  NETAPP   X306\_WMANT NA04
0a.02.7                 2    7  WD-WMAY02377652  NETAPP   X306\_WMANT NA04
0a.02.8                 2    8  WD-WMAY02501788  NETAPP   X306\_WMANT NA04
0a.03.8                 3    8  WD-WMAY02481326  NETAPP   X306\_WMANT NA04
0a.03.9                 3    9  WD-WMAY02504662  NETAPP   X306\_WMANT NA04
0a.03.10                3   10  WD-WMAY02459742  NETAPP   X306\_WMANT NA04

As shown above, the PN of the failed disks is "X306_WMANT NA04". You can also check via the following command:

Fas3210A> sysconfig -v
NetApp Release 8.0.2 7-Mode: Mon Jun 13 14:13:45 PDT 2011
System ID: 1574419769 (Fas3210A); partner ID: 1574419336 (Fas3210B)
System Serial Number: 850000103829 (Fas3210A)
System Rev: F4
System Storage Configuration: Single-Path HA
System ACP Connectivity: NA
 slot 0: System Board 2.3 GHz (System Board XVI F4)
Model Name:         FAS3210
Part Number:        111-00585
Revision:           F4
Serial Number:      5006779457
BIOS version:       5.1.1
Loader version:     3.2
Processors:         2
Processor type:     Intel(R) Xeon(R) CPU           E5220  @ 2.33GHz
Memory Size:        5120 MB
Memory Attributes:  Bank Interleaving Hoisting Rank Interleaving Normal ECC
NVMEM Size:         640 MB of Main Memory Used
CMOS RAM Status:    OK
Service Processor           Status: Online
 Firmware Version:   1.2.2
 Mgmt MAC Address:   00:A0:98:18:77:C8
 Ethernet Link:      down
 Using DHCP:         no
 IPv4 configuration:
  IP Address:         unknown
  Netmask:            unknown
  Gateway:            unknown
 slot 0: Internal BGE 10/100 Ethernet Controller
  e0M MAC Address:    00:a0:98:18:77:c6 (auto-100tx-fd-up)
  e0P MAC Address:    00:a0:98:18:77:c7 (auto-100tx-fd-up)
  Device Type:        BCM5721
 slot 0: Dual 10G Ethernet Controller T320E-SFP/KR
  Device Type:        CT-FE-3
  Version Number:     T3-SRAM1.1.0-BR1016-02-01-FW7.7.192-DR04
  Serial Number:      jb04050693
  c0a MAC Address:    00:a0:98:18:77:c4 (auto-unknown-enabling)
  c0b MAC Address:    00:a0:98:18:77:c5 (auto-10g\_kr-fd-up)
 slot 0: Dual 10/100/1000 Ethernet Controller G20
  e0a MAC Address:    00:a0:98:18:77:c2 (auto-100tx-fd-up)
  e0b MAC Address:    00:a0:98:18:77:c3 (auto-100tx-fd-up)
  Device Type:        Rev 6
 slot 0: SAS Host Adapter 0a (PMC-Sierra PM8001 rev. C, SAS, <UP>)
  Firmware rev:       01.10.14.00
  Base WWN:           5:00a098:000808d:e0
  Phy State:          [0] Enabled, 3.0 Gb/s [1] Enabled, 3.0 Gb/s [2] Enabled, 3.0 Gb/s [3] Enabled, 3.0 Gb/s
  QSFP Vendor:        Molex Inc.
  QSFP Part Number:   112-00177+A0
  QSFP Type:          Passive Copper 2m ID:01
  QSFP Serial Number: 116620073
  ID     Vendor   Model            FW    Size
  01.23: NETAPP   X306\_HJUPI02TSSA NA01 1695.4GB (3907029168 512B/sect)
  02.0 : NETAPP   X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
  02.1 : NETAPP   X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
  02.2 : NETAPP   X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
  02.3 : NETAPP   X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
  02.4 : NETAPP   X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
  02.5 : NETAPP   X306\_HMARK02TSSM NA00 1695.4GB (3907029168 512B/sect)
  02.6 : NETAPP   X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect) (Failed)
  02.7 : NETAPP   X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
  03.6 : NETAPP   X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
  03.7 : NETAPP   X306\_WMANT02TSSM 4321 1695.4GB (3907029168 512B/sect)
  03.8 : NETAPP   X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
  03.9 : NETAPP   X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect) (Failed)
  03.10: NETAPP   X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
 Shelf   1: IOM3  Firmware rev. IOM3 A: 0131 IOM3 B: 0131
 Shelf   2: IOM3  Firmware rev. IOM3 A: 0131 IOM3 B: 0131
 Shelf   3: IOM3  Firmware rev. IOM3 A: 0131 IOM3 B: 0131
 slot 0: SAS Host Adapter 0b (PMC-Sierra PM8001 rev. C, SAS, <OFFLINE (hard)>)
  Firmware rev:       01.10.14.00
  Base WWN:           5:00a098:000808d:e4
  Phy State:          [4] Disabled [5] Disabled [6] Disabled [7] Disabled
  QSFP Vendor:        not available
  QSFP Part Number:   not available
  QSFP Type:          not available
  QSFP Serial Number: not available
 slot 0: Intel ICH USB EHCI Adapter u0a (0xdf901400)
  boot0   Micron Technology Real SSD eUSB 2GB, rev 2.00/11.10, addr 2 1936MB 512B/sect (0FF0022700058229)
 slot 0: FC Host Adapter 0c (QLogic 2432 rev. 2, L-port, <OFFLINE (hard)>)
  Firmware rev:      5.4.0
  Host Loop Id:      0
  FC Node Name:      5:00a:098000:808dd8
  FC Port Name:      5:00a:098000:808dd8
  SFP Vendor:        FINISAR CORP.
  SFP Part Number:   FTLF8524P2BNV
  SFP Serial Number: PL82Q4S
  SFP Capabilities:  1, 2 or 4 Gbit
  Link Data Rate:    N/A
 slot 0: FC Host Adapter 0d (QLogic 2432 rev. 2, L-port, <OFFLINE (hard)>)
  Firmware rev:      5.4.0
  Host Loop Id:      0
  FC Node Name:      5:00a:098100:808dd8
  FC Port Name:      5:00a:098100:808dd8
  SFP Vendor:        FINISAR CORP.
  SFP Part Number:   FTLF8524P2BNV
  SFP Serial Number: PL82RFY
  SFP Capabilities:  1, 2 or 4 Gbit
  Link Data Rate:    N/A
 slot 1: Quad Gigabit Ethernet Controller 82580
  e1a MAC Address:    00:1b:21:c4:36:88 (auto-unknown-down)
  e1b MAC Address:    00:1b:21:c4:36:89 (auto-unknown-down)
  e1c MAC Address:    00:1b:21:c4:36:8a (auto-unknown-down)
  e1d MAC Address:    00:1b:21:c4:36:8b (auto-unknown-down)
  Device Type:    150E, PBA E68891-015
 slot 2: Quad Gigabit Ethernet Controller 82580
  e2a MAC Address:    00:1b:21:c4:35:c4 (auto-unknown-down)
  e2b MAC Address:    00:1b:21:c4:35:c5 (auto-unknown-down)
  e2c MAC Address:    00:1b:21:c4:35:c6 (auto-unknown-down)
  e2d MAC Address:    00:1b:21:c4:35:c7 (auto-unknown-down)
  Device Type:    150E, PBA E68891-015

2. Replacing the Disk

Identify the disk:

Fas3210A> priv set advanced  (enter advanced mode)
Fas3210A*> led\_on 0a.2.6      (make the disk indicator blink)
Fas3210A*> led\_off 0a.2.6     (stop the disk indicator blink)
Fas3210A*> priv set admin      (exit advanced mode)

Open the disk latch, remove the failed disk, insert the new disk, then check the new disk's status in the system:

Fas3210A*> disk show -v
DISK       OWNER                      POOL   SERIAL NUMBER         HOME
------------ -------------              -----  -------------         -------------
0a.02.3      Fas3210B  (1574419336)    Pool0  WD-WMAY02393819       Fas3210B  (1574419336)
0a.02.4      Fas3210B  (1574419336)    Pool0  WD-WMAY02501482       Fas3210B  (1574419336)
0a.01.14     Fas3210B  (1574419336)    Pool0  BFGULBNF              Fas3210B  (1574419336)
0a.01.23     Fas3210B  (1574419336)    Pool0  BFGUKM4F              Fas3210B  (1574419336)
0a.01.5      Fas3210B  (1574419336)    Pool0  BFGS1EDF              Fas3210B  (1574419336)
0a.01.1      Fas3210A  (1574419769)    Pool0  BFGUKM5F              Fas3210A  (1574419769)
0a.01.3      Fas3210A  (1574419769)    Pool0  BFGUKKBF              Fas3210A  (1574419769)
0a.02.6      Not Owned      NONE            YFG95P3A
0a.02.8      Fas3210A  (1574419769)    Pool0  WD-WMAY02501788       Fas3210A  (1574419769)
0a.02.0      Fas3210A  (1574419769)    Pool0  WD-WMAY02481293       Fas3210A  (1574419769)
0a.03.10     Fas3210A  (1574419769)    Pool0  WD-WMAY02459742       Fas3210A  (1574419769)
0a.03.3      Fas3210A  (1574419769)    Pool0  WD-WMAY02482034       Fas3210A  (1574419769)
0a.02.12     Fas3210A  (1574419769)    Pool0  BFGB2R9F              Fas3210A  (1574419769)
0a.03.2      Fas3210A  (1574419769)    Pool0  WD-WMAY02481343       Fas3210A  (1574419769)
0a.02.10     Fas3210A  (1574419769)    Pool0  WD-WMAY02389974       Fas3210A  (1574419769)
0a.02.5      Fas3210A  (1574419769)    Pool0  YFH7R0BA              Fas3210A  (1574419769)

If, as shown above, the new disk's status is "Not Owned", you need to run the following commands:

Fas3210A*> disk assign 0a.03.9  (assign the disk to the current controller)
Fas3210B*> disk assign 0a.03.9 -o Fas3210A  (assign the disk to a specific controller)

3. Checking the Storage State

Then use the command disk show -v to check the disk status, and sysconfig -r to check the system status.

If the disk you swapped in is brand new, there usually won't be any further issues after running disk assign. If it's a used spare part, watch out for the disk showing "not zeroed" in the sysconfig -r output, or for a new aggregate being created automatically instead of the disk being added to the spare pool.

If the newly replaced disk was added to the spare pool and is "not zeroed", running the command disk zero spares will resolve it. If a new aggregate was created, run the following commands to delete it.

Check the aggregate status to determine which are pre-existing and which is newly created (assume the new aggregate is aggr1(1)):

Fas3210A*> aggr status

Change the aggregate status to offline:

Fas3210A*> aggr offline aggr1(1)

Delete the aggregate:

Fas3210A*> aggr destroy aggr1(1)

Check the system status again to make sure the newly replaced disk has been added to the spare pool and its status is normal:

Fas3210A*> sysconfig -r

Appendix

View log command:

Fas3210A> rdfile /etc/messages

Enable automatic disk assignment:

Fas3210A*> options disk.auto\_assign off  (disable)
Fas3210A*> options disk.auto\_assign        (view)

If a used spare part is fine on its own but shows "failed" after replacement, run the following command to force the disk state to change. If it still shows "failed" afterward, the spare part is faulty:

Fas3210A*> disk unfail 0a.2.6

Switch back to normal mode:

Fas3210A*> priv set
N
norvyn

独立 iOS 开发者,写字的人。在一座有海的城市,慢慢地做一些小而确定的东西。An independent iOS developer and writer — slowly making small, certain things in a city by the sea.

评论Comments

加载中…Loading…

留下评论Leave a comment