This is an old revision of the document!
Overview
In this section we will configure and manage the Exadata storage cells. In exadata the storage cells are separate servers with their own operation system, mini oracle database and of course a storage. The facilitate the smart scan capabilities of exadata, by being able to scan through an oracle block and provide ONLY the necessary rows to a server process. Since they provide only the rows, that information cannot “live” in the SGA, but only in the PGA.
You can see basic architecture of the physical, LUN, celldisk and Grod Disks:
A more detailed view of a disk, you can see also below:
Configuration
In this section we will configure and re-configure certain feature of the cell storage server.
Enable Mail Notifications
To enable Mail notification from a certain cell, we can use the following command:
Enable Mail Notification
--List cell Details: CellCLI> list cell detail name: qr01celadm01 cellVersion: OSS_12.1.2.1.0_LINUX.X64_141206.1 cpuCount: 2 diagHistoryDays: 7 fanCount: 0/0 fanStatus: normal ******************************************** ***NO NOTIFICATION SETTINGS***************** ******************************************** CellCLI> --Modify The cell: CellCLI> alter cell smtpServer='my_mail.example.com', - > smtpFromAddr='[email protected]', - > smtpFrom='John Doe', - > smtpToAddr='[email protected]', - > notificationPolicy='critical,warning,clear', - > notificationMethod='mail' Cell qr01celadm01 successfully altered --List Details again CellCLI> list cell detail name: qr01celadm01 cellVersion: OSS_12.1.2.1.0_LINUX.X64_141206.1 cpuCount: 2 diagHistoryDays: 7 fanCount: 0/0 fanStatus: normal ******************************************** notificationMethod: mail notificationPolicy: critical,warning,clear ******************************************** offloadGroupEvents: offloadEfficiency:
However since the mail doesn't exist, we will recieved the following error if we try to validate it:
Validate Mail
CellCLI> alter cell validate mail CELL-02578: An error was detected in the SMTP configuration: CELL-05503: An error was detected during notification. The text of the associated internal error is: Unknown SMTP host: my_mail.example.com. The notification recipient is [email protected]. Please verify your SMTP configuration. CellCLI>
You can also validate the whole configuration using the following command:
Validate Exadata Configuration
CellCLI> alter cell validate configuration Cell qr01celadm01 successfully altered CellCLI> --Note Note that the ALTER CELL VALIDATE CONFIGURATION command does not perform I/O tests against the cell’s hard disks and flash modules. You must use the CALIBRATE command to perform such tests. The CALIBRATE command can only be executed in a CellCLI session initiated by the root user.
Management
List Cell Processes
The storage cell server has couple processes:
- RS (Restart Server) - Used to start and shut down the Cell server (cellsrv) and Management Server (MS)
- MS (Management Server) - Provides exadata cell management and configuration. Cooperates with the cellcli interface. Also send alerts and gather statistics in addition to these collection by the cellsrv
- Cellsrv (Cell server) - Primary exadata component, provides most of the exadata storage services. Primary Cellsrv communicates with the oracle database in order to provide simple block requests or smart-scan capabilities. It also implements IO management (IORM) and collects numerous statistics.
RS Server
[celladmin@qr01celadm01 ~]$ ps -ef | grep cellrs root 1927 1 0 Nov29 ? 00:00:15 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/bin/cellrssrm -ms 1 -cellsrv 1 root 1934 1927 0 Nov29 ? 00:00:04 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/bin/cellrsbmt -rs_conf /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/config/cellinit.ora -ms_conf /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/config/cellrsms.state -cellsrv_conf /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/config/cellrsos.state -debug 0 root 1935 1927 0 Nov29 ? 00:00:05 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/bin/cellrsmmt -rs_conf /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/config/cellinit.ora -ms_conf /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/config/cellrsms.state -cellsrv_conf /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/config/cellrsos.state -debug 0 root 1936 1927 0 Nov29 ? 00:01:38 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/bin/cellrsomt -rs_conf /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/config/cellinit.ora -ms_conf /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/config/cellrsms.state -cellsrv_conf /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/config/cellrsos.state -debug 0 root 1937 1934 0 Nov29 ? 00:00:00 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/bin/cellrsbkm -rs_conf /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/config/cellinit.ora -ms_conf /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/config/cellrsms.state -cellsrv_conf /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/config/cellrsos.state -debug 0 root 1944 1937 0 Nov29 ? 00:00:04 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/bin/cellrssmt -rs_conf /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/config/cellinit.ora -ms_conf /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/config/cellrsms.state -cellsrv_conf /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/config/cellrsos.state -debug 0
MS Server
[celladmin@qr01celadm01 ~]$ ps -ef | grep msServer root 2003 1938 0 Nov29 ? 00:04:39 /usr/java/jdk1.7.0_72/bin/java -client -Xms256m -Xmx512m -XX:CompileThreshold=8000 -XX:PermSize=128m -XX:MaxPermSize=256m -Dweblogic.Name=msServer -Djava.security.policy=/opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/wls/wlserver_10.3/server/lib/weblogic.policy -XX:-UseLargePages -XX:ParallelGCThreads=8 -Dweblogic.ListenPort=8888 -Djava.security.egd=file:/dev/./urandom -Xverify:none -da -Dplatform.home=/opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/wls/wlserver_10.3 -Dwls.home=/opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/wls/wlserver_10.3/server -Dweblogic.home=/opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/wls/wlserver_10.3/server -Dweblogic.management.discover=true -Dwlw.iterativeDev= -Dwlw.testConsole= -Dwlw.logErrorsToConsole= -Dweblogic.ext.dirs=/opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/deploy/wls/patch_wls1036/profiles/default/sysext_manifest_classpath weblogic.Server 1000 7610 7441 0 03:21 pts/0 00:00:00 grep msServer [celladmin@qr01celadm01 ~]$
CellSRV
[celladmin@qr01celadm01 ~]$ ps -ef | grep "/cellsrv " root 1940 1936 22 Nov29 ? 05:34:49 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/cellsrv/bin/cellsrv 40 3000 9 5042 [celladmin@qr01celadm01 ~]$
it is important to note that both: MS and CellSRV are children of the RS server (they have RS as a parent).
Examine storage Cell
With the cellCli we can also list the storage cell Luns. P.S. Since I don't have money for real exadata there will be some discrepencies between my output and the real output which you will get in the real exadata: Each exadata consists of:
- 12 Hard disk based Luns
- 4 Flash based Luns
Let's firstly start with the examination of tha Hard disk based storage:
Hard Disk Based
To list the status of a storage cell server, we can use the following command:
[celladmin@qr01celadm01 ~]$ cellcli -e list cell qr01celadm01 online [celladmin@qr01celadm01 ~]$
We can obtain more detailed information using the cellCli interface:
[celladmin@qr01celadm01 ~]$ cellcli CellCLI: Release 12.1.2.1.0 - Production on Mon Nov 30 03:30:03 UTC 2020 Copyright (c) 2007, 2013, Oracle. All rights reserved. Cell Efficiency Ratio: 391 CellCLI> list cell detail name: qr01celadm01 cellVersion: OSS_12.1.2.1.0_LINUX.X64_141206.1 cpuCount: 2 diagHistoryDays: 7 fanCount: 0/0 fanStatus: normal flashCacheMode: WriteThrough id: ef92136a-837c-4e1d-88d2-e01f5ab89b7b interconnectCount: 0 interconnect1: ib0 interconnect2: ib1 iormBoost: 0.0 ipaddress1: 192.168.1.105/24 ipaddress2: 192.168.1.106/24 kernelVersion: 2.6.39-400.243.1.el6uek.x86_64 makeModel: Fake hardware memoryGB: 4 metricHistoryDays: 7 offloadGroupEvents: offloadEfficiency: 390.8 powerCount: 0/0 powerStatus: normal releaseVersion: 12.1.2.1.0 releaseTrackingBug: 17885582 status: online temperatureReading: 0.0 temperatureStatus: normal upTime: 1 days, 0:47 cellsrvStatus: running msStatus: running rsStatus: running CellCLI>
List Lun
CellCLI> list lun /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK00 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK00 normal /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK01 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK01 normal /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK02 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK02 normal /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK03 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK03 normal /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK04 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK04 normal /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK05 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK05 normal /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK06 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK06 normal /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK07 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK07 normal /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK08 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK08 normal /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK09 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK09 normal /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK10 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK10 normal /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK11 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK11 normal /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/FLASH00 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/FLASH00 normal /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/FLASH01 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/FLASH01 normal /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/FLASH02 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/FLASH02 normal /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/FLASH03 /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/FLASH03 normal
The result which you will get on the real exadata will be similary to:
CellCLI> list lun 0_0 0_0 normal 0_1 0_1 normal 0_2 0_2 normal 0_3 0_3 normal 0_4 0_4 normal 0_5 0_5 normal 0_6 0_6 normal 0_7 0_7 normal 0_8 0_8 normal 0_9 0_9 normal 0_10 0_10 normal 0_11 0_11 normal 1_1 1_1 normal 2_1 2_1 normal 4_1 4_1 normal 5_1 5_1 normal CellCLI>
The reason for that is the fact that on my virtualized env, the Cells are mapped to a virtualized disks and virtualized flash devices, where on real exadata, they will be mapped to PCI slot and device number.
We can list a specific LUN in more details, as follows:
List Lun
CellCLI> list lun where name like '.*DISK09' detail name: /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK09 cellDisk: CD_09_qr01celadm01 deviceName: /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK09 diskType: HardDisk id: /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK09 isSystemLun: FALSE lunSize: 11 physicalDrives: /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK09 raidLevel: "RAID 0" status: normal CellCLI>
List Physical Disk
CellCLI> list physicaldisk where luns like '.*DISK09' detail name: /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK09 deviceName: /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK09 diskType: HardDisk luns: /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK09 physicalInsertTime: 2015-02-17T03:31:42+00:00 physicalSerial: /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK09 physicalSize: 11 status: normal CellCLI>
List celldisk
--List all cell disks CellCLI> list celldisk where disktype=flashdisk FD_00_qr01celadm01 normal FD_01_qr01celadm01 normal FD_02_qr01celadm01 normal FD_03_qr01celadm01 normal --List Cell disk details CellCLI> list celldisk CD_09_qr01celadm01 detail name: CD_09_qr01celadm01 comment: creationTime: 2015-02-20T00:57:10+00:00 deviceName: /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK09 devicePartition: /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK09 diskType: HardDisk errorCount: 0 freeSpace: 0 id: f369c761-a9a1-4d1b-aaa0-51fc32edbc42 interleaving: none lun: /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK09 physicalDisk: /opt/oracle/cell12.1.2.1.0_LINUX.X64_141206.1/disks/raw/DISK09 raidLevel: "RAID 0" size: 2G status: normal CellCLI>
List grid disk
CellCLI> list griddisk where celldisk=CD_09_qr01celadm01 detail name: DATA_QR01_CD_09_qr01celadm01 asmDiskGroupName: DATA_QR01 asmDiskName: DATA_QR01_CD_09_QR01CELADM01 asmFailGroupName: QR01CELADM01 availableTo: cachingPolicy: default cellDisk: CD_09_qr01celadm01 comment: creationTime: 2015-03-11T03:23:37+00:00 diskType: HardDisk errorCount: 0 id: 9a11b79a-1fa7-4527-85bf-28ad5cb98cac offset: 400M size: 720M status: active name: DBFS_DG_CD_09_qr01celadm01 asmDiskGroupName: DBFS_DG asmDiskName: DBFS_DG_CD_09_QR01CELADM01 asmFailGroupName: QR01CELADM01 availableTo: cachingPolicy: default cellDisk: CD_09_qr01celadm01 comment: creationTime: 2015-03-11T03:23:35+00:00 diskType: HardDisk errorCount: 0 id: e304a244-75de-4ea6-b92d-b480e2226615 offset: 48M size: 352M status: active name: RECO_QR01_CD_09_qr01celadm01 asmDiskGroupName: RECO_QR01 asmDiskName: RECO_QR01_CD_09_QR01CELADM01 asmFailGroupName: QR01CELADM01 availableTo: cachingPolicy: default cellDisk: CD_09_qr01celadm01 comment: creationTime: 2015-03-11T03:23:40+00:00 diskType: HardDisk errorCount: 0 id: 9923a077-dc5e-4f53-adc5-4cc3eaaa9916 offset: 1.09375G size: 928M status: active CellCLI>
Flash Based
The flash based modules, can be examined as the hard disk based ones:
List the cell disks
--Basic CellCLI> list celldisk where disktype=flashdisk FD_00_qr01celadm01 normal FD_01_qr01celadm01 normal FD_02_qr01celadm01 normal FD_03_qr01celadm01 normal --Detail CellCLI> list flashcache detail name: qr01celadm01_FLASHCACHE cellDisk: FD_02_qr01celadm01,FD_03_qr01celadm01,FD_01_qr01celadm01,FD_00_qr01celadm01 creationTime: 2020-11-29T02:43:46+00:00 degradedCelldisks: effectiveCacheSize: 1.0625G id: 7fcc1eac-2214-40a1-9f27-eb988ec75340 size: 1.0625G status: normal CellCLI>
Apart from the flashdisk, the exadata has also flashlog, to improve the redo log latency.
Flash log
--Detail CellCLI> list flashlog detail name: qr01celadm01_FLASHLOG cellDisk: FD_01_qr01celadm01,FD_03_qr01celadm01,FD_00_qr01celadm01,FD_02_qr01celadm01 creationTime: 2020-11-29T02:43:44+00:00 degradedCelldisks: effectiveSize: 256M efficiency: 100.0 id: 1bb5db75-8136-4952-9d20-bec44a303a17 size: 256M status: normal CellCLI> --Content CellCLI> list flashcachecontent detail ... cachedKeepSize: 0 cachedSize: 262144 cachedWriteSize: 0 columnarCacheSize: 0 columnarKeepSize: 0 dbID: 2080757153 dbUniqueName: DBM hitCount: 11345 missCount: 9 objectNumber: 4294967294 tableSpaceNumber: 0
Drop/Create Flash Cache
We can use the distributed CLI interface, provided with the exadata: “dcli” to run a command on multiple storage cells:
--Drop Flash Cache: [celladmin@qr01celadm01 ~]$ dcli -c qr01celadm01,qr01celadm02,qr01celadm03 cellcli -e drop flashcache qr01celadm01: Flash cache qr01celadm01_FLASHCACHE successfully dropped qr01celadm02: Flash cache qr01celadm02_FLASHCACHE successfully dropped qr01celadm03: Flash cache qr01celadm03_FLASHCACHE successfully dropped [celladmin@qr01celadm01 ~]$ --Create Flash Cache [celladmin@qr01celadm01 ~]$ dcli -c qr01celadm01,qr01celadm02,qr01celadm03 cellcli -e create flashcache all qr01celadm01: Flash cache qr01celadm01_FLASHCACHE successfully created qr01celadm02: Flash cache qr01celadm02_FLASHCACHE successfully created qr01celadm03: Flash cache qr01celadm03_FLASHCACHE successfully created [celladmin@qr01celadm01 ~]$
Stop / Start Cell Services
We can Restart all cell services using the cellCli interface as follows:
[root@qr01celadm01 ~]# cellcli CellCLI: Release 12.1.2.1.0 - Production on Mon Nov 30 03:36:16 UTC 2020 Copyright (c) 2007, 2013, Oracle. All rights reserved. Cell Efficiency Ratio: 392 CellCLI> alter cell restart services all Stopping the RS, CELLSRV, and MS services... The SHUTDOWN of services was successful. Starting the RS, CELLSRV, and MS services... Getting the state of RS services... running Starting CELLSRV services... The STARTUP of CELLSRV services was successful. Starting MS services... The STARTUP of MS services was successful. CellCLI>
This action, will NOT cause downtime.
ReConfigure GridDisk
Let's put ourselves a challange, let's change the size of the ASM disks one diskgroup, on oura grid disks from: 928 MB to - 608 MB That is hard process if we want to avoid downtime. In a nutshell, what we will do is:
- Drop Disks on Cell Server 1
- Reconfigure the Disks on Cell Server 1
- Add the new disks to the diskgroup, while dropping the ones on the next Cell server
We will repeat step 1), 2) and 3) until we have all disks with the same size. P.S. Have disks with different size in one Diskgroup is very bad practice, it will cause constant rebalancing. Without further adew, let's get started:
Check Our Diskgroups
SQL> select dg.name, count(*), d.total_mb, d.os_mb, 2 min(d.free_mb) MIN_FREE_MB, max(d.free_mb) MAX_FREE_MB 3 from v$asm_disk d, v$asm_diskgroup dg 4 where dg.group_number=d.group_number and d.mount_status='CACHED' 5 group by dg.name, d.total_mb, d.os_mb; NAME COUNT(*) TOTAL_MB OS_MB MIN_FREE_MB MAX_FREE_MB --------------- ---------- ---------- ---------- ----------- ----------- DBFS_DG 36 352 352 4 84 DATA_QR01 36 720 720 288 328 RECO_QR01 36 928 928 888 916 <- This one
Now, we can rebalance and manually bring the disks to 608, but that won't help:
Rebalance
SQL> alter diskgroup reco_qr01 resize all size 608m 2 rebalance power 1024; Diskgroup altered. SQL> SQL> select dg.name, count(*), d.total_mb, d.os_mb, 2 min(d.free_mb) MIN_FREE_MB, max(d.free_mb) MAX_FREE_MB 3 from v$asm_disk d, v$asm_diskgroup dg 4 where dg.group_number=d.group_number and d.mount_status='CACHED' 5 group by dg.name, d.total_mb, d.os_mb; NAME COUNT(*) TOTAL_MB OS_MB MIN_FREE_MB MAX_FREE_MB --------------- ---------- ---------- ---------- ----------- ----------- DBFS_DG 36 352 352 4 84 DATA_QR01 36 720 720 288 328 RECO_QR01 36 608 928 568 596 <- This one
The problem here is the fact that the grid disk on the Storage cell is still 928, as indicated by the “OS_MB” size. To change that, we need to do it on rotations, just like with Redo log files :)
Drop disks in Storage Cell Server 1
--Verify there isn't rebalancing: SQL> select * from gv$asm_operation; no rows selected SQL> --Drop disks on Storage Cell Server 1: SQL> alter diskgroup reco_qr01 2 drop disks in failgroup qr01celadm01 (that is our storage cell server 1) 3 rebalance power 1024; Diskgroup altered. --Of course there will be rebalance, but wait for that, to pass: select * from gv$asm_operation; INST_ID GROUP_NUMBER OPERA PASS STAT POWER ACTUAL SOFAR ---------- ------------ ----- --------- ---- ---------- ---------- ---------- EST_WORK EST_RATE EST_MINUTES ERROR_CODE ---------- ---------- ----------- -------------------------------------------- CON_ID ---------- 2 3 REBAL RESYNC DONE 1024 0 2 3 REBAL RESILVER DONE 1024 --Some time later SQL> select * from gv$asm_operation; no rows selected SQL> --Check if they are mounted: SQL> select path, free_mb, header_status, mount_status 2 from v$asm_disk 3 where path like '%RECO_QR01%celadm01'; PATH FREE_MB HEADER_STATU MOUNT_S --------------------------------------------------------------- ---------- ------------ ------- o/192.168.1.105;192.168.1.106/RECO_QR01_CD_10_qr01celadm01 0 FORMER CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_06_qr01celadm01 0 FORMER CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_00_qr01celadm01 0 FORMER CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_05_qr01celadm01 0 FORMER CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_08_qr01celadm01 0 FORMER CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_03_qr01celadm01 0 FORMER CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_09_qr01celadm01 0 FORMER CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_04_qr01celadm01 0 FORMER CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_01_qr01celadm01 0 FORMER CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_07_qr01celadm01 0 FORMER CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_02_qr01celadm01 0 FORMER CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_11_qr01celadm01 0 FORMER CLOSED 12 rows selected.
Now that we have the disks on Cell Storage Server 1 dropped, we can drop the actual disks from the Storage cell. Go to your storage cell server 1 and drop them:
Drop Grid disks
--List grid disks [celladmin@qr01celadm01 ~]$ cellcli CellCLI: Release 12.1.2.1.0 - Production... CellCLI> CellCLI> list griddisk attributes name, size, ASMModeStatus *********************************************************** RECO_QR01_CD_00_qr01celadm01 928M UNUSED RECO_QR01_CD_01_qr01celadm01 928M UNUSED RECO_QR01_CD_02_qr01celadm01 928M UNUSED RECO_QR01_CD_03_qr01celadm01 928M UNUSED RECO_QR01_CD_04_qr01celadm01 928M UNUSED RECO_QR01_CD_05_qr01celadm01 928M UNUSED RECO_QR01_CD_06_qr01celadm01 928M UNUSED RECO_QR01_CD_07_qr01celadm01 928M UNUSED RECO_QR01_CD_08_qr01celadm01 928M UNUSED RECO_QR01_CD_09_qr01celadm01 928M UNUSED RECO_QR01_CD_10_qr01celadm01 928M UNUSED RECO_QR01_CD_11_qr01celadm01 928M UNUSED --Drop the disks: CellCLI> drop griddisk all prefix=reco_qr01 GridDisk RECO_QR01_CD_00_qr01celadm01 successfully dropped GridDisk RECO_QR01_CD_01_qr01celadm01 successfully dropped GridDisk RECO_QR01_CD_02_qr01celadm01 successfully dropped GridDisk RECO_QR01_CD_03_qr01celadm01 successfully dropped GridDisk RECO_QR01_CD_04_qr01celadm01 successfully dropped GridDisk RECO_QR01_CD_05_qr01celadm01 successfully dropped GridDisk RECO_QR01_CD_06_qr01celadm01 successfully dropped GridDisk RECO_QR01_CD_07_qr01celadm01 successfully dropped GridDisk RECO_QR01_CD_08_qr01celadm01 successfully dropped GridDisk RECO_QR01_CD_09_qr01celadm01 successfully dropped GridDisk RECO_QR01_CD_10_qr01celadm01 successfully dropped GridDisk RECO_QR01_CD_11_qr01celadm01 successfully dropped CellCLI> --Create new disks with the correct size: CellCLI> create griddisk all harddisk prefix=RECO_QR01, size=608M GridDisk RECO_QR01_CD_00_qr01celadm01 successfully created GridDisk RECO_QR01_CD_01_qr01celadm01 successfully created GridDisk RECO_QR01_CD_02_qr01celadm01 successfully created GridDisk RECO_QR01_CD_03_qr01celadm01 successfully created GridDisk RECO_QR01_CD_04_qr01celadm01 successfully created GridDisk RECO_QR01_CD_05_qr01celadm01 successfully created GridDisk RECO_QR01_CD_06_qr01celadm01 successfully created GridDisk RECO_QR01_CD_07_qr01celadm01 successfully created GridDisk RECO_QR01_CD_08_qr01celadm01 successfully created GridDisk RECO_QR01_CD_09_qr01celadm01 successfully created GridDisk RECO_QR01_CD_10_qr01celadm01 successfully created GridDisk RECO_QR01_CD_11_qr01celadm01 successfully created --Aaaaand check again: CellCLI> list griddisk attributes name, size, ASMModeStatus *********************************************************** RECO_QR01_CD_00_qr01celadm01 608M UNUSED RECO_QR01_CD_01_qr01celadm01 608M UNUSED RECO_QR01_CD_02_qr01celadm01 608M UNUSED RECO_QR01_CD_03_qr01celadm01 608M UNUSED RECO_QR01_CD_04_qr01celadm01 608M UNUSED RECO_QR01_CD_05_qr01celadm01 608M UNUSED RECO_QR01_CD_06_qr01celadm01 608M UNUSED RECO_QR01_CD_07_qr01celadm01 608M UNUSED RECO_QR01_CD_08_qr01celadm01 608M UNUSED RECO_QR01_CD_09_qr01celadm01 608M UNUSED RECO_QR01_CD_10_qr01celadm01 608M UNUSED RECO_QR01_CD_11_qr01celadm01 608M UNUSED CellCLI>
Now, that we have brand new disks, we have to add them :) Remember never drop more than you can carry
- External Redundancy - Drop will cause data loss
- Normal Redundancy - Drop more than 1/2 will cause data loss
- High Redundancy - Drop more than 2/3 will cause data loss.
In our case, we have Normal Redundancy, so we go one by one just in case.
Verify the Disks
--In ASM we can check the new disks: --Check if they are mounted: SQL> select path, free_mb, header_status, mount_status 2 from v$asm_disk 3 where path like '%RECO_QR01%celadm01'; PATH FREE_MB HEADER_STATU MOUNT_S --------------------------------------------------------------- ---------- ------------ ------- o/192.168.1.105;192.168.1.106/RECO_QR01_CD_10_qr01celadm01 0 CANDIDATE CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_06_qr01celadm01 0 CANDIDATE CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_00_qr01celadm01 0 CANDIDATE CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_05_qr01celadm01 0 CANDIDATE CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_08_qr01celadm01 0 CANDIDATE CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_03_qr01celadm01 0 CANDIDATE CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_09_qr01celadm01 0 CANDIDATE CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_04_qr01celadm01 0 CANDIDATE CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_01_qr01celadm01 0 CANDIDATE CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_07_qr01celadm01 0 CANDIDATE CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_02_qr01celadm01 0 CANDIDATE CLOSED o/192.168.1.105;192.168.1.106/RECO_QR01_CD_11_qr01celadm01 0 CANDIDATE CLOSED 12 rows selected.
So we have 6 shiny new disks to add to our group, let's add them:
Add the new Disks
SQL> alter diskgroup reco_qr01 add disk 2 'o/192.168.1.105;192.168.1.106/RECO_QR01_CD_00_qr01celadm01', 3 'o/192.168.1.105;192.168.1.106/RECO_QR01_CD_01_qr01celadm01', 4 'o/192.168.1.105;192.168.1.106/RECO_QR01_CD_02_qr01celadm01', 5 'o/192.168.1.105;192.168.1.106/RECO_QR01_CD_03_qr01celadm01', 6 'o/192.168.1.105;192.168.1.106/RECO_QR01_CD_04_qr01celadm01', 7 'o/192.168.1.105;192.168.1.106/RECO_QR01_CD_05_qr01celadm01', 8 'o/192.168.1.105;192.168.1.106/RECO_QR01_CD_06_qr01celadm01', 9 'o/192.168.1.105;192.168.1.106/RECO_QR01_CD_07_qr01celadm01', 10 'o/192.168.1.105;192.168.1.106/RECO_QR01_CD_08_qr01celadm01', 11 'o/192.168.1.105;192.168.1.106/RECO_QR01_CD_09_qr01celadm01', 12 'o/192.168.1.105;192.168.1.106/RECO_QR01_CD_10_qr01celadm01', 13 'o/192.168.1.105;192.168.1.106/RECO_QR01_CD_11_qr01celadm01' 14 drop disks in failgroup qr01celadm02 <- Drop the disks on the 2nd Cell Storage Server 15 rebalance power 1024; Diskgroup altered. SQL> --Of course there will be some rebalance, but that will pass SQL> select * from gv$asm_operation; *********************************** ---------- ------------ ----- --------- ---- ---------- ---------- ---------- EST_WORK EST_RATE EST_MINUTES ERROR_CODE ---------- ---------- ----------- -------------------------------------------- CON_ID ---------- 2 3 REBAL COMPACT WAIT 1024 0 --Some time later: SQL> select * from gv$asm_operation; no rows selected SQL>
Now we can see that we have one disk group with two different sizes:
Check Diskgroups
SQL> select dg.name, count(*), d.total_mb, d.os_mb, 2 min(d.free_mb) MIN_FREE_MB, max(d.free_mb) MAX_FREE_MB 3 from v$asm_disk d, v$asm_diskgroup dg 4 where dg.group_number=d.group_number and d.mount_status='CACHED' 5 group by dg.name, d.total_mb, d.os_mb; NAME COUNT(*) TOTAL_MB OS_MB MIN_FREE_MB MAX_FREE_MB --------------- ---------- ---------- ---------- ----------- ----------- DBFS_DG 36 352 352 4 84 DATA_QR01 36 720 720 288 328 RECO_QR01 12 608 608 572 592 <- New one (12 which we moved) RECO_QR01 12 608 928 576 592 <- Old one (24 - 12 (which we dropped in the previous step)) 36 disks in total
Again that is very bad practice, so we should continue, as the steps repeat I will slack on explaining :) In a nutshell we have to move the disks from the group with 928 MB disk size to 608 disk size :)
Move the disks
--On the 2nd Cell Storage server CellCLI> list griddisk attributes name, size, ASMModeStatus RECO_QR01_CD_00_qr01celadm02 928M UNUSED RECO_QR01_CD_01_qr01celadm02 928M UNUSED RECO_QR01_CD_02_qr01celadm02 928M UNUSED RECO_QR01_CD_03_qr01celadm02 928M UNUSED RECO_QR01_CD_04_qr01celadm02 928M UNUSED RECO_QR01_CD_05_qr01celadm02 928M UNUSED RECO_QR01_CD_06_qr01celadm02 928M UNUSED RECO_QR01_CD_07_qr01celadm02 928M UNUSED RECO_QR01_CD_08_qr01celadm02 928M UNUSED RECO_QR01_CD_09_qr01celadm02 928M UNUSED RECO_QR01_CD_10_qr01celadm02 928M UNUSED RECO_QR01_CD_11_qr01celadm02 928M UNUSED <- We dropped them earlier --Drop the grid disks: CellCLI> drop griddisk all prefix=reco_qr01 GridDisk RECO_QR01_CD_00_qr01celadm02 successfully dropped GridDisk RECO_QR01_CD_01_qr01celadm02 successfully dropped GridDisk RECO_QR01_CD_02_qr01celadm02 successfully dropped GridDisk RECO_QR01_CD_03_qr01celadm02 successfully dropped GridDisk RECO_QR01_CD_04_qr01celadm02 successfully dropped GridDisk RECO_QR01_CD_05_qr01celadm02 successfully dropped GridDisk RECO_QR01_CD_06_qr01celadm02 successfully dropped GridDisk RECO_QR01_CD_07_qr01celadm02 successfully dropped GridDisk RECO_QR01_CD_08_qr01celadm02 successfully dropped GridDisk RECO_QR01_CD_09_qr01celadm02 successfully dropped GridDisk RECO_QR01_CD_10_qr01celadm02 successfully dropped GridDisk RECO_QR01_CD_11_qr01celadm02 successfully dropped CellCLI> --Create new Grid Disks with new Size: CellCLI> create griddisk all harddisk prefix=RECO_QR01, size=608M GridDisk RECO_QR01_CD_00_qr01celadm02 successfully created GridDisk RECO_QR01_CD_01_qr01celadm02 successfully created GridDisk RECO_QR01_CD_02_qr01celadm02 successfully created GridDisk RECO_QR01_CD_03_qr01celadm02 successfully created GridDisk RECO_QR01_CD_04_qr01celadm02 successfully created GridDisk RECO_QR01_CD_05_qr01celadm02 successfully created GridDisk RECO_QR01_CD_06_qr01celadm02 successfully created GridDisk RECO_QR01_CD_07_qr01celadm02 successfully created GridDisk RECO_QR01_CD_08_qr01celadm02 successfully created GridDisk RECO_QR01_CD_09_qr01celadm02 successfully created GridDisk RECO_QR01_CD_10_qr01celadm02 successfully created GridDisk RECO_QR01_CD_11_qr01celadm02 successfully created CellCLI> --On ASM, add the new disks to the "608" disk group: SQL> alter diskgroup reco_qr01 add disk 2 'o/192.168.1.107;192.168.1.108/RECO_QR01_CD_00_qr01celadm02', 3 'o/192.168.1.107;192.168.1.108/RECO_QR01_CD_01_qr01celadm02', 4 'o/192.168.1.107;192.168.1.108/RECO_QR01_CD_02_qr01celadm02', 5 'o/192.168.1.107;192.168.1.108/RECO_QR01_CD_03_qr01celadm02', 6 'o/192.168.1.107;192.168.1.108/RECO_QR01_CD_04_qr01celadm02', 7 'o/192.168.1.107;192.168.1.108/RECO_QR01_CD_05_qr01celadm02', 8 'o/192.168.1.107;192.168.1.108/RECO_QR01_CD_06_qr01celadm02', 9 'o/192.168.1.107;192.168.1.108/RECO_QR01_CD_07_qr01celadm02', 10 'o/192.168.1.107;192.168.1.108/RECO_QR01_CD_08_qr01celadm02', 11 'o/192.168.1.107;192.168.1.108/RECO_QR01_CD_09_qr01celadm02', 12 'o/192.168.1.107;192.168.1.108/RECO_QR01_CD_10_qr01celadm02', 13 'o/192.168.1.107;192.168.1.108/RECO_QR01_CD_11_qr01celadm02' 14 drop disks in failgroup qr01celadm03 <- Drop the disks on the 3rd Storage Cell Server 15 rebalance power 1024; Diskgroup altered. SQL> --On the 3rd Storage Cell Server CellCLI> list griddisk attributes name, size, ASMModeStatus RECO_QR01_CD_00_qr01celadm03 928M UNUSED RECO_QR01_CD_01_qr01celadm03 928M UNUSED RECO_QR01_CD_02_qr01celadm03 928M UNUSED RECO_QR01_CD_03_qr01celadm03 928M UNUSED RECO_QR01_CD_04_qr01celadm03 928M UNUSED RECO_QR01_CD_05_qr01celadm03 928M UNUSED RECO_QR01_CD_06_qr01celadm03 928M UNUSED RECO_QR01_CD_07_qr01celadm03 928M UNUSED RECO_QR01_CD_08_qr01celadm03 928M UNUSED RECO_QR01_CD_09_qr01celadm03 928M UNUSED RECO_QR01_CD_10_qr01celadm03 928M UNUSED RECO_QR01_CD_11_qr01celadm03 928M UNUSED --Drop the grid disks: CellCLI> drop griddisk all prefix=reco_qr01 GridDisk RECO_QR01_CD_00_qr01celadm03 successfully dropped GridDisk RECO_QR01_CD_01_qr01celadm03 successfully dropped GridDisk RECO_QR01_CD_02_qr01celadm03 successfully dropped GridDisk RECO_QR01_CD_03_qr01celadm03 successfully dropped GridDisk RECO_QR01_CD_04_qr01celadm03 successfully dropped GridDisk RECO_QR01_CD_05_qr01celadm03 successfully dropped GridDisk RECO_QR01_CD_06_qr01celadm03 successfully dropped GridDisk RECO_QR01_CD_07_qr01celadm03 successfully dropped GridDisk RECO_QR01_CD_08_qr01celadm03 successfully dropped GridDisk RECO_QR01_CD_09_qr01celadm03 successfully dropped GridDisk RECO_QR01_CD_10_qr01celadm03 successfully dropped GridDisk RECO_QR01_CD_11_qr01celadm03 successfully dropped CellCLI> --Create new Disks with the correct size (608) CellCLI> create griddisk all harddisk prefix=RECO_QR01, size=608M GridDisk RECO_QR01_CD_00_qr01celadm03 successfully created GridDisk RECO_QR01_CD_01_qr01celadm03 successfully created GridDisk RECO_QR01_CD_02_qr01celadm03 successfully created GridDisk RECO_QR01_CD_03_qr01celadm03 successfully created GridDisk RECO_QR01_CD_04_qr01celadm03 successfully created GridDisk RECO_QR01_CD_05_qr01celadm03 successfully created GridDisk RECO_QR01_CD_06_qr01celadm03 successfully created GridDisk RECO_QR01_CD_07_qr01celadm03 successfully created GridDisk RECO_QR01_CD_08_qr01celadm03 successfully created GridDisk RECO_QR01_CD_09_qr01celadm03 successfully created GridDisk RECO_QR01_CD_10_qr01celadm03 successfully created GridDisk RECO_QR01_CD_11_qr01celadm03 successfully created CellCLI> --On ASM, add the new disks (finally) to the disk group: SQL> alter diskgroup reco_qr01 add disk 2 'o/192.168.1.109;192.168.1.110/RECO_QR01_CD_00_qr01celadm03', 3 'o/192.168.1.109;192.168.1.110/RECO_QR01_CD_01_qr01celadm03', 4 'o/192.168.1.109;192.168.1.110/RECO_QR01_CD_02_qr01celadm03', 5 'o/192.168.1.109;192.168.1.110/RECO_QR01_CD_03_qr01celadm03', 6 'o/192.168.1.109;192.168.1.110/RECO_QR01_CD_04_qr01celadm03', 7 'o/192.168.1.109;192.168.1.110/RECO_QR01_CD_05_qr01celadm03', 8 'o/192.168.1.109;192.168.1.110/RECO_QR01_CD_06_qr01celadm03', 9 'o/192.168.1.109;192.168.1.110/RECO_QR01_CD_07_qr01celadm03', 10 'o/192.168.1.109;192.168.1.110/RECO_QR01_CD_08_qr01celadm03', 11 'o/192.168.1.109;192.168.1.110/RECO_QR01_CD_09_qr01celadm03', 12 'o/192.168.1.109;192.168.1.110/RECO_QR01_CD_10_qr01celadm03', 13 'o/192.168.1.109;192.168.1.110/RECO_QR01_CD_11_qr01celadm03' 14 rebalance power 1024; Diskgroup altered. SQL>
Finally we can check the size of the disk groups again:
Check the results
SQL> select dg.name, count(*), d.total_mb, d.os_mb, 2 min(d.free_mb) MIN_FREE_MB, max(d.free_mb) MAX_FREE_MB 3 from v$asm_disk d, v$asm_diskgroup dg 4 where dg.group_number=d.group_number and d.mount_status='CACHED' 5 group by dg.name, d.total_mb, d.os_mb; NAME COUNT(*) TOTAL_MB OS_MB MIN_FREE_MB MAX_FREE_MB --------------- ---------- ---------- ---------- ----------- ----------- DBFS_DG 36 352 352 4 84 DATA_QR01 36 720 720 288 328 RECO_QR01 36 608 608 572 592 <- New one