Show Tech:
BMC – Bios Info
BIOS Complete – smbios is sent
- blade memory, boot order, cpu, memory
Collect show tech:
UCS1-FI-A(local-mgmt)# show tech-support ucsm detail
UCS1-FI-A(local-mgmt)# show tech chassis 1 ALL DETAIL
copy workspace:///techsupport/20091105202812_UCS1-FI_BC001_all.tar scp:
IBMC3:
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105202812_UCS1-FI_BC001_all/IBMC3/tmp}1071)ls
IBMC3_TechSupport_bios_post_results.txt
IBMC3_TechSupport_smbios.bin
IBMC3_TechSupport.txt
BIOS Complete Status:
1
Means it is complete.
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105202812_UCS1-FI_BC001_all/IBMC3/tmp}1078)more IBMC3_TechSupport.txt
BMC info:
ver: 1.0(1e)
process that are running
smbios.bin. we need to check the decode utility: dmidecode (it is on the switch on linux)
tar ball decode:
OBFL:
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105202812_UCS1-FI_BC001_all/IBMC3/obfl}1079)ls
obfl-log obfl-log.1 obfl-log.2 obfl-log.3 obfl-log.4 obfl-log.5
l
ved situation
5:2009 Nov 4 22:48:15:BMC:mctool@127.6.254.1:1616: mcserver_ipmi_extensions.c:2
99:[mcserver_set_bios_boot_order]Setting BIOS Boot Order
5:2009 Nov 4 22:48:15:BMC:mctool@127.6.254.1:1616: mcserver_ipmi_extensions.c:5
45:[mcserver_set_uuid]Setting Soft UUID: e7 a6 ce d6 7d 2e 11 de ae 4b 00 0b ab
01 c0 fb
5:2009 Nov 4 22:48:15:BMC:mctool@127.6.254.1:1616: mcserver_ipmi_extensions.c:2
12:[mcserver_set_vdd_power]"Power Off"
:2009 Nov 5 16:20:58:BMC:IPMI:525: mcddI2CDrv.c:836:PI2CWriteRead: ioctl to dr
iver failed to read Bus[f8].Dev[30]! ErrorStatus[fd]
0:2009 Nov 5 16:20:58:BMC:IPMI:525: mcddI2CDrv.c:836:PI2CWriteRead: ioctl to dr
iver failed to read Bus[f8].Dev[32]! ErrorStatus[fd]
Who rebooted ?
IPMI command
Auto booted – Poweroff /Power On
/var/log/messages
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105202812_UCS1-FI_BC001_all/var/log}1097)ls – messages are gone if blade reboots:
avct_server critical messages.1
avct_server.1 critical.1 virtual_media
avct_server.first messages x-remserial.log
tail x-remserial.log
Wed Nov 4 23:11:47 2009: 53de: Serial Read: 0x8
Wed Nov 4 23:11:47 2009: 53de: Serial Read: 0x10
Wed Nov 4 23:11:47 2009: 53de: Serial Read: 0x8
Wed Nov 4 23:11:47 2009: 53de: Serial Read: 0x8
Wed Nov 4 23:11:47 2009: 53de: Serial Read: 0x8
Wed Nov 4 23:11:47 2009: 53de: Serial Read: 0x5
Wed Nov 4 23:11:47 2009: 53de: Net Send: 0xcd bytes
Wed Nov 4 23:12:03 2009: 53de: x-remserial is closing.
Wed Nov 4 23:41:57 2009: 5a83: x-remserial started.
Thu Nov 5 00:24:47 2009: 5a83: x-remserial is closing.
tail messages
6:2009 Nov 5 20:22:37:BMC:mctool@127.5.254.1:27309: mcserver_ipmi_extensions.c:596:[mcserver_get_smbios_table]Getting SMBIOS
6:2009 Nov 5 20:22:37:BMC:mctool@127.5.254.1:27309: mcserver_ipmi_extensions.c:647:SMBIOS Table version match. No need to send SMBIOS table back.
6:2009 Nov 5 20:24:37:BMC:mctool@127.5.254.1:27309: mcserver_ipmi_extensions.c:596:[mcserver_get_smbios_table]Getting SMBIOS
6:2009 Nov 5 20:24:37:BMC:mctool@127.5.254.1:27309: mcserver_ipmi_extensions.c:647:SMBIOS Table version match. No need to send SMBIOS table back.
6:2009 Nov 5 20:26:37:BMC:mctool@127.5.254.1:27309: mcserver_ipmi_extensions.c:596:[mcserver_get_smbios_table]Getting SMBIOS
6:2009 Nov 5 20:26:37:BMC:mctool@127.5.254.1:27309: mcserver_ipmi_extensions.c:647:SMBIOS Table version match. No need to send SMBIOS table back.
6:2009 Nov 5 20:28:23:BMC:mctool@127.5.254.1:19798: mcserver_net.c:110:New Connection from [127.5.254.1]
6:2009 Nov 5 20:28:24:BMC:mctool@127.5.254.1:19798: mcserver_show_techsupport.c:48:Received a show techsupport detailed cmd
6:2009 Nov 5 20:28:37:BMC:mctool@127.5.254.1:27309: mcserver_ipmi_extensions.c:596:[mcserver_get_smbios_table]Getting SMBIOS
6:2009 Nov 5 20:28:37:BMC:mctool@127.5.254.1:27309: mcserver_ipmi_extensions.c:647:SMBIOS Table version match. No need to send SMBIOS table back.
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105202812_UCS1-FI_BC001_all/var/log}1105)cd virtual_media/
Monitor
more avct_server
6:2009 Nov 5 17:51:33:BMC:avct_server:593: Video Signal detected: 720 x 400, co
lor depth is: 4 bits
6:2009 Nov 5 17:51:33:BMC:avct_server:593: VGA Text mode
6:2009 Nov 5 17:51:33:BMC:avct_server:593: AvspResolutionChange: 720 x 400
6:2009 Nov 5 17:51:33:BMC:avct_server:593: avct_setVGAPaletteMsg reports new pa
lette with 16 colors
6:2009 Nov 5 17:51:35:BMC:avct_server:593: avct_setVGAPaletteMsg reports new pa
lette with 16 colors
6:2009 Nov 5 18:01:47:BMC:avct_server:593: avct_setVGAPaletteMsg reports new pa
lette with 16 colors
CMC Tech Support
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105202812_UCS1-FI_BC001_all/IOCard1/techsupport_detailed_iocard1}1125)ls
cmc ipc_cm_redwood_events proc_slabinfo.out
cores ipc_fru ps_det.out
df.out ipc_ohms ps.out
dmclient_test.frus ipc_sdr psreadings.out
dmclient_test.out ipc_seeprom redwood
fancontrol.out ipc_sel release_info
free.out ipc_sensor root_cmd_history
fru ipc_thermal sdr
fsl-i2c.1-counters.out ipc_updated sel
fsl-i2c.2-counters.out lsof.out sensor
ifconfig_a.out netstat.out top.out
ipc_chassis_info proc_cpuinfo.out uname.out
ipc_cm_local_cluster proc_interrupts.out uptime.out
ipc_cm_redwood proc_meminfo.out
S1-FI_BC001_all/IOCard1/techsupport_detailed_iocard1/cmc}1129)ls
cms.info.atu cms.info.info cms.info.vtu
cms.info.global cms.info.port log
- added new blade/power changes
- why the fan changes – Thermal logs
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105202812_UCS1-FI_BC001_all/IOCard1/techsupport_detailed_iocard1/cmc/log}1131)ls
- chassis_info dmserver.5.gz pmon
- chassis-status.xml iom_serial_num pmon_pstate
- cluster_manager ipmiserver pwrmgr
- cluster_manager.log krphy.log sam_connected
- cmc_manager mctools_params thermal
- cmc_manager.log messages thermal.1.gz
- cmc_ohms_errors messages.1.gz thermal.2.gz
- cmc_ohms_status obfl-cmc.log thermal.3.gz
- cmc_post_status obfl-swupdate.log thermal.4.gz
- critical ohms.log thermal.5.gz
- dmserver peer_cmc_ohms_status thermal.log
- dmserver.1.gz peer_cmc_post_status uboot_console.log
- dmserver.2.gz peer_uboot_post_status uboot_post_status
- dmserver.3.gz platform_ohms
- dmserver.4.gz platform_post
Transport to BMC And adapter,
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105202812_UCS1-FI_BC001_all/IOCard1/techsupport_detailed_iocard1/redwood}1135)ls
logs show_elog show_oper
show_ctx show_ilog show_post
show_debug_satctrl show_ints show_sts
rw> Redwood POST Results:
legend:
'.' PASSED
'X' FAILED
' ' Not Run
+-------------------+-+----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |A| | | | | | | | | | | | | | | |
| |S|ASIC| | |H|H|H|H|H|H|H|H|N|N|N|N|
| |I|LVL |C|B|I|I|I|I|I|I|I|I|I|I|I|I|
| POST Test |C|RSLT|I|I|0|1|2|3|4|5|6|7|0|1|2|3|
+-------------------+-+----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0. Register Test |0| . | | | | | | | | | | | | | | |
| 1. MBIST |0| . | | | | | | | | | | | | | | |
| 2. CI Loopback |0| . | | | | | | | | | | | | | | |
| 3. Serdes |0| | | |.|.|.|.|.|.|.|.|.|.|.|.|
| 4. PHY BIST |0| | | |.|.|.|.|.|.|.|.|.|.|.|.|
| 5. PRBS |0| | | |.|.|.|.|.|.|.|.|.|.|.|.|
| 6. PCS Loopback |0| | | |.|.|.|.|.|.|.|.|.|.|.|.|
| 7. IIF PRBS |0| | | | | | | | | | | | | | | |
| 8. Runtime Failure|0| | | | | | | | | | | | | | | |
+-------------------+-+----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
rw>
r
rw> CI /BI – CMC facing ports
ASIC 0:
+--+----+-+----+-----+-------------------------+-+
| | | | |MAC | PHY | |
|P | N |A| |-+-+-+----+-+-+-+--------+-----+ +
|o | a |d| | | |A| |X| | | | | |
|r | m |m| |L|R|L| |G|P|P| | |S|
|t | e |i|Oper|C|M|G|MDIO|X|C|M| | |F|
| | |n| St |L|T|N|adr |S|S|D| u-code | Ver |P|
+--+----+-+----+-+-+-+----+-+-+-+--------+-----+-+
| 0| CI |E| Up | | | | 0 |0|0|0| n/a | 0.00| |
| 1| BI |E| Up | | | | 0 |0|0|0| n/a | 0.00| |
| 2| HI0|-| Dn | | |1| 18 |0|0|0| Ok | 1.09| |
| 3| HI1|-| Dn | | |1| 19 |0|0|0| Ok | 1.09| |
| 4| HI2|E| Up | | | | 16 |1|1|1| Ok | 1.09| | -- blade 5.6 is up.
| 5| HI3|E| Up | | | | 17 |1|1|1| Ok | 1.09| |
| 6| HI4|E| Up | | | | 14 |1|1|1| Ok | 1.09| |
| 7| HI5|E| Up | | | | 15 |1|1|1| Ok | 1.09| | - blade 3.4
| 8| HI6|-| Dn | | |1| 12 |0|0|0| Ok | 1.09| |
| 9| HI7|-| Dn | | |1| 13 |0|0|0| Ok | 1.09| | -- blade 0
|10| NI0|E| Dn |1| | | 23 |1|0|0| Ok | 1.39|*|
|11| NI1|E| Dn |1| | | 22 |1|0|0| Ok | 1.39|*|
|12| NI2|E| Dn |1| | | 21 |1|0|0| Ok | 1.39|*|
|13| NI3|E| Up | | | | 20 |1|1|1| Ok | 1.39|*|
+--+----+-+----+-+-+-+----+-+-+-+--------+-----+-+
+---+----+----+----+
SFP: |[$]| [$]| [$]| [$]|
+---+----+----+----+
: : : |
+-+----+----+----+-+
| 0 1 2 3 |
| I I I I |
| N N N N |
| |
| ASIC 0 |
| |
| H H H H H H H H |
| I I I I I I I I |
| 0 1 2 3 4 5 6 7 |
+-+-+-+-+-+-+-+-+--+
- - | | | | - -
+-+-+-+-+-+-+-+-+
|-|-|v|v|v|v|-|-|
+-+-+-+-+-+-+-+-+
Blade: 8 7 6 5 4 3 2 1
HI0 – blade 8
HI1- blade 7
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105202812_UCS1-FI_BC001_all/IOCard1/techsupport_detailed_iocard1/redwood}1142)more show_elog
rw> 380422.431 - 0-HI4: Admin state changed to Dsbl
380422.458 - 0-HI4: Oper state changed to Dn
380425.282 - 0-HI4: Admin state changed to Enbl
380425.339 - 0-HI4: Admin state changed to Dsbl
380428.257 - 0-HI4: Admin state changed to Enbl
380429.379 - 0-HI4: Oper state changed to Up
380438.419 - 0-HI4: Oper state changed to Dn
380438.458 - 0-HI4: Admin state changed to Dsbl
380441.193 - 0-HI4: Admin state changed to Enbl
Detailed States
rw> ASIC: 0:
+-------+--------------------------+--------------+-----------+-----------+-----
------+
| ASIC | Interrupt Bit Field | Count1 | Thresh1 | Count2 | Thr
esh2 |
| Port | | | | |
|
+-------+--------------------------+--------------+-----------+-----------+-----
------+
| 0-HI2 | not_synced_lane_3 | 9 | 0 | 0 |
1 |
| 0-HI2 | not_synced_lane_2 | 26 | 0 | 0 |
1 |
| 0-HI2 | not_synced_lane_1 | 15 | 0 | 0 |
1 |
| 0-HI2 | not_synced_lane_0 | 11 | 0 | 0 |
1 |
| 0-HI2 | synced_lane_3 | 28 | 0 | 0 |
1 |
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105202812_UCS1-FI_BC001_all/IOCard1/techsupport_detailed_iocard1/cmc/log}1153)ls
chassis_info dmserver.5.gz pmon
chassis-status.xml iom_serial_num pmon_pstate
cluster_manager ipmiserver pwrmgr
cluster_manager.log krphy.log sam_connected
cmc_manager mctools_params thermal
cmc_manager.log messages thermal.1.gz
cmc_ohms_errors messages.1.gz thermal.2.gz
cmc_ohms_status obfl-cmc.log thermal.3.gz
cmc_post_status obfl-swupdate.log thermal.4.gz
critical ohms.log thermal.5.gz
dmserver peer_cmc_ohms_status thermal.log
dmserver.1.gz peer_cmc_post_status uboot_console.log
dmserver.2.gz peer_uboot_post_status uboot_post_status
dmserver.3.gz platform_ohms
dmserver.4.gz platform_post
Each process has the status, u need to look specifically
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105202812_UCS1-FI_BC001_all/IOCard1/techsupport_detailed_iocard1/cmc/log}1154)more ohms.log
Logfile: /var/cmc/log/ohms.log
EEPROM Log:
cmc slot number 0
signature : 0xa5a50000
version : 1
uboot post status : 0x20000000
cmc post status : 0x01000000
cmc ohms status : 0x00000040
chassis ok led status : ON
chassis fault led status : OFF
cpu single ecc errors : 0
sysinfo ok : 1
uptime : 6 days, 22:36
total memory : 256632 kB
free memory : 148904 kB
process count : 76
load average : 1.89, 1.52, 1.42
peer signature : 0xb5b50001
peer version : 1
peer uboot post status seeprom : 0x00000000
peer uboot post status : 0x00000000
peer cmc post status seeprom : 0x00000000
peer cmc post status : 0x00000000
peer cmc ohms status : 0x00000000
peer chassis ok led status : ON
peer chassis fault led status : OFF
peer cpu single ecc errors : 0
peer sysinfo ok : 1
peer uptime : 9 days, 2:04
peer total memory : 256632 kB
peer free memory : 155048 kB
peer process count : 76
peer load average : 2.03, 1.37, 1.27
Chassis OHMS status cmc0 cmc1
master : - 1
cmc ohms status : 0x00000040 0x00000000
cpu error : 0 0
memory error : 0 0
memory controller error : 0 0
selected image error : 0 0
alternate image error : 0 0
i2c bus 0 error : 0 0
i2c bus 1 error : 1 0
i2c bus 1 master error : 0 0
cpu mdio bus error : 0 0
cpu interrupt error : 0 0
cpu kernel crash : 0 0
user process restart : 0 0
cpu low memory : 0 0
obfl error : 0 0
serial link error : 0 0
cpu tsec1 eth error : 0 0
cpu tsec2 eth error : 0 0
inlet 1 temp sensor error : 0 0
inlet 2 temp sensor error : 0 0
redwood temp sensor error : 0 0
pca9539 hub error : 0 0
iom fru error : 0 0
chassis fru error : 0 0
chassis seeprom error : 0 0
fan error : 0 0
minor thermal error : 0 0
minor power error : 0 0
cms error : 0 0
bmc error : 0 0
sam error : 0 0
major thermal error : 0 0
major power error : 0 0
Error Counts:
Error Count Description
274 2 error reading fan device id
275 2 error reading fan speed
Peer Error Counts:
Error Count Description
I2C device driver statistics:
Name Addr Success IO error NotFound Timeout Busy Interr
upt Refused TooLarge
iom-fru 0-0050 c 0 0 0 0
0 0 0
c.fru 1-0012 2 0 0 0 0
0 0 0
c.seeprom 1-0014 2607 0 0 0 0
0 0 0
c.ms 1-0013 0 0
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105202812_UCS1-FI_BC001_all/IOCard1/techsupport_detailed_iocard1/cmc/log}1158)more thermal
2009 Nov 5 20:20:01 UCS1-FI BC01_IOM01_thermal-3-QCI1326002U 0:dm_get_fan_duty
cycle:Error send/recv req, error: Operation not permitted
2009 Nov 5 20:20:01 UCS1-FI BC01_IOM01_thermal-3-QCI1326002U 0:dm_get_fan_duty
cycle:Error send/recv req, error: Operation not permitted
2009 Nov 5 20:20:01 UCS1-FI BC01_IOM01_thermal-5-QCI1326002U 0:saratoga_fan_po
licy:F5 no tach reading from ipmi, trying dmserver
2009 Nov 5 20:20:01 UCS1-FI BC01_IOM01_thermal-3-QCI1
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105202812_UCS1-FI_BC001_all/IOCard1/techsupport_detailed_iocard1}1162)ls
cmc ipc_cm_redwood_events proc_slabinfo.out
cores ipc_fru ps_det.out
df.out ipc_ohms ps.out
dmclient_test.frus ipc_sdr psreadings.out
dmclient_test.out ipc_seeprom redwood
fancontrol.out ipc_sel release_info
free.out ipc_sensor root_cmd_history
fru ipc_thermal sdr
fsl-i2c.1-counters.out ipc_updated sel
fsl-i2c.2-counters.out lsof.out sensor
ifconfig_a.out netstat.out top.out
ipc_chassis_info proc_cpuinfo.out uname.out
ipc_cm_local_cluster proc_interrupts.out uptime.out
ipc_cm_redwood proc_meminfo.out
Internal Chassis Ethernet status
more ifconfig_a.out
Summary system:
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105202812_UCS1-FI_BC001_all/IOCard1/techsupport_detailed_iocard1}1167)more dmclient_test.out
Last scan time : 599803
Chassis-id : 1
Fabric-id : 2
Cluster-id : 335c22fe-b901-11de-9f0d-000decd53a04
Slot id : 0
Amber LED status : ON
Green LED status : OFF
Chassis ok LED status : ON
Chassis fault LED status : ON
Locate LED status : OFF
Locate buttion status : 0
Backplane status : 1
Blades present : 2 3 4 5
Blades powered on : 0 1 2 3 4 5 6 7
Blades alerted : 2 3 4 5
Fans present : 0 1 2 3 4 5 6 7
Fans alerted : 0 1 2 3 4 5 6 7
PSs present : 0 2
PSs RMT on : 0 1 2 3
PS DC ok : 0 2
PS AC ok : 0 2
SFPs present :
Redwood pll status : 0 1 2
Redwood status : 0
IBMC Link status : 2 3 4 5
uBoot post status : 0x20000000
Peer uBoot post status : 0x00000000
CMC post status : 0x01000000
Peer CMC post status : 0x00000000
Health status : 0x00000040
Peer Health status : 0x00000000
Number of ethernet addresses : 10
Start IOM ethernet address : 00:26:51:08:37:be
Eth0 MAC: : 00:26:51:08:37:be
Eth1 MAC : 00:26:51:08:37:bf
HIF state : 2 3 4 5 ------ blade 3,4,5,6 are up..
dmserver init status : 1
gilroy status (is hung) : 0
Current Fan status:
S1-FI_BC001_all/IOCard1/techsupport_detailed_iocard1}1168) more fancontrol.out
Fan 0 dutycycle: 30
Fan 1 dutycycle: 30
Fan 2 dutycycle: 30
Fan 3 dutycycle: 30
Fan 6 dutycycle: 30
Fan 7 dutycycle: 30
MEZZ Card status:
- Menlo limited space.
Palo Card sends a lot more data.
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105202812_UCS1-FI_BC001_all}1176)ls
IBMC3 MEZZ31_TechSupport.txt
IBMC3_TechSupport.tar MEZZ31_TechSupport.txt.done
IBMC3_TechSupport.tar.gz.done MEZZ41_TechSupport.txt
IBMC4_TechSupport.tar.gz MEZZ41_TechSupport.txt.done
IBMC4_TechSupport.tar.gz.done MEZZ51_TechSupport.txt
IBMC5_TechSupport.tar.gz MEZZ51_TechSupport.txt.done
IBMC5_TechSupport.tar.gz.done MEZZ61_TechSupport.txt
IBMC6_TechSupport.tar.gz MEZZ61_TechSupport.txt.done
IBMC6_TechSupport.tar.gz.done mnt
IOCard1 obfl
IOCard1_TechSupport.tar tmp
IOCard1_TechSupport.tar.gz.done usr
IOCard2_TechSupport.tar.gz var
IOCard2_TechSupport.tar.gz.done
S1-FI_BC001_all}1177)more MEZZ31_TechSupport.txt
Current Firmware Image Verion 1.0(1e)
Backup Firmware Image (GOOD) Version 1.0(1e)
Menlo Configuration:
Uplink Port 0 Mac Address: 00:26:51:08:83:d6
Uplink Port 1 Mac Address: 00:26:51:08:83:d7
Link States:
Port Physical Logical
uif0 1 1
uif1 1 1
eth0 1 1
eth1 1 1
fc0 2 2
fc1 2 2
VIF Information:
Log VIC
[0000] 00000:01:17:09:810 vif[2]: veth785: s:STANDBY_UP_ACTIVATE(e:ACTIV
ATE_SUCCESS)->s:A
[0001] 00000:01:17:09:810 active_vif_up: port 0 veth785 primary 1
[0002] 00000:01:17:09:810 vic_set_ratelimit: samindex 785 pif 0 rate 0xf
fffffff burst 0xf
[0003] 00000:01:17:09:810 failover-rec-complete: port 0 uif 0 p veth785
s veth786
[0004] 00000:01:17:04:000 vif[7]: vfc761: s:INIT(e:CREATE)->s:CREATE
[0005] 00000:01:17:04:000 create: port 0 vfc761 primary 1
[0006] 00000:01:17:04:000 vif[2]: veth785:
[0105] 00000:01:17:03:500 Egress FLOGI: port 1, wwpn 2000000000010130, o
xid 51e
[0106] 00000:01:17:03:550 FLOGI LS_ACC: port 1, ox_id 0x51e, d_id 0xef00
13
s:CREATE(e:ENABLE_STANDBY)->s
:ENABLE_STANDBY
Palo Card has more data: as it has a linux running there.
Config, debugdump and obfl/var
Config – Fruid/vnic.cfg
SAM tech support: show tech ucsm detail
With show tech support of N5k.
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105201437_UCS1-FI_UCSM}1203)
df_a.out sam_process_state
dhcpd.leases sam_techsupportinfo
dmesg.out sw_techsupportinfo
ifconfig_a.out UCSM_A_TechSupport.tar
iptables.out UCSM_A_TechSupport.tar.gz.done
isan UCSM_B_TechSupport.tar.gz
ls_l.out UCSM_B_TechSupport.tar.gz.done
opt var
sam_cluster_state
- Switch tech support – see if we can talk to the nic. Vnic up ?
(sw-tech-support)
Sam tech support – config of UCS
Var/sysmgr/sam_logs
Need to look for primary log – connection info – managing instance – A or B.
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105201437_UCS1-FI_UCSM/var/sysmgr/sam_logs}1212)ls
auto_update.log svc_sam_controller.log
error_log.1257380923 svc_sam_dcosAG.log
httpd.log svc_sam_dme.log
load_warnings.log svc_sam_extvmmAG.log
svc_sam_bladeAG.log svc_sam_hostagentAG.log
svc_sam_bladeAG.log.1.gz svc_sam_nicAG.log
svc_sam_bladeAG.log.2.gz svc_sam_nicAG.log.1.gz
svc_sam_bladeAG.log.3.gz svc_sam_nicAG.log.first.gz
svc_sam_bladeAG.log.4.gz svc_sam_pamProxy.log
svc_sam_bladeAG.log.first.gz svc_sam_portAG.log
svc_sam_cliD.log
dhcp.leases - each blades ip address for pnuos
iptables.out –kvm
Chain FORWARD (policy DROP 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
57 5244 ACCEPT icmp -- * * 0.0.0.0/0 127.5.1.3
9541 551K ACCEPT tcp -- * * 0.0.0.0/0 127.5.1.3 tcp dpt:2068 << kvm
0 0 ACCEPT udp -- * * 0.0.0.0/0 127.5.1.3 udp dpt:623 - ipmi
54 5295 ACCEPT tcp -- * * 0.0.0.0/0 127.5.1.3 tcp dpt:22 -- ssh
57 5244 ACCEPT icmp -- * * 0.0.0.0/0 12
These three ports should be up in northbound ound.
SAM Cluster state
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105201437_UCS1-FI_UCSM}1224)more sam_cluster_state
Cluster Id: 0x335c22feb90111de-0x9f0d000decd53a04
Start time: Thu Nov 5 00:28:36 2009
Last election time: Thu Nov 5 00:28:43 2009
A: UP, PRIMARY
B: UP, SUBORDINATE
A: memb state UP, lead state PRIMARY, mgmt services state: UP
B: memb state UP, lead state SUBORDINATE, mgmt services state: UP
heartbeat state PRIMARY_OK
INTERNAL NETWORK INTERFACES:
eth1, UP
eth2, UP
HA READY
Chassis detailed state:
Chassis, serial: FOX1325G5F5, state: active
Process state
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105201437_UCS1-FI_UCSM}1225)more sam_process_state
SERVICE NAME STATE RETRY(MAX) EXITCODE SIGNAL CORE
------------ ----- ---------- -------- ------ ----
svc_sam_controller running 0(4) 0 0 no
svc_sam_dme running 0(4) 0 0 no
svc_sam_dcosAG running 0(4) 0 0 no
svc_sam_bladeAG running 0(4) 0 0 no
svc_sam_portAG running 0(4) 0 0 no
svc_sam_hostagentAG running 0(4) 0 0 no
svc_sam_nicAG running 0(4) 0 0 no
svc_sam_extvmmAG running 0(4) 0 0 no
httpd running 0(4) 0 0 no
svc_sam_cliD running 0(4) 0 0 no
svc_sam_pamProxy running 0(4) 0 0 no
sfcbd running 0(4) 0 0 no
dhcpd running 0(4) 0 0 no
sam_core_mon running 0(4) 0 0 no
httpd can;’t service if the cluster is not in good state.
last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105201437_UCS1-FI_UCSM/isan/apache/logs}1235)more httpd.pid
3108
SW_tech_support
SAM_techsupport
Server 1/3:
Equipped Product Name: Cisco B200-M1
Equipped PID: N20-B6620-1
Equipped Serial (SN): QCI133000NM
Slot Status: Equipped
Acknowledged Product Name: Cisco B200-M1
Acknowledged PID: N20-B6620-1
Acknowledged Serial (SN): QCI133000NM
Acknowledged Memory (MB): 49152
Acknowledged Effective Memory (MB): 49152
Acknowledged Cores: 8
Acknowledged Adapters: 1
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105201437_UCS1-FI_UCSM}1243)more sw_techsupportinfo
`show tech-support details`
`show switchname`
UCS1-FI-A
Nexus 5k tech support details of 6120
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105201437_UCS1-FI_UCSM/UCSM_A/var/sysmgr/sam_logs}1264)ls
auto_update.log svc_sam_bladeAG.log.2.gz svc_sam_dcosAG.log svc_sam_nicAG.log.first.gz
error_log.1257380923 svc_sam_bladeAG.log.3.gz svc_sam_dme.log svc_sam_pamProxy.log
httpd.log svc_sam_bladeAG.log.4.gz svc_sam_extvmmAG.log svc_sam_portAG.log
load_warnings.log svc_sam_bladeAG.log.first.gz svc_sam_hostagentAG.log
svc_sam_bladeAG.log svc_sam_cliD.log svc_sam_nicAG.log
svc_sam_bladeAG.log.1.gz svc_sam_controller.log svc_sam_nicAG.log.1.gz
- host agent logs - deployment activity of all the blades will be there.
[INFO][0xb6537bb0][Nov 5 19:41:17.063][mcAG:fetchSensorValue] [mc: 7F050103 19] 'POWER_ON_FAIL'
[INFO][0xb6537bb0][Nov 5 19:41:17.064][mcAG:fetchSensorValue] [mc: 7F050103 19] Got POWER_ON_FAIL sensorNumber : 95
[INFO][0xb6537bb0][Nov 5 19:41:17.065][mcAG:debugSensorBits] [mc: 7F050103 19] Sensor state:value 15:1 16:0
[INFO][0xb6537bb0][Nov 5 19:41:17.067][mcAG:fetchSensorValue] [mc: 7F050103 19] 'IOH_THERMTRIP_N'
[INFO][0xb6537bb0][Nov 5 19:41:17.067][mcAG:fetchSensorValue] [mc: 7F050103 19] Got IOH_THERMTRIP_N sensorNumber : 24
[INFO][0xb6537bb0][Nov 5 19:41:17.069][mcAG:debugSensorBits] [mc: 7F050103 19] Sensor state:value 66:1 67:0
[INFO][0xb6537bb0][Nov 5 19:41:17.071][mcAG:fetchSensorValue] [mc: 7F050103 19] 'CPUS_PRCHT_N'
[INFO][0xb6537bb0][Nov 5 19:41:17.071][mcAG:fetchSensorValue] [mc: 7F050103 19] Got CPUS_PRCHT_N sensorNumber : 31
[INFO][0xb5a37bb0][Nov 5 19:41:17.075][mcAG:readingCb] [mc: 7F050105 1B] readingCb: mcHandle = 0x1b, type = 3, le
n = 95
[INFO][0xb5a37bb0][Nov 5 19:41:17.075][mcAG:readingCb] [mc: 7F050105 1B] mc_reading_cb_t request from MC client,
sending McEvent instance 0x8586618/231
[INFO][0xb5a37bb0][Nov 5 19:41:17.075][mcAG:getDeviceAddress] getDeviceAddress for McEvent instance 0x8586618 returning
a device address instance (nil) for IP 7F050105
[INFO][0xb5a37bb0][Nov 5 19:41:17.075][mcAG:getDeviceAddress] getDeviceAddress for McEvent instance 0x8586618 returning
a device address instance 0x8209b70 for IP 7F050105
what blade AG and dme sent Boot order is ?
keywords in dmelog: Critical is the keyword.
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105201437_UCS1-FI_UCSM/UCSM_A/var/sysmgr/sam_logs}1274)grep CRIT svc_sam_dme.log
jejoseph@last-call-5{/home/jejoseph/personal_html/SAN/UCS/tech/20091105201437_UCS1-FI_UCSM/UCSM_A/var/sysmgr/sam_logs}1275)
O][0xb6537bb0][Nov 5 19:41:17.063][mcAG:fetchSensorValue] [mc: 7F050103 19] 'POWER_ON_FAIL'
[INFO][0xb6537bb0][Nov 5 19:41:17.064][mcAG:fetchSensorValue] [mc: 7F050103 19] Got POWER_ON_FAIL sensorNumber : 95
[INFO][0xb6537bb0][Nov 5 19:41:17.065][mcAG:debugSensorBits] [mc: 7F050103 19] Sensor state:value 15:1 16:0
[INFO][0xb6537bb0][Nov 5 19:41:17.067][mcAG:fetchSensorValue] [mc: 7F050103 19] 'IOH_THERMTRIP_N'
[INFO][0xb6537bb0][Nov 5 19:41:17.067][mcAG:fetchSensorValue] [mc: 7F050103 19] Got IOH_THERMTRIP_N sensorNumber : 24
[INFO][0xb6537bb0][Nov 5 19:41:17.069][mcAG:debugSensorBits] [mc: 7F050103 19] Sensor state:value 66:1 67:0
[INFO][0xb6537bb0][Nov 5 19:41:17.071][mcAG:fetchSensorValue] [mc: 7F050103 19] 'CPUS_PRCHT_N'
[INFO][0xb6537bb0][Nov 5 19:41:17.071][mcAG:fetchSensorValue] [mc: 7F050103 19] Got CPUS_PRCHT_N sensorNumber : 31
[INFO][0xb5a37bb0][Nov 5 19:41:17.075][mcAG:readingCb] [mc: 7F050105 1B] readingCb: mcHandle = 0x1b, type = 3, le
n = 95
[INFO][0xb5a37bb0][Nov 5 19:41:17.075][mcAG:readingCb] [mc: 7F050105 1B] mc_reading_cb_t request from MC client,
sending McEvent instance 0x8586618/231
7F050103 --- ip address
Port ag manages only the switch:
grep Executing svc_sam_portAG.log|more – which commands are executed in the switch:
[INFO][0xb68b0080][Nov 5 00:28:41.971][app_sam_portAG:SC_Exec] Executing command: configure terminal
[INFO][0xb68b0080][Nov 5 00:28:41.973][app_sam_portAG:SC_Exec] Executing command: interface ethernet 1/1
[INFO][0xb68b0080][Nov 5 00:28:52.043][app_sam_portAG:SC_Exec] Executing command: configure terminal
[INFO][0xb68b0080][Nov 5 00:28:52.100][app_sam_portAG:SC_Exec] Executing command: interface ethernet 1/1
[INFO][0xb68b0080][Nov 5 00:29:02.144][app_sam_portAG:SC_Exec] Executing command: configure terminal
[INFO][0xb68b0080][Nov 5 00:29:02.150][app_sam_portAG:SC_Exec] Executing command: interface ethernet 1/1
[INFO][0xb68b0080][Nov 5 00:29:12.614][app_sam_portAG:SC_Exec] Executing command: configure terminal
[INFO][0xb68b0080][Nov 5 00:29:12.647][app_sam_portAG:SC_Exec] Executing command: interface ethernet 1/1
[INFO][0xb68b0080][Nov 5 00:29:22.793][app_sam_portAG:SC_Exec] Executing command: configure terminal
[INFO][0xb68b0080][Nov 5 00:29:22.807][app_sam_portAG:SC_Exec] Executing command: interface ethernet 1/1
[INFO][0xb68b0080][Nov 5 00:29:32.855][app_sam_portAG:SC_Exec] Executing command: configure terminal
[INFO][0xb68b0080][Nov 5 00:29:32.864][app_sam_portAG:SC_Exec] Executing command: interface ethernet 1/1
[INFO][0xb68b0080][Nov 5 00:29:32.889][app_sam_portAG:SC_Exec] Executing command: exit
[INFO][0xb68b0080][Nov 5 00:29:37.936][app_sam_portAG:SC_Exec] Executing command: configure terminal
[INFO][0xb68afbb0][Nov 5 00:29:58.002][app_sam_portAG:SC_Exec] Executing command: feature port-security
[INFO][0xb68afbb0][Nov 5 00:29:59.778][app_sam_portAG:SC_Exec] Executing command: feature fcoe
Steps in debugging:
- operational setup
- Datapath-Issues
Debugging SAM/setup issues
- FSM stuck at
- SAM talks to single end-point
- Look at dme logs
- Look at AG Logs – no end point connection/ always look at switch and redwood
(are the ports are up)
- Always at end-point
- -deployment failure (invalid config)
- configured vnic failed – creation of vnic failed, few lines before failure on port AG
Given object there can be only one FSM operational in an object at the same time.
Rest gets scheduled.
--- show tech debugging
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
BMC Cut Thro Interfaces:
BMC main feature – Server monitoring/interaction BIOS
Server Management – power/boot-order/Serial over lan, kvm/Virt Media services
Int. platform management Interface – IPMI
BMC runs ipmi server – IPMI Device interface – host OS can talk to BMC
KCS interface – keyboard interface- for BIOS
BMC-IP is there.
Ipmi v 2.0
LIVE TROUBLESHOOTING:
Mezz connection:
UCS1-FI-A# connect adapter 1/3/1
Available commands:
exit - Exit from subshell
help - List available commands
history - Show command history
show-asic-stats - Show adapter's asic stats
show-cfg - Show adapter's configuration
show-debug-log - Show adapter's debug log
show-fwlist - Show firmware versions on the adapter
show-identity - Show adapter identity
show-memory - Show adapter's memory
show-panic-log - Show adapter's panic log
show-phyinfo - Show adapter phy info
show-port-stats - Show adapter's port stats
show-systemstatus - Show adapter status
show-vif-stats - Show adapter's vif stats
show-vifs - Show adapter's vifs
adapter 1/3/1 #show-vifs
-------------+
| veth785 | Up | 0 | 0 | 1 | 0 | 3 | 0 | 00:25:b5:01:01:3b |
| veth783 | Up | 1 | 1 | 1 | 1 | 3 | 0 | 00:25:b5:11:01:3a |
| veth784 | Sb | 1 | 0 | 1 | 1 | 3 | 0 | 00:25:b5:11:01:3a |
| veth786 | Sb | 0 | 1 | 1 | 0 | 3 | 0 | 00:25:b5:01:01:3b |
| vfc762 | Up | 3 | 1 | 1 | 3 | 3 | 3 | 2000000001010130:2000000000010130 |
| vfc761 | Up | 2 | 0 | 1 | 2 | 3 | 3 | 2000000001010130:2000000000010131 |
ASIC stats:
| NET EG STAT | CURRENT |DIFF | NET IG STAT | CURRENT |DIFF | +----------------------------+----------------------+-----------------+-----------------------------+----------------------+-----------------+ | EG0_LEARN_REQ_DROP | 0| 0| IG0_FWD_LOOKUP_NO_HIT | 0| 0| | EG0_PKT_DROP_CMD | 0| 0| IG0_PKT_DROP_FC_MCAST | 0| 0| | EG0_PKT_DROP_LIFCFG_INVALID| 0| 0| IG0_PKT_DROP_INVALID_FC_LIF | 0| 0| | EG0_PKT_DROP_LIFMAP_NO_HIT | 0| 0| IG0_PKT_NULL_PIF | 2158| 0| | EG0_PKT_DROP_SRC_BIND | 0| 0| | | | +----------------------------+----------------------+-----------------+-----------------------------+----------------------+-----------------+ | EG1_LEARN_REQ_DROP | 0| 0| IG1_FWD_LOOKUP_NO_HIT | 0| 0| | EG1_PKT_DROP_CMD | 0| 0| IG1_PKT_DROP_FC_MCAST | 0| 0| | EG1_PKT_DROP_LIFCFG_INVALID| 0| 0| IG1_PKT_DROP_INVALID_FC_LIF | 0| 0| | EG1_PKT_DROP_LIFMAP_NO_HIT | 0| 0| IG1_PKT_NULL_PIF | 221| 0| | EG1_PKT_DROP_SRC_BIND | 0| 0| | | | +----------------------------+----------------------+-----------------+-----------------------------+----------------------+-----------------+ +----------------------------------------------------------------------------------------------------------------------------------------------+ | PAUSE STATS | +------------------------------+----------------------+-----------------+-----------------------------+----------------------+-----------------+ | PAUSE TX STAT | CURRENT | DIFF | PAUSE RX STAT | CURRENT | DIFF | +------------------------------+----------------------+-----------------+-----------------------------+----------------------+-----------------+ | HOST10GBE_PORT0_TX_PAUSE_CFC | 0| 0| HOST10GBE_PORT0_RX_PAUSE_CFC| 0| 0| | HOST10GBE_PORT0_TX_PAUSE_PFC | 0| 0| HOST10GBE_PORT0_RX_PAUSE_PFC| 0| 0| +------------------------------+----------------------+-----------------+-----------------------------+----------------------+-----------------+ | HOST10GBE_PORT1_TX_PAUSE_CFC | 0| 0| HOST10GBE_PORT1_RX_PAUSE_CFC| 0| 0| | HOST10GBE_PORT1_TX_PAUSE_PFC | 0| 0| HOST10GBE_PORT1_RX_PAUSE_PFC| 0| 0| +------------------------------+----------------------+-----------------+-----------------------------+----------------------+-----------------+ | DCE_PORT0_TX_PAUSE_CFC | 0| 0| DCE_PORT0_RX_PAUSE_CFC | 0| 0| | DCE_PORT0_TX_PAUSE_PFC | 0| 0| DCE_PORT0_TX_PAUSE_PFC | 0| 0| +------------------------------+----------------------+-----------------+-----------------------------+----------------------+-----------------+ | DCE_PORT1_TX_PAUSE_CFC | 0| 0| DCE_PORT1_RX_PAUSE_CFC | 0| 0| | DCE_PORT1_TX_PAUSE_PFC | 0| 0| DCE_PORT1_TX_PAUSE_PFC | 0| 0| +------------------------------+----------------------+-----------------+-----------------------------+----------------------+-----------------+ adapter 1/3/1 # bv |
|
Connect iom 1
fex-1# show platform software redwood rate
+-------++------------+-----------+------------++------------+-----------+------------+-------+-------+---+
| Port || Tx Packets | Tx Rate | Tx Bit || Rx Packets | Rx Rate | Rx Bit |Avg Pkt|Avg Pkt| |
| || | (pkts/s) | Rate || | (pkts/s) | Rate | (Tx) | (Rx) |Err|
+-------++------------+-----------+------------++------------+-----------+------------+-------+-------+---+
| 0-NI3 || 211 | 42 | 414.94Kbps || 148 | 29 | 25.38Kbps | 1229 | 107 | |
| 0-HI5 || 16 | 3 | 2.40Kbps || 0 | 0 | 0.00 bps | 93 | 0 | |
| 0-HI4 || 90 | 18 | 13.89Kbps || 151 | 30 | 393.39Kbps | 96 | 1628 | |
| 0-HI3 || 18 | 3 | 2.63Kbps || 0 | 0 | 0.00 bps | 91 | 0 | |
| 0-HI2 || 22 | 4 | 3.54Kbps || 1 | 0 | 240.00 bps | 100 | 150 | |
| 0-BI || 36 | 7 | 5.37Kbps || 41 | 8 | 8.05Kbps | 93 | 122 | |
| 0-CI || 31 | 6 | 6.68Kbps || 28 | 5 | 17.97Kbps | 134 | 401 | |
+-------++------------+-----------+------------++------------+-----------+------------+-------+-------+---+
fex-1# show platform software redwood sts
Board Status Overview:
legend:
' '= no-connect
X = Failed
- = Disabled
: = Dn
| = Up
$ = SFP+ present
v = Blade Present
------------------------------
+---+----+----+----+
SFP: |[$]| [$]| [$]| [$]|
+---+----+----+----+
: : : |
+-+----+----+----+-+
| 0 1 2 3 |
| I I I I |
| N N N N |
| |
| ASIC 0 |
| |
| H H H H H H H H |
| I I I I I I I I |
| 0 1 2 3 4 5 6 7 |
+-+-+-+-+-+-+-+-+--+
- - | | | | - -
- +-+-+-+-+-+-+-+-+
- |-|-|v|v|v|v|-|-|
- +-+-+-+-+-+-+-+-+
- Blade: 8 7 6 5 4 3 2 1
fex-1# show platform software redwood phy list 5221
BCM5221_CONTROL : [0x00000000]
BCM5221_STATUS : [0x00000001]
BCM5221_PHY_ID_HI : [0x00000002]
BCM5221_PHY_ID_LO : [0x00000003]
BCM5221_AUTO_NEG_ADV : [0x00000004]
BCM5221_LINK_PARTNER_ABILITY : [0x00000005]
BCM5221_AUTO_NEG_EXPANSION : [0x00000006]
BCM5221_NEXT_PAGE : [0x00000007]
BCM5221_NEXT_PAGE : [0x00000008]
BCM5221_100BX_AUX_CONTROL : [0x00000010]
BCM5221_100BX_AUX_STATUS : [0x00000011]
BCM5221_100BX_RCV_ERROR_CNT : [0x00000012]
BCM5221_100BX_RCV_ERROR_CNT : [0x00000013]
BCM5221_100BX_DISCONNECT_CNT : [0x00000014]
BCM5221_PTEST : [0x00000017]
BCM5221_AUX_CONTROL_STATUS : [0x00000018]
BCM5221_AUX_STATUS_SUMMARY : [0x00000019]
BCM5221_INTERRUPT : [0x0000001a]
BCM5221_AUX_MODE2 : [0x0000001b]
BCM5221_10BT_AUX_GEN_STATUS : [0x0000001c]
BCM5221_AUX_MODE : [0x0000001d]
BCM5221_AUX_MULTI_PHY : [0x0000001e]
BCM5221_BROADCOM_TEST : [0x0000001f]
BCM5221_AUX_MODE4 : [0x0000001a]
BCM5221_AUX_STATUS2 : [0x0000001b]
BCM5221_AUX_STATUS3 : [0x0000001c]
BCM5221_AUX_MODE3 : [0x0000001d]
fex-1# show platform software redwood phy list 54980
CM54980_PHY_EXT_CONTROL : [0x00000010] <cfg>
BCM54980_PHY_EXT_STATUS : [0x00000011] <cfg>
BCM54980_RX_ERR_CNT : [0x00000012] <cfg>
BCM54980_TX_ERROR_CODE_CNT : [0x00000013] <cfg>
BCM54980_RX_ERROR_CODE_CNT : [0x00000014] <cfg>
BCM54980_EXP_ACCESS_REG : [0x00000017] <cfg>
BCM54980_MISC_CONTROL : [0x00000018] <cfg>
BMC sel view.