Windows Advanced Server Cluster Tools

Many tools are available to use with Windows 2000 Advanced Server to perform exhaustive problem determination.

Cluster Tools can be grouped into:

•Monitoring Tools

Gather Input Tools

• CFGCMP (Configuration Compare)

Backup and Recovery Tools

• DumpConfig

Monitoring Tools : Event Log Replication

When Clustering is installed on Windows 2000 servers, events logged in the event log of one node in the cluster are also replicated to the event log of the other nodes.

Using the Cluster.exe command-line tool you can configure the behavior of this feature.

•To disable the replication of event logs for the entire cluster:

cluster /prop EnableEventLogReplication=0

• To enable the replication of the event log:

cluster /prop EnableEventLogReplication=1

•To disable the replication of event logs from a single node of the cluster:

cluster <nodename> /prop EnableEventLogReplication=0

•To enable replication from a particular node:

cluster <nodename> /prop EnableEventLogReplication=1

•that by disabling the replication at a specific node, replication of events from that node to other nodes is disabled. Other nodes that have the EnableEventLogReplication property turned on still replicate to that node.

Event Log Analyst

•Event Log Analyst (ELA) is a tool that collects reliability information from Windows event logs. •ELA runs on a single server, called the collection server, and sequentially retrieves event log information from other WINDOWS servers.

•Once ELA data is collected, it can be analyzed using ancillary tools.

MUUUL

SERVER-01 SERVER-02

SERVER-N

-Local Area Network-

Server List

DDDDDDD

Collection Server

Server Info Reboot Info Bugcheck Info Dr Watson Info Sql Server info IIS Server info Exhange server info Sfc info

•ELA collects several types of basic information from the System and Application event logs. The data collected includes: Reboots, Blue screens, Dr Watson Info, Service pack installations, IIS events, Exchange events and SQL start/stop events.

•ELA is designed to be unobtrusive. Because ELA only uses publicly documented interfaces for remotely accessing the event log, it does not require any software to be installed on the systems where the event logs reside. ELA is trivial to install. It is a single executable image that runs on any Windows NT 4.0 or later system. ELA has a low impact on production environments. Typically, ELA accesses a remote server for less than 30 seconds during the collection process. In tests using the Microsoft Corporate Data Center, ELA scanned the event logs of over 800 servers in about 1 hour using a collection system with a 100Mbps LAN connection.

•The main advantage of ELA is the fact that it analyses multiple machines remotely.

•ELA can be used to provide customers with detailed, easy-to-read data on the availability, mean-time-between-failure, and unplanned downtime of their systems and therefore analyze the reliability characteristics of the system.

•However, compared to the original NT event logs the content for troubleshooting the system is very limited.

Configuring ELA for Data Collection

A file containing a list of servers, and optionally date/time stamps, tells ELA where to look for event logs.

The server list is a simple text file.

Example server list to collect data from 2 WINDOWS 2000 servers (use a new line for every server):

# My server list nodel |[-]nT<l node2 04/05/2001 12:15:00 Fife Edit Format Help

•Any server list line beginning with the "#" character is a comment line and is not processed by ELA.

•All other lines are assumed to be WINDOWS 2000 server names.

•Typically, server lists use the .txt file extension. In the example the second server name is followed by a date and time. In this case ELA will only collect events from node2 starting at 12:15:00 on 4/05/2001.

# collect information from: | nodel node2 04/05/2001 12:15:01

Collecting Data

QSelect Command Prompt

Euent Log Analyst, Version 05.00.00.1019 Collects specified euent log entries from a lis

<C> Copyright 1998 2000, Microsoft Corporation Used by permission only — do not distribute. Beta contact: [email protected]

Text file containing list of seruers to process, each line of the file should contain a single servernane Use n threads to process list, default is 40, max is 200 Use Name to build the output filenames, allows you to distinguish one collection from another nt> It of times to retry collecting from a particular server. > Gather custom events specified in <init file>

ELA generates the following CSU files:

BugChecks

DrUatsons

OutOfUM

Reboots

SPUer

ServerErr Summary

List of all STOP errors found

Usermode access violations recorded by Dr. Watson All occurances of the Out of Uirtual Memory pop-up All detected reboots

Installs/Removals of Windows NT or Windows 2000 Service Packs

List of seruers successfuly processed, with additional detail

List of seruers not processed, with error code Lists ELA start/stop/elapsed times. And the number of servers processed durring each run.

ELA generates the following TXT file:

Each filename is in the form ELA_Name_Month_Day_¥ear_OutputFilename.csu For example: ELA_WebSeruers_l_22^l999_Reboots.csu

•Once the server list file is created, you can collect event log data using ELA. •C:\ela> ela MyServers.txt /n:MyServers

•MyServers.txt: Starting the collection at Fri Jan 22 16:42:40 1999

•Queueing \\Server1 for processing (1/55)... •Queueing \\Server1 for processing (2/55)... •Queueing \\Server1 for processing (3/55)... •Queueing \\Server1 for processing (4/55)...

•All servers queued for processing, waiting for completion.

•This may take 15 minutes.

•Total time elapsed: 0:12:24

•All output files written to "ELA_MyServers_1_22_1999_*.csv •C:\ela>

Analyzing ELA Data

ElilComrriarid Prompt

BEI

C:\ELA>ela server.1st server.1st: Starting the Processing logs on nodel Processing logs on node2

collection at 13-Apr-2001 15:57:10 fron the beginning<l/2>. fron the beginning<2/2>.

Total tine elapsed: 00:0(

3:15

Iflll output files written

to ELA_L3_flpr_2001_*.csv

C:\ELfi>_

ELA produces 10 comma separated value

ELA_<your collection name>_<collection date>_

ELA_<your collection name>_<collection date>_

ELA_<your collection name>_<collection date>_

ELA_<your collection name>_<collection date>_

ELA_<your collection name>_<collection date>_

ELA_<your collection name>_<collection date>_

ELA_<your collection name>_<collection date>_

ELA_<your collection name>_<collection date>_

ELA_<your collection name>_<collection date>

ELA_<your collection name>_<collection date> ELA_Summary.csv

ServerList <collection date>.txt

ELA produces 10 comma separated value

ELA_<your collection name>_<collection date>_

ELA_<your collection name>_<collection date>_

ELA_<your collection name>_<collection date>_

ELA_<your collection name>_<collection date>_

ELA_<your collection name>_<collection date>_

ELA_<your collection name>_<collection date>_

ELA_<your collection name>_<collection date>_

ELA_<your collection name>_<collection date>_

ELA_<your collection name>_<collection date>

ELA_<your collection name>_<collection date> ELA_Summary.csv

ServerList <collection date>.txt files when run:

Servers.csv ServerErr.csv _Bugchecks.csv _Reboots.csv _DrWatsons.csv OutofVM.csv _SPVer.csv _Sql.csv Exchange.csv IIS.csv

•These files can be imported into Excel and analyzed for various trends. Microsoft has some internal tools under development to perform further analysis, but those are not ready for distribution.

•It is strongly recommended that ELA be run from an account with administrative privileges on the target systems.

•The time zone of the target systems, for example, can greatly affect the calculations performed on data retrieve by ELA and is only obtainable if the user has administrative privileges.

•Access to the event logs is determined by the account under which the application is running.

•The LocalSystem account is a special account that Windows 2000 Advanced Serverservices can use.

•The Administrator account consists of the administrators for the system.

•The Server Operator account (ServerOp) consists of the administrators of the domain server. The World account includes all users on all systems.

•ELA accesses the Application and System event logs, but not the Security event log

Incremental Collections

•ELA allows you to incrementally collect system events. After each execution of ELA a new ServerList_<collection_date>.txt file is generated.

•This file contains a list of all servers specified in the original ServerList.txt file and the date/time stamp of the last event collected.

•If this file is used in subsequent collections with ELA, only events that have occurred since the date specified will be collected.

•ELA Performance Impact

ELA was designed to cause minimal impact in production data centers.

ELA uses the same application programming interfaces as the WINDOWS 2000 Advanced Server

Event Viewer application.

ELA's performance impact is roughly the same as an operator using Event Viewer on a remote system and rapidly displaying pages of event information.

The file ELA_perf_impact.xls contains detailed PERFMON measurements of the performance impact of ELA on a system undergoing collection.

Note that these measurements were taken using a 10Mbps Ethernet connection between the collecting system and the system under collection.

Performance impact with a 100Mbps connection may be more noticeable. In actual collection at 6 production data centers, there have not been any complaints about adverse performance (or any other!) impacts of using this tool.

ELA collection times vary due to the size of the event log and the bandwidth of the network connection. When using ELA to scan event logs over a Wide Area Net, collection times will definitely increase.

Gather input Tools Configuration Compare - CFGCMP

The main purpose of the CFGCMP.EXE utility is to compare current configuration with the computer's original or previous hardware and software configuration.

Command Prompt

C:\ELA >cd \cfgcmp C:\cfgcmp>cfgcmp

Configuration Comparision Uersion 02.06

cfgcmp [-opsdu][-n nachinenane][-i filename H-r filename] f ilel Ef ile2 ] . Current configuration is used if second file is not supplied, -d: compare loaded modules and drivers.

—p: print summary information from the configuration file.

—s: summary of differences(default).

—r: output to file instead of the console.

—i: create an initial configuration file.

-m: remote machine to run comparison on.

-u: include all changes included deleted components.

-o: allow files to be overwritten.

IS0B

In order to use the CFGCMP.EXE utility, you need to install the "Windows 2000 Advanced Server Support Tools", located in the SUPPORT/Tools directory of the Windows 2000 Advanced Server CD.

CFGCMP.EXE is a command line utility that is used to create a system information file (.NFO) that can be loaded in the System Information MMC and examined, or output to a text file (this file can be viewed with MSINFO32.EXE, located by default in the \program files\common files\Microsoft shared\msinfo directory).

Once you have created a system information file for a Windows 2000 System, CFGCMP.EXE can then be used to compare the systems configuration to other systems or to the systems original configuration.

Configuration Compare - Continued

Note: The initial snapshot needs to be done locally.

Ë Command Prompt

C:Scfgcmp>cfgcrop -i initial.txt

Creating initial configuration

Initial Driver Scanning 69k complete. Driver i(12i0Et.sys is not signed. I nitial Driver Scanning ïîix complete. Driver i2cnt.sys is not signed. Initial Driver Scanning complété.

Driver ldpur.sys is not signed.

Driver lro78nt.sys is not signed. Initial Driver Scanning ?7k complete. Driver pnemnt.sys is nut signed. Initial Driver Scanning 98V. complete. Driver rrtacfItr.sys is not signed. Initial Driver Scanning 99;i complete. Driver tugsysin.sys is not signed. Initial Driver Scanning IB fiii complete.

Completed writing initial configuration to file initial.txt. C:\cf«cnp>_

The following is sample output from CFGCMP.EXE /?: C:\>cfgcmp -?

Configuration Comparison Version 02.06

cfgcmp [-opsdv][-m machinename][-i filename][-r filename] file1 [file2].

Current configuration is used if second file is not supplied. -d: compare loaded modules and drivers. -p: print summary information from the configuration file. -s: summary of differences(default). -r: output to file instead of the console. -i: create an initial configuration file. -m: remote machine to run comparison on. -v: include all changes included deleted components. -o: allow files to be overwritten.

To create a system information file you would run the following locally on the system (note that the initial configuration file has to be created on each system locally):

Configuration Compare - Continued

Nnuifaml IV: ni I

Dpi^er Cnnp^nison lggj -onjlf-rf.

CBrpvrljE Initial DirilpillH In fiLt liltlil.bt CMjrtà <n V-£tp 68:6; J t jltit The c.'.hnjnl [il F i:iu:-,ir im.

111 i PiffTh'ifiT ip? Favnd111

In this case the output was saved into compare.txt

You can now compare this with the current configuration of another machine or like in the following example with the same machine after a change

Dumpcfg Exe

This will create an initial configuration file. The details can be viewed with MSINFO32.EXE (located by default in the \program files\common files\Microsoft shared\msinfo)

Dump Config

DUMPCFG is a resource kit utility, which simplifies the manual system recovery process associated with storage configuration.

When to use DUMPCFG:

•To display system, disk and volume information

•To change the signature of a disk

DUMPCFG is a resource kit utility, which simplifies the manual system recovery process associated with storage configuration.

When to use DUMPCFG:

•To display system, disk and volume information

•To change the signature of a disk

Windows Systeem Diskette Scherm

[Q280425]

After you replace a failed hard disk, or change drives (different SCSI ID, or physical location in the rack) for the shared disk resource, the Microsoft Cluster service may not start. Also, the following error message may be generated in the Event log: Event ID: 1034 Source: ClusDisk

Description: The disk associated with cluster disk resource

<DriveLetter> could not be found. The expected signature of the disk was <DiskSignature>.

This issue can occur because the server that is running Microsoft Cluster Server (MSCS) relies on disk signatures to identify, and mount volumes. If a hard disk is replaced, or the bus is re-enumerated, MSCS may not find the disk signatures that it is expecting, and consequently may fail to mount the disk. If it is the Quorum disk that has failed, the Cluster service will not start. To get your cluster back online quickly, you may want to change the Quorum disk designation. For additional information about how to do this, view the article in the Microsoft Knowledge Base:

Q280353 How to Change Quorum Disk Designation

1. Back up the server's configuration: Open the "Backup and Recovery Tool Wizard", and back up the System State of each node.

2. Set the Cluster service, and Cluster Disk device to Manual on all nodes, and then turn off all but one node in the Cluster.

3. Partition the new disk if necessary:

a. Open Computer Management, double-click Storage, and then click Disk Management.

b. Verify that you can see the disk, and its partitions.

c. If this disk is being completely replaced, create a primary partition from the free space, and format it with the NTFS file system.

4. Assign a drive letter:

a. Assign the same drive letters as before, verifying that it is the same one that is displayed in the "Description" section of the Event ID 1034. For example: "The disk associated with cluster disk resource 'Disk Q:\'".

NOTE: You should use drive labels to quickly identify what the appropriate drive letters are. You should also use drive letters that are not the next enumerated drive letters for Cluster shared disks, such as "Q", "S", or "T".

5. Document the disk number:

a. Open Computer Management, double-click Storage, and then click Disk Management.

b. In Logical Disk Manger, notate the disk number that is associated with the failing disk, which you find to the left of the partition information. For example: Disk 0.

6. Write the expected signature to the disk:

a. Obtain the expected signature from the "Description" section of the Event ID 1034 error message. For example: "The expected signature of the disk was 12345678".

b. Write the signature, which the Cluster expected, to the disk by using DumpCFG.exe from the Resource Kit by using the following syntax dumpcfg.exe /s 12345678 0

where <12345678> is the disk signature, and <0> is the disk number that you replaced (which was obtained from the previous step). For more information about the proper usage of DumpCFG.exe, type "DumpCFG /?" (without the quotation marks).

7. Set the Cluster service back to Automatic, and the Cluster Disk device back to System on the node. Start the Cluster Disk device, and then the Cluster service.

8. Bring the disk online:

a. Open Cluster Administrator, and then bring the disk online.

b. If it is the Quorum disk that has failed, the Cluster service will fail to start. You will need to start the Cluster service.

9. Verify that the disk came online, and then restore the data from a tape if necessary.

10. Turn on all other nodes, one at a time, and then test failover. Remember to set the Cluster service, and Cluster Disk back to their original values.

Recovering from an Event ID 1034 on a Server Cluster [MS Q280425]

After you replace a failed hard disk, or change drives (different SCSI ID, or physical location in the rack) for the shared disk resource, the Microsoft Cluster service may not start. Also, the following error message may be generated in the Event log:

Event ID: 1034 Source: ClusDisk

Description: The disk associated with cluster disk resource <DriveLetter> could not be found. The expected signature of the disk was <DiskSignature>

This issue can occur because the server that is running Microsoft Cluster Server (MSCS) relies on disk signatures to identify, and mount volumes. If a hard disk is replaced, or the bus is re-enumerated, MSCS may not find the disk signatures that it is expecting, and consequently may fail to mount the disk. If it is the Quorum disk that has failed, the Cluster service will not start. To get your cluster back online quickly, you may want to change the Quorum disk designation.

DumpConfig - Continued

View your disk configuration by typing DUMPCFG -the disk signature will be shown in the beginning of the output like this:

P C:\WINNT\System32\cmd.eHe - dumpcfg

BEI

B

Microsoft Windows 2000 [Uersion 5.00.2195 3 (C) Copyright 1985-1999 Microsoft Corp.

D

C:N)dumpcfg

[System Information]

Cluster name (DNS): nodeb.cluster03

Cluster name (NetBIOS): NODEB

System Root (install directory): C:\UINNT

OS: Windows 2000 Server

Service Pack:

Product: Windows 2000 Aduanced Server is installed.

D- S □ h N il ihri-d- : + t lir»*w : iitilJ-lt l>d □ k Hunhor > "

lllu i .11 ihH z ----ri lli'TIl 1-111,1. i L ,r.. bVc b- Miy. ^V ■».,:!,: .IJ'.l

Vit Iili-IM l.nlln I : MrL Un hLiHii

U->J'in'" [v|h ■ tTinpJr UtJi'n'- ^ l.ï'/icii L Priuv

H-: nil-: y 9 J ■ t* i-j-n- - lhirk< 4. Kb^rL in'jU[h:nl ' Jïïït Jiyni:

U-. J.in., lid«! MTMlilwtiM^I Ptivt lllltyi C: Va l-una ]ni hH I:

I!:,I H1ÏH d L .!■- ..-.■!. H l!&-rtlj II 'f'Jù

DumpConfig - Continued

If you create a new disk you will see that the signature is not the same as shown in the cluster registry.

This is due to the fact that Windows 2000 writes a unique signature every time.

The cluster database is of course not aware of this change and would reject this drive.

-

nprfeY «1 Yfc- 'r^Hn h*

K jj '7VM

sj^^B M&JW0M

ME dl| nMH'K'K'IllIlL

a E<t i»m wn i

m Cj iémmiIm

■ Ll Urin

_| iH^iffl

■ _j i ni i w.t WMMMHi KJ ■' L

>. _J IKluiï.UlfrHM.B'^CnTWKli:

- -J

J ^vriwi

ll

■1

1

H^ivir'JUu XiliJWiLK'.amiiiuHiilhUUi M» Jib'.iUOhlkH^^riH«!

•Disks still under the control of the cluster disk driver might show invalid information and errors. This will not affect the quorum drive since it is a newly created disk and therefore not under the control of the cluster disk driver.

•Change the disk configuration to the expected signature, i.e. if your quorum was physical disk 1 and the signature expected (as shown in the registry key) is 19c81251, then run

•DUMPCFG -S19c81251 1

•- Verify that the disk signature was set correctly.

Cluster Tool

•CLUSTOOL.EXE is a new utility available with the W2K Resource kit (CD_Drive_letter\apps\clustool).

•You can install it on any node of your Windows 2000 Advanced Server cluster.

•It allows you to backup and selectively restore your cluster configuration (mainly all your groups and resources) and migrate file shares and printers into your cluster.

Cluster Tool - Backup Cluster Configuration

•CLUSTOOL consists of the following wizards: •Configuration Backup Wizard

•Creates a backup of the configuration for a selected cluster. •Configuration Restore Wizard

•Restores the configuration of a cluster from a selected configuration backup file. •Resource Migration Wizard

•Migrates resources (file shares and shared printers) from a stand-alone Microsoft Windows 2000 or Microsoft Windows NT server to a cluster.

Was this article helpful?

0 -1

Post a comment