After following the article published earlier, you ended up with two OpenVMS 7.3 systems running on SIMH VAX emulator instances. Networking was configured, both over DECnet and TCP/IP. Next up: clustering those two systems.
Creating the basic cluster
A two-node cluster is vulnerable: the only way to get a proper quorum working in such a config, is by using a quorum disk. Since we don't have shared storage to do this (yes, we *could* try to do this on the same host, by sharing a disk-file - but what if we want to cluster a node that's not local?) we'll opt for a less sexy approach: one node will be 'master', the other 'slave'. If the 'master' goes down, the 'slave' will have to wait until the 'master' is back up again. This guarantees cluster data integrity.
The cluster configuration utility should be ran on both nodes; first on the master, then on the slave.
$ @SYS$MANAGER:CLUSTER_CONFIG_LAN.COM Cluster Configuration Procedure Executing on a VAX System DECnet Phase IV is installed on this node. The LAN, not DECnet, will be used for MOP downline loading. To ensure that this procedure is executing with the required privileges, invoke it from the system manager's account. Enter a "?" for help at any prompt. If you are familiar with the execution of this procedure, you may want to mute extra notes and explanations by invoking it with "@CLUSTER_CONFIG_LAN BRIEF". This VAX node is not currently a cluster member.
A few questions will be asked. Yes, you want to add your node to a cluster, or add a new one if one doesn't exist yet. Yes, the LAN will be used for cluster communications.
Cluster number and password are things you should write down: they will be needed to authorize new nodes. To keep the organization clean and simple, we'll take "cluster number" == "DECnet area". So, in our case: 1. Our nodes won't be serving as boot servers (unless you really want to - go ahead and try it), but will serve disks, but no RFxx disks.
And thus we arrive at the dreaded ALLOCLASS parameter. Without going into too much detail: this parameter points at the allocation class value of the machine, which is used to identify devices across the cluster. If WHISKY and COGNAC both have a DUA1 (and they have) we need this identifier to specify *which* DUA1 we are talking about. Again, to keep things clean and simple, I'm keeping the ALLOCLASS value the same as the DECnet node number, so WHISKY gets ALLOCLASS=131 and COGNAC 132. The devices would thus be $131$DUA1 and $132$DUA1.
MAIN MENU 1. ADD WHISKY to existing cluster, or form a new cluster. 2. MAKE a directory structure for a new root on a system disk. 3. DELETE a root from a system disk. 4. EXIT from this procedure. Enter choice [1]: Will the LAN be used for cluster communications (Y/N)? Y Enter this cluster's group number: 1 Enter this cluster's password: Re-enter this cluster's password for verification: Will WHISKY be a boot server [Y]? N Will WHISKY be a disk server (Y/N)? Y Will WHISKY serve RFxx disks [N]? Enter a value for WHISKY's ALLOCLASS parameter [0]: 131 Does this cluster contain a quorum disk [N]? WARNING: WHISKY will be a voting cluster member. EXPECTED_VOTES for this and every other cluster member should be adjusted at a convenient time before a reboot. For complete instructions, check the section on configuring a cluster in the "OpenVMS Cluster Systems" manual.
This cluster configuration sets a bunch of parameters. We don't want to keep all of them, so once the cluster config is done, we tell the system NOT to run autogen yet.
Execute AUTOGEN to compute the SYSGEN parameters for your configuration and reboot WHISKY with the new parameters. This is necessary before WHISKY can become a cluster member. Do you want to run AUTOGEN now [Y]? N
Add or change the following parameters on the master node:
$ EDIT SYS$SYSTEM:MODPARAMS.DAT VOTES=1 EXPECTED_VOTES=1 SHADOWING=2 SHADOW_MAX_COPY=128
On node COGNAC, make sure to set VOTES=0. The "VOTES" and "EXPECTED_VOTES" do what they say: in this cluster, WHISKY has the only vote. Shadowing is being enabled, and 128 simultaneous I/O operations are allowed for syncing a shadow disk (we can spare the I/O on our fast host server, right?). After both nodes have their proper parameters, run autogen and reboot.
@SYS$UPDATE:AUTOGEN GETDATA REBOOT NOFEEDBACK
Once both machines are back up, lo and behold!
$ SHOW CLUSTER View of Cluster from system ID 1155 node: WHISKY ┌───────────────────┬─────────┐ │ SYSTEMS │ MEMBERS │ ├────────┬──────────┼─────────┤ │ NODE │ SOFTWARE │ STATUS │ ├────────┼──────────┼─────────┤ │ WHISKY │ VMS V7.3 │ MEMBER │ │ COGNAC │ VMS V7.3 │ MEMBER │ └────────┴──────────┴─────────┘
Disk shadowing
One of the fun things of VMS clusters is their ability to mount any disk on any node, regardless of where that disk is located in the cluster - and their ability to replicate a disk between nodes. This is called disk shadowing, and it's a handy option to make sure your data doesn't die with your machine if something breaks.
Since we have a DUA1 on each of the nodes, we'll setup the shadowset between those.
$ INIT/ERASE $131$DUA1: SHADOWDATA $ INIT/ERASE $132$DUA1: SHADOWDATA $ MOUNT/SYSTEM/CLUSTER DSA0: /SHADOW=($131$DUA1,$132$DUA1) SHADOWDATA SHADOWDATA %WBM-I-WBMINFO Deleting all bitmaps represented by this WBMB. %MOUNT-I-MOUNTED, SHADOWDATA mounted on _DSA0: %MOUNT-I-SHDWMEMSUCC, _$131$DUA1: (WHISKY) is now a valid member of the shadow set %MOUNT-I-SHDWMEMCOPY, _$132$DUA1: (COGNAC) added to the shadow set with a copy operation
The commands speak for themselves: first, we initialise the disks, with extra option to erase them; then, we mount the disk on all systems in the cluster as device DSA0. DSA0 is a shadowset of $131$DUA1 and $132$DUA1; we mount it under label SHADOWDATA.
A few seconds later, OPCOM announces that the disk sync has started; ff you check the device now, you can see the shadow sync progress.
%%%%%%%%%%% OPCOM 27-JAN-2012 20:03:07.93 %%%%%%%%%%% (from node COGNAC at 27-JAN-2012 20:02:46.41) Message from user SYSTEM on COGNAC %SHADOW_SERVER-I-SSRVINICPY, initiating copy operation on _DSA0: at LBN: 0, I/O size: 127 blocks, ID number: 4500004B. $ SHOW DEVICE SHADOWDATA Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA0: Mounted 0 SHADOWDATA 2375847 1 2 $131$DUA1: (WHISKY) ShadowSetMember 0 (member of DSA0:) $132$DUA1: (COGNAC) ShadowCopying 0 (copy trgt DSA0: 4% copied)
And that's that. You can create, delete, modify, ... anything on SHADOWDATA: and it'll be immediately visible on the other node.
The last step: shared management
Now that we have a cluster and a cluster-wide shared disk, there's only one last thing to do. All nodes are still using their own, separate databases for virtually everything: their own user authorization files (UAF), rights lists, DECnet lists, ... This means that for every action we want to do on the cluster, we have to perform the commands on every node! This includes adding users; changing passwords; deleting users; ...
The solution is to put the files that keep the records for all these facilities on a common disk - like our shadowset. But since these files will be needed very early in the boot process, doing a MOUNT in SYSTARTUP_VMS.COM won't do: we'll have to hook into a file that is run much earlier. Create a dir on the SHADOWDATA volume to hold cluster-specific files, and edit SYLOGICALS.COM.
$ CREATE/DIRECTORY SHADOWDATA:[CLUSTER$CONFIG] $ EDIT SYS$MANAGER:SYLOGICALS.COM
Yes, I always wanted to create a directory with a funky $ in it name :-) In SYLOGICALS.COM, search for the following line:
$! DEFINE/SYSTEM/EXECUTIVE SYSUAF SYS$SYSTEM:SYSUAF.DAT
As you can see, it's commented; this is the default location for SYSUAF.DAT. Below it, all other cluster-aware files are listed with their default locations. Our cluster disk will have to be mounted before we redefine these locations, so put these lines before the DEFINE list:
$ MOUNT/SYSTEM DSA0: /SHADOW=($131$DUA1,$132$DUA1) SHADOWDATA SHADOWDATA
Notice that we're NOT using the /CLUSTER switch here! Every node will mount this disk independently, to avoid complex issues if the node issuing the /CLUSTER mount is delayed during boot. Save the file (remember to do it on both nodes!) and reboot the entire cluster.
$ REBOOT
If all goes well, the shadowset will be mounted on both nodes at boottime.
$ SHOW DEV SHADOWDATA Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt DSA0: Mounted 0 SHADOWDATA 2375847 1 2 $131$DUA1: (WHISKY) ShadowSetMember 0 (member of DSA0:) $132$DUA1: (COGNAC) ShadowSetMember 0 (member of DSA0:)
Next, we're going to copy the needed files to the CLUSTER$CONFIG directory:
$ COPY SYS$SYSTEM:SYSUAF.DAT SHADOWDATA:[CLUSTER$CONFIG] $ COPY SYS$SYSTEM:SYSUAFALT.DAT SHADOWDATA:[CLUSTER$CONFIG] $ COPY SYS$SYSTEM:SYSALF.DAT SHADOWDATA:[CLUSTER$CONFIG] $ COPY SYS$SYSTEM:RIGHTSLIST.DAT SHADOWDATA:[CLUSTER$CONFIG] $ COPY SYS$SYSTEM:NETPROXY.DAT SHADOWDATA:[CLUSTER$CONFIG] $ COPY SYS$SYSTEM:NET$PROXY.DAT SHADOWDATA:[CLUSTER$CONFIG] $ COPY SYS$SYSTEM:NETOBJECT.DAT SHADOWDATA:[CLUSTER$CONFIG] $ COPY SYS$SYSTEM:NETNODE_REMOTE.DAT SHADOWDATA:[CLUSTER$CONFIG] $ COPY SYS$SYSTEM:LMF$LICENSE.LDB SHADOWDATA:[CLUSTER$CONFIG] $ COPY SYS$SYSTEM:VMSMAIL_PROFILE.DATA SHADOWDATA:[CLUSTER$CONFIG] $ COPY SYS$SYSTEM:VMS$OBJECTS.DAT SHADOWDATA:[CLUSTER$CONFIG] $ COPY SYS$MANAGER:VMS$AUDIT_SERVER.DAT SHADOWDATA:[CLUSTER$CONFIG] $ COPY SYS$SYSTEM:VMS$PASSWORD_HISTORY.DATA SHADOWDATA:[CLUSTER$CONFIG] $ COPY SYS$LIBRARY:VMS$PASSWORD_DICTIONARY.DATA SHADOWDATA:[CLUSTER$CONFIG] $ COPY SYS$MANAGER:NETNODE_UPDATE.COM SHADOWDATA:[CLUSTER$CONFIG] $ COPY SYS$LIBRARY:VMS$PASSWORD_POLICY.EXE SHADOWDATA:[CLUSTER$CONFIG] $ COPY SYS$SYSTEM:LAN$NODE_DATABASE.DAT SHADOWDATA:[CLUSTER$CONFIG] $ COPY SYS$STARTUP:SYLOGIN.TEMPLATE SHADOWDATA:[CLUSTER$CONFIG]SYLOGIN.COM
Now, don't be alarmed if not all commands succeed; this is normal. Not all files will already exist on your system, in which case you'll get an error like the following:
$ $ COPY SYS$SYSTEM:NETPROXY.DAT SHADOWDATA:[CLUSTER$CONFIG] %COPY-E-OPENIN, error opening SYS$COMMON:[SYSEXE]NETPROXY.DAT; as input -RMS-E-FNF, file not found
You can safely ignore this; if the file doesn't exist, there's nothing to copy. You'll also notice some files being locked and thus not copyable:
$ $ COPY SYS$SYSTEM:VMS$OBJECTS.DAT SHADOWDATA:[CLUSTER$CONFIG] %COPY-E-OPENIN, error opening SYS$COMMON:[SYSEXE]VMS$OBJECTS.DAT;1 as input -RMS-E-FLK, file currently locked by another user
In this case, you'll have to use a sneaky workaround (thanks, Hoff!) to get the contents of this file anyway:
$ CONVERT/SHARE SYS$SYSTEM:VMS$OBJECTS.DAT SHADOWDATA:[CLUSTER$CONFIG]VMS$OBJECTS.DAT
Once all files are either copied or found not to exist, all that remains to do is to modify SYLOGICALS.COM to actually point to the files on the shadowed disk. Open the file, locate the MOUNT command that you added earlier on, and add the following lines underneath it:
$ DEFINE/SYSTEM/EXECUTIVE SYSUAF SHADOWDATA:[CLUSTER$CONFIG]SYSUAF.DAT $ DEFINE/SYSTEM/EXECUTIVE SYSUAFALT SHADOWDATA:[CLUSTER$CONFIG]SYSUAFALT.DAT $ DEFINE/SYSTEM/EXECUTIVE SYSALF SHADOWDATA:[CLUSTER$CONFIG]SYSALF.DAT $ DEFINE/SYSTEM/EXECUTIVE RIGHTSLIST SHADOWDATA:[CLUSTER$CONFIG]RIGHTSLIST.DAT $ DEFINE/SYSTEM/EXECUTIVE NETPROXY SHADOWDATA:[CLUSTER$CONFIG]NETPROXY.DAT $ DEFINE/SYSTEM/EXECUTIVE NET$PROXY SHADOWDATA:[CLUSTER$CONFIG]NET$PROXY.DAT $ DEFINE/SYSTEM/EXECUTIVE NETOBJECT SHADOWDATA:[CLUSTER$CONFIG]NETOBJECT.DAT $ DEFINE/SYSTEM/EXECUTIVE NETNODE_REMOTE SHADOWDATA:[CLUSTER$CONFIG]NETNODE_REMOTE.DAT $ DEFINE/SYSTEM/EXECUTIVE LMF$LICENSE SHADOWDATA:[CLUSTER$CONFIG]LMF$LICENSE.LDB $ DEFINE/SYSTEM/EXECUTIVE VMSMAIL_PROFILE SHADOWDATA:[CLUSTER$CONFIG]VMSMAIL_PROFILE.DATA $ DEFINE/SYSTEM/EXECUTIVE VMS$OBJECTS SHADOWDATA:[CLUSTER$CONFIG]VMS$OBJECTS.DAT $ DEFINE/SYSTEM/EXECUTIVE VMS$AUDIT_SERVER SHADOWDATA:[CLUSTER$CONFIG]VMS$AUDIT_SERVER.DAT $ DEFINE/SYSTEM/EXECUTIVE VMS$PASSWORD_HISTORY SHADOWDATA:[CLUSTER$CONFIG]VMS$PASSWORD_HISTORY.DATA $ DEFINE/SYSTEM/EXECUTIVE VMS$PASSWORD_DICTIONARY SHADOWDATA:[CLUSTER$CONFIG]VMS$PASSWORD_DICTIONARY.DATA $ DEFINE/SYSTEM/EXECUTIVE NETNODE_UPDATE SHADOWDATA:[CLUSTER$CONFIG]NETNODE_UPDATE.COM $ DEFINE/SYSTEM/EXECUTIVE VMS$PASSWORD_POLICY SHADOWDATA:[CLUSTER$CONFIG]VMS$PASSWORD_POLICY.EXE $ DEFINE/SYSTEM/EXECUTIVE LAN$NODE_DATABASE SHADOWDATA:[CLUSTER$CONFIG]LAN$NODE_DATABASE.DAT
Save SYLOGICALS.COM. As you have noticed, I've copied one extra file to the CLUSTER$CONFIG directory: SYLOGIN.COM. This is basically the "/etc/profile" of VMS: you can define startup commands here that will be executed for every DCL login - be it local console or telnet, SYSTEM or user. This file is defined in SYSTARTUP_VMS.COM:
$ DEFINE/SYSTEM/EXECUTIVE SYS$SYLOGIN SHADOWDATA:[CLUSTER$CONFIG]SYLOGIN.COM
This does mean that the file itself must be readable by all users, and also that the directory must be accessible:
$ SET DEF SHADOWDATA:[000000] $ SET PROTECTION=(S:RWED,O:RWED,G:RE,W:RE) SHADOWDATA:[CLUSTER$CONFIG]SYLOGIN.COM $ SET PROTECTION=(S:RWED,O:RWED,G:RE,W:RE) CLUSTER$CONFIG.DIR
Do note the funky syntax to go to the "root" directory of a filesystem; this root dir is called 000000. We're adding the World (W:) permissions Read and Execute on both dir and file. Edit the SYLOGIN.COM file, and add a VT100 term definition and a nicer custom prompt:
$ SET TERM/VT100 $ SET PROMPT="''F$GETSYI("DECNET_FULLNAME")'''F$GETJPI("","USERNAME")'$ "
Reboot both nodes. Crossing fingers, you'll have a common UAF in a minute.
Welcome to OpenVMS (TM) VAX Operating System, Version V7.3 Username: system Password: %LICENSE-I-NOLICENSE, no license is active for this software product %LOGIN-S-LOGOPRCON, login allowed from OPA0: Welcome to OpenVMS (TM) VAX Operating System, Version V7.3 on node COGNAC Last interactive login on Friday, 27-JAN-2012 21:12 COGNAC::ALVER $
Wait, what? Yes, of course. The license db is one of the files that was moved to the cluster disk, and since we took the copy of the master node, we'll have to re-apply all licenses of the slave node. Easiest way is to FTP the license files over to the machine, and @ them.
COGNAC::ALVER $ @ COGNAC-VMS.TXT COGNAC::ALVER $ @ COGNAC-PAK.TXT COGNAC::ALVER $ LICENSE MODIFY/INCLUDE=COGNAC VAX-VMS %LICENSE-W-AMBIG, information provided was ambiguous; multiple licenses were found for VAX-VMS
This last error is to show that since there are now two licenses for each software on the cluster, you'll need to be more specific about which specific license you'll want to MODIFY:
COGNAC::ALVER $ LICENSE MODIFY/INCLUDE=COGNAC /AUTHORIZATION=DECUS-BEL-006150000-170XXXX VAX-VMS COGNAC::ALVER $ LICENSE LOAD
Now, let's check the terminal type:
COGNAC::ALVER $ SHOW TERMINAL Terminal: _OPA0: Device_Type: VT100 Owner: SYSTEM Input: 300 LFfill: 0 Width: 132 Parity: None Output: 300 CRfill: 0 Page: 24 Terminal Characteristics: Interactive Echo Type_ahead No Escape No Hostsync TTsync Lowercase Tab Wrap Scope No Remote No Eightbit Broadcast No Readsync No Form Fulldup No Modem No Local_echo No Autobaud No Hangup No Brdcstmbx No DMA No Altypeahd Set_speed No Commsync Line Editing Overstrike editing No Fallback No Dialup No Secure server No Disconnect No Pasthru No Syspassword No SIXEL Graphics No Soft Characters No Printer Port Numeric Keypad ANSI_CRT No Regis No Block_mode Advanced_video No Edit_mode DEC_CRT No DEC_CRT2 No DEC_CRT3 No DEC_CRT4 No DEC_CRT5 No Ansi_Color VMS Style Input
Yup: VT100. And that pretty much concludes it. Two-node VMS cluster, running in emulated SIMH VAXes, on a single Linux host, with shadowed disk and common UAF.
Some notes
What if you'd want to expand your cluster and shadowset to three nodes, there's a few things to keep in mind:
- EXPECTED_VOTES has to be adjusted to 3 on all nodes
- VOTES has to be adjusted to 1 on the original 'slave' node
- DSA0 needs to be expanded with an extra shadow member
- SYLOGICALS.COM needs to be adjusted on the original two nodes to reflect the extra shadow member
Other than that... just try :-) and remember, if you think that what you're going to do might just screw up your entire setup... do a shutdown first, and take a plain file backup of the VMS-RQ* disks for each node. Easy, no?
Enjoy!
Credits & thanks
Even more than the earlier article, this second part of my project would, with a 100% certainty, have ended in a giant bloody trainwreck if it weren't for the guidance and VMS voodoo of Steve "Hoff" Hoffman, the Deathrow VMS cluster people and everyone in #vms on irc.2600.net. Thanks!