Clustering OpenVMS on SIMH

After following the article published earlier, you ended up with two OpenVMS 7.3 systems running on SIMH VAX emulator instances. Networking was configured, both over DECnet and TCP/IP. Next up: clustering those two systems.

Creating the basic cluster

A two-node cluster is vulnerable: the only way to get a proper quorum working in such a config, is by using a quorum disk. Since we don't have shared storage to do this (yes, we *could* try to do this on the same host, by sharing a disk-file - but what if we want to cluster a node that's not local?) we'll opt for a less sexy approach: one node will be 'master', the other 'slave'. If the 'master' goes down, the 'slave' will have to wait until the 'master' is back up again. This guarantees cluster data integrity.

The cluster configuration utility should be ran on both nodes; first on the master, then on the slave.

$ @SYS$MANAGER:CLUSTER_CONFIG_LAN.COM

Cluster Configuration Procedure
Executing on a VAX System

DECnet Phase IV is installed on this node.

The LAN, not DECnet, will be used for MOP downline loading.

To ensure that this procedure is executing with the required
privileges, invoke it from the system manager's account.

Enter a "?" for help at any prompt.  If you are familiar with
the execution of this procedure, you may want to mute extra notes
and explanations by invoking it with "@CLUSTER_CONFIG_LAN BRIEF".

This VAX node is not currently a cluster member.

A few questions will be asked. Yes, you want to add your node to a cluster, or add a new one if one doesn't exist yet. Yes, the LAN will be used for cluster communications.

Cluster number and password are things you should write down: they will be needed to authorize new nodes. To keep the organization clean and simple, we'll take "cluster number" == "DECnet area". So, in our case: 1. Our nodes won't be serving as boot servers (unless you really want to - go ahead and try it), but will serve disks, but no RFxx disks.

And thus we arrive at the dreaded ALLOCLASS parameter. Without going into too much detail: this parameter points at the allocation class value of the machine, which is used to identify devices across the cluster. If WHISKY and COGNAC both have a DUA1 (and they have) we need this identifier to specify *which* DUA1 we are talking about. Again, to keep things clean and simple, I'm keeping the ALLOCLASS value the same as the DECnet node number, so WHISKY gets ALLOCLASS=131 and COGNAC 132. The devices would thus be $131$DUA1 and $132$DUA1.

MAIN MENU

1. ADD WHISKY to existing cluster, or form a new cluster.
2. MAKE a directory structure for a new root on a system disk.
3. DELETE a root from a system disk.
4. EXIT from this procedure.

Enter choice [1]:
Will the LAN be used for cluster communications (Y/N)? Y
Enter this cluster's group number: 1
Enter this cluster's password:
Re-enter this cluster's password for verification:
Will WHISKY be a boot server [Y]? N
Will WHISKY be a disk server (Y/N)? Y
Will WHISKY serve RFxx disks [N]?
Enter a value for WHISKY's ALLOCLASS parameter [0]: 131
Does this cluster contain a quorum disk [N]?

WARNING: WHISKY will be a voting cluster member. EXPECTED_VOTES for
this and every other cluster member should be adjusted at
a convenient time before a reboot. For complete instructions,
check the section on configuring a cluster in the "OpenVMS
Cluster Systems" manual.

This cluster configuration sets a bunch of parameters. We don't want to keep all of them, so once the cluster config is done, we tell the system NOT to run autogen yet.

Execute AUTOGEN to compute the SYSGEN parameters for your configuration
and reboot WHISKY with the new parameters. This is necessary before
WHISKY can become a cluster member.

Do you want to run AUTOGEN now [Y]? N

Add or change the following parameters on the master node:

$ EDIT SYS$SYSTEM:MODPARAMS.DAT

VOTES=1
EXPECTED_VOTES=1
SHADOWING=2
SHADOW_MAX_COPY=128

On node COGNAC, make sure to set VOTES=0. The "VOTES" and "EXPECTED_VOTES" do what they say: in this cluster, WHISKY has the only vote. Shadowing is being enabled, and 128 simultaneous I/O operations are allowed for syncing a shadow disk (we can spare the I/O on our fast host server, right?). After both nodes have their proper parameters, run autogen and reboot.

@SYS$UPDATE:AUTOGEN GETDATA REBOOT NOFEEDBACK

Once both machines are back up, lo and behold!

$ SHOW CLUSTER

View of Cluster from system ID 1155  node: WHISKY
┌───────────────────┬─────────┐
│      SYSTEMS      │ MEMBERS │
├────────┬──────────┼─────────┤
│  NODE  │ SOFTWARE │  STATUS │
├────────┼──────────┼─────────┤
│ WHISKY │ VMS V7.3 │ MEMBER  │
│ COGNAC │ VMS V7.3 │ MEMBER  │
└────────┴──────────┴─────────┘

Disk shadowing

One of the fun things of VMS clusters is their ability to mount any disk on any node, regardless of where that disk is located in the cluster - and their ability to replicate a disk between nodes. This is called disk shadowing, and it's a handy option to make sure your data doesn't die with your machine if something breaks.

Since we have a DUA1 on each of the nodes, we'll setup the shadowset between those.

$ INIT/ERASE $131$DUA1: SHADOWDATA
$ INIT/ERASE $132$DUA1: SHADOWDATA
$ MOUNT/SYSTEM/CLUSTER DSA0: /SHADOW=($131$DUA1,$132$DUA1) SHADOWDATA SHADOWDATA
%WBM-I-WBMINFO Deleting all bitmaps represented by this WBMB.
%MOUNT-I-MOUNTED, SHADOWDATA mounted on _DSA0:
%MOUNT-I-SHDWMEMSUCC, _$131$DUA1: (WHISKY) is now a valid member of the shadow set
%MOUNT-I-SHDWMEMCOPY, _$132$DUA1: (COGNAC) added to the shadow set with a copy operation

The commands speak for themselves: first, we initialise the disks, with extra option to erase them; then, we mount the disk on all systems in the cluster as device DSA0. DSA0 is a shadowset of $131$DUA1 and $132$DUA1; we mount it under label SHADOWDATA.

A few seconds later, OPCOM announces that the disk sync has started; ff you check the device now, you can see the shadow sync progress.

%%%%%%%%%%%  OPCOM  27-JAN-2012 20:03:07.93  %%%%%%%%%%%    (from node COGNAC at 27-JAN-2012 20:02:46.41)
Message from user SYSTEM on COGNAC
%SHADOW_SERVER-I-SSRVINICPY, initiating copy operation on _DSA0: at LBN: 0, I/O size: 127 blocks, ID number: 4500004B.

$ SHOW DEVICE SHADOWDATA

Device                  Device           Error    Volume         Free  Trans Mnt
Name                   Status           Count     Label        Blocks Count Cnt
DSA0:                   Mounted              0  SHADOWDATA     2375847     1   2
$131$DUA1:    (WHISKY)  ShadowSetMember      0  (member of DSA0:)
$132$DUA1:    (COGNAC)  ShadowCopying        0  (copy trgt DSA0:   4% copied)

And that's that. You can create, delete, modify, ... anything on SHADOWDATA: and it'll be immediately visible on the other node.

The last step: shared management

Now that we have a cluster and a cluster-wide shared disk, there's only one last thing to do. All nodes are still using their own, separate databases for virtually everything: their own user authorization files (UAF), rights lists, DECnet lists, ... This means that for every action we want to do on the cluster, we have to perform the commands on every node! This includes adding users; changing passwords; deleting users; ...

The solution is to put the files that keep the records for all these facilities on a common disk - like our shadowset. But since these files will be needed very early in the boot process, doing a MOUNT in SYSTARTUP_VMS.COM won't do: we'll have to hook into a file that is run much earlier. Create a dir on the SHADOWDATA volume to hold cluster-specific files, and edit SYLOGICALS.COM.

$ CREATE/DIRECTORY SHADOWDATA:[CLUSTER$CONFIG]
$ EDIT SYS$MANAGER:SYLOGICALS.COM

Yes, I always wanted to create a directory with a funky $ in it name :-) In SYLOGICALS.COM, search for the following line:

$! DEFINE/SYSTEM/EXECUTIVE SYSUAF                      SYS$SYSTEM:SYSUAF.DAT

As you can see, it's commented; this is the default location for SYSUAF.DAT. Below it, all other cluster-aware files are listed with their default locations. Our cluster disk will have to be mounted before we redefine these locations, so put these lines before the DEFINE list:

$ MOUNT/SYSTEM DSA0: /SHADOW=($131$DUA1,$132$DUA1) SHADOWDATA SHADOWDATA

Notice that we're NOT using the /CLUSTER switch here! Every node will mount this disk independently, to avoid complex issues if the node issuing the /CLUSTER mount is delayed during boot. Save the file (remember to do it on both nodes!) and reboot the entire cluster.

$ REBOOT

If all goes well, the shadowset will be mounted on both nodes at boottime.

$ SHOW DEV SHADOWDATA

Device                  Device           Error    Volume         Free  Trans Mnt
Name                    Status           Count     Label        Blocks Count Cnt
DSA0:                   Mounted              0  SHADOWDATA     2375847     1   2
$131$DUA1:    (WHISKY)  ShadowSetMember      0  (member of DSA0:)
$132$DUA1:    (COGNAC)  ShadowSetMember      0  (member of DSA0:)

Next, we're going to copy the needed files to the CLUSTER$CONFIG directory:

$ COPY SYS$SYSTEM:SYSUAF.DAT SHADOWDATA:[CLUSTER$CONFIG]
$ COPY SYS$SYSTEM:SYSUAFALT.DAT SHADOWDATA:[CLUSTER$CONFIG]
$ COPY SYS$SYSTEM:SYSALF.DAT SHADOWDATA:[CLUSTER$CONFIG]
$ COPY SYS$SYSTEM:RIGHTSLIST.DAT SHADOWDATA:[CLUSTER$CONFIG]
$ COPY SYS$SYSTEM:NETPROXY.DAT SHADOWDATA:[CLUSTER$CONFIG]
$ COPY SYS$SYSTEM:NET$PROXY.DAT SHADOWDATA:[CLUSTER$CONFIG]
$ COPY SYS$SYSTEM:NETOBJECT.DAT SHADOWDATA:[CLUSTER$CONFIG]
$ COPY SYS$SYSTEM:NETNODE_REMOTE.DAT SHADOWDATA:[CLUSTER$CONFIG]
$ COPY SYS$SYSTEM:LMF$LICENSE.LDB SHADOWDATA:[CLUSTER$CONFIG]
$ COPY SYS$SYSTEM:VMSMAIL_PROFILE.DATA SHADOWDATA:[CLUSTER$CONFIG]
$ COPY SYS$SYSTEM:VMS$OBJECTS.DAT SHADOWDATA:[CLUSTER$CONFIG]
$ COPY SYS$MANAGER:VMS$AUDIT_SERVER.DAT SHADOWDATA:[CLUSTER$CONFIG]
$ COPY SYS$SYSTEM:VMS$PASSWORD_HISTORY.DATA SHADOWDATA:[CLUSTER$CONFIG]
$ COPY SYS$LIBRARY:VMS$PASSWORD_DICTIONARY.DATA SHADOWDATA:[CLUSTER$CONFIG]
$ COPY SYS$MANAGER:NETNODE_UPDATE.COM SHADOWDATA:[CLUSTER$CONFIG]
$ COPY SYS$LIBRARY:VMS$PASSWORD_POLICY.EXE SHADOWDATA:[CLUSTER$CONFIG]
$ COPY SYS$SYSTEM:LAN$NODE_DATABASE.DAT SHADOWDATA:[CLUSTER$CONFIG]
$ COPY SYS$STARTUP:SYLOGIN.TEMPLATE SHADOWDATA:[CLUSTER$CONFIG]SYLOGIN.COM

Now, don't be alarmed if not all commands succeed; this is normal. Not all files will already exist on your system, in which case you'll get an error like the following:

$ $ COPY SYS$SYSTEM:NETPROXY.DAT SHADOWDATA:[CLUSTER$CONFIG]
%COPY-E-OPENIN, error opening SYS$COMMON:[SYSEXE]NETPROXY.DAT; as input
-RMS-E-FNF, file not found

You can safely ignore this; if the file doesn't exist, there's nothing to copy. You'll also notice some files being locked and thus not copyable:

$ $ COPY SYS$SYSTEM:VMS$OBJECTS.DAT SHADOWDATA:[CLUSTER$CONFIG]
%COPY-E-OPENIN, error opening SYS$COMMON:[SYSEXE]VMS$OBJECTS.DAT;1 as input
-RMS-E-FLK, file currently locked by another user

In this case, you'll have to use a sneaky workaround (thanks, Hoff!) to get the contents of this file anyway:

$ CONVERT/SHARE SYS$SYSTEM:VMS$OBJECTS.DAT SHADOWDATA:[CLUSTER$CONFIG]VMS$OBJECTS.DAT

Once all files are either copied or found not to exist, all that remains to do is to modify SYLOGICALS.COM to actually point to the files on the shadowed disk. Open the file, locate the MOUNT command that you added earlier on, and add the following lines underneath it:

$ DEFINE/SYSTEM/EXECUTIVE SYSUAF          SHADOWDATA:[CLUSTER$CONFIG]SYSUAF.DAT
$ DEFINE/SYSTEM/EXECUTIVE SYSUAFALT       SHADOWDATA:[CLUSTER$CONFIG]SYSUAFALT.DAT
$ DEFINE/SYSTEM/EXECUTIVE SYSALF          SHADOWDATA:[CLUSTER$CONFIG]SYSALF.DAT
$ DEFINE/SYSTEM/EXECUTIVE RIGHTSLIST      SHADOWDATA:[CLUSTER$CONFIG]RIGHTSLIST.DAT
$ DEFINE/SYSTEM/EXECUTIVE NETPROXY        SHADOWDATA:[CLUSTER$CONFIG]NETPROXY.DAT
$ DEFINE/SYSTEM/EXECUTIVE NET$PROXY       SHADOWDATA:[CLUSTER$CONFIG]NET$PROXY.DAT
$ DEFINE/SYSTEM/EXECUTIVE NETOBJECT       SHADOWDATA:[CLUSTER$CONFIG]NETOBJECT.DAT
$ DEFINE/SYSTEM/EXECUTIVE NETNODE_REMOTE  SHADOWDATA:[CLUSTER$CONFIG]NETNODE_REMOTE.DAT
$ DEFINE/SYSTEM/EXECUTIVE LMF$LICENSE     SHADOWDATA:[CLUSTER$CONFIG]LMF$LICENSE.LDB
$ DEFINE/SYSTEM/EXECUTIVE VMSMAIL_PROFILE SHADOWDATA:[CLUSTER$CONFIG]VMSMAIL_PROFILE.DATA
$ DEFINE/SYSTEM/EXECUTIVE VMS$OBJECTS     SHADOWDATA:[CLUSTER$CONFIG]VMS$OBJECTS.DAT
$ DEFINE/SYSTEM/EXECUTIVE VMS$AUDIT_SERVER SHADOWDATA:[CLUSTER$CONFIG]VMS$AUDIT_SERVER.DAT
$ DEFINE/SYSTEM/EXECUTIVE VMS$PASSWORD_HISTORY SHADOWDATA:[CLUSTER$CONFIG]VMS$PASSWORD_HISTORY.DATA
$ DEFINE/SYSTEM/EXECUTIVE VMS$PASSWORD_DICTIONARY SHADOWDATA:[CLUSTER$CONFIG]VMS$PASSWORD_DICTIONARY.DATA
$ DEFINE/SYSTEM/EXECUTIVE NETNODE_UPDATE  SHADOWDATA:[CLUSTER$CONFIG]NETNODE_UPDATE.COM
$ DEFINE/SYSTEM/EXECUTIVE VMS$PASSWORD_POLICY SHADOWDATA:[CLUSTER$CONFIG]VMS$PASSWORD_POLICY.EXE
$ DEFINE/SYSTEM/EXECUTIVE LAN$NODE_DATABASE SHADOWDATA:[CLUSTER$CONFIG]LAN$NODE_DATABASE.DAT

Save SYLOGICALS.COM. As you have noticed, I've copied one extra file to the CLUSTER$CONFIG directory: SYLOGIN.COM. This is basically the "/etc/profile" of VMS: you can define startup commands here that will be executed for every DCL login - be it local console or telnet, SYSTEM or user. This file is defined in SYSTARTUP_VMS.COM:

$ DEFINE/SYSTEM/EXECUTIVE SYS$SYLOGIN SHADOWDATA:[CLUSTER$CONFIG]SYLOGIN.COM 

This does mean that the file itself must be readable by all users, and also that the directory must be accessible:

$ SET DEF SHADOWDATA:[000000]
$ SET PROTECTION=(S:RWED,O:RWED,G:RE,W:RE) SHADOWDATA:[CLUSTER$CONFIG]SYLOGIN.COM
$ SET PROTECTION=(S:RWED,O:RWED,G:RE,W:RE) CLUSTER$CONFIG.DIR

Do note the funky syntax to go to the "root" directory of a filesystem; this root dir is called 000000. We're adding the World (W:) permissions Read and Execute on both dir and file. Edit the SYLOGIN.COM file, and add a VT100 term definition and a nicer custom prompt:

$ SET TERM/VT100 
$ SET PROMPT="''F$GETSYI("DECNET_FULLNAME")'''F$GETJPI("","USERNAME")'$ " 

Reboot both nodes. Crossing fingers, you'll have a common UAF in a minute.

Welcome to OpenVMS (TM) VAX Operating System, Version V7.3   

Username: system
Password:
%LICENSE-I-NOLICENSE, no license is active for this software product
%LOGIN-S-LOGOPRCON, login allowed from OPA0:
Welcome to OpenVMS (TM) VAX Operating System, Version V7.3 on node COGNAC
    Last interactive login on Friday, 27-JAN-2012 21:12
COGNAC::ALVER       $

Wait, what? Yes, of course. The license db is one of the files that was moved to the cluster disk, and since we took the copy of the master node, we'll have to re-apply all licenses of the slave node. Easiest way is to FTP the license files over to the machine, and @ them.

COGNAC::ALVER       $ @ COGNAC-VMS.TXT
COGNAC::ALVER       $ @ COGNAC-PAK.TXT
COGNAC::ALVER       $ LICENSE MODIFY/INCLUDE=COGNAC VAX-VMS
%LICENSE-W-AMBIG, information provided was ambiguous; multiple licenses were found for VAX-VMS

This last error is to show that since there are now two licenses for each software on the cluster, you'll need to be more specific about which specific license you'll want to MODIFY:

COGNAC::ALVER $ LICENSE MODIFY/INCLUDE=COGNAC /AUTHORIZATION=DECUS-BEL-006150000-170XXXX VAX-VMS 
COGNAC::ALVER $ LICENSE LOAD 

Now, let's check the terminal type:

COGNAC::ALVER       $ SHOW TERMINAL
Terminal: _OPA0:      Device_Type: VT100         Owner: SYSTEM

   Input:     300     LFfill:  0      Width: 132      Parity: None
   Output:    300     CRfill:  0      Page:   24     

Terminal Characteristics:
   Interactive        Echo               Type_ahead         No Escape
   No Hostsync        TTsync             Lowercase          Tab
   Wrap               Scope              No Remote          No Eightbit
   Broadcast          No Readsync        No Form            Fulldup
   No Modem           No Local_echo      No Autobaud        No Hangup
   No Brdcstmbx       No DMA             No Altypeahd       Set_speed
   No Commsync        Line Editing       Overstrike editing No Fallback
   No Dialup          No Secure server   No Disconnect      No Pasthru
   No Syspassword     No SIXEL Graphics  No Soft Characters No Printer Port
   Numeric Keypad     ANSI_CRT           No Regis           No Block_mode
   Advanced_video     No Edit_mode       DEC_CRT            No DEC_CRT2
   No DEC_CRT3        No DEC_CRT4        No DEC_CRT5        No Ansi_Color
   VMS Style Input

Yup: VT100. And that pretty much concludes it. Two-node VMS cluster, running in emulated SIMH VAXes, on a single Linux host, with shadowed disk and common UAF.

Some notes

What if you'd want to expand your cluster and shadowset to three nodes, there's a few things to keep in mind:
- EXPECTED_VOTES has to be adjusted to 3 on all nodes
- VOTES has to be adjusted to 1 on the original 'slave' node
- DSA0 needs to be expanded with an extra shadow member
- SYLOGICALS.COM needs to be adjusted on the original two nodes to reflect the extra shadow member

Other than that... just try :-) and remember, if you think that what you're going to do might just screw up your entire setup... do a shutdown first, and take a plain file backup of the VMS-RQ* disks for each node. Easy, no?

Enjoy!

Credits & thanks

Even more than the earlier article, this second part of my project would, with a 100% certainty, have ended in a giant bloody trainwreck if it weren't for the guidance and VMS voodoo of Steve "Hoff" Hoffman, the Deathrow VMS cluster people and everyone in #vms on irc.2600.net. Thanks!