Difference between revisions of "BXadmin:Network/Galaxy"

From CCGB
Jump to: navigation, search
(cisco nat for private ips)
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
== Notes ==
 +
 +
Useful stuff:
 +
 +
* <tt>/afs/bx.psu.edu/service/rancid/prod/var/bx/config/*</tt> contains router configs pulled from the hardware by rancid, super useful.
 +
* Per the HP Procurve 2510G config/admin guide, ports are set to allow jumbos when they are added to a vlan that supports jumbos, and they are set to disallow jumbos when they are removed from a vlan that supports jumbos.  3 ports on procurve-8 were not allowing jumbos and i had to move them to a different vlan and back to 270 to make them work.
 +
 +
Unexpected problems I encountered:
 +
 +
* <tt>show run</tt> on the Cisco does not show you the commands used to create the vlans.  Since <tt>int vlan '''''id'''''</tt> can be executed without ever having executed <tt>vlan '''''id'''''</tt>, the parts of <tt>show run</tt> relevant to your new vlan will look just like the parts for existing vlans, but your new vlan will not pass traffic.  It's not until you execute <tt>show vlan</tt> that you will realize that your new vlan does not actually exist (the rancid config dump above ''does'' execute <tt>show vlan</tt>!).  '''Moral:''' don't forget to <tt>vlan '''''id'''''</tt>.
 +
* <tt>pbs_server</tt> and <tt>pbs_mom</tt> refused to communicate in the following scenario:
 +
** The server has interfaces on both the old and the new networks
 +
** The DNS name for the server points to the server's address on the old network
 +
** A mom is on the new network
 +
** Packets go: mom -> router -> pbs_server -> mom
 +
** TORQUE does not like this.  So you have to put the new address for the server in the mom's /etc/hosts until DNS changes.
 +
* Likewise with NFS, be aware of where your packets will appear to come from from multihomed hosts once DNS for the server changes.
 +
* The LDAP server has ACLs that limits access to the Group OU but not (the necessary attributes in) the People OU.  So if you don't modify the ACLs, users work but groups don't.
 +
* RCC has a firewall that has to be updated if the PBS submit host (main) changes IPs and if the NFS servers change.
 +
* You have to think really hard about how packets are going to route once you make certain changes, e.g. DNS.  And never forget that just because you send them one may does not mean they'll want to come back the same way.
 +
* The only interface in the main-web1 zone with a <tt>defrouter</tt> specified was still not being used as *the* default route because:
 +
** bigsky had an address on the old public subnet
 +
** main-web1 had an address on both the old and new public subnets, and <tt>defrouter</tt> specified on the new network
 +
** Because main-web1 had an interface on the old subnet, it was still using that default route (since it was bigsky's default route)
 +
** Okay, even more problems: the routing table is completely global, so because I removed bigsky and main-db1's public IPs, they had a default route of 172.18.2.1.  Since main-web1 also has a 172.18.2.0/25 interface, it was just picking one or the other default route and so cyberstar connections were frequently failing.  I readded bigsky and main-db1's public IPs so the 172.18.2.1 default route could go away (this is now possible since main-web1's 128.118.200.0/23 interface is down).  Probably I should just get rid of the zones.
 +
** More on this:
 +
*** [https://blogs.oracle.com/stw/entry/solaris_zones_and_networking_common https://blogs.oracle.com/stw/entry/solaris_zones_and_networking_common]
 +
*** [https://blogs.oracle.com/stw/entry/guidelines_on_zones_with_shared https://blogs.oracle.com/stw/entry/guidelines_on_zones_with_shared]
 +
 
== Initial Configuration ==
 
== Initial Configuration ==
 
Galaxy has its own subnet, this is the configuration that was done to create it:
 
Galaxy has its own subnet, this is the configuration that was done to create it:
Line 12: Line 41:
 
=== switch-cisco-3750-1 ===
 
=== switch-cisco-3750-1 ===
  
<pre>switch-cisco-3750-1(config)#int vlan 140
+
<pre>switch-cisco-3750-1(config)#vlan 140
 +
switch-cisco-3750-1(config-vlan)#name GALAXY_PUBLIC
 +
switch-cisco-3750-1(config-vlan)#exit
 +
switch-cisco-3750-1(config)#vlan 270
 +
switch-cisco-3750-1(config-vlan)#name GALAXY_PRIVATE
 +
switch-cisco-3750-1(config-vlan)#exit
 +
switch-cisco-3750-1(config)#int vlan 140
 
switch-cisco-3750-1(config-if)#description GALAXY_PUBLIC
 
switch-cisco-3750-1(config-if)#description GALAXY_PUBLIC
 
switch-cisco-3750-1(config-if)#no ip address
 
switch-cisco-3750-1(config-if)#no ip address
Line 21: Line 56:
 
switch-cisco-3750-1(config-if)#exit
 
switch-cisco-3750-1(config-if)#exit
 
switch-cisco-3750-1(config)#ip route 128.118.250.0 255.255.255.224 10.1.7.2
 
switch-cisco-3750-1(config)#ip route 128.118.250.0 255.255.255.224 10.1.7.2
switch-cisco-3750-1(config)#ip route 172.18.2.0 255.255.255.0 10.1.7.2
+
switch-cisco-3750-1(config)#ip route 172.18.2.0 255.255.255.128 10.1.7.2
 
</pre>
 
</pre>
 +
 +
Also, established connections to <pre>172.18.0.0 0.0.15.255</pre> had to be added to the inbound access-list.
  
 
=== switch-dell-powerconnect-6248-1 ===
 
=== switch-dell-powerconnect-6248-1 ===
Line 42: Line 79:
 
switch-dell-powerconnect-1(config)#interface vlan 270
 
switch-dell-powerconnect-1(config)#interface vlan 270
 
switch-dell-powerconnect-1(config-if-vlan270)#name "GALAXY_PRIVATE"
 
switch-dell-powerconnect-1(config-if-vlan270)#name "GALAXY_PRIVATE"
switch-dell-powerconnect-1(config-if-vlan270)#ip address 172.18.2.1 255.255.255.0
+
switch-dell-powerconnect-1(config-if-vlan270)#ip address 172.18.2.1 255.255.255.128
 
switch-dell-powerconnect-1(config-if-vlan270)#routing
 
switch-dell-powerconnect-1(config-if-vlan270)#routing
 
switch-dell-powerconnect-1(config-if-vlan270)#no ip redirects
 
switch-dell-powerconnect-1(config-if-vlan270)#no ip redirects
Line 52: Line 89:
 
switch-dell-powerconnect-1(config-if-ch2)#exit
 
switch-dell-powerconnect-1(config-if-ch2)#exit
 
switch-dell-powerconnect-1(config)#interface port-channel 4
 
switch-dell-powerconnect-1(config)#interface port-channel 4
switch-dell-powerconnect-1(config-if-ch4)#switchport general allowed vlan add 140 tagged
+
switch-dell-powerconnect-1(config-if-ch4)#switchport general allowed vlan add 140,270 tagged
 
Warning: The use of large numbers of VLANs or interfaces may cause significant
 
Warning: The use of large numbers of VLANs or interfaces may cause significant
 
delays in applying the configuration.
 
delays in applying the configuration.
Line 60: Line 97:
 
delays in applying the configuration.
 
delays in applying the configuration.
 
</pre>
 
</pre>
 +
 +
Also add 140,270 tagged to bigsky, thumper, rochefort, westmalle, orval
  
 
=== switch-hp-procurve-8.net.bx.psu.edu ===
 
=== switch-hp-procurve-8.net.bx.psu.edu ===
Line 74: Line 113:
  
 
<pre># touch /etc/hostname.aggr140001
 
<pre># touch /etc/hostname.aggr140001
 +
# echo 'bigsky.g2.bx.psu.edu mtu 9000' > /etc/hostname.aggr270001
 +
# cat /dev/null > /etc/hostname.aggr1
 
# ifconfig aggr140001 plumb
 
# ifconfig aggr140001 plumb
 +
# ifconfig aggr270001 plumb
 
# zonecfg -z main-web1
 
# zonecfg -z main-web1
 
zonecfg:main-web1> add net
 
zonecfg:main-web1> add net
 
zonecfg:main-web1:net> set physical=aggr140001
 
zonecfg:main-web1:net> set physical=aggr140001
 
zonecfg:main-web1:net> set address=128.118.250.4/27
 
zonecfg:main-web1:net> set address=128.118.250.4/27
 +
zonecfg:main-web1:net> end
 +
zonecfg:main-web1:net> set physical=aggr270001
 +
zonecfg:main-web1:net> set address=172.18.2.20/25
 
zonecfg:main-web1:net> end
 
zonecfg:main-web1:net> end
 
zonecfg:main-web1> verify
 
zonecfg:main-web1> verify
 
zonecfg:main-web1> commit
 
zonecfg:main-web1> commit
 
zonecfg:main-web1> exit
 
zonecfg:main-web1> exit
 +
# echo '172.18.2.0        255.255.255.128' >> /etc/netmasks
 +
# ifconfig aggr270001 plumb 172.18.2.20 netmask + broadcast + up
 +
# ifconfig aggr270001 addif 172.18.2.100/27 zone main-web1 up
 +
# ifconfig aggr270001 addif 172.18.2.101/27 zone main-db1 up
 
# ifconfig aggr140001 addif 128.118.250.4/27 zone main-web1
 
# ifconfig aggr140001 addif 128.118.250.4/27 zone main-web1
 
# ifconfig aggr140001:1 up
 
# ifconfig aggr140001:1 up
 
</pre>
 
</pre>
  
'''TODO:''' default route change in zonecfg?
+
See Notes above for a discussion of route problems here.
 +
 
 +
=== frisell ===
 +
 
 +
Changed interfaces in ESXi, Changed IPs in /etc/hosts, deleted public IPs from test-db1, set defrouter for test-db1
 +
 
 +
=== rochefort/westmalle/orval ===
 +
 
 +
<pre># dladm create-vlan -l aggr0 -v 140 vlan140
 +
# dladm create-vlan -l aggr0 -v 270 vlan270
 +
# ipadm create-if vlan140
 +
# ipadm create-if vlan270
 +
# ipadm create-addr -T static -a 128.118.250.XXX/27 vlan140/v4
 +
# ipadm create-addr -T static -a 172.18.2.XXX/25 vlan270/v4
 +
</pre>

Latest revision as of 15:36, 12 March 2012

Notes

Useful stuff:

  • /afs/bx.psu.edu/service/rancid/prod/var/bx/config/* contains router configs pulled from the hardware by rancid, super useful.
  • Per the HP Procurve 2510G config/admin guide, ports are set to allow jumbos when they are added to a vlan that supports jumbos, and they are set to disallow jumbos when they are removed from a vlan that supports jumbos. 3 ports on procurve-8 were not allowing jumbos and i had to move them to a different vlan and back to 270 to make them work.

Unexpected problems I encountered:

  • show run on the Cisco does not show you the commands used to create the vlans. Since int vlan id can be executed without ever having executed vlan id, the parts of show run relevant to your new vlan will look just like the parts for existing vlans, but your new vlan will not pass traffic. It's not until you execute show vlan that you will realize that your new vlan does not actually exist (the rancid config dump above does execute show vlan!). Moral: don't forget to vlan id.
  • pbs_server and pbs_mom refused to communicate in the following scenario:
    • The server has interfaces on both the old and the new networks
    • The DNS name for the server points to the server's address on the old network
    • A mom is on the new network
    • Packets go: mom -> router -> pbs_server -> mom
    • TORQUE does not like this. So you have to put the new address for the server in the mom's /etc/hosts until DNS changes.
  • Likewise with NFS, be aware of where your packets will appear to come from from multihomed hosts once DNS for the server changes.
  • The LDAP server has ACLs that limits access to the Group OU but not (the necessary attributes in) the People OU. So if you don't modify the ACLs, users work but groups don't.
  • RCC has a firewall that has to be updated if the PBS submit host (main) changes IPs and if the NFS servers change.
  • You have to think really hard about how packets are going to route once you make certain changes, e.g. DNS. And never forget that just because you send them one may does not mean they'll want to come back the same way.
  • The only interface in the main-web1 zone with a defrouter specified was still not being used as *the* default route because:
    • bigsky had an address on the old public subnet
    • main-web1 had an address on both the old and new public subnets, and defrouter specified on the new network
    • Because main-web1 had an interface on the old subnet, it was still using that default route (since it was bigsky's default route)
    • Okay, even more problems: the routing table is completely global, so because I removed bigsky and main-db1's public IPs, they had a default route of 172.18.2.1. Since main-web1 also has a 172.18.2.0/25 interface, it was just picking one or the other default route and so cyberstar connections were frequently failing. I readded bigsky and main-db1's public IPs so the 172.18.2.1 default route could go away (this is now possible since main-web1's 128.118.200.0/23 interface is down). Probably I should just get rid of the zones.
    • More on this:

Initial Configuration

Galaxy has its own subnet, this is the configuration that was done to create it:

asa

ciscoasa(config)# access-list Outside_access_in extended permit ip any 128.118.250.0 255.255.255.224 
ciscoasa(config)# route Bioinformatics 128.118.250.0 255.255.255.224 172.28.90.18 1
ciscoasa(config)# nat (Bioinformatics) 1 172.18.0.0 255.255.240.0
ciscoasa(config)# route Bioinformatics 172.18.0.0 255.255.240.0 172.28.90.18 1

switch-cisco-3750-1

switch-cisco-3750-1(config)#vlan 140
switch-cisco-3750-1(config-vlan)#name GALAXY_PUBLIC
switch-cisco-3750-1(config-vlan)#exit
switch-cisco-3750-1(config)#vlan 270
switch-cisco-3750-1(config-vlan)#name GALAXY_PRIVATE
switch-cisco-3750-1(config-vlan)#exit
switch-cisco-3750-1(config)#int vlan 140
switch-cisco-3750-1(config-if)#description GALAXY_PUBLIC
switch-cisco-3750-1(config-if)#no ip address
switch-cisco-3750-1(config-if)#exit
switch-cisco-3750-1(config)#int vlan 270
switch-cisco-3750-1(config-if)#description GALAXY_PRIVATE
switch-cisco-3750-1(config-if)#no ip address
switch-cisco-3750-1(config-if)#exit
switch-cisco-3750-1(config)#ip route 128.118.250.0 255.255.255.224 10.1.7.2
switch-cisco-3750-1(config)#ip route 172.18.2.0 255.255.255.128 10.1.7.2
Also, established connections to
172.18.0.0 0.0.15.255
had to be added to the inbound access-list.

switch-dell-powerconnect-6248-1

switch-dell-powerconnect-1(config)#vlan database 
switch-dell-powerconnect-1(config-vlan)#vlan 140
Warning: The use of large numbers of VLANs or interfaces may cause significant
delays in applying the configuration.
switch-dell-powerconnect-1(config-vlan)#vlan 270
Warning: The use of large numbers of VLANs or interfaces may cause significant
delays in applying the configuration.
switch-dell-powerconnect-1(config-vlan)#exit
switch-dell-powerconnect-1(config)#interface vlan 140 
switch-dell-powerconnect-1(config-if-vlan140)#name "GALAXY_PUBLIC"
switch-dell-powerconnect-1(config-if-vlan140)#ip address 128.118.250.1 255.255.255.224
switch-dell-powerconnect-1(config-if-vlan140)#routing
switch-dell-powerconnect-1(config-if-vlan140)#no ip redirects
switch-dell-powerconnect-1(config-if-vlan140)#exit
switch-dell-powerconnect-1(config)#interface vlan 270
switch-dell-powerconnect-1(config-if-vlan270)#name "GALAXY_PRIVATE"
switch-dell-powerconnect-1(config-if-vlan270)#ip address 172.18.2.1 255.255.255.128
switch-dell-powerconnect-1(config-if-vlan270)#routing
switch-dell-powerconnect-1(config-if-vlan270)#no ip redirects
switch-dell-powerconnect-1(config-if-vlan270)#exit
switch-dell-powerconnect-1(config)#interface port-channel 2
switch-dell-powerconnect-1(config-if-ch2)#switchport general allowed vlan add 140,270 tagged
Warning: The use of large numbers of VLANs or interfaces may cause significant
delays in applying the configuration.
switch-dell-powerconnect-1(config-if-ch2)#exit
switch-dell-powerconnect-1(config)#interface port-channel 4
switch-dell-powerconnect-1(config-if-ch4)#switchport general allowed vlan add 140,270 tagged
Warning: The use of large numbers of VLANs or interfaces may cause significant
delays in applying the configuration.
switch-dell-powerconnect-1(config)#interface port-channel 1
switch-dell-powerconnect-1(config-if-ch1)#switchport general allowed vlan add 270 tagged
Warning: The use of large numbers of VLANs or interfaces may cause significant
delays in applying the configuration.

Also add 140,270 tagged to bigsky, thumper, rochefort, westmalle, orval

switch-hp-procurve-8.net.bx.psu.edu

switch-hp-procurve-8(config)# vlan 270
switch-hp-procurve-8(vlan-270)# name GALAXY_PRIVATE
String GALAXY_PR... too long. Allowed length is 12.
switch-hp-procurve-8(vlan-270)# name GALAXY_PRIV
switch-hp-procurve-8(vlan-270)# tagged trk1
switch-hp-procurve-8(vlan-270)# exit

bigsky

# touch /etc/hostname.aggr140001
# echo 'bigsky.g2.bx.psu.edu mtu 9000' > /etc/hostname.aggr270001
# cat /dev/null > /etc/hostname.aggr1
# ifconfig aggr140001 plumb
# ifconfig aggr270001 plumb
# zonecfg -z main-web1
zonecfg:main-web1> add net
zonecfg:main-web1:net> set physical=aggr140001
zonecfg:main-web1:net> set address=128.118.250.4/27
zonecfg:main-web1:net> end
zonecfg:main-web1:net> set physical=aggr270001
zonecfg:main-web1:net> set address=172.18.2.20/25
zonecfg:main-web1:net> end
zonecfg:main-web1> verify
zonecfg:main-web1> commit
zonecfg:main-web1> exit
# echo '172.18.2.0        255.255.255.128' >> /etc/netmasks
# ifconfig aggr270001 plumb 172.18.2.20 netmask + broadcast + up
# ifconfig aggr270001 addif 172.18.2.100/27 zone main-web1 up
# ifconfig aggr270001 addif 172.18.2.101/27 zone main-db1 up
# ifconfig aggr140001 addif 128.118.250.4/27 zone main-web1
# ifconfig aggr140001:1 up

See Notes above for a discussion of route problems here.

frisell

Changed interfaces in ESXi, Changed IPs in /etc/hosts, deleted public IPs from test-db1, set defrouter for test-db1

rochefort/westmalle/orval

# dladm create-vlan -l aggr0 -v 140 vlan140
# dladm create-vlan -l aggr0 -v 270 vlan270
# ipadm create-if vlan140
# ipadm create-if vlan270
# ipadm create-addr -T static -a 128.118.250.XXX/27 vlan140/v4
# ipadm create-addr -T static -a 172.18.2.XXX/25 vlan270/v4