irix Main Homepage About Colocation Connectivity Internet Access Newsroom Contact Us
Technical Details

All you need to know about irix: The Regional Internet Exchange for Kuching, Sarawak and Borneo.


irix Topology

irix is an internet exchange located at one colocation facility in Kuching. The site is equipped with access devices to enable connections to the irix infrastructure. Colocation services for connected networks are available in the data centre facility, and are offered seperately from the internet services.

See irix Colocation →

The current implementation of the irix peering platform uses an MPLS/VPLS infrastructure. This setup allows for a resilient and highly scalable infrastructure inherent to MPLS, while at the same time the interface towards the members and customers is still the common shared Layer 2 Ethernet platform.

Networks connect with either Gigabit Ethernet (GE), 10 Gigabit Ethernet (10GE), 100 Gigabit Ethernet (100GE) or multiples of these on the access device. All connections are terminated on Extreme SLX routers.

Introductions

The current irix setup for layer 1 is visualised below.

irix Topology


Interface & Cabling Specification
Interface Type Cable Connector (2) Wavelength Max. Distance (1) Tx (dBm) (4) Rx (dBm)
1000BaseLX (3) SM SC/PC 1310 nm 5 km -9.5 ... -3.0 -20.0 ... -3.0
10GEBase-LR SM SC/PC 1310 nm 10 km -8.0 ... 0.5 -14.4 ... -0.5
100GBase-LR4 SM SC/PC 1310 nm 10 km -4.5 ... 4.5 -8.6 ... 4.5
  1. This concerns ‘customer cable’, i.e. the maximum distance between customer equipment and the irix patch panel.
  2. The type depends on the housing site; certain MMRs contain panels with LC/PC connectors.
  3. Brocade/Foundry LX = Cisco LH
  4. as measured at the irix patch panel

Port Security
Network Loops

The greatest danger to any Ethernet network consists of loops. Unless countermeasures are taken, a loop will instantly bring down any L2 network. For example, broadcast frames are looped back to the network, creating duplicates and loading the CPUs of all connected equipment. This, in turn, can lead to a self-sustaining broadcast storm as each broadcast frame is received on all other ports and sent out once again.

Mitigation via Port Security

irix uses a different technology to combat network loops: Layer 2 access control lists. This feature limits the amount of MAC addresses that can be learned behind a port, and drops frames with any other source MAC address than the original configured one(s).

Implementation

The irix Connection Agreement allows for connecting one router to a port sold to a member/customer. Only the customer’s MAC address is allowed on the port; no frames with different source MAC addresses are allowed to enter the platform. L2 ACLs prevent several potentially crippling network loops affecting the switching fabric.

MAC Address Changes

If a MAC address change is needed, please be advised that you can replace the existing one, or even temporarily add a second MAC address, via our web portal. We recommend you do that a few hours in advance, so the L2 ACLs can be updated on time. Should you need any assistance or have an emergency, you can always contact irix NOC by email or telephone for immediate resolution.

Port Flap Dampening

In addition to port L2 ACLs, irix also implements port flap dampening on all customer facing interfaces. If a port transitions from an Up to a Down state and back more than three times in five seconds, the port is disabled. After ten seconds it is automatically re-enabled.


Quality Statement

irix offers high quality interconnection services on a technologically advanced and highly resilient platform supported by a professional organisation. In practice this means we are offering carrier grade service levels. This Quality Statement does not have penalty schemes associated with it.

Service Demarcation

irix is responsible for the correct functioning of its distributed switching infrastructure. The irix service consists of delivering, operating and interconnecting member ports on our switches, including service from these member ports up to and including the local irix patch panel. The customer is responsible for the necessary cabling between the member’s router and the irix switch patch panel through the arrangements made with an irix co-location or, in case of a remote layer-2 connection, arrangements made with a partner/carrier. The customer is always responsible for arranging their own BGP peering with other irix members and for the correct functioning of their own infrastructure, i.e. router equipment.


Service Demarcation

Service Delivery

The initial provisioning of an IXP customer connection (i.e. port on the IXP switching platform) will take a maximum of five (5) working days (Monday-Friday 9.00-18.00 CET).

Upon completion initial provisioning, the port will be placed in the quarantine VLAN. This allows the IXP customer to physically install/configure their router and other equipment at the housing location(s), finalise the cabling arrangements with the colocation or layer-2 service provider and subsequently verify basic (L1/L2 and ping) connectivity to the IXP platform.

Also, this stage of the process allows the irix NOC to verify that the IXP customer’s equipment is configured according to the IXP ‘Allowed Traffic’ rules. Once this is done and the irix NOC has concluded that the interface is ready for production, the interface is placed into the appropriate VLAN.


Connection changes:

  • For changes in the configuration without contractual implication, we schedule a provisioning time of 3 working days.
  • The customer can always indicate their own envisioned date of delivery of the IXP customer port, which irix will honour as much as possible.

Network & Service Availability

The aim of the irix Network Operations Centre is to have a network availability of at least 99.99%. irix considers both service interruption as well as deterioration of service as service failure.

Excluded from this definition are service failures due to:

  • Scheduled maintenance
  • Violations of irix regulations by members causing dis-functioning of the exchange
  • Force majeure

Service deterioration is defined as not performing according to the set performance parameters outlined below.

Trouble Ticket Support

Our Network Operations Centre actively monitors the irix infrastructure 24 hours/day, 7 days/week. Problems can be reported to the irix NOC via email or telephone.

When a problem is reported, the irix NOC opens a trouble ticket and assigns an engineer to resolve the problem. The customer is kept up to date of resolution by email. In exceptional cases, e.g. when a customer cannot be reached via email because of the reported network failure, the NOC can agree to keep the customer’s staff up-to-date by phone instead.

In case of service failure (disruption or deterioration) we aim to resolve within 4 hours of reporting. Other issues or requests will be resolved as soon as possible. A ticket will not be closed without the customer’s consent.

In case a customer feels there is a need to escalate a problem, the request is relayed to our Chief Technical Officer.

All trouble tickets can be reviewed through the member portal on the irix website. In many cases, problems are discussed on our tech-l interactive mailing list to which the irix NOC and all customers’ technical contacts are (or can be) subscribed.

Maintenance

All Equipment and Software is maintained and supported by irix. 1st line support is available 24x7 by phone and email, with escalation to 2nd line, if necessary.

Communication with the Technical support is conducted in English.

To ensure the required Quality of Service and facilitate continuous growth, the irix platform is maintained on a day-to-day basis and upgraded regularly. Such upgrades are always carried out during scheduled maintenance windows, between 00:00 and 06:00 local time.

Any scheduled maintenance is announced to the tech-l mailing list with at least 72 hours’ notice.

Scheduled maintenance is the time window during which work is being done to fix or improve the IXP platform, and as a result the platform may not be performing at the usual quality level.

In addition to the above, it may occur that any equipment needs to be replaced immediately, because of hardware or software malfunctioning detected by the irix NOC. In such cases the replacement work may involve so-called Emergency Scheduled Maintenance, which will also be announced to the tech-l mailing list; however, it may not be announced well in advance. This, of course, is due to the immediate nature of the required repair activity and is always up to the discretion of the irix technical team.


Allowed Traffic

Allowed Traffic Types on Unicast Peering LANs.

Important: The irix NOC reserves the right to disable ports that violate the rules below.

To ensure smooth operation of the irix infrastructure we impose a set of restrictions on what kind of traffic is allowed on the peering fabric. This page gives a summary of those restrictions. For more info, including hints on how to configure equipment, please see the irix Configuration Guide.

MAC Layer

Ethernet framing

The irix infrastructure is based on the Ethernet II (or “DIX Ethernet”) standard. This means that LLC/SNAP encapsulation (802.2) is not permitted. For more information on the differences, see the Ethernet FAQ, question 4.1.2.2 Ethernet typesFrames forwarded to irix ports must have one of the following ethertypes:

  • 0x0800 - IPv4
  • 0x0806 - ARP
  • 0x86dd - IPv6

One MAC address per connection

Frames forwarded to an individual irix port shall all have the same source MAC address.

No proxy ARP

Use of proxy ARP on the router’s interface to the Exchange is not allowed.

Unicast only

Frames forwarded to irix ports shall not be addressed to a multicast or broadcast MAC destination address except as follows:

  • broadcast ARP packets
  • multicast ICMPv6 Neighbour Discovery, Neighbour Solicitation, and MLD packets. Please note that this does not include Router Solicitation or Advertisement packets.

No link-local traffic

Traffic related to link-local protocols shall not be forwarded to irix ports. Link-local protocols include, but are not limited to, the following list:

  • IRDP
  • ICMP redirects
  • IEEE 802 Spanning Tree

Vendor proprietary protocols. These include, but are not limited to:

  • Discovery protocols: CDP, EDP, LLDP etc.
  • VLAN/trunking protocols: VTP, DTP
  • Interior routing protocol broadcasts (e.g. OSPF, ISIS, IGRP, EIGRP)
  • BOOTP/DHCP
  • PIM-SM
  • PIM-DM
  • DVMRP
  • ICMPv6 ND-RA
  • UDLD
  • L2 Keepalives

The following link-local protocols are exceptions and are allowed:

  • ARP
  • IPv6 ND

IP Layer

No directed broadcast

IP packets addressed to irix peering LAN’s directed broadcast address shall not be automatically forwarded to irix ports.

No-export of irix peering LAN

IP address space assigned to irix Peering LANs must not be advertised to other networks without explicit permission of irix.

Application Layer (TCP/IP Model)

Using Application layer protocols to unleash malicious actions against other irix customers over irix infrastructure, is forbidden. irix reserves the right to disable a customer’s port in case of complaints of attacks/abuse originating from such customers. The following list includes, but not limited to, some very well-known attacks that we forbid:

  • BGP hijacking
  • DNS amplification/flood
  • HTTP flood
  • NTP amplification
  • UDP flood
  • ICMP flood
  • Simple Service Discovery Protocol (SSDP)

Did you experience or notice a customer abusing their irix connection for malicious actions?
Contact Us →

Please get in touch to file a complaint providing information about:

  • The timestamp of the event
  • The type of the event
  • The related prefixes/ASNs
  • The parties involved
  • Any other relevant information providing appropriate context.

Typically, this information can be found in (but is not limited to) router logs, syslog servers, packet captures, BGP monitoring services.

irix will investigate to confirm the complaint and take appropriate action.


Configuration Guide

How to set up your device when connecting to irix? Here are some pointers to start with.

This article will tell you how to prevent flaps from influencing your session and how to configure your interface towards irix to only send allowed traffic towards the exchange.

Introduction

irix operates as a shared Layer 2 (L2) Ethernet infrastructure. Large Ethernet LANs require that more or less everyone plays by the same set of rules. In other words, they can be quite sensitive to misbehaviour.

In order to improve the stability of the Exchange, irix has defined a set of rules to which every member’s connection must adhere.

It is not always easy to immediately grasp the subtleties of configuring equipment to adhere to the rules. Let us help you fill in some blanks and provide examples and hints for the most common equipment.

Technical Specifications →

Definition of Terms

In this document we refer to terms like ‘L2 device’, ‘L2/L3 hybrid’, etc. Here are our definitions:

  • L2 Device
    A device that functions as a Layer 2 (Ethernet) Bridge (a.k.a. ‘switch’, ‘bridge’, ‘hub’, etc).
  • L3 Device
    A device that functions as a L3 (IP) router only. This means it does not bridge any Ethernet frames between its interfaces. Such a device is typically called a ‘router’.
  • L2/L3 Hybrid
    A device that functions both as a L2 bridge and a L3 router. This means it can both bridge Ethernet frames between its interfaces as well as route IP traffic and participate in IP routing protocols. Foundry/Brocade, Force10 and Extreme are common examples of this type of device.

irix Topology

The irix network is built as a redundant (virtual) hub & spoke topology using Extreme Networks (formerly Brocade) switches.

Customers up to 1GE are directly connected to Extreme SLX switches, available at each location. One can connect with 1 (or multiple) GE via singlemode fiber. Fiber connections are supported using LX optics.

10GE customers connect to the irix platform via Extreme SLX switches. The 10G Ethernet access switches are locally available at each location and one can connect with either ER or LR optics.

General 10GE Specifics

Most vendors implement specific commands to ensure BGP ignores such events (see ‘10GE specifics’ in the respective vendor sections for Cisco, Force10. Foundry/Brocade and Juniper configurations). If your router platform does not support such a feature, we advise you to configure the equivalent of:

  • no bgp fast-external-fallover
  • To ignore link flaps and wait for the BGP hold timers to expire before resetting sessions.

General Configuration Recommendations

This is what we recommend:

IPv4 ARP / IPv6 Neighbour Timeout

Each equipment vendor implements its own maximum ages for the IPv4 ARP and IPv6 neighbor caches. The values vary widely and in at least one case (Linux) it is not a constant.Low ARP timeouts can lead to excessive ARP traffic, especially if the values are lower than the BGP KEEPALIVE interval timers. On the other hand, long timeouts can theoretically lead to longer downtime if you change equipment (since your peers still have the old MAC address in their ARP cache). With BGP this is unlikely to happen because your router will start re-establishing BGP sessions as soon as it is back up, causing its peers to update their ARP cache as well.

We recommend setting the ARP cache timeout to at least two hours, preferably four (240 minutes). See the sections on specific equipment vendors for examples.

Peering LAN Prefix

The IPv4 prefix for the irix peering LAN (202.88.42.0/24) is part of AS131329, and is not supposed to be globally routable. This means the following:

  • Do not configure ‘network 202.88.42.0/24’ in your router’s BGP configuration (seriously, we have seen this happen!).
  • Do not redistribute the route, a supernet, or a more specific outside of your AS. We (AS131329) announce it with a no-export attribute, please honour it.

In short, you can take the view that the Peering LAN is a link-local address range and you may decide to not even redistribute it internally (but in that case you may want to set a static route for management access so you can troubleshoot peering, etc.).

BGP Routing

Please exchange only unicast routes over your BGP sessions in the ISP peering LAN. Exchanging multicast routes is useless since multicast traffic is not allowed on the (unicast) ISP peering LAN.

Allowed Traffic Types and Configurations

The Technical Specifications state the following:

There are only three ethertypes allowed:

  • 0x0800 - IPv4
  • 0x0806 - ARP
  • 0x86dd - IPv6

This implies IEEE 802.3 compliance, not 802.2, so no LLC encapsulation!

Only one MAC address allowed on a port, i.e. all frames sent towards the irix should have exactly one unique MAC address.

The only non-unicast traffic allowed is:

  • Broadcast ARP
  • Multicast ICMPv6 Neighbour Discovery (ND) packets. (NOTE: this does not include Router Advertisement (ND-RA) packets!)

irix member equipment should only reply to ARP queries for IP addresses of their directly connected irix interface. In other words, proxy ARP is not allowed.

Traffic for link-local protocols is not allowed, except for ARP and IPv6 ND (see above).

IP packets addressed to irix peering LAN’s directed broadcast address shall not be automatically forwarded to irix ports.

The speed and duplex setting of 10baseT and 100baseTX ports must be statically configured, i.e. auto- negotiation should be disabled.

The irix platform is designed to carry Ethernet frames with a payload of up to 1500 bytes. MTU settings must be configured accordingly.

Physical L2 Topology

The irix rules dictate that only one MAC address is allowed behind a port. This means that you have to be extremely careful when connecting a device that can act as a L2 device.

We allow only one MAC address because we allow no additional devices behind the irix ports. Extended L2 networks are not under the control of irix, but instabilities in a L2 network behind the irix switches can and typically do have a negative impact on the whole exchange. Forwarding loops and spanning tree topology changes are good examples of this. By enforcing the one-MAC-address-per-port rule, we effectively prevent forwarding loops and STP traffic from intermediate L2 devices.

In short, an intermediate L2 device may only bridge frames from the member’s router to the irix port (so we see only one MAC address) and should otherwise be completely invisible. No connected device should bridge frames from other devices onto irix, or talk STP on its irix interface.

- Connecting a L3 Device

The most preferred way of connecting to irix is directly through a L3 device (router), see the diagram below. This is your best chance of not leaking MAC addresses or STP traffic and it greatly increases the stability of the network.

Allowed Traffic Types and Configurations

- Connecting Through a L2 Device

We neither recommend nor encourage connecting your router through a L2 device, but if you do so, keep the following in mind:

  • You must make absolutely sure that only traffic to/from your L3 router’s interface goes to/from the irix port.
  • You must make absolutely sure that all legitimate traffic to/from your L3 router’s interface goes to/ from the irix port.
  • MLD snooping may block legitimate ICMPv6 neighbour solicitations.
  • You must disable spanning tree on your link to irix.

Allowed Traffic Types and Configurations

On all intermediate L2 devices, consider using explicitly defined port-based VLANs for production ports. It forces you to understand your topology and reduces the chances of a nasty surprise further down the road. In particular, we strongly recommend using a dedicated VLAN for the path from your router to irix.

Allowed Traffic Types and Configurations

On a L2/L3 hybrid device, it is a good idea to put the irix connected interface (untagged) in a separate (non-default) port-based VLAN without spanning tree and with no other ports in it. This is the best way to ensure that no traffic from other ports will be bridged onto the irix port.

Commonly Seen Illegal Traffic and Setup

Any traffic other than the types mentioned in the previous section is deemed to be illegal traffic. In this section we will list some of the more common types of violations we see at irix and give some arguments as to why it is considered unwanted.

- Multiple MAC addresses

Since irix operates on the principle of one router per port, there should be one MAC address visible behind each port. Some members connect through intermediate switches, or use a L2/L3 hybrid device. If these devices are not configured properly, they can cause forwarding loops, STP instabilites, and lots of unwanted traffic on the exchange. There is no excuse for these devices to leak traffic, and there is no necessity to talk STP on the link to irix. Hence, by enforcing the one-MAC-address rule, we also enforce these issues. Beware that this rule is enforced automatically, so if you leak traffic from another MAC address, your legitimate traffic may be blocked (depending on which MAC address the switch sees first)!

- Spanning Tree (STP)

This point is closely related to the previous point. The device(s) connected to the irix port are not allowed to be visible as L2 bridges. This means that they should not speak STP (spanning tree) or any other (proprietary) L2 specific protocol.

- Routing protocols: EIGRP, OSPF, RIP, IS-IS

The only routing protocol allowed on irix is BGP. There is no valid reason for interior routing protocols to appear on the shared medium. These protocols only cause unnecessary multicast and broadcast traffic.

- (Cisco) Keepalive

By default Cisco routers and switches periodically test their (Fast) Ethernet links by sending out Loopback frames (ethertype 0x9000) addressed to themselves. Call it a ‘L2 self-ping’ if you will. In a switched environment it can be used to test the functionality of the switch and/or keep the router’s MAC address in the switch’s address table.

In the irix environment, this is not useful since we use MAC timeouts that are larger than the typical BGP and/or ARP timeouts. In fact, the keepalives a may actually cause port security violations if they are being sent by an intermediate switch.

- Discovery protocols: CDP, EDP, LLDP

Various vendors (e.g. Extreme, Cisco) tend to ship their boxes as gregarious devices: by default they announce their existence out of all their interfaces and try to find family members. CDP (Cisco) and EDP (Extreme) are examples of this, but there are others.

The only reason for running discovery protocols is to support certain types of autoconfiguration. Autoconfiguration on an Internet Exchange is a very bad idea. Hence, there is absolutely no reason to run discovery protocols on your irix interface. Discovery protocols typically cause unwanted broadcast or multicast traffic.

- Non-unicast IPv4: IGMP, DHCP, TFTP

On the ISP peering LAN, the only non-unicast traffic that is allowed is the ARP query.Sometimes we see equipment trying to get a configuration through broadcast TFTP, or configure themselves through DHCP. These options are unsafe and we strongly advise against them.Other equipment has IGMP turned on by default (or by accident). The Peering LAN is for unicast IP traffic only, so there is no point in configuring multicast on the irix interface.

- Proxy ARP

Since traffic over irix is exchanged based on BGP routes, there is no reason to answer ARP queries for any other IP address(es) than those that are configured on your irix interface.Unfortunately, some vendors (e.g. Cisco) ship their products with proxy ARP enabled by default.Proxy ARP is not only sloppy, it can lead to unwanted traffic on your network. Consider that if you have it enabled at irix, it’s likely to be enabled at other peering points, allowing parties on both sides to use you as a transit.Proxy ARP is not allowed.

- Non-unicast IPv6: IPv6 ND-RA

IPv6 router advertisements are not allowed: they generate a lot of unnecessary traffic, since IPv6 hosts on irix are not autoconfigured and besides, you don’t want to be the default router for irix as a whole.

- Miscellaneous non-IP: DEC MOP, etc.

Some vendors enable protocols other than IP by default. Cisco, for example ships certain versions of IOS with DEC MOP enabled by default. This is non-IP traffic and has no place on irix.

Cisco Configuration Hints

Cisco’s philosophy seems to be similar to that of some PC OS vendors: enable as many protocols and features as possible by default, so the device works out-of-the-box in most situations. Unfortunately, this means that a lot of unnecessary features are turned on that, while harmless in LAN or corporate environments, can cause undesired traffic on an Internet Exchange.Typical things that need to be disabled are: autoconfiguration protocols (DHCP, BOOTP, TFTP config download over the irix interface), CDP, DEC MOP, IP redirects, IP directed broadcasts, proxy ARP, IPv6 Router Advertisements, keepalive.

Intermediate switches or hybrid devices will also need to disable VTP, STP, etc.



Global Config

Global configuration

! Do not run a DHCP server/relay agent 
no service dhcp
! Older IOS versions require this instead of the above. 
no ip bootp server
! Do not download configs through TFTP 
no service config
! Do not run CDP 
no cdp run				


Interface Config

Interface configuration

! Don’t do redirects -- if they don’t know
! how to route properly, tough luck! no ip redirects
! Don’t run proxy ARP on your irix interface
no ip proxy-arp
! Don’t run CDP on your irix interface
no cdp enable
! Directed broadcasts are evil.
no ip directed-broadcast
! Disable the DEC drek if you haven’t done so globally yet.
no mop enable
! For (Fast)Ethernet: no auto-negotiation on your connection.
! no negotiation auto
! duplex half
duplex full
! L2 keepalives are useless on the irix
no keepalive


Layer 2 Config

It is difficult to give a complete guide for Cisco products, because of the many different types of devices and (IOS) software versions. When in doubt, consult your documentation.

- 29xx and 35xx Series

If you use a Cisco Layer 2 device (such as the 2900 and 3500 series), you have to turn off VTP (VLAN Trunking Protocol), DTP (Dynamic Trunking Protocol), LLDP, and UDLD.

In global config mode: vtp mode transparent
!
no spanning-tree vlan 1200
! If you don’t need LLDP, disable globally
no lldp run
! If you don’t need CDP, disable globally
no cdp run
!
vlan 1200
name irix
!
interface /IfIdent/ description Interface to irix
switchport access vlan 1200
switchport mode access
switchport nonegotiate
no keepalive
speed nonegotiate
no udld enable
! If CDP has not been disabled globally:
no cdp enable
! If LLDP has not been disabled globally:
no lldp receive
no lldp transmit
! If you do not want to shut off STP:
spanning-tree bpdufilter enable
end


- 7600 Series

Members are advised not to run 12.2(33)SRC on their Cisco 7600’s with a sup720. This software release does not always send or forward replies to solicit requests, even if it’s acting as a pure Layer 2 switch between a member router and the irix fabric.

To make a Cisco 7600 switch ‘silent’ the following configuration seems to work:

no service dhcp no ip bootp server
vtp mode transparent
spanning-tree mode pvst
spanning-tree extend system-id
no spanning-tree vlan XX
!
vlan XX
name irix
exit
!
interface GigabitEthernet6/0/0
description to-irix switchport
switchport access vlan XX
switchport mode access
switchport nonegotiate
no mls qos trust
no cdp enable
spanning-tree bpdufilter enable
exit
!

Vlan XX was also removed from the ‘allow list’ on all dot1q trunk ports not related to the setup, in this case every dot1q trunk port in the chassis.


- Catalyst 6500 Series

CatOS and IOS are different beasts, so for Catalyst switches, the following applies:

set vtp mode off
set port name /IfIdent/ My irix Port
set cdp disable /IfIdent/
set udld disable /IfIdent/
set trunk /IfIdent/ off dot1q
set spantree bpdu-filter /IfIdent/ enable
set vlan 1200 name My_irix_Vlan
set vlan 1200 /IfIdent

If, for some reason, you cannot afford to turn off VTP globally, the only way to turn it off on individual ports seems to be by using l2pt:

set port l2protocol-tunnel /IfIdent/ vtp enable

Depending on your CatOS platform, you may or may not be able to do this.


- CRS (IOS-XR)

CDP, Proxy ARP, Directed Broadcast, Link Auto Negotiation, and ICMP redirects* are disabled by default in IOS-XR. ICMP redirect messages are disabled by default on the interface unless the Hot Standby Router Protocol (HSRP) is configured.

- Other Devices

For other devices, some or all of the above may apply. Check your documentation for details.

Cisco Aggregated Links

- Catalyst 6500 Series

Configure the port-channel as on, or should you want LACP, as active. Please do not not configure any forms of negotiate or desirable as the irix switches do not speak PAgP.

Load-balancing over four ports may result in an unequal distribution due to bug CSCsg80948. Here is an example configuration:

interface GigabitEthernet1/1
description irix Link 1
no ip address
no ip redirects
no ip proxy-arp
no keepalive
no cdp enable
channel-group 1 mode on
!
interface GigabitEthernet1/2

description irix Link 2
no ip address
no ip redirects
no ip proxy-arp
no keepalive
no cdp enable
channel-group 1 mode on
!
interface Port-channel1
description irix aggregated link
ip address 202.88.42.x 255.255.255.0
no ip redirects
no ip proxy-arp
no keepalive
!palive
!


Here are examples of LACP configurations:

Cisco IOS 65xx/76xx:
interface GigabitEthernet1/1
description irix Link 1
channel-group 10 mode active
! (12.2(18)SXF2 or (12.2(33)SRC) upwards)
lacp rate fast
!
interface GigabitEthernet1/2
description irix Link 2
channel-group 10 mode active
!
interface Port-channel10
description irix aggregated link
no switchport
ip address 202.88.42.x 255.255.255.0
!


Cisco IOS-XR:


interface Bundle-Ether 10
description irix aggregated link
ipv4 address 202.88.42.x 255.255.255.0
!
interface GigabitEthernet 1/0/0/0
description irix Link 1
bundle-id 10 mode active
! (3.2 upwards)

lacp period short
!
interface GigabitEthernet 1/0/1/0
description irix Link 2
bundle-id 10 mode active
!
(don’t forget to commit)


Cisco NX-OS:


feature lacp
!
interface ethernet 2/1
description irix Link 1
channel-group 10 mode active
lacp rate fast
!
interface ethernet 2/2
description irix Link 2
channel-group 10 mode active
!
interface port-channel 10
description irix aggregated link
ip address 202.88.42.x 255.255.255.0
!


- GSR Series

Do not set a static MAC address on the Port-channel interface. This causes CEF inconsistencies and other assorted failures.Link aggregation and IPv6 do not seem to play well together. Cisco advises against trying this.

Some changes will result in a different MAC address getting chosen for the aggregated link (likely such as reloading a linecard, if it contains the first port in the bundle). This will keep your ports dysfunctional due to port security on the irix switches and you will have to contact the irix NOC in such cases to fix this.

Some restrictions apply to what features are supported on link bundles (e.g. sampled NetFlow only on ISE/Engine4+; no uRPF). Also not all line cards support link bundling, and if traffic towards irix comes in on such an interface you will experience suboptimal load-balancing. Please see the Cisco documentation for more details.

Support for link bundling on Engine 5 linecards will come in 12.0(33)S.Cisco Engineering have a special train called ‘Phase 3’ (lb-eft-ph3) that is purported to also provide functionality such as MAC address accounting for Port-Channel interfaces. This seems to have been integrated into 12.0(32)S, but IPv6 does not seem to be supported yet.

Below follows a list of Cisco Bug IDs (ddts) related to link aggregation that you need to consider when choosing an appropriate IOS image

CSCee27396
present in 12.0(26)S1; fixed in 12.0(26)S3, 12.0(27)S2, 12.0(28)S1, 12.0(30)SSymptoms: Over 90% CPU usage by CEF Scanner on all linecards and %TFIB-7-SCANSABORTED errors occur when configuring a link bundle. Also, the router sends traffic to MAC addresses taken from its ARP table seemingly at random, instead of to the appropriate next-hop’s MAC address.

CSCef12828 present in post-CSCee27396; fixed in 12.0(26)S4, 12.0(27)S3, 12.0(28)S1, 12.0(30)SSymptoms: When traffic passes through a router, the router blocks traffic for certain prefixes behind a port-channel link.

CSCdz33664 present in 12.0(25)S3, 12.0(26)S1, 12.0(27)S2, 12.0(28)S; fixed in 12.0(25)S4Symptoms: An HSRP state change on any Engine2 interface causes a microcode bundle flap on all other Engine2 linecards, preventing load balancing to work due to vanilla microcode getting loaded.

CSCee81071 present in 12.0(26)S3, 12.0(27)S2, 12.0(29)SSymptoms: Router sends Ethernet frames with a source MAC address of beef.f00d.beef and destination MAC address f00d.beef.f00d (which is the pattern scribbled in unallocated memory in linecards), with what looks to be a legitimate payload of transit traffic. This is one of the symptoms of CSCee27396

CSCeb38014 present in 12.0(26)S5; fixed in 12.0(26)S5, 12.0(27)SSymptoms: The BGP Router process flushes the BGP tables for each peer when you change one neighbor’s description. This pegs the GRP CPU at 99% for quite a while.

CSCeg31951 present in 12.0(31)S; fixed in 12.0(31)S2 (CSCei53226)

Symptoms: IOS (at least in the PRP code) places each individual public peer in its own update-group if remove-private-as is configured on a peer. Needless to say, this scales badly for a router connected to an Internet exchange. (Try ‘show ip bgp replication’.) A collection of hearsay follows for recent IOS images for the GSR PRP regarding link aggregation. irix does not run any GSRs. Please take this information with appropriately-sized grains of salt.

  • 12.0(24)S2 is not advisable (not many specifics known but they include CSCef89562 and CSCee33045)
  • 12.0(24)S6 boots but load-balancing is completely off 12.0(25)S until S3 have CSCdz33664
  • 12.0(26)S until S4 have CSCef89562, where Engine4+ linecards can have continuously flapping interfaces, but is also somewhat required for Quadra linecards
  • 12.0(26)S3 has CSCee27396 integrated but not CSCef12828, which leads to traffic blackholing 12.0(27)S until S3 have CSCef89562 as well
  • 12.0.(27)S1 has a problem where it sends traffic to random destinations 12.0(27)S2 has CSCee27396 integrated but not CSCef12828
  • 12.0(27)S4 reportedly works reasonably well on PRP2s
  • 12.0(28)S1 has problems with Engine2 linecards (CSCef78098) and Engine4+ (CSCef89562)
  • 12.0(28)S2 reportedly works better but still sometimes emits beef.f00d.beef frames on normal ports with only an IPv6 address configured
  • 12.0(30)S has only been observed to exhibit CSCef12828-like symptoms in conjunction with broken hardware, and also (sometimes) to still emit frames from MAC beef.f00d.beef.

Routers occasionally still send out frames with beef.f00d.beef as MAC source address on interfaces with an IPv6 but no IPv4 address configured, even on regular links.

Due to the massive amount of feature requests there will be both a 12.0(32)S and a new 12.0(32)SY train.

You can check for incorrect next-hops by attaching to the linecard and executing show controllers rewrite and show adjacency internal and comparing the two rewrite strings for a certain peer’s IPv4 address (suffix the commands with | begin 80.249.20a.b). The first six bytes of the returned long hex string should be the peer’s MAC address, and equal for all three occurrences. An example configuration follows:

!
interface Port-channel1
description irix Aggregated Link
ip address 202.88.42.x 255.255.255.0
no ip redirects
no ip directed-broadcast
no ip proxy-arp
channel-group minimum active 1
no channel-group bandwidth control-propagation
hold-queue 150 in
!
interface GigabitEthernet1/2/1
no keepalive
no negotiation auto
channel-group 1
no cdp enable
!
interface GigabitEthernet1/2/2
no keepalive
no negotiation auto
channel-group 1
no cdp enable
!

Specifying a value is optional, but setting it to the amount of ports in an aggregated link multiplied by 75 is advised.show interfaces Port-channel 1 will display keepalives enabled even though they are not; also, the BIA (burnt-in address, shown as 0000.0000.0000) can be ignored.

If you disable autonegotiation on Gigabit Ethernet ports
Contact us →


- CRS (IOS-XR)

interface Bundle-Ether1
description Aggregated interface to irix Peering LAN
ipv4 address 202.88.42.x 255.255.255.0
ipv6 nd suppress-ra
ipv6 address 2400:1560:6::A50a:bcde:1/64
ipv6 enable
bundle minimum-active links 1
!
interface TenGigE0/0/0/0
description interface to irix Peering LAN #1
bundle id 1 mode on
!
interface TenGigE0/0/0/1
description interface to irix Peering LAN #2
bundle id 1 mode on
!


Cisco 10GE Specifics

IOS supports no bgp fast-external-fallover and event dampening . The no bgp fast external-fallover tells the device to not act immediately on link flaps but to wait for the BGP hold timers to expire before resetting sessions.

Newer versions of Cisco IOS even support ip bgp fast-external-fallover deny in a per-interface context. Note that in practice we have found that the previously advised carrier-delay does not work as expected on Cisco equipment. We suggest you disable fast-external-fallover instead.

In IOS-XR, to disable BGP Fast External Failover globally, add bgp fast-external-failover disable to your global bgp configuration.


IPv6 Config

! disable ICMPv6 multicast listener reports
no ipv6 mld router
! disable IPv6 multicast forwarding
no ipv6 mfib forwarding
! v6 ND-RA is unnecessary and undesired
ipv6 nd suppress-ra
! on IOS version 12.2(33)SRC it is the following syntax:
ipv6 nd ra suppress
! on even more later IOS/IOS-XE versions the “all” option is needed to also
! suppress responses to Router Solicitation messages besides periodic RAs:
ipv6 nd ra suppress all
! disable PIM on a specified interface
no ipv6 pim
! disable MLD snooping on hybrid devices and intermediate layer-2 devices
no ipv6 mld snooping


MTU Config

On newer Cisco IOS/IOS-XR versions, the interface IP MTU is automatically set, based on the presence or absence of 802.1q tags. For more details, please consult this document.

Extreme Networks Configuration Hints

CAUTION: Updating Firmware in an EAPS Environment

When updating firmware in an Extreme Networks EAPS environment, be sure to temporarily disable your irix port(s). TFTP file transfers may cause EAPS instabilities resulting in bogus traffic. This is likely to trip the port security on the irix switches, which may result in 10 minutes downtime.Most people who use Extreme equipment do not have problems with their irix connections, some do. We would appreciate feedback from people running Extreme equipment on how they configure their irix facing side.

If you are running Extreme equipment and would like to share your feedback
Contact Us →


L2 Configuration

The configuration fragment below shows how to configure an intermediate L2 switch, which is also part of an EAPS ring. Port 1 is connected to the irix switch. Ports 2 and 3 are in the ring. The router is somewhere in that ring, in the ‘irix’ VLAN.

create vlan “ring”
configure vlan “ring” tag 1200 # VLAN-ID=0x4b0 Global Tag 3
configure vlan “ring” qosprofile “QP8”
configure vlan “ring” add port 2 tagged
configure vlan “ring” add port 3 tagged
create vlan “irix”
configure vlan “irix” tag 1700 # VLAN-ID=0x6a4 Global Tag 9
configure vlan “irix” add port 1 untagged
configure vlan “irix” add port 2 tagged
configure vlan “irix” add port 3 tagged
configure port 1 auto off speed 1000 duplex full
configure port 2 auto off speed 1000 duplex full
configure port 3 auto off speed 1000 duplex full
disable edp port 1
disable igmp snooping
disable igmp snooping with-proxy
create eaps “ring-eaps”
configure eaps “ring-eaps” mode transit
configure eaps “ring-eaps” primary port 2
configure eaps “ring-eaps” secondary port 3
configure eaps “ring-eaps” add control vlan “ring”
configure eaps “ring-eaps” add protect vlan “irix”
enable eaps “ring-eaps”


L3 Configuration

The configuration fragment below shows the relevant configuration information for a L3-only device. As in the previous example, port 1 is connected to irix and is configured in the ‘irix’ VLAN (untagged).

#
# Config information for VLAN irix.
#
create vlan “irix”
configure vlan “irix” tag 1200
configure vlan “irix” protocol “IP”
configure vlan “irix” ipaddress 202.88.42.x 255.255.255.0
configure vlan “irix” add port 1 untagged
#
configure port 1 display-string “irix”
disable edp port 1
#
enable ipforwarding vlan “irix”
disable ipforwarding broadcast vlan “irix”
disable ipforwarding fast-direct-broadcast vlan “irix”
disable ipforwarding ignore-broadcast vlan “irix”
disable ipforwarding lpm-routing vlan “irix”
disable isq vlan “irix”
disable irdp vlan “irix”
disable icmp unreachable vlan “irix”
disable icmp redirects vlan “irix”
disable icmp port-unreachables vlan “irix”
disable icmp time-exceeded vlan “irix”
disable icmp parameter-problem vlan “irix”
disable icmp timestamp vlan “irix”
disable icmp address-mask vlan “irix”
disable subvlan-proxy-arp “irix”
configure ip-mtu 1500 vlan “irix”
#
# IP Route Configuration
#
configure iproute add blackhole default
disable icmpforwarding vlan “irix”
disable igmp vlan “irix”


Force10 Configuration Hints

There isn’t much to configure on Force10 routers. The Network Operations Guide and various pages in the Team Cymru Document Collection provide useful information on Force10 router configuration and management.

#
# Config information for VLAN irix.
#
create vlan “irix”
configure vlan “irix” tag 1200
configure vlan “irix” protocol “IP”
configure vlan “irix” ipaddress 202.88.42.x 255.255.255.0
configure vlan “irix” add port 1 untagged
#
configure port 1 display-string “irix”
disable edp port 1
#
enable ipforwarding vlan “irix”
disable ipforwarding broadcast vlan “irix”
disable ipforwarding fast-direct-broadcast vlan “irix”
disable ipforwarding ignore-broadcast vlan “irix”
disable ipforwarding lpm-routing vlan “irix”
disable isq vlan “irix”
disable irdp vlan “irix”
disable icmp unreachable vlan “irix”
disable icmp redirects vlan “irix”
disable icmp port-unreachables vlan “irix”
disable icmp time-exceeded vlan “irix”
disable icmp parameter-problem vlan “irix”
disable icmp timestamp vlan “irix”
disable icmp address-mask vlan “irix”
disable subvlan-proxy-arp “irix”
configure ip-mtu 1500 vlan “irix”
#
# IP Route Configuration
#
configure iproute add blackhole default
disable icmpforwarding vlan “irix”
disable igmp vlan “irix”

Force10 10GE Specifics

Force10 E-Series switch/routers support no bgp fast-external-fallover, BGP Graceful Restart, and a link debounce timer to maintain BGP stability during topology switchovers.The recommended option is to use the /link debounce/ command to delay link change notifications on the interface. The default for fiber interfaces is 100 ms, which is a good value to use.

Foundry / Brocade Configuration Hints

The following fragment of configuration gives an idea of how to configure a Foundry (BigIron) device. Depending on the actual role of the device (router or switch between router and irix) and the type of code loaded into the device you may need to mix and match a little here.

! Define a single-port VLAN for the irix port
vlan number name “irix” by port
no spanning-tree
untagged ethernet if
! Configure the irix interface
interface ethernet if
port-name “irix”
! Behave as a router.
route-only
no spanning-tree
! Don’t do IPv6 ND-RA (Router Advertisements)
ipv6 nd suppress-ra
! No weird discovery proto, please.
no vlan-dynamic-discovery
! IP address
ip address 202.88.42.x 255.255.255.0
! No redirects
no ip redirect
no ipv6 redirect
! irix recommends 2 hour ARP timeouts
ip arp-age 120
! For fast-ethernet: no autoconfig.
speed-duplex 100-full

On a Foundry BigIron RX, software version < 2.4, we noticed together with a customer that his device had a very aggressive default setting for ICMPv6 ND queries for known MAC addresses. It retransmitted them every second. The retransmit interval can be altered in interface context as follows:

! Set the retransmit timer to 1 hour
ipv6 nd ns-retransmit 3600

Note: This command should not be confused with ‘ipv6 nd ns-interval’, which applies to ND queries for unresolved MAC addresses.

Foundry/Brocade Aggregated Links

BigIron JetCore-based switches support link aggregation only on adjacent ports. The first port must be oddly numbered, and the other port must directly follow the first one. The same goes for any additional pairs of ports in an aggregated link.

CAUTION: On BigIron 15000 switches you cannot build trunks with ports on blade 8, or spanning ports on both sides of slot 8!

! Create an aggregate on a Jet-Core based switch
trunk server ethernet slot/port to slot/port+1

BigIron RX or NetIron MLX/XMR switches don’t have limits to port placement for aggregated links. Ports can be non-adjacent or even distributed over multiple blades. BigIron RX has a limit of 8 ports per aggregated link, NetIron MLX/XMR raise this to 16 in software 3.5.0, 32 in 3.8.0

! Create an aggregate on a RX/MLX/XMR switch
trunk ethe slot/port to slot/port ethe otherslot/otherport to otherslot/otherport

As of RX software release 2.5.0 and MLX/XMR software release 3.9.0 the link aggregation syntax changed. The configuration now looks like:

! Create a LAG on a RX/MLX/XMR switch
lag “<NAME HERE>” static
ports ethernet #/# ethernet #/# 
primary-port #/#
deploy
!

The primary-port is used as a single point of configuration. All configuration changes to the primary- port are propagated to the other ports in the lag group.

The keyword ‘static’ designates a standard aggregated link. For an LACP-enabled link, use:

! Create a dynamic LAG on a RX/MLX/XMR switch
lag “<NAME HERE>” dynamic
ports ethernet #/# ethernet #/# <and so on>
primary-port #/#
lacp-timeout short
deploy
!

Foundry/Brocade 10GE Specifics

Foundry/Brocade supports a feature called BGP Graceful Restart that, if all peers support it, will reduce the impact of prefix flaps but the CPU will still have to re-establish any flapped BGP session before the configured interval passes.The command delay-link-event can make the router ignore short link flaps (for example, in the case of a photonic switch swap). We recommend setting this to 20 which equals to 1000 msecs. Consequently, the flap will be logged in syslog, but higher level protocols (BGP in this case) will be unaffected. We suggest to leave fast-external-fallover in its default state.

HP Configuration Hints

Recommendations we received for HP ProCurve devices

spanning-tree ifname bpdu-filter spanning-tree ifname tcn-guard lldp admin-status ifname disable


Juniper Configuration Hints

For Juniper routers, there isn’t much to disable. The Juniper Documents contain useful hints on how to set up your Juniper router.

CAUTION: IGMP Bug (PR/20343) in Junos OS versions 5.3R4 !

There’s a bug in Junos OS versions up to 5.3R4, that will cause a Juniper router to emit IGMP packets on all its interfaces, even when IGMP is disabled. The only way to stop your router from transmitting IGMP is to configure outgoing packet filters on your irix interface(s).

Unicast BGP Configuration

Make sure to exchange only unicast routes in the unicast ISP peering LAN by explicitly adding the following statement to ,em>all neighbors, groups and prefix-limits:

set family inet unicast

Be thorough with family inet unicast
If even one of the neighbours, groups or prefix-limits is defined with a family inet “any”, you’ll enable multicast and turn on MBGP.

Increasing interface hold-time (1200ms) to preserve BGP sessions during interface flaps

---
user@router# show interfaces xe-0/1/0
description “interface to irix Peering LAN”;
hold-time up 1200 down 1200
---


IPv4 ARP Cache Timeout

Juniper’s default ARP cache timeout is 20 minutes (by comparison: Cisco’s default ARP cache timeout is 4 hours, which fits irix’s relatively static environment much better).

To reduce the amount of unnecessary broadcast traffic, we recommend setting the ARP cache timeout on Juniper routers to 4 hours. A recipe for this follows:

> configure
Entering configuration mode
[edit]
you@juniper# edit system arp
[edit system arp]
you@juniper# set aging-timer 240
[edit system arp]
you@juniper#   show | compare
[edit system arp]
+ aging-timer 240;
[edit system arp]
you@juniper# commit and-quit
commit complete
Exiting configuration mode

Since Junos 9.4 the ARP cache timeout is also configurable on an interface level:

[edit system arp aging-timer interface interface-name] aging-timer-minutes;

and on more recent versions of Junos that syntax has changed to:

[edit system arp interface interface-name] aging-timer aging-timer-minutes;


Juniper Aggregated Link

- M-Series

We have encountered no issues with aggregated links and Jun OS (M40, M160, T320). Junos releases prior to 6.0 required VLAN tagging on aggregated interfaces. This limitation has since been removed. An example configuration follows:

[edit]
niels@junix# show chassis
aggregated-devices {
ethernet {
device-count 1;
}

}
---
[edit]
niels@junix# show interfaces ge-2/1/0
gigether-options {
802.3ad ae0;
}
[edit]
niels@junix# show interfaces ge-3/1/0
gigether-options {
802.3ad ae0;
}
---
[edit]
niels@junix# show interfaces ae0
description “irix”;
unit 0 {
family inet {
filter {
input irix-in;
output irix-out;
}
address 202.88.42.x/24;
}
family inet6 {
address 2400:1560:6::A50a:bcde:1/64;
}

Additionally and optionally you can configure more granular load balancing:

#
---
routing-options {
autonomous-system abcde;
forwarding-table {
export [ load-balance ];
}
}
policy-options {
policy-statement load-balance {
then {
load-balance per-packet;
}
}
}
forwarding-options {
hash-key {
family inet {
layer-3;

layer-4;
}
}
}
---

In case that is not granular enough, you can modify the hash-key algorithm with some undocumented options in Junos OS 7.x and up:

---
hash-key {
family inet {
layer-3 {
destination-address;
protocol;
source-address;
}
layer-4 {
destination-port;
source-port;
type-of-service;
}
}
}
---

Also, you can set your aggregated min-links to a value that will cause the bundle to drop in the event that your links can no longer support the amount of traffic you plan on shoving down the pipe. Thus, 2-port aggregated link, pushing 1.2 Gbps sustained across, drop bundle if n == 1;

---
aggregated-ether-options {
minimum-links 2;
link-speed 1g;
}
---

In a situation with load-balancing over multiple IP interfaces (not irix), the final statement will make traceroute more confusing to novices as packets may seem to ‘bounce’ between interfaces by also including TCP/UDP port numbers and ICMP checksums in the algorithm.On an IP1 load-balance perpacket really means per-packet; on an IP2 it actually works per flow, which is preferable.


Juniper 10GE Specifics

The link flap introduced by the PXCs make that you have to damp interface transitions. Junos supports a configurable hold-time. A good value would be 1200 ms.

[edit]
arien@router# show interfaces xe-0/1/0
description “ interface to irix Peering LAN”;
hold-time up 1200 down 1200

Aggregated interfaces require hold timers on all physical interfaces and on the logical aggregated interface. Respectively xe-0/1/0 and ae0 in the example below:

[edit]
arien@router# show interfaces xe-0/1/0
description “10GE LinkAgg #1”;
hold-time up 1200 down 1200;
gigether-options {
802.3ad ae0;
}

[edit]
arien@router# show interfaces ae0
description “Aggregated interface to irix Peering LAN”;
hold-time up 1200 down 1200;
aggregated-ether-options {
minimum-links 1;
link-speed 10g;
}
unit 0 {
description “Aggregated interface to irix Peering LAN”;
bandwidth 20g;
family inet {
address 202.88.42.x/24;
}
}


MTU Config

The configured MTU should be 1514 (this includes Ethernet headers but not the FCS), or 1518 when tagged.


Arista Configuration Hints

Recommendations we received for Arista routers.

Interface configuration

Configure the interface facing the Peering LAN as a routed port, disable IPv6 router advertisements and disable LLDP:

interface Ethernet1
description irix
no switchport
ip address ...
ipv6 address ...
ipv6 nd ra disabled
no lldp transmit

If you do decide to configure the port as a switched port with a VLAN-interface, make sure STP is disabled:

router(config-if-Vl1)#no spanning-tree


Configuration for 10GE/100GE ports

To ignore short link-flaps, configure the link-debounce setting:

router(config-if-Et1)#link-debounce time 1200


Link Aggregation

To create an LACP-bundle, configure the ports in a channel-group. This will create a virtual port-channel interface on which you configure the Peering LAN IP address and other settings:

interface Ethernet1
description irix port 1
channel-group 1 mode active
interface Ethernet2
description irix port 2
channel-group 1 mode active

interface Port-Channel1
description irix
ip address ...
ipv6 address ...


- ARP aging timeout

The default ARP timeout on Arista is 4 hours, which is acceptable for the Peering LAN. Should you wish to change it, you can do so as follows:

router(config-if-Et1)#arp aging timeout

Linux Configuration Hints

We are not aware of any major issues with Linux boxes used as routers, and they seem to be pretty rare on the Exchange. Having said that, there are a few parameters that can (and usually should) be tuned:

  • ARP filtering & source routing
  • ARP cache timeout
  • Reverse Path (RP) filter

For more information on tuning your Linux system for routing, see the Linux Advanced Routing & Traffic Control HOWTO. NOTE: Please be aware while configuring sysctl parameters, that interface specific entries override global ones. For instance, proxy-arp will be enabled (which is undesirable) if both of these are set:

net.ipv4.conf.eth0.proxy_arp = 1

net.ipv4.conf.all.proxy_arp = 0

ARP Filtering and Source Routing

The Linux approach to IP addresses is that they belong to the system, not any single interface. As a result, Linux hosts have a default behaviour that is different from most other systems: interfaces semipromiscuously answer for all IP addresses of all other interfaces. Example:

Linux Configuration Hints

In this example, host tuxco is a Linux box with a peering connection on eth0 (192.168.1.1/24) and a backbone link on eth1 (10.0.0.1/24).When host kannix (192.168.1.2) sends an ARP query for 10.0.0.1 it will get a reply from tuxco’s eth0 interface!

In other words, a Linux host will answer to ARP queries coming in on any interface if the queried address is configured on any of its interfaces. The idea behind this is that an IP address belongs to the system, not just to a single interface. Although this may work well for server or desktop systems, it is not desirable behaviour in a router system. One reason is that it is a limited version of proxy-arp, which is forbidden on the irix peering LAN. Another reason is that two separate routers could potentially answer ARP queries for the same RFC1918 address.


- Fixing ARP

The ARP behaviour can be fixed by using arp_ignore and arp_announce on the WAN interface:

tuxco# sysctl -w net.ipv4.conf.
ifname
.arp_ignore=1

tuxco# sysctl -w net.ipv4.conf.
ifname
.arp_announce=1

- Multiple Interfaces on One Subnet

If you have multiple interfaces on the same subnet, you may also want to enable arp_filter:

This prevents the ARP entry for an interface to fluctuate between two or more MAC addresses. However, you need to use source routing to make this work correctly. From the Documentation/networking/ipsysctl- 2.6.txt file in the kernel source:

[...]

arp_filter - BOOLEAN

1 - Allows you to have multiple network interfaces on the same subnet, and have the ARPs for each interface be answered based on whether or not the kernel would route a packet from the ARP’d IP out that interface (therefore you must use source based routing for this to work). In other words it allows control of which cards (usually 1) will respond to an arp request.

[...]


IPv4 ARP Cache Timeout

The ARP cache timeout on Linux-based routers should be changed from the default, especially if you have a large number of peers. This parameter can be tuned by setting the appropriate procfs variable through the*sysctl* interface. The Linux arp(7) manual says:

[...]

SYSCTLS

ARP supports a sysctl interface to configure parameters on a global or per-interface basis. The sysctls can be accessed by reading or writing the /proc/sys/net/ipv4/neigh/*/* files or with the *sysctl*(2) interface. Each interface in the system has its own directory in /proc/sys/net/ipv4/neigh/. The setting in the default directory is used for all newly created devices. Unless otherwise specified time related sysctls are specified in seconds.

[...]

base_reachable_time

Once a neighbour has been found, the entry is considered to be valid for at least a random value between base_reachable_time/2 and 3*base_reachable_time/2. An entry’s validity will be extended if it receives positive feedback from higher level protocols. Defaults to 30 seconds.

This means that Linux systems keep ARP entries in their cache for some time between 15 and 45 seconds (and yes, the average works out to 3 seconds). This is not very high. In fact, it is lower than the typical BGP keepalive interval and may thus result in excessive ARPs. We suggest a timeout of at least two hours for ARP entries on your irix interface, so you’d have to set the base_reachable_time to 2 x 2hrs = 4 hours.

tuxco1# sysctl net.ipv4.neigh.ifname.base_reachable_time
net.ipv4.neigh.ifname.base_reachable_time = 30

The above command tells you that the ARP cache timeout is 30 seconds average. To change it so it’s between 2 and 6 hours, use the following command:

tuxco1# sysctl -w net.ipv4.neigh.ifname.base_reachable_time=14400
net.ipv4.neigh.ifname.base_reachable_time = 14400

Here ifname is the name of the interface that connects to irix. You can also use “default” here, but that may have undesired side-effects for your other interfaces.


IPv6 Neighbor Cache Timeout

As with the IPv4 ARP cache, Linux systems tend to set the lifetime of the IPv6 neighbor cache quite short as well. The lifetime is controlled in a similar way as for IPv4 ARP.

Proxy ARP

Disable proxy-arp using sysctl:

sysctl -w net.ipv4.conf..proxy_arp =

router# sysctl -w net.ipv4.conf.
ifname
.proxy_arp=0


IPv6 Autoconfiguration

IPv6 stateless autoconfiguration must be disabled:

router# sysctl -w net.ipv6.conf.
ifname
.autoconf=0net.ipv6.conf.ifname.autoconf = 0


RP Filter Setting

You may need to turn off the Reverse Path Filter (rp_filter) functionality on a Linux-based router to allow asymmetric routing, particularly on your WAN interface.To disable the RP filter:

tuxco1# sysctl -w net.ipv4.conf.
ifname
.rp_filter=0net.ipv4.conf.ifname.rp_filter = 0

Running the ‘sysctl’ Commands at Boot

The various system parameters discussed above can be set at boot time by adding it to a file such as /etc/sysctl.conf. The exact name, location and very existence of this file typically depends on the Linux distribution in use, but both Debian and Red Hat/Fedora use /etc/sysctl.conf:

# file: /etc/sysctl.conf
# These settings should be duplicated for all interfaces that are
# on a peering LAN.

### Typical stuff you really want on a router

# Fix the “promiscuous ARP” thing...
net.ipv4.conf.ifname.arp_ignore=1
net.ipv4.conf.ifname.arp_announce=1

# Turn off RP filtering to allow asymmetric routing:
net/ipv4/conf/ifname/rp_filter=0

# Multiple (non-aggregated) interfaces on the same peering LAN.
# READ THE MANUAL FIRST!
#net.ipv4.conf.ifname.arp_filter=1

### Keep the irix ARP Police happy. :-)

net.ipv4.neigh.ifname.base_reachable_time=14400
net.ipv6.neigh.ifname.base_reachable_time=14400

CAUTION: Modules must be loaded before sysctl is executed

On Debian systems, kernel modules for some network interfaces (e.g. 10GE cards) are not loaded before the init process executes the script thatruns the sysctl commands. In those cases, it is necessary to force the module to be loaded earlier. The same goes for the IPv6 settings; the ipv6 module is usually not loaded until the network interfaces are brought up, which is typically after the sysctl variables are set by the procps.sh script.(On Red Hat/Fedora systems no action needs to be taken; the /etc/init.d/network script automatically (re-)sets the sysctl variables before and after bringing up the interfaces.)There are a few ways around this:

On Debian-based systems, this can be done by creating a symbolic link in /etc/rc2.d to re-run procps.sh after the network is brought up:

root@tuxco# ln -s ../init.d/procps.sh /etc/rc2.d/S20procps.sh


Linux Aggregated Links

Enable bonding driver support in the kernel (CONFIG_BONDING=m)Edit /etc/modules to load the bonding driver on boot:

bonding miimon=100

The miimon parameter specifies the frequency for link-monitoring, measured in ms.Install the ifenslave package (apt-get install ifenslave). This package provides the /sbin/ifenslave tool, which is used to attach physical interfaces to the bonding interface.Add the bonding interface to /etc/network/interfaces:

# irix side
auto bond0
iface bond0 inet static
address 202.88.42.x
netmask 255.255.255.0
post-up /sbin/ifenslave bond0 eth0 eth1

The above example creates a bonding interface with two physical interfaces.For more information see the file Documentation/networking/bonding.txt in the kernel source tree.


MLDv2

Modern kernels have MLDv2 on by default and there is no sysctl parameter to switch it off. The only known way by now is to drop it with an outgoing filter:

ip6tables -A OUTPUT -p icmpv6 --icmpv6-type 143 -j DROP
ip6tables-save

Mikrotik Configuration Hints

By default Mikrotik routers have their own proprietary Mikrotik Discovery Protocol and CDP enabled. To turn these discovery protocols off, in the Web UI go to IP > Neighbors > Discovery Interfaces and disable the protocols on the irix-facing interface.

Redback Configuration Hints

To configure link aggregation on Redback SMS routers you need to do the following.

!Create the link group interface and assign an IP address to it

[local]Redback(config)#context local
[local]Redback(config-ctx)#interface irix
[local]Redback(config-if)#ip address 202.88.42.x/24
[local]Redback(config-if)#exit

!Create the link group and bind it to its interface

[local]Redback(config)#link-group irix ether
[local]Redback(config-link-group)#bind interface irix local

!Configure an ethernet port and add it to the link group

[local]Redback(config-config)#port ethernet 1/1
[local]Redback(config-port)#no shutdown
[local]Redback(config-port)#link-group irix
[local]Redback(config-port)#exit

!Configure another ethernet port and add it to the link group

[local]Redback(config-config)#port ethernet 1/2
[local]Redback(config-port)#no shutdown
[local]Redback(config-port)#link-group irix
[local]Redback(config-port)#exit

!To match the irix arp timeout (4 hours) you need to configure this

under the interface

[local]Redback(config)#context local
[local]Redback(config-ctx)#int irix
[local]Redback(config-if)#ip arp timeout 14400
[local]Redback(config-port)#exit

!Also, you can set your aggregated min-links to a value that will
cause the bundle to drop in the event that your links can no longer
support the amount of traffic you move trough the link-group.Thus, 2-
port aggregated link, pushing 1.2 Gbps sustained across, drop bundle
if n == 1;

[local]Redback(config)#link-group irix ether
[local]Redback(config-link-group)#minimum-links 2
[local]Redback(config-link-group)#exit

Riverstone Configuration Hints

On Riverstone equipment, proxy ARP seems to be enabled by default, so you will need to disable it:

ip disable proxy-arp interface ifname

Here, ifname refers to your interface towards irix, or the string ‘all’

Acknowledgements

Various people contributed to this document. We received configuration info from:

  • Aaron Weintraub (Cogent Communications)
  • Adam Davenport (Choopa)
  • Andree Toonk (SARA)
  • Andrew V. Zachinyaev (RIPN)
  • Bart Peirens (Belgacom)
  • Bas Haakman (Multikabel)
  • Ben Galliart (Steadfast Networks)
  • Blake Willis (Neo Telecoms)
  • Brad Dreisbach (NTT)
  • Daniel Roesen (ClueNet Project)
  • Edward Henigin (Giganews)
  • Elisa Jasinska (Limelight)
  • Erik Bos (XS4ALL)
  • Geraint Jones (Koding.com)
  • Greg Hankins (Force10)
  • Jesper Skriver (TDC)
  • Job Snijders (Snijders IT)
  • Jon Nistor (Rogers/TorIX)
  • Kevin Day (Your.org)
  • Lucas van Schouwen (Eweka)
  • Marcel ten Berg (Scarlet)
  • Mark Bergsma (Wikimedia Foundation)
  • Martijn Bakker (Support Net)
  • Martin Pels (Support Net)
  • Michiel Bool (Vodafone Netherlands)
  • Miquel van Smoorenburg (Cistron)
  • Najam Saquib (Mediaways)
  • Niels Raijer (Demon)
  • Paolo Moroni (SWISSCOM)
  • Pierfrancesco Caci (Telecom Italia Sparkle)
  • Rene Huizinga (UPC)
  • Richard A Steenbergen (nLayer)
  • Robert McKay (MCKAYCOM LTD)
  • Ronald Esveld (Equant)
  • Ruediger Volk (Deutsche Telekom)
  • Santi Mercado (SARENET)
  • Scott Madley (Level 3 Communications)
  • Simon Leinen (SWITCH)
  • Thijs Eilander (Cobweb)
  • Tom Scholl (SBC)
  • Vincent Bourgonjen (Open Peering)
  • Wolfgang Tremmel (DE-CIX)

Thanks to all those who contributed.


irix Route Server

irix offers networks connected to the Peering LAN the opportunity to peer via its route servers. On our route servers, peers can filter based on IRRDB objects, as well as on predefined BGP communities. Therefore, members/customers can peer with the route servers while maintaining their own peering policy.

Introductions

Normally, you would need to maintain separate BGP sessions to each of your peers' routers. With a route server you can replace all or a subset of these sessions with one session towards each route server.

The goal of irix's Route Server Project is to facilitate the implementation of peering arrangements. We aim to lower the barrier of entry for new participants on the peering platform.

The route servers do not participate in the forwarding path, so they do not forward any traffic. And peering with a route server does not mean that you must accept routes from all other route server participants.

Why would you use the route server
  • Let's make it easy
    Simplify the needed configuration to reach as many networks as possible on the irix platform by configuring just two BGP sessions. With the large amount of connected parties, it can be a full-time task to manage separate BGP sessions. In addition, whenever a new party connects to the route servers, you will be able to automatically exchange prefixes with it (depending on yours/their filters).
  • Manage only your most important peers, let the route server do the rest
    You probably want to exchange as much traffic as possible through the exchange, but setting up a peering takes time and effort. So only set up peering sessions with your most important peers - let the route server do the rest!
  • Send and receive routes from day one
    Once you are connected to the route servers you will start exchanging routes immediately. The route servers are a good way to get started on the exchange.
  • Use it as a backup
    When your BGP session to a party becomes inactive, there is a possibility that you can still connect to them via the route servers. So the use of the route servers can lead to a more stable platform.
  • Maintain your peering policy
    The route server has built in filters that allow you to maintain your peering policies. For more information, please read the filtering topic.
Route server details
Route Server 1 Route Server 2
rs1.irix.my rs2.irix.my
ASN 131329 131329
IPv4 202.88.42.251 202.88.42.252
IPv6 2400:1560:6::A500:13:1329:251 2400:1560:6::A500:13:1329:252
Platform Bird → Bird →
  • When peering with the route servers, it is mandatory that routers are set up to connect to both route servers and advertise the same amount and length of prefixes for resilience.
  • Please note that the route servers are set to passive mode and will never initiate a BGP session. You should make sure that your equipment does so, i.e. connects to our TCP port 179 and that your inbound filtering/ACL rules permit established sessions with the route servers.
Prefix propagation and Max-Prefix Advisory

The route servers hold around 15K IPv4 prefixes and 9K IPv6 prefixes in the master table. These prefixes are the best routes that Bird’s BGP algorithm has selected among all received routes from all the established BGP feeds. But the number of prefixes that each member receives from the route servers varies and depends of the following factors:

  • Your peering policy that is expressed in RPSL format in the IRR database.
  • The peering policy of other irix members in which they can decide to announce prefixes via irix route servers to specific peers.

With the current peering policies and convergence of BGP algorithm, we observe that the average amount of prefixes being received by our members is around 15K for IPv4 and 9K for IPv6. However, we advise our members to configure a max-prefix of 18K for IPv4 and 11K for IPv6 due to the following reasons:

  • We calculate the limit based on the maximum number of valid prefixes that exist in the master table and can be potentially provided to a singe BGP feed.
  • irix expects future prefix growth as a result of a dynamic platform where more and more networks get connected. Thus, we raise the limit by 25% in order accommodate this growth.
Want to Participate

Many unique ASNs participate in the route server project, representing tens of thousands of prefixes. For more information about who is participating, see the Connected Parties page.

If you would like to peer with the irix route servers, please login to our customer portal My.irix, and enable it in the configuration page of your respective connection (Connections -> Show -> Disable/ Enable Peering with route-server).

Need support to enable peering with route server?
Contact Us →


MANRS

irix Route Servers are MANRS (Mutually Agreed Norms for Routing Security) compliant.
Read more →

Deployment Guidelines

Below follows a sample configuration for Cisco routers to announce a prefix to the route servers:

!
router bgp your-asn

bgp always-compare-med
no bgp enforce-first-as
bgp log-neighbor-changes
neighbor irix-RS peer-group
neighbor irix-RS remote-as 131329
neighbor irix-RS version 4
neighbor irix-RS transport connection-mode active

neighbor irix-RS-6 peer-group
neighbor irix-RS-6 remote-as 131329
neighbor irix-RS-6 version 4
neighbor irix-RS-6 transport connection-mode active

neighbor 202.88.42.251 peer-group irix-RS
neighbor 202.88.42.251 description rs1.irix.net
neighbor 202.88.42.252 peer-group irix-RS
neighbor 202.88.42.252 description rs2.irix.net

neighbor 2400:1560:6::A500:13:1329:251 peer-group irix-RS-6
neighbor 2400:1560:6::A500:13:1329:251 description rs1.irix.net
neighbor 2400:1560:6::A500:13:1329:252 peer-group irix-RS-6
neighbor 2400:1560:6::A500:13:1329:252 description rs2.irix.net
!
address-family ipv4
no neighbor irix-RS-6 activate
neighbor irix-RS activate
neighbor irix-RS next-hop-self
neighbor irix-RS soft-reconfiguration inbound
neighbor irix-RS route-map TO-RS out
no auto-summary
no synchronization
neighbor 202.88.42.251 peer-group irix-RS
neighbor 202.88.42.252 peer-group irix-RS
network 192.168.110.0 mask 255.255.255.0
network 192.168.111.0 mask 255.255.255.0
network 192.168.112.0 mask 255.255.255.0
exit-address-family
!
address-family ipv6
neighbor irix-RS-6 activate
neighbor irix-RS-6 next-hop-self
neighbor irix-RS-6 soft-reconfiguration inbound
neighbor irix-RS-6 route-map TO-RS out
neighbor 2400:1560:6::A500:13:1329:251 peer-group irix-RS-6
neighbor 2400:1560:6::A500:13:1329:252 peer-group irix-RS-6
network 2001:DB8:10::/64

network 2001:DB8:11::/64

network 2001:DB8:12::/64

exit-address-family
!
ip as-path access-list 12 permit ^$
!
ip prefix-list TO-RS seq 10 permit 192.168.110.0/24

ip prefix-list TO-RS seq 20 permit 192.168.111.0/24

ip prefix-list TO-RS seq 30 permit 192.168.112.0/24

!
ipv6 prefix-list TO-RS seq 10 permit 2001:DB8:10::/64

ipv6 prefix-list TO-RS seq 20 permit 2001:DB8:11::/64

ipv6 prefix-list TO-RS seq 30 permit 2001:DB8:12::/64

!
route-map TO-RS permit 10
match ip address prefix-list TO-RS
!

Note that for recent IOS versions (e.g. 12.0(26)S and 12.2(25)S and up, where this has become the - hidden - default) you will have to specify “no bgp enforce-first-as (IOS, IOS-XE) / bgp enforce-first-as disable (IOS-XR)” as the route server does not insert its own ASN into the AS_path of relayed prefix announcements. Zebra and Quagga suffer from the same problem since somewhere in 0.91.

Below is a similar example for Juniper routers:

[edit]
user@junix# show protocols bgp
group IPV4-RS {
	type external;
	description “Route Servers”;
	family inet {
		unicast;
	}
	export TO-RS;
	peer-as 131329;
	neighbor 202.88.42.251 {
		description rs1.irix.net;
	}
	neighbor 202.88.42.252 {
		description rs2.irix.net;
	}
}

[edit]
user@junix# show policy-options policy-statement TO-RS
term unicast-export {
	from {
		rib inet.0;
		prefix-list to-announce;
	}
	then accept;
}
term end {
	then reject;
}

[edit]
user@junix# show policy-options prefix-list to-announce
10.25.1.0/24;

Route Server Filtering

Incoming prefixes sanitisation

All irix route servers in Singapore perform basic incoming prefix filtering to all member/customer BGP sessions that are being established (optionally) with our Route Servers. The basic prefix filtering consists of blocking RFC 1918 ranges, bogon and Martian prefixes and the default route. We base our list on Team CYMRU’s BOGON List.

We do not monitor or control which prefixes participants announce to each other, just as we do not filter your bilateral sessions. If you wish to filter out more specifics or perform IRR-based prefix filtering, you are free to do so on your own router.

Please note that, specifically for the Amsterdam route servers, apart from bogon prefixes, we also filter by default “ROA status: INVALID” prefixes, as well as prefixes not announced in AS/AS-SETs part of aut- num export statements, as discussed below.

Outgoing prefixes filtering among route-server members

The irix route servers implement outgoing filtering based on policies defined by the route server participants. This filtering is applied on outgoing advertisements. By defining your policy using an IRRdb supporting RPSL, you instruct the route servers to send your prefixes to other participants (export policy), or from which participants you wish to receive prefixes (import policy). Therefore participating in the route server project does not necessarily mean that you would be obliged to send/receive prefixes for all connected participants; filtering schemes are available.

The filters are solely derived from your IRRdb objects, which use RPSL as a description language. There are three different options you can use: ANY, ANY except and RESTRICTIVE, to define your filtering needs.

In order to pick up the change in member’s peering policy, irix route-servers periodically detect policy changes every hour starting at midnight Amsterdam time. If you wish to have your filters updated right away or encounter any problems, please contact the irix NOC. We can apply new configuration for the route-server to reflect your new policy.

Please check the list of these supported IRRdbs.

Would you like to have your filters updated right away or do you encounter any problems?
Contact us →


The 3+1 peering modes of route servers

As stated above, our route servers in Kuching implement 3+1 peering modes of prefix filtering in the outbound direction.

  • Peering mode ‘Filtering based on both IRRDB and RPKI data’:
    This is the default option when a new BGP session is established with the irix route servers. By selecting this peering mode, the route servers are configured automatically to apply IIRDB based filtering (explanation is provided below) and RPKI based filtering (explanation provided below). In case you already have a session with the NL route servers and this option is not the selected one, we recommend you to switch your peering mode to the default one.
  • Peering mode ‘Filtering based on IRRDB data’:
    By selecting this option, Route Server outgoing prefixes extended filtering is based on IRRDB filtering only (explanation below). In summary, the prefixes that are being blocked are the ones that are not present in AS’s announced AS/AS-SET. We strongly recommend to make sure that your IRRDB objects are correctly updated and described in the RIPE database when having this option enabled (and the default one)
  • Peering mode ‘Filtering based on RPKI data’:`
    By selecting this option, Route Server outgoing prefixes extended filtering is based on RPKI filtering. In summary, the prefixes that are being blocked are the ones with ROA status ‘INVALID’. We strongly recommend to make sure that your IRRDB ROAs are correctly updated in the RIPE database when having this option enabled (and the default one).

Optionally, we can offer the following peering mode in case you really need an unfiltered BGP feed (e.g. for research purposes”:

  • Peering mode ‘Just tagging’:
    By selecting this not recommended option, no filtering is applied to announced prefixes. That functionality is helpful for research institutes who want to receive all information or organisations who want to apply their own BGP policies. However, any prefixes that are not filtered will be tagged by using standard BGP communities based on the following criteria (communities are given in the parentheses).

    » Prefix with ROA status: VALID (131329:65012)
    » Prefix with ROA status: INVALID (131329:65022)
    » Prefix with ROA status: UNKNOWN (131329:65023)
    » Prefix present in AS’s announced AS/AS-SET (131329:65011)
    » Prefix not present in AS’s announced AS/AS-SET (131329:65021)

IRRDB based Filtering

Our route servers generate their configuration based on a IRRdb parser script. The script supports most of the IETF snijders-rpsl-via draft extensions to the RPSL and the “import-via” and “export-via” attributes defined therein. Using these attributes, we allow for ASN32 aut-num objects in expressions and promote more elegant policy definitions regarding route servers.

The legacy filtering method is fully supported, but we encourage new and existing customers to use the new attributes when defining their policy. Please refer to the examples found below as to how to accomplish this.

We’re using AS131329 as the example aut-num object containing the example policies.

  1. ANY
    (Send and receive prefixes to/from any RS participant):
    [...]
    import-via: AS131329 from AS-ANY accept ANY
    export-via: AS131329 to AS-ANY announce AS131329
    [...]
    

  2. ANY except
    (Send and receive prefixes to/from any RS participant EXCEPT AS666):
    [...]
    import-via: AS131329 from AS-ANY EXCEPT AS666 accept ANY
    export-via: AS131329 to AS-ANY EXCEPT AS666 announce AS131329
    [...]
    

  3. RESTRICTIVE
    (Send and receive prefixes ONLY to/from AS15703):
    [...]
    import-via: AS131329 from AS15703 accept ANY
    export-via: AS131329 to AS15703 announce AS131329
    [...]
    

    AS-SETs also work in all cases:

  4. ANY EXCEPT using AS-SETs
    (Send and receive prefixes to/from any RS participant EXCEPT ASes/AS-SETs included in AS131329:CUSTOMERS):
    [...]
    import-via: AS131329 from AS-ANY EXCEPT AS131329:AS-CUSTOMERS accept ANY
    export-via: AS131329 to AS-ANY EXCEPT AS131329:CUSTOMERS announce AS131329:CUSTOMERS
    [...]
    

  5. RESTRICTIVE using AS-SETs
    (Send and receive prefixes ONLY to/from ASes/AS-SETs contained in AS-SET AS131329:CUSTOMERS):
    [...]
    import-via: AS131329 from AS131329:AS-PEERS accept ANY
    export-via: AS131329 to AS131329:AS-PEERS announce AS131329:AS-CUSTOMERS
    [...]
    

  6. RESTRICTIVE with NOT ANY
    # Import from no-one
    import-via: AS131329 from AS-ANY accept NOT ANY
    

    # Export to no-one
    export-via: AS131329 to AS-ANY announce NOT ANY
    

  7. afi lists are also supported
    (initially described in RFC4012), e.g.:
    import-via: afi ipv4.unicast AS131329 from AS-ANY EXCEPT AS131329:AS-CUSTOMERS accept ANY
    export-via: afi ipv4.unicast AS131329 to AS-ANY EXCEPT AS131329:AS-CUSTOMERS announce ANY
    

irix route server objects

Relevant objects for participating peers in the Route Server project are grouped into these AS-SETs:

  • AS-IRIX-RS (list of connected peers)
  • AS-IRIX-RS-SETS (list of advertised AS-SETs)
  • AS-IRIX-RS-V6 (list of connected IPv6 peers)
  • AS-IRIX-RS-SETS-V6 (list of advertised AS-SETs for IPv6 peers)
  • AS-IRIX-SET (List of all route server ASNs)

BGP Traffic Engineering

In this section, you will find information about BGP Community filtering and AS-PATH prepending.

BGP Community filtering

Provide a BGP community filtering mechanism to peers

Route server peers are able to manipulate outbound routing policies via an in-band mechanism using BGP communities, instead of relying on “import/import-via”, “export/export-via” RPSL attributes. The downside to this method is that peers won’t be able to control inbound policies. Currently, the mechanism is implemented to support the traditional BGP communities, the Extended BGP communities and the Large BGP communities.

Please note that irix is planning to drop the support for the Extended communities as their functionality is fully covered from the Large communities.

When you want to signal the Route-Server to filter prefixes for destination networks that have 16bit ASN, you can use either the traditional communities or the Extended communities. In case the destination network is a 32bit ASN, then you can use either the Extended communities or the Large communities.

To make the above easily understandable, we provide the table below that summarises the available options:

Source ASN and destination ASN Legacy communities Extended communities Large communities
16bit ASN to 16bit ASN YES YES YES
16bit ASN to 32bit ASN NO YES YES
32bit ASN to 16bit ASN YES YES YES
32bit ASN to 32bit ASN NO NO YES

Note that you have to use the appropriate route server AS number, based on the irix location you’re peering in, with 131329 representing irix. All locations support this feature.

For traditional BGP communities, the offered options are:

  • Do not announce a prefix to a certain peer: 0: <peer-as>
  • Announce a prefix to a certain peer: 131329: <peer-as>
  • Do not announce a prefix to any peer: 0:131329
  • Announce a prefix to all peers: 131329:131329

For BGP Extended communities, you can use the offered options below:

  • Do not announce a prefix to a certain peer: RT:0: <peer-as>
  • Announce a prefix to a certain peer: RT:131329: <peer-as>
  • Do not announce a prefix to any peer: RT:0:131329

For Large communities, the offered options are as below:

  • Do not announce a certain prefix to peer-as: 131329:0: <peer-as>
  • Announce a certain prefix to a certain peer: 131329:1: <peer-as>
  • Do not announce a certain prefix to any peer: 131329:0:0

Note that if you want to advertise a specific prefix to a specific customer only, then you need to combine “131329:<peer-as>” and “0:131329” BGP communities.

AS_PATH prepending

As with the community-based filtering, irix peers have the ability to influence the prefix selection process of other members based on AS-Path pretending. The mechanism can be enabled either with the traditional 16bit BGP communities, or with the 32bit Large communities.

For the traditional 16bit communities, the prefix tagging must be as below:

  • using 131329:65501, to prepend customer’s ASN once towards all other peers
  • using 131329:65502, to prepend customer’s ASN twice towards all other peers
  • using 131329:65503, to prepend customer’s ASN thrice towards all other peers

The same result can be achieved by tagging prefixes with Large Communities as below:

  • using 131329:101:<peer-as>, to prepend customer’s ASN once towards all other peers
  • using 131329:102:<peer-as>, to prepend customer’s ASN twice towards all other peers
  • using 131329:103:<peer-as>, to prepend customer’s ASN thrice towards all other peers

Please note that in case you use your own ASN in the “<peer-as>” position then we prepend your prefix to all other irix members. However, if you use in the position of the ASN of another irix member, the route servers will prepend the prefix once, twice, or thrice towards that particular irix member only.

Additional Notes

  • IRRDB policies work only on the AS level, whereas BGP communities work on the prefix level.
  • IRRDB policies are parsed and applied hourly, whereas BGP communities are effective immediately, being in-band.
  • BGP communities can only influence outbound (customer edge router to route server) announcements, whereas IRRDB policies can be used to influence inbound (route server to customer edge router) announcements, before reaching the customer edge router, thus potentially affecting the BGP decision process.
  • Path hiding should not be a problem, as we are employing the BIRD ‘secondary’ configuration option.
  • Note that validity of the IRRDB/RPKI based information provided is not guaranteed in any way.
  • Please consider carefully whether your irix facing router should solely rely on information exchanged to and from the route servers.
Dynamic per-A Prefix Limits

Dynamic per-AS Prefix Limits

Problem: route leaks

Route leaks are a problem. Either due to fat fingers, software bugs, or even malicious intent, route leaks are a fact of BGP life. A simple way to deal with the issue is using prefix limits.

Setting a static (fixed) limit to prevent customers from advertising more prefixes than intended does not really work for a route server service as the customer advertising the most prefixes has to be taken as the standard from which the limit is derived.

That leaves all the other customers with a wide margin in which they can freely leak routes; e.g. if the limit was set to 15,000, a customer advertising only one prefix could leak 14,999 more before being hitting the limit.

Adding insult to injury, this also has a cascading effect. Other route server peers having set a prefix limit for the session with the route servers, would potentially shut down the session, as they are now seeing thousands of additional prefixes.

Enter dynamic per-AS prefixes:

irix is applying prefix limits specific to the AS connecting to the route server service. For instance, peers advertising only a couple of prefixes will have a maximum prefix limit of 100. Peers advertising thousands of prefixes will be calculated based on an proportional coefficient.

For examples and a breakdown of the formula used, see the FAQ below.

Fluctuations in advertisements are normal and expected. As long as these are within reason, our limits will adapt accordingly (hence ‘dynamic’).

FAQ

Q: I’m concerned that the limits for my AS are not big enough!
We hate to tear down sessions for no good reason, so rest assured that the limits are sufficiently relaxed. A 2 month lead period in which we were observing peer behaviour and fine-tuned the algorithm ensured this as much as possible. That being said, we value the stability of the service above everything else, so peers suddenly advertising thousands of prefixes when historically they have been advertising only a handful *will* hit the limit. In such cases, please contact us and we will be happy to reactivate the sessions.

Q: I’m still concerned about the sanity of the limits, though.
We can also set a static limit for you, please contact us and state the limit you wish.

Q: Why not use IRRDB objects/RPKI to contain announcements?
irix specifically wants to ensure that the route server service is as stable as possible. Having peers announce unexpectedly large amounts of prefixes wreaks havoc as it tears down sessions for considerable amounts of peers causing CPU churn to all parties involved. This is a different matter compared to the *type* of prefixes advertised.

IRRDB data is prone to inconsistencies and even more importantly, their usage is mostly limited to the western world. AS’s from other regions of the world generally disregard IRRs.

Q: Can you give me examples of how this works?
Please consult the tables below:

Announced Prefixes y Coefficient x Prefix Limit (yx < z)
y < 50 2 100
50 < y < 249 2 500
250 < y < 499 2 1000
500 < y < 999 2 2000
1000 < y < 2000 2 - 1,5 next step of 1000
2000 < y < 10000 1,5 - 1,2 next step of 1000
y > 10000 1,2 next step of 1000

- Examples

25 Announced Prefixes x 2 = 50. Limit set to 100
551 Announced Prefixes x 2 = 102. Limit set to 500
300 Announced Prefixes x 2 = 600. Limit set to 1000
900 Announced Prefixes x 2 = 1800. Limit set to 2000
1500 Announced Prefixes x 1,75 = 2625. Limit set to 3000
9000 Announced Prefixes x 1,22 = 10980. Limit set to 11000
15000 Announced Prefixes x 1,2 = 18000. Limit set to 19000

Other Information
Link Aggregation

Link aggregation allows for the bundling of two or more links into one virtual channel. Link aggregation is also known as EtherChannel, Port Channel, Port aggregation or trunking, depending on the vendor involved. The IEEE 802.3ad or LACP specifications is applicable.

Pricing and Availability

irix currently offers link aggregation on any 1G/10G/100G physical connection. Aggregating links acquired from different partners or resellers, however, is not supported. The port prices for aggregated links are identical to the normal port prices.

Due to technical limitations of the switches used by irix it may be necessary to relocate your existing port. If this turns out to be the case, irix will inform you and advise you of any additional steps necessary for this process.

irix can deliver aggregated links at all co-locations. Although a strict reading of the spec forbids it, we can offer aggregated links over different media types of the same speed.

Load-Balancing Algorithm for the SLX platform

The load-balancing algorithms used in our Extreme SLX switches use a modulo operation, leading to the best distribution over links with the entropy available in source and destination IPv4 or IPv6 address, TCP or UDP source and destination port number, as applicable.

LACP & irix Topology

LACP is supported at irix for all connection types. After a link flap, ports with LACP enabled will stay in blocking mode until the first LACP frame is received. Because this may take several tens of seconds (depending on vendor implementation) this can cause BGP sessions to flap.

When enabling LACP we advise to configure the LACP timeout to short, to limit the maximum failover time to 3 seconds.

Configuration Hints

We have collected information about link aggregation for several router platforms in our configuration guide.
See Config Guide →

Quarantine VLAN

irix has implemented a feature called “Quarantine VLAN” whereby all new ports are placed in a separate VPLS instance, which is used for testing purposes. before the customer connection is moved to the production environment.

Q: What is a Quarantine VLAN?
A quarantine VLAN is a VPLS instance on the irix switch containing the following:

  • (New) customer ports
  • irix monitoring system
The monitoring system sniffs all broadcast, multicast and unknown unicast in the quarantine VLAN.

Q: Why have Quarantine VLANs?
irix defines a fairly strict set of allowed traffic types on the peering LANs. Not all routers (and intermediate L2 devices) adhere to these guidelines; they typically have various protocols turned on by default such as CDP, EDP, STP, DEC MOP, etc., or they present more than one MAC address to the platform. These misbehaving/misconfigured devices potentially endanger the stability of the peers and/ or switching platform. Hence, we cannot allow them on the peering LANs.

Rather than act reactively once a customer port is in production, we prefer to detect and fix these issues beforehand. Therefore, we introduced the concept of a quarantine VLAN. Once a customers router is connected and the port is up, we can quickly see if it adheres to the rules. If not, the violating

Q: When do you use Quarantine VLANs?
New ports are always put into a quarantine VLAN first. This also is the case for upgrades, downgrades and relocations.

In addition to the above, existing customer ports may be moved into quarantine VLAN if they violate the allowed traffic types. Please note that this is only done in extreme cases.

Q: How do I get out of a Quarantine VLAN?
If your port is moved into Quarantine the irix NOC will notify you for this. If the reason is because you are sending illegal traffic, your configuration should be updated accordingly. Once you are confident the port adheres to the rules please contact the irix NOC and request the port be put back in production. The NOC will check the port’s behaviour again. If all is fine, the port will be moved (back) into production. If not, we will notify you with details of the problem.

sFlow at irix

To analyse and optimise high speed networks, an efficient monitoring system is required. irix uses sFlow for its traffic analysis.

Introduction

sFlow is a standard to capture traffic data in switched or routed networks. It uses a sampling technology to collect statistics from the device and is for that reason applicable to gigabit speeds or higher. Due to it being an open standard, described in RFC 3176, it is implemented on a wide range of devices, like the irix Brocade/Extreme switches.

The sFlow agent (Brocade/Extreme switch) supports two forms of operation:

  • time-based sampling of counters
  • packet-based sampling of ethernet frames

The counter samples provide exactly the same information irix uses for its traffic statistics now, therefore the sFlow implementation at irix makes use of the packet-based samples (called flowsamples) to provide additional analysis of the exchanged traffic.

Packet Based Sampling

Based on a defined sampling rate, one out of N frames from the incoming traffic for each interface gets sampled and sent to a central server which is statistically analyzing the traffic. If we see one packet out of N, we assume that all the N-1 packets we haven't seen are the same type and size.

Note: this type of sampling does not provide 100% accurate results.

Without sampling technology, packet analysis on a network with a throughput like irix would not be possible. For more detailed information about the accuracy of packed based sampling see the documents on the official sFlow website.

Software

The sFlow samples on the server get analysed by software developed at irix. The software package is written in PERL and based on the sFlow decoding module Net::sFlow.

Note: While the sFlow packet format supports sampling of IP and TCP/UDP flows, our software only looks at Layer-2 (Ethernet) fields. We neither process nor store flow information from higher layer protocols.

IPV6 Numbering Scheme

Here is some detailed information about IPv6 Numbering Scheme at irix.

ISP LAN

The IPv6 set-up on the irix ISP peering LAN is as follows:

  • The prefix in use is: 2400:1560:6::/64
  • The prefix is sourced from AS131329.
  • The prefix is tagged with the no-export community and should not be announced outside of your AS.

The suffix (‘allocation’) scheme for 16 bits ASNs is as follows:
A50 x:xxxx:n

The suffix (‘allocation’) scheme for 32 bits ASNs is as follows:
A500 :xx:xxxx:n

Here, “x:xxxx” or “xx:xxxx” is your (zero-padded) AS number in decimals and “n” is a serial number depending on the number of interfaces you are using for IPv6 peerings (starting from 1 for the first interface, 2 for the second interface and so on).


- Examples:

irix uses AS131329 (zero-padded; 0131329 ), so its IPv6 peering address is: 2400:1560:6::A500:13:1329:1

A member with a 16-bit ASN of 64523 would use:
2400:1560:6::A506:4523:1/64
2400:1560:6::A506:4523:2/64
...

A member with a 32-bit ASN of 195000 would use:
2400:1560:6::A500:19:5000:1/64
2400:1560:6::A500:19:5000:2/64
...

IPv6 Peerings

If you want to implement IPv6 peerings on the irix:

Please login to portal.ixaas.net/IR-IX, navigate to your connections, click the “Show” button next to the connection you wish to enable IPv6 for, and under the IPv6 Router section press the “Request new IPv6 Router” button. An IPv6 address will be assigned to you following the IPv6 numbering scheme (see above). Once the IPv6 address is assigned you configure your router interface(s) accordingly.

Invite peers. The irix Members List also lists which members use IPv6.

Controlling ARP Traffic on irix Platform

ARP (Address Resolution Protocol)

ARP (Address Resolution Protocol) is the Layer-2 protocol used by irix member’s router to associate IPv4 address with the MAC address of peers interfaces. Learn more about ARP here.

Problems caused by too much ARP traffic

On Ethernet networks, the Address Resolution Protocol (ARP) is used to find the MAC-address for a given IPv4 address. ARP uses Ethertype 0x0806 together with Ethernet broadcasting. A node will broadcast an ARP Request packet to ask for the MAC address of an unknown IPv4 address. The node using the requested IP address replies (using regular unicast) with an ARP Reply packet, which includes its MAC address. In order to work, it is important that all nodes using IPv4 listen for ARP packets and reply to them if necessary.

The nodes therefore need to process all Ethernet broadcast messages with Ethertype 0x0806. For each ARP packet, they must decide whether or not to reply. Processing ARP packets can take a lot of processing power. Because all ARP packets need to be examined in order for ARP to work, processing ARP packets may take precedence over other activities, depending on the Operating System. As such, when there is a lot of ARP traffic, routers may be unable to do other processing tasks like maintaining BGP sessions.

This problem was noticed on irix when the ISP peering LAN was renumbered to new IPv4 addresses. Members in the new IPv4 range were trying to reach members in the old IPv4range and vice versa. Larger amounts of ARP packets than usual crossed the network, consuming all available processing power on some customer routers, not leaving enough resources to process BGP in a timely manner, resulting in lost BGP sessions. Also, routers trying to re-establish old BGP sessions started sending ARP packets, resulting in an ARP storm that caused even more problems on customer equipment.

ARP Sponge – the irix solution

To help routers survive heavy ARP traffic, irix tries to keep the amount of ARP traffic to a minimum. For this purpose, irix developed a daemon, written in Perl, called ARP Sponge. The ARP Sponge daemon listens on the ISP peering LAN for ARP traffic. When the number of ARP Requests for a certain IP address exceeds a threshold, the ARP Sponge sends out an ARP Reply for that IP address using its own MAC address. From that moment, the IP address is sponged: all traffic to that node is sent to the ARP Sponge. This prevents ARP storms because it keeps the amount of ARP traffic limited.

When the interface of a sponged IP address comes up again, it generally sends out a gratuitous ARP request packet. This is an ARP packet with both source and destination IP address set to the IP address of the node sending the packet. It is used mostly in case the MAC-address changed, so that other nodes can update their ARP caches. When the ARP Sponge receives any traffic from a sponged IP address (including but not limited to gratuitous ARP requests, ARP requests for other nodes, BGP peering initiations, etc.), it ceases sponging the IP address, thus no longer sending out ARP replies for that IP address.

Current irix ARP Sponge MAC address is : 00:25:90:0a:0a:bd

ARP Traffic on irix Platform

ARP Traffic on irix Platform

ARP Traffic on irix Platform

Common Issue with IPv4 addresses after being sponged by ARP Sponge

* Unable to exchange traffic with irix peers when IPv4 address comes up after being inactive for a period of time

If a IPv4 is sponged, it means that in the members ARP tables, the ARP entry for this IP is registered with the ARP sponge MAC address. After the IPv4 is again reachable again and being “un-sponged”, the ARP table of peers might not be updated fast enough with the customer’s MAC address, result in traffic from these peers toward the recovered IP still being forward to the sponged MAC address.

For instance, if the IP 80.249.208.1 of member A with MAC address AAAA.AAAA.AAA is sponged with the sponged MAC address EEEE.EEEE.EEEE, then member B ARP entry for the address will be 80.249.208.1 - EEEE.EEEE.EEE. After the member A recovers, it send traffics toward member B, but member B ARP entry is not yet updated with the original address AAAA.AAAA.AAAA, then traffic will be ended up sending to EEEE.EEEE.EEEE, until member B updates the ARP entry.

This issue should be automatically resolved after a certain period of time, after the daemon stop replying to ARP reply for this IP and let the un-sponged IP and peers update ARP entries themselves.

The issue is more noticeable with members that only have peering sessions with the irix route-servers. If members do not have peering sessions with route-servers, BGP sessions with peers must be brought up one-by-one and ARP entries are sure to be updated through the BGP initialisation process. Subsequently, traffic will be properly forwarded and received from each peers. However, if the newly “un-sponged” member only has peering sessions with route-servers, and after recovery establishes BGP sessions and receives irix peers prefixes from there, there could be a case that traffic is forwarded to the next hop IP of peers that still have the spoofed ARP entries.

Therefore, irix NOC recommend members that have their IPv4 address being unreachable for prolonged period of time (so it is certainly sponged), to temporary shutdown peering with route-servers and send gratuitous ARP request to update peer’s ARP tables.

irix Route Servers →

Acknowledgement

The ARP sponge explanation section is an excerpt from the report of Marco Wessel and Niels Sijm from Universiteit van Amsterdam in 2009, after they did the research about effect of IPv4 and IPv6 address solution on AMS-IX platform and the ARP sponge during their course for Master in System and Network Engineering.