Thursday, March 31, 2011

Hot Hot Hot Nexus FEXes

Cisco's Nexus 2xxx products are marketed for top-of-rack deployment at the back (hot side) of a server cabinet.

They blow air in the same direction as servers (toward the back).  They're only 1/2 to 2/3 as deep as a typical server.  This combination of features (flow direction and depth) means that the FEX intakes are near the middle of the cabinet.

Airflow management in a typical server cabinet includes blanking panels covering unused server positions and baffles between the vertical mounting rails and the sides of the cabinet.  As a result, the entire cabinet runs hot, except for the couple of inches between the front door and the server air intake.


Fabric extenders inevitably run hot.  I wish Cisco had made the FEXes full-depth so that they'd always breathe in cool air.


The statistics below are from an environment that's not quite as severe as my drawing indicates.  In fact, none of the baffles or blanking panels are in place.  These FEXes aren't getting cool air because hot air is actually exhausting on the cold side of the cabinet near the FEX intake.  Hot server exhaust is circling from the rear of the cabinet, up the sides and out through the area of the FEX intake.  The FEXes don't move enough air to overcome the high pressure exhaust from the servers.


-----------------------------------------------------------------

Module   Sensor     MajorThresh   MinorThres   CurTemp     Status
                    (Celsius)     (Celsius)    (Celsius)
-----------------------------------------------------------------
1        Outlet-1   60            50           48          ok
1        Inlet-1    50            40           42          minor alarm
1        Outlet-1   60            50           46          ok
1        Inlet-1    50            40           41          minor alarm
1        Outlet-1   60            50           49          ok
1        Inlet-1    50            40           43          minor alarm
1        Outlet-1   60            50           48          ok
1        Inlet-1    50            40           42          minor alarm
1        Outlet-1   60            50           41          ok
1        Inlet-1    50            40           33          ok
1        Outlet-1   60            50           39          ok
1        Inlet-1    50            40           33          ok
1        Outlet-1   60            50           47          ok
1        Inlet-1    50            40           42          minor alarm
1        Outlet-1   60            50           44          ok
1        Inlet-1    50            40           38          ok
1        Outlet-1   60            50           50          ok
1        Inlet-1    50            40           44          minor alarm
1        Outlet-1   60            50           48          ok
1        Inlet-1    50            40           42          minor alarm
1        Outlet-1   60            50           50          ok
1        Inlet-1    50            40           43          minor alarm
1        Outlet-1   60            50           49          ok
1        Inlet-1    50            40           42          minor alarm
1        Outlet-1   60            50           49          ok
1        Inlet-1    50            40           42          minor alarm
1        Outlet-1   60            50           47          ok
1        Inlet-1    50            40           40          minor alarm
1        Outlet-1   60            50           41          ok
1        Inlet-1    50            40           34          ok
1        Outlet-1   60            50           40          ok
1        Inlet-1    50            40           34          ok
1        Outlet-1   60            50           48          ok
1        Inlet-1    50            40           42          minor alarm
1        Outlet-1   60            50           47          ok
1        Inlet-1    50            40           41          minor alarm
1        Outlet-1   60            50           47          ok
1        Inlet-1    50            40           41          minor alarm
1        Outlet-1   60            50           45          ok
1        Inlet-1    50            40           39          ok

Omitting the blanking panels obviously doesn't help, nor does mounting full-depth equipment directly under the FEXes.  The only solution that makes sense to me is to truly isolate the FEX intake from the server exhaust.  But how can this be accomplished?

A rack-mountable air duct at the front of the rack (in place of the top blanking panel) would be great, but I don't know if such a product exists.  [It exists - see update 2 below]  Until then, it might come down to duct-taping paper towel tubes to the FEX intakes  :-)

Moving the blanking panels and side baffles to the rear of the cabinet might do the trick, but...
  • It only works when dealing with equipment of a uniform depth.  Alternating short/long servers in a heterogeneous server environment make the rear blanking panel strategy fall apart.
  • I've never seen a rear-blanked server cabinet, nor have any of the facilities guys I've talked to, and they're a conservative bunch, not interested in going out on a limb.

Have you seen this problem in your data center?

What are you doing about it?

Update: TAC has pointed me to some documentation that addresses my airflow concerns.  The answer seems completely outrageous:
Apparently all of my FEXes are mounted wrong.  The documentation shows the fixed rack ear (1) mounted to the intake end of the FEX.  This would mount the FEX flush with the cool side of the cabinet, making all of its switchports completely inaccessible in a full cabinet.

The relevant section of the Cisco on Cisco tour of the Richardson data center doesn't play anymore, but I've seen those videos, and I'm pretty sure that Cisco's own FEXes are mounted like the ones in my drawing at the top of this post, contrary to the documentation and the drawing above.

Wondering how I missed this detail at install time, I cracked open the carton of a factory-fresh 2+ year old 2148T.  The box contained the FEX, mounting hardware, and a safety sheet.  No install guide.  No rackmount drawings.  No documentation DVD.

If you've ever seen a FEX mounted flush with the cold side of the cabient, please leave a comment about it.  How do you reach the uplink and downlink ports?


Update 2:  The best solution so far is the Panduit CDE2 air duct.  It's made to be compatible with the 4948E and Nexus 2xxx, and is even referenced by Cisco's 4948E documentation.  This product appears to be a sheet metal duct which extends the air intake to the cold side of the cabinet.  It doubles the FEX space requirements to 2U.  It's frustrating that this accessory is required to operate a FEX in the typical deployment scenario, but I'm glad it's available!



10 comments:

  1. Interesting post. I'm surprised that having a full-depth device directly below the FEXes doesn't help (assuming the blanking directly in front of the FEXes is also removed)...are their fans just not powerful enough to pull cool air from the front?

    ReplyDelete
  2. Hi Jason,

    FEXes (2148s in this case) just don't move much air.

    The servers move SO much air that they suck in outside air from the very nearby front door, and then blow it out into the large cavity at the rear of the rack. And the cavity on the sides of the rack. And up the sides of the rack. And out the front (cold side!) door.

    Here's a video clip indicating that no cool air gets to the FEX intake because server exhaust is dominating everything. Shot from the cold side of the cabinet:
    http://wrt.marget.com/airflow.3gp

    Note the amber LEDs on the FEXes.

    ReplyDelete
  3. > Until then, it might come down to duct-taping paper towel tubes to the FEX intakes

    Please, please, please post photos if it comes to this. *Please.*

    ReplyDelete
  4. Actually it depends what model you get. You can get front to back or back to front air-flow. The FEX cabling is supposed to be on the other side where it reaches the server nics a lot closer.

    http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps10110/data_sheet_c78-507093.html

    ReplyDelete
  5. Hi Josh,

    What you're describing is exactly what I've got.

    FEX cabling and exhaust is on the hot side of the cabinet, flush with the back (hot side, NIC side) of the servers.

    The FEXes in my example happen to be 2148s which only support front-to-back (exhaust on switchport side) airflow as far as I know.

    I'm not sure how I gave the impression that they're installed backwards...

    ReplyDelete
  6. You should take a look at something like this:

    http://www.opengatedata.com/systems/sitex-ec/network-switch-cooling/

    Jeremy

    ReplyDelete
  7. Hi Jeremy,

    That's a really cool product, thanks for the pointer.

    It's exactly what's required for a traditional Cisco fixed configuration switch, but doesn't seem ideal for the FEXes.

    As far as I can tell, the back side (opposite the air intake grille) of that unit is closed, so that it can direct all fresh air into the *sides* of a switch.

    FEXes don't need an additional fan, side ducts, "wipers", etc... They just need a rack-front to FEX-front "chassis extension" duct because the air is already flowing in the correct direction.

    Thanks!

    ReplyDelete
  8. Chris,
    In the year plus since you posted this have you seen any performance issues related to FEX overheating? Have you deployed any of the available solutions? If so was there a big impact? And can you install these products on an active FEX without taking it down ?

    ReplyDelete
  9. DE, nice to hear from you!

    I've seen some FEXes get RMA'ed due to overheating. The FEXes in this case were running hot to begin with, and were pushed over the edge when an CRAC failure raised the temperature of the cold aisle by a few degrees.

    TAC previously told me that a critical temperature alarm would lead to shutdown... The way the customer tells this story, the FEXes go down and never come back! Maybe there's a thermal fuse in there?

    I've still never seen the Cisco product, but have seen many Panduit deployed.

    Either one should work fine, reducing the inlet temperature from 110F or more, to the ambient temperature of the room (70F-ish).

    Overall, I like the Panduit better because it affords easy servicing of cold-side FEX components (power cable, power supply, fan), but it's costly in terms of purchase price and rack units.

    Also, the Panduit is just about impossible to retrofit, because it requires you to remove the OEM mounting rails from the FEX. Panduit ducts will probably never work in the environment with which you and I are both familiar.

    The Cisco sleeve looks like it will retrofit just fine, but has two tricky bits:
    1) The cold side of the FEX now requires three cage nuts, because the middle one will secure the duct. Snapping that middle cage nut into place might be tricky.
    2) You won't be able to reach the power cords, so they'll need to snake through the duct before it's installed. Temporary use of some long (2m or more) extension cords will probably help. The dance goes something like this:

    - unplug PS1 from the PDU1
    - add extension cord, snake it through duct, reconnect to PDU1
    - unplug PS2 from PDU2
    - add extension cord, snake it through duct, reconnect to PDU2
    - install duct to FEX
    - remove extension cord 1, power PS1 directly from PDU1
    - remove extension cord 2, power PS2 directly from PDU2

    ReplyDelete
    Replies
    1. Chris, thanks for the quick and detailed response!

      Delete