I really have a question regarding STP timers, but I need to give a little background on a particular issue we're experiencing right now. Please refer to the diagram I've attached.
We currently have clustered servers in two geographically distinct locations. We tie these together (L2-wise) via 15454s across our MAN. At our main location, the servers are attached to a SAN that is connected via a fiber channel connection to the other location. Backups to the remote SAN are made in real time. The servers are set up to "heartbeat" every 10-15 secs...basically a ping to the other server. If it misses a ping, it will initiate a script that makes the remote server and SAN become active.
That said, our issues are that STP takes around 35s to recalculate and begin to forward traffic...which means that the 15s pings that the servers send out are missed which causes it to failover. Problem with that is that the script being used is not very graceful, and does not deal with this very well, nor does it have built into it how to respond to the primary coming back online....but that's another story for the server guys to fight.
Having the server/application guys fix their servers aside(by upping the ping timer to a value consistent with STP), what kind of STP changes could bring our convergence times down without bringing the network to the point of instability? I believe all of our STP values are now at default.