
re: probability of failure
28 dec 1999
inder wrote:
>i am searching for resources/information dealing with the probability of
>failure of protection relays and system performance mesures.
one basic measure of a component's reliability is its estimated
"failure rate," eg once per 50,000 hours (about 6 years, fulltime)
or 0.00002/h for an inverter. the inverse is "mean time to fail" (mttf)
(eg 50,000, above) in hours. the "mean time to repair" (mttr) is the
average time it takes to fix the component, eg 4 hours, which may
include the time to take a spare off a shelf and install it, a return
trip to the factory, and so on, depending on circumstances. mean time
between failures (mtbf) is mttf + mttr. it's usually not much more
than the mttf, since the mttr is comparatively small.
manufacturers can often supply estimates of failure rates or mtbfs for
components. failure rates usually increase with temperature and how hard
a part is stressed in use. for instance, capacitors used with much less
than their rated voltage last longer than those used at their maximum limit.
individual component failure rates can be added to estimate the failure
rate of a system that uses multiple components, if the failure of any
single component means the system fails.
otoh, if a few components (eg relays) stand out as very unreliable
(and cheap), you may be able to design a system with a few more of
those components arranged in a (redundant) way that the system can
keep going even if one of those components fails. redundancy can
dramatically improve reliability, or "availability" (the probability
that a system will be working at any particular time), especially if
component failures can be detected before the entire system fails
(a "failure alarm") so they can be fixed before other working redundant
components fail.
for instance, we might have 2 1500 hour light bulbs in a hallway, and say
"the lighting system is functional" if either bulb works. fortunately,
we can tell if one is burned out and quickly replace it, most likely before
the other burns out. (the onebulb condition is "graceful degradation,"
ie the system is still working, but at less than full capacity.)
as another example, suppose we had several inverter choices:
watts cost cost/watt efficiency warranty
porta power 140 $75 $0.54 "over 90%" 90 days
powerstar 200 119 0.60 "over 90%" 1 year
ac genius 150 95 0.63 0.24a stby
prowatt 250 195 0.78 "over 90%"
trace 812 800 550 0.69 0.02a stby
powerstar 400 389 0.97 0.06a stby 2 years
if an inverter fails once a year and takes a week to get fixed, this sort
of (markov) model (in courier font) might predict its availability:
where l = is the failure rate, 1/52 week,
l r = is the repair rate, 1/1 week,
 >  p1 is the probability that the inverter works
 p1   p0  and p0 is the probability that it doesn't.
 < 
r
since it works or it doesn't, but not both, p1 + p0 = 1, and p0 = l/r p1, so
r/l p0 + p0 = 1, or p0 = 1/(1+r/l) = 1/(1+1/1/(1/52)) = 1/53, so we might
expect the inverter to be out of service (unavailable) an average of 1/53
of each week, 165 hours per year, a bit less than one week per year.
add another inverter and this becomes
2l l p2 <> both work
 >  >  p1 <> one works
 p2   p1   p0  p0 <> none work
 <  < 
2r r
again, p2 + p1 + p0 = 1, p1 = 2l/2r p2 and p0 = l/r p1, so
2r/2l p1 + p1 + p0 = 1, or r^2/l^2 p0 + r/l p0 + p0 = 1, or
p0 = 1/(1+r/l + r^2/l^2) = 1/(1+ 52 + 52x52) = 0.00036, so we
might expect both inverters to be out of service an average of
0.00036x8760 = 3 hours per year. (reducing the repair time to
4 hours would decrease the unavailability to 1/2 hour per year.)
add another inverter and this becomes
3l 2l l p3 <> 3 work
 >  >  >  p1 <> 2 work
 p3   p2   p1   p0  p1 <> 1 works
 <  <  <  p0 <> 0 work
3r 2r r
again, p3 + p2 + p1 + p0 = 1, p2 = 3l/3r p3, p1 = 2l/2r p2 and p0 = l/r p1,
so p0 = 1/(1+r/l+ r^2/l^2+r^3/l^3) = 1/143,365, so might expect all three
inverters to be out of service an average of 8760/143,365 = 0.061 hours or
3.7 minutes per year, which is comparable to grid power, but not as good
as a typical telephone system with an unavailability goal and budget of
less than 30 seconds per year.
nick

