# Availability

In reliability theory and reliability engineering, the term availability has the following meanings:

• The degree to which a system, subsystem or equipment is in a specified operable and committable state at the start of a mission, when the mission is called for at an unknown, i.e. a random, time. Simply put, availability is the proportion of time a system is in a functioning condition. This is often described as a mission capable rate. Mathematically, this is expressed as 100% minus unavailability.
• The ratio of (a) the total time a functional unit is capable of being used during a given interval to (b) the length of the interval.

For example, a unit that is capable of being used 100 hours per week (168 hours) would have an availability of 100/168. However, typical availability values are specified in decimal (such as 0.9998). In high availability applications, a metric known as nines, corresponding to the number of nines following the decimal point, is used. With this convention, "five nines" equals 0.99999 (or 99.999%) availability.

## Introduction

Availability of a system is typically measured as a factor of its reliability – as reliability increases, so does availability.

Availability of a system may also be increased by the strategy of focusing on increasing testability, diagnostics and maintainability and not on reliability. Improving maintainability during the early design phase is generally easier than reliability (and Testability & diagnostics). Maintainability estimates (item Repair [by replacement] rates) are also generally more accurate. However, because the uncertainties in the reliability estimates (and also in diagnostic times) are in most cases very large, it is likely to dominate the availability (and the prediction uncertainty) problem, even while maintainability levels are very high. Furthermore, when reliability is not under control, then many and different sorts of issues may arise, for example:

• The need for complex testability (built in test sensors, hardware and software) requirements,
• The need for detailed diagnostic procedures,
• Manpower (maintainers / customer service capability) availability,
• Spare part availability,
• Dead on arrival issues (non-quality impact on system availability),
• Logistic delays of spares or manpower due to any reason,
• Lack of repair facilities and tools – also lack software development (e.g. software caused many delays in the DoD F22 Raptor program),
• Lack of repair knowledge and expert-personnel
• Extensive retro-fit and complex configuration management costs and others.

The problem of unreliability may also become out of control due to the "domino effect" of maintenance induced failures after repairs and more and more increasing efforts of problem solving, re-engineering en service efforts. Only focusing on maintainability is therefore not enough!

• If failures are prevented, none of the others are of any importance and therefore reliability is generally regarded as the most important part of availability!

Reliability needs to be evaluated and improved related to both availability and the cost of ownership (due to cost of spare parts, maintenance man-hours, transport costs, storage cost, part obsolete risks etc.). Often a trade-off is needed between the two. There might be a maximum ratio between availability and cost of ownership. Testability of a system should also be addressed in the availability plan as this is the link between reliability and maintainability. The maintenance strategy can influence the reliability of a system (e.g. by preventive and/or predictive maintenance), although it can never bring it above the inherent reliability. So, Maintainability and Maintenance strategies influences the availability of a system. In theory this can be almost unlimited if one would be able to always repair any fault in an infinitely short time. This is in practice impossible. Repair-ability is always limited due to testability, manpower and logistic considerations. Reliability is not limited (Reliable items can be made that outlast the life of a machine with almost 100% certainty). For high levels of system availability (e.g. the availability of engine trust in an aircraft), the use of redundancy may be the only option. Refer to reliability engineering.

An availability plan should clearly provide a strategy for availability control. Whether only Availability or also Cost of Ownership is more important depends on the use of the system. For example, a system that is a critical link in a production system – e.g. a big oil platform – is normally allowed to have a very high cost of ownership if this translates to even a minor increase in availability, as the unavailability of the platform results in a massive loss of revenue which can easily exceed the high cost of ownership. A proper reliability plan should always address RAMT analysis in its total context. RAMT stands in this case for Reliability, Availability, Maintainability/Maintenance and Testability in context to the customer needs.

## Representation

The most simple representation for availability is as a ratio of the expected value of the uptime of a system to the aggregate of the expected values of up and down time, or

${\displaystyle A={\frac {E[\mathrm {uptime} ]}{E[\mathrm {uptime} ]+E[\mathrm {downtime} ]}}}$

If we define the status function ${\displaystyle X(t)}$ as

${\displaystyle X(t)={\begin{cases}1,&{\text{sys functions at time }}t\\0,&{\text{otherwise}}\end{cases}}}$

therefore, the availability A(t) at time t > 0 is represented by

${\displaystyle A(t)=\Pr[X(t)=1]=E[X(t)].\,}$

Average availability must be defined on an interval of the real line. If we consider an arbitrary constant ${\displaystyle c>0}$, then average availability is represented as

${\displaystyle A_{c}={\frac {1}{c}}\int _{0}^{c}A(t)\,dt.}$

Limiting (or steady-state) availability is represented by[1]

${\displaystyle A=\lim _{c\rightarrow \infty }A_{c}.}$

Limiting average availability is also defined on an interval ${\displaystyle [0,c]}$ as,

${\displaystyle A_{\infty }=\lim _{c\rightarrow \infty }A_{c}=\lim _{c\rightarrow \infty }{\frac {1}{c}}\int _{0}^{c}A(t)\,dt,\quad c>0.}$

Availability is the probability that an item will be in an operable and commitable state at the start of a mission when the mission is called for at a random time, and is generally defined as uptime divided by total time (uptime plus downtime).

### Methods and techniques to model availability

Fault tree analysis and related software are developed to calculate (analytic or by simulation) availability of a system or a functional failure condition within a system including many factors like:

• Reliability models
• Maintainability models
• Maintenance concepts
• Redundancy
• Common cause failure
• Diagnostics
• Level of repair
• Repair status (as good as new, as good as old)
• Dormancy
• Test coverage
• Active operational times / missions / sub system states
• Logistical aspects like; spare part (stocking) levels at different depots, transport times, repair times at different repair lines, manpower availability and more.
• Uncertainty in parameters

Furthermore, these methods are capable to identify the most critical items and failure modes or events that impact availability.

### Definitions within systems engineering

Availability, inherent (Ai) [2] The probability that an item will operate satisfactorily at a given point in time when used under stated conditions in an ideal support environment. It excludes logistics time, waiting or administrative downtime, and preventive maintenance downtime. It includes corrective maintenance downtime. Inherent availability is generally derived from analysis of an engineering design and is calculated as the mean time to failure (MTTF) divided by the mean time to failure plus the mean time to repair (MTTR). It is based on quantities under control of the designer.

Availability, achieved (Aa) [3] The probability that an item will operate satisfactorily at a given point in time when used under stated conditions in an ideal support environment (i.e., that personnel, tools, spares, etc. are instantaneously available). It excludes logistics time and waiting or administrative downtime. It includes active preventive and corrective maintenance downtime.

Availability, operational (Ao) [4] The probability that an item will operate satisfactorily at a given point in time when used in an actual or realistic operating and support environment. It includes logistics time, ready time, and waiting or administrative downtime, and both preventive and corrective maintenance downtime. This value is equal to the mean time between failure (MTBF) divided by the mean time between failure plus the mean downtime (MDT). This measure extends the definition of availability to elements controlled by the logisticians and mission planners such as quantity and proximity of spares, tools and manpower to the hardware item.

Refer to Systems engineering for more details

### Basic example

If we are using equipment which has a mean time to failure (MTTF) of 81.5 years and mean time to repair (MTTR) of 1 hour:

MTTF in hours = 81.5 × 365 × 24 = 713940 (This is a reliability parameter and often has a high level of uncertainty!)
Inherent availability (Ai) = 713940 / (713940+1) = 713940 / 713941 = 99.999860%
Inherent unavailability = 1 / 713940 = 0.000140%

Outage due to equipment in hours per year = 1/rate = 1/MTTF = 0.01235 hours per year.

## Literature

Availability is well established in the literature of stochastic modeling and optimal maintenance. Barlow and Proschan [1975] define availability of a repairable system as "the probability that the system is operating at a specified time t." Blanchard [1998] gives a qualitative definition of availability as "a measure of the degree of a system which is in the operable and committable state at the start of mission when the mission is called for at an unknown random point in time." This definition comes from the MIL-STD-721. Lie, Hwang, and Tillman [1977] developed a complete survey along with a systematic classification of availability.

Availability measures are classified by either the time interval of interest or the mechanisms for the system downtime. If the time interval of interest is the primary concern, we consider instantaneous, limiting, average, and limiting average availability. The aforementioned definitions are developed in Barlow and Proschan [1975], Lie, Hwang, and Tillman [1977], and Nachlas [1998]. The second primary classification for availability is contingent on the various mechanisms for downtime such as the inherent availability, achieved availability, and operational availability. (Blanchard [1998], Lie, Hwang, and Tillman [1977]). Mi [1998] gives some comparison results of availability considering inherent availability.

Availability considered in maintenance modeling can be found in Barlow and Proschan [1975] for replacement models, Fawzi and Hawkes [1991] for an R-out-of-N system with spares and repairs, Fawzi and Hawkes [1990] for a series system with replacement and repair, Iyer [1992] for imperfect repair models, Murdock [1995] for age replacement preventive maintenance models, Nachlas [1998, 1989] for preventive maintenance models, and Wang and Pham [1996] for imperfect maintenance models. A very comprehensive recent book is by Trivedi and Bobbio [2017].

## Applications

Availability is used extensively in power plant engineering. For example, the North American Electric Reliability Corporation implemented the Generating Availability Data System in 1982.[5]