AllExperts > Encyclopedia 
Search      
Find out about volunteering to AllExperts

Safety engineering: Encyclopedia BETA


Free Encyclopedia
 Index · Browse A-Z  · Questions and Answers ·
Encyclopedia

Browse A-Z
ABCDEFGHIJKLMNOPQRSTUVWXYZNum


License
Disclaimer

 
 
 
 
Free Online Courses
12 Weeks to Weight Loss
Take Charge of Stress
Learn How to Bake
Budgeting 101
Deeper Faith
DIY Fashion Makeover

       MORE E-COURSES
 
   

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z  Misc

Safety engineering

Safety engineering is an applied science strongly related to systems engineering. Safety engineering assures that a life-critical system behaves as needed even when pieces fail.

In the real world the term "safety engineering" refers to any act of accident prevention by a person qualified in the field. Safety engineering is often reactionary to adverse events, also described as "incidents", as reflected in acident statistics. This arises largely because of the complexity and difficulty of collecting and analysing data on "near misses".

Increasingly, the importance of a safety review is being recognised as an important risk managament tool. Failure to identify risks to safety, and the according inability to address or "control" these risks, can result in massive costs, both human and ecomic. The multidisciplinary nature of safety engineering means that a very broad array of professionals are actively involved in accident prevention or safety engineering.

The majority of those practicing safety engineering are employed in industry to keep workers safe on a day to day basis. See the American Society of Safety Engineers publication Scope and Function of the Safety Profession.

Safety engineers distinguish different extents of defective operation: A "fault" is said to occur when some piece of equipment does not operate as designed. A "failure" only occurs if a human being (other than a repair person) has to cope with the situation. A "critical" failure endangers one or a few people. A "catastrophic" failure endangers, harms or kills a significant number of people.

Safety engineers also identify different modes of safe operation: A "probabilistically safe" system has no single point of failure, and enough redundant sensors, computers and effectors so that it is very unlikely to cause harm (usually "very unlikely" means, on average, less than one human life lost in a billion hours of operation). An inherently safe system is a clever mechanical arrangement that cannot be made to cause harm – obviously the best arrangement, but this is not always possible. A fail-safe system is one that cannot cause harm when it fails. A "fault-tolerant" system can continue to operate with faults, though its operation may be degraded in some fashion.

These terms combine to describe the safety needed by systems: For example, most biomedical equipment is only "critical", and often another identical piece of equipment is nearby, so it can be merely "probabilistically fail-safe". Train signals can cause "catastrophic" accidents (imagine chemical releases from tank-cars) and are usually "inherently safe". Aircraft "failures" are "catastrophic" (at least for their passengers and crew) so aircraft are usually "probabilistically fault-tolerant". Without any safety features, nuclear reactors might have "catastrophic failures", so real nuclear reactors are required to be at least "probabilistically fail-safe", and some such as pebble bed reactors are "inherently fault-tolerant".

The process

Ideally, safety-engineers take an early design of a system, analyze it to find what faults can occur, and then propose changes to make the system safer. In an early design stage, often a fail-safe system can be made acceptably safe with a few sensors and some software to read them. Probabilitically fault-tolerant systems can often be made by using more, but smaller and less-expensive pieces of equipment.

Historically, many organizations viewed "safety engineering" as a process to produce documentation to gain regulatory approval, rather than a real asset to the engineering process. These same organizations have often made their views into a self-fulfilling prophecy by assigning less-able personnel to safety engineering.

Far too often, rather than actually helping with the design, safety engineers are assigned to prove that an existing, completed design is safe. If a competent safety engineer then discovers significant safety problems late in the design process, correcting them can be very expensive. This project management error has wasted large sums of money in the development of commercial nuclear reactors.

Additionally, failure mitigation can go beyond design recommendations, particularly in the area of maintenance. There is an entire realm of safety and reliability engineering known as "Reliability Centered Maintenance" (RCM), which is a discipline that is a direct result of analyzing potential failures within a system, and determining maintenance actions that can mitigate the risk of failure. This methodology is used extensively on aircraft, and involves understanding the failure modes of the serviceable replaceable assemblies, in addition to the means to detect or predict an impending failure. Every automobile owner is familiar with this concept when they take in their car to have the oil changed or brakes checked. Even filling up one's car with gas is a simple example of a failure mode (failure due to fuel starvation), a means of detection (fuel gauge), and a maintenance action (fill 'er up!).

For large scale complex systems, hundreds if not thousands of maintenance actions can result from the failure analysis. These maintenance actions are based on conditions (eg, gauge reading or leaky valve), hard conditions (eg, a component is known to fail after 100 hrs of operation with 95% certainty), or require inspection to determine the maintenance action (eg, metal fatigue). The Reliability Centered Maintenance concept then analyzes each individual maintenance item for its risk contribution to safey, mission, operational readiness, or cost to repair if a failure does occur. Then the sum total of all the maintenance actions are bundled into maintenance intervals so that maintenance is not occurring around the clock, but rather, at regular intervals. This bundling process introduces further complexity, as it might stretch some maintenance cycles, thereby increasing risk, but reduce others, thereby potentially reducing risk, with the end result being a comprehensive maintenance schedule, purpose built to reduce operational risk and ensure acceptable levels of operational readiness and availability.

Analysis techniques

The two most common fault modeling techniques are called "failure modes and effects analysis" and "fault tree analysis". These techniques are just ways of finding problems and of making plans to cope with failures, as in Probabilistic Risk Assessment (PRA or PSA). One of the earliest complete studies using PRA techniques on a commercial nuclear plant was the Reactor Safety Study (RSS), edited by Prof. Norman Rasmussen (see WASH-1400)

Failure modes and effects analysis

In the technique known as "failure mode and effects analysis" (FMEA), an engineer starts with a block diagram of a system. The safety engineer then considers what happens if each block of the diagram fails. The engineer then draws up a table in which failures are paired with their effects and an evaluation of the effects. The design of the system is then corrected, and the table adjusted until the system is not known to have unacceptable problems. Of course, the engineers may make mistakes. It is very helpful to have several engineers review the failure modes and effects analysis.

Fault tree analysis

In the technique known as "fault tree analysis", an undesired effect is taken as the root ('top event') of a tree of logic. Then, each situation that could cause that effect is added to the tree as a series of logic expressions. When fault trees are labelled with actual numbers about failure probabilities, which are often in practice unavailable because of the expense of testing, computer programs can calculate failure probabilities from fault trees.

A fault tree diagram

The Tree is usually written out using conventional logic-gate symbols.The route through a Tree between an event and an initiator in the tree is called a Cutset. The shortest credible way through the tree from Fault to initiating Event is called a Minimal Cutset.

Some industries use both Fault Trees and Event Trees (see Probabilistic Risk Assessment). An Event Tree starts from an undesired initiator (loss of critical supply, component failure etc) and follows possible further system events through to a series of final consequences. As each new event is considered, a new node on the tree is added with a split of probabilities of taking either branch. The probabilities of a range of 'top events' arising from the initial event can then be seen.

Classic programs include the EPRI (Electric Power Research Institute)'s CAFTA Software which is used by almost all the Nuclear Power Plants in the US and by a majority of US and international aerospace manufacturers and the Idaho National Engineering and Environmental Laboratory's SAPHIRE, which is used by the U.S. government to evaluate the safety and reliability of nuclear reactors, the space shuttle, and the International Space Station.

Unified Modeling Language (UML) activity diagrams have been used as graphical components in a fault tree analysis.

Safety certification

Usually a failure in safety-certified systems is acceptable if, on average, less than one life per 30 years of operation (109 seconds) is lost to failure. Most Western nuclear reactors, medical equipment, and commercial aircraft are certified to this level. The cost versus loss of lives has been considered appropriate at this level (by FAA for aircraft under Federal Aviation Regulations).

Preventing failure

Probabilistic fault tolerance: adding redundancy to equipment and systems

A NASA graph shows the relationship between the survival of a crew of astronauts and the amount of redundant equipment in their spacecraft (the "MM", Mission Module).

Once a failure mode is identified, it can usually be prevented entirely by adding extra equipment to the system. For example, nuclear reactors emit dangerous radiation and contain nasty poisons, and nuclear reactions can cause so much heat that no substance might contain them. Therefore reactors have emergency core cooling systems to keep the temperature down, shielding to contain the radiation, and engineered barriers (usually several, nested, surmounted by a containment building) to prevent accidental leakage.

Most biological organisms have a certain amount of redundancy: multiple organs, multiple limbs, etc.

For any given failure, a fail-over, or redundancy can almost always be designed and incorporated into a system.

Inherent fail-safe design

When adding equipment is impractical (usually because of expense), then the least expensive form of design is often "inherently fail-safe". The typical approach is to arrange the system so that ordinary single failures cause the mechanism to shut down in a safe way. (For nuclear power plants, this is termed a passively safe design, although more than ordinary failures are covered.)

One of the most common fail-safe systems is the overflow tube in baths and kitchen sinks. If the valve sticks open, rather than causing an overflow and damage, the tank spills into an overflow.

Another common example is that in an elevator the cable supporting the car keeps spring-loaded brakes open. If the cable breaks, the brakes grab rails, and the car does not fall.

Inherent fail-safes are common in medical equipment, traffic and railway signals, communications equipment, and safety equipment.

See also

Articles

*Software Engineering for Safety: A Roadmap at portal.acm.org
*Specification and Evaluation of Safety Properties in a Component-based Software Engineering Process at springer.com

Related concepts

* Safety engineer
* Nuclear safety
* Life-critical (also safety-critical)
* Reliability engineering
* Reliability theory
* Reliability theory of aging and longevity
* Air brake (rail)
* Biomedical engineering
* SAPHIRE (risk analysis software)
* Some of the techniques of safety engineering have been applied to the field of security engineering.
* Redundancy (engineering)
* Double switching
* Workplace safety
* DO-178B
* DO-254
* ARP4761
* Hazard analysis

External links

* Hardware Fault Tolerance – A discussion about redundancy schemes.
* Safety-critical systems
* NUREG-0492 Fault Tree Handbook
* American Society of Safety Engineers (official website)
* Board of Certified Safety Professionals (official website)



Email this page
About Us | Advertise on This Site | User Agreement | Privacy Policy | Kids' Privacy Policy | Help
About and About.com are registered trademarks of About, Inc. The About logo is a trademark of About, Inc. All rights reserved.
This is the "GNU Free Documentation License" reference article from the English Wikipedia. All text is available under the terms of the GNU Free Documentation License. See also our Disclaimer.