System Design According to ISO/EN 13849 and SISTEMA

A full and detailed study of EN ISO 13849-1 is required before it can be correctly applied. The following is a brief overview:

This standard provides requirements for the design and integration of safety-related parts of control systems, including some software aspects. The standard applies to a safety-related system but can also be applied to the component parts of the system.


SISTEMA Software PL Calculation Tool

SISTEMA is a software tool for the implementation of EN ISO 13849-1. Its use will greatly simplify the implementation of the standard.

SISTEMA stands for "Safety Integrity Software Tool for the Evaluation of Machine Applications" It was developed by the BGIA in Germany and is free for use. It requires the input of various types of functional safety data as described later in this section.

The Data can be input manually or automatically by using a Manufacturer’s SISTEMA Data Library.

The Rockwell Automation SISTEMA Data Library is available for download, together with a link to the SISTEMA download site, at: http://discover.rockwellautomation.com/EN_Safety_Solutions.aspx.


Overview of EN ISO 13849-1

This standard has wide applicability, as it applies to all technologies, including electrical, hydraulic, pneumatic and mechanical. Although ISO 13849-1 is applicable to complex systems, it also refers the reader to IEC 62061 and IEC 61508 for complex software embedded systems.

Let's have look at what are the basic differences between the old EN 954-1 and the new EN ISO 13849-1. The outputs of the old standard were Categories [B, 1, 2, 3 or 4]. The outputs of the new standard are Performance Levels [PL a, b, c, d or e]. The Category concept is retained but there are additional requirements to be satisfied before a PL can be claimed for a system.


The requirements can be listed in basic form as follows:


Later we will take a closer look at these factors but before we do it will be useful to consider the basic intent and principle of the whole standard. It is clear at this stage that there are new things to learn but the detail will make more sense once we have understood what it is trying to achieve and why.

First of all why do we need the new standard? It is obvious that the technology used in machine safety systems has progressed and changed considerably over the last ten years. Until relatively recently safety systems have depended on "simple" equipment with very foreseeable and predictable failure modes. More recently we have seen an increasing use of more complex electronic and programmable devices in safety systems. This has given us advantages in terms of cost, flexibility and compatibility but it has also meant that the pre-existing standards are no longer adequate. In order to know whether a safety system is good enough we need to know more about it. This is why the new standard asks for more information. As safety systems start to use a more "black box" approach we start to rely more heavily on their conformity to standards. Therefore those standards need to be capable of properly interrogating the technology. In order to fulfill this they must speak to the basic factors of reliability, fault detection, architectural and systematic integrity. This is the intent of EN ISO 13849-1.

In order to plot a logical course through the standard, two fundamentally different user types must be considered: the designer of safety-related subsystems and the designers of safety-related systems. In general the subsystem designer [typically a safety component manufacturer] will be subjected to a higher level of complexity. They will need to provide the required data in order that the system designer can ensure that the subsystem is of adequate integrity for the system. This will usually require some testing, analysis and calculation. The results will be expressed in the form of the data required by the standard.

The system designer [typically a machine designer or integrator] will use the subsystem data to perform some relatively straightforward calculations to determine the overall Performance Level [PL] of the system.

PLr is used to denote what performance level is required by the safety function. In order to determine the PLr the standard provides a risk graph into which the application factors of severity of injury, frequency of exposure and possibility of avoidance are input.


Click to enlarge - PL diagram
 
Figure 119: Risk Graph from Annex A of EN ISO 13849-1

The output is the PLr. Users of the old EN 954-1 will be familiar with this approach but take note that the S1 line now subdivides whereas the old risk graph did not. Note that this means a possible reconsideration of the integrity of safety measures required at lower risk levels.

Click to enlarge - Categories diagram
 
Figure 120: Risk Graph from Annex B of EN 945-1

There is one very important part yet to be covered however. We now know from the standard how good the system needs to be and also how to determine how good it is but we don't know what it needs to do. We need to decide what the safety function is. Clearly the safety function must be appropriate to the task so how do we ensure this? How does the standard help us?

It is important to realize that the functionality required can only be determined by considering the characteristics prevailing at the actual application. This can be regarded as the safety concept design stage. It cannot be completely covered by the standard because the standard does not know about all the characteristics of a specific application. This also often applies to the machine builder who produces the machine but does not necessarily know the exact conditions under which it will be used.

The standard does provide some help by listing out many of the commonly used safety functions (e.g. safety-related stop function initiated by safeguard, muting function, start/restart function) and giving some normally associated requirements. Other standards such as EN ISO 12100: Basic design principles and EN ISO 14121: Risk assessment, are highly recommended for use at this stage. Also there is a large range of machine specific standards that will provide solutions for specific machines. Within the European EN standards they are termed C type standards, some of them have exact equivalents in ISO standards.

So we can now see that the safety concept design stage is dependant on the type of machine and also on the characteristics of the application and environment in which it is used. The machine builder must anticipate these factors in order to be able to design the safety concept. The intended [i.e. anticipated] conditions of use should be given in the user manual. The user of the machine needs to check that they match the actual usage conditions.


So now we have a description of the safety functionality. From annex A of the standard we also have the required performance level [PLr] for the safety-related parts of the control system [SRP/CS] that will be used to implement this functionality. We now need to design the system and make sure that it complies with the PLr.

One of the significant factors in the decision on which standard to use [EN ISO 13849-1 or EN/IEC 62061] is the complexity of the safety function. In most cases, for machinery, the safety function will be relatively simple and EN ISO 13849-1 will be the most suitable route. Reliability data, diagnostic coverage [DC], the system architecture [Category], common cause failure and, where relevant, requirements for software are used to assess the PL.


This is a simplified description meant only to give an overview. It is important to understand that all the provisions given in the body of the standard must be applied. However, help is at hand. The SISTEMA software tool is available to help with the documentation and calculation aspects. It also produces a technical file.

At time of going to print of this publication SISTEMA is available in German and English. Other languages will be released in the near future. BGIA, the developer of SISTEMA, is a well-respected research and testing institution based in Germany. It is particularly involved in solving scientific and technical problems relating to safety in the context of statutory accident insurance and prevention in Germany. It works in cooperation with occupational health and safety agencies from over 20 countries. Experts from the BGIA, along with their BG colleagues had significant participation in the drafting of both EN ISO 13849-1 and IEC/EN 62061.

The “library” of Rockwell Automation safety component data for SISTEMA is available at: http://discover.rockwellautomation.com/EN_Safety_Solutions.aspx.

Whichever way the calculation of the PL is done it is important to start of from the right foundation. We need to view our system in the same way as the standard so let's start with that.


System Structure

Any system can be split into basic system components or "subsystems." Each subsystem has its own discrete function. Most systems can be split into three basic functions; input, logic solving and actuation [some simple systems may not have logic solving]. The component groups that implement these functions are the subsystems.

Click to enlarge - Input Logic Output diagram
 
Figure 121

A simple single channel electrical system example is given in Figure 122. It comprises only input and output subsystems.

Click to enlarge - InterlockSwitch_SafetyContactor 4c
 
Figure 122: Interlock Switch and Contactor

In Figure 123 the system is a little more complex because some logic is also required. The safety controller itself will be fault tolerant (e.g. dual channel) internally but the overall system is still limited to single channel status because of the single limit switch and single contactor.

Click to enlarge - InterlockSwitch_SafetyContrlContct 4c
 
Figure 123: Interlock Switch, Safety Controller and Safety Contactor

If we take the basic architecture of Figure 123, there are also some other things to consider. First how many "channels" does the system have? A single channel system will fail if one of its subsystems fails. A two channel [also called redundant] system would need to have two failures, one in each channel before the system fails. Because it has two channels it can tolerate a single fault and still keep working. Figure 124 shows a two channel system.

Click to enlarge - DualChannelSys 4c
 
Figure 124: Dual Channel with Interlock Switch, Safety Controller and Safety Contactors

Clearly the system shown in Figure 124 is less likely to fail than the one shown in Figure 123 but we can make it even more reliable [in terms of its safety function] if we include diagnostic measures for fault detection. Of course, having detected the fault we also need to react to it and put the system into a safe state. Figure 125 shows the inclusion of diagnostic measures achieved through monitoring techniques.

Click to enlarge - DualChannelSys_Diagnostics 4c
 
Figure 125: Dual Channel System with Interlock Switch, Safety Controller and Safety Contactors—Diagnostics Shown by Dashed Arrows

It is usually [but not always] the case that the system comprises two channels in all its subsystems as shown in Figure 125. Therefore we can see that, in this case each subsystem has two "sub channels." The standard describes these as "blocks." A two channel subsystem will have two blocks and a single channel subsystem will have one block. It is possible that some systems will comprise a combination of dual channel and single channel blocks.

If we want to investigate the system in more depth we need to look at the components parts of the blocks. The SISTEMA tool uses the term "elements" for these component parts. Figure 126 shows our system using the SISTEMA terminology.


Click to enlarge - DualChannelSys_Subdivided 4c
 
Figure 126: Dual Channel System Shown Subdivided into Subsystems, Blocks and Elements

The limit switches subsystem is shown subdivided down to its element level. The output contactor subsystem is subdivided down to its block level and the logic subsystem is not subdivided at all. The monitoring function for both the limit switches and the contactors is performed at the logic controller. Therefore the boxes representing the limit switch and contactor subsystems have a small overlap with the logic subsystem box.

This principle of system subdivision can be recognized in the methodology given in EN ISO 13849-1 and in the basic system structure principle for the SISTEMA tool. However, it is important to note that there are some subtle differences. The standard is not restrictive in its methodology but for the simplified method for estimating the PL the usual first step is to break the system structure into channels and the blocks within each channel. With SISTEMA the system is first divided into subsystems. The standard does not explicitly describe a subsystem concept but its use as given in SISTEMA provides a more understandable and intuitive approach. Of course there is no effect on the final calculation. SISTEMA and the standard both use the same principles and formulae. It is interesting to note that the subsystem approach is also used in EN/IEC 62061.

The system we have been using as an example is just one of the five basic types of system architectures that the standard designates. Anyone familiar with the Categories system will recognize our example as representative of either Category 3 or 4.

The standard uses the original EN 954-1 Categories as its five basic types of designated system architectures. It calls them Designated Architecture Categories. The requirements for the Categories are almost [but not quite] identical to those given in EN 954-1. The Designated Architecture Categories are represented by the following figures. It is important to note that they can be applied either to a complete system or a subsystem. The diagrams should not be taken purely as a physical structure. They are intended more as a graphical representation of conceptual requirements.

A more detailed look at the practical implementation of categories is dealt with in a later chapter.


Click to enlarge - Input Logic Output diagram 2
 
Figure 127: Designated Architecture Category B

Designated Architecture Category B must use basic safety principles [see annex of EN ISO 13849-2]. The system or subsystem can fail in the event of a single fault. See EN ISO 13849-1 for full requirements.

Click to enlarge - Input Logic Output diagram 2
 
Figure 128: Designated Architecture Category 1

Designated Architecture Category 1 has the same structure as Category B and can still fail in the event of a single fault. But because it must also use well tried safety principles [see annex of EN ISO 13849-2] this is less likely than for Category B. See EN ISO 13849-1 for full requirements.

Click to enlarge - Input Logic Output diagram 3
 
Figure 129: Designated Architecture Category 2

Designated Architecture Category 2 must use basic safety principles [see annex of EN ISO 13849-2]. There must also be diagnostic monitoring via a functional test of the system or subsystem. The test must occur at start up and then periodically with a frequency that equates to at least one hundred tests to every demand on the safety function. Note that this test rate is an additional requirement to that given in the old EN 954-1. The system or subsystem can still fail if a single fault occurs between the functional tests but this is usually less likely than for Category 1. See EN ISO 13849-1 for full requirements.

Click to enlarge - Input Logic Output diagram 5
 
Figure 130: Designated Architecture Category 3

Designated Architecture Category 3 must use basic safety principles [see annex of EN ISO 13849-2]. There is also a requirement that the system/subsystem must not fail in the event of a single fault. This means that the system needs to have single fault tolerance with regard to its safety function. The most common way of achieving this requirement is to employ a dual channel architecture as shown in Figure 130. In addition a single fault shall be detected, wherever practicable. This requirement is the same as the original requirement for Category 3 from EN 954-1. In that context the meaning of the phrase "wherever practicable" proved somewhat problematic. It meant that Category 3 could cover everything from a system with redundancy but no fault detection [often descriptively and appropriately termed "stupid redundancy"] to a redundant system where all single faults are detected. This issue is addressed in EN ISO 13849-1 by the requirement to estimate the quality of the Diagnostic Coverage [DC]. By reference to Annex K or Table 10. We can see that the greater the reliability [MTTFd] of the system, the less the DC we need. However, DC needs to be at least 60% for Category 3 Architecture.

Click to enlarge - Input Logic Output diagram 5
 
Figure 131: Designated Architecture Category 4

Designated Architecture Category 4 must use basic safety principles [see annex of EN ISO 13849-2]. It has a similar requirements diagram to Category 3 but it demands greater monitoring i.e. higher Diagnostic Coverage. This is shown by the heavier dotted lines representing the monitoring functions. In essence the difference between Categories 3 and 4 is that for Category 3 most faults must be detected but for Category 4 all faults must be detected. The DC needs to be at least 99%. Even an accumulation of faults must not cause a dangerous failure.

Reliability Data

EN ISO 13849-1 uses quantitative reliability data as part of the calculation of the PL achieved by the safety-related parts of a control system. This is a significant departure from EN 954-1. The first question this raises is "where do we get this data from?" It is possible to use data from recognized reliability handbooks but the standard makes it clear that the preferred source is the manufacturer. To this end, Rockwell Automation is making the relevant information available in the form of a data library for SISTEMA. In due course it will also publish the data in other forms. Before we go any further we should consider what types of data are required and also gain an understanding of how it is produced.

The ultimate type of data required as part of the PL determination in the standard [and SISTEMA] is the PFH [the probability of dangerous failure per hour]. This is the same data as represented by the PFHd abbreviation used in IEC/EN 62061.


PL Average Probability of Dangerous Failure per Hour (1/h) SIL
a ³10-5 to <10-4 No correspondence
b ³3 x 10-6 to <10-5 1
c ³10-6 to <3 x 10-6 1
d ³10-7 to <10-6 2
e ³10-8 to <10-7 3
  
Table 9

Table 9 shows the relationship between PFH and PL and SIL. For some subsystems the PFH may be available from the manufacturer. This makes life easier for the calculation. The manufacturer will usually have to perform some relatively complex calculation and/or testing on their subsystem in order to provide it. In the event that it is not available, EN ISO13849-1 gives us an alternative simplified approach based on the average MTTFd [mean time to a dangerous failure] of a single channel. The PL [and therefore the PFH] of a system or subsystem can then be calculated using the methodology and formulae in the standard. It can be done even more conveniently using SISTEMA.

NOTE: It is important to understand that, for a dual channel system (with or without diagnostics), it is not correct to use 1/PFHD to determine the MTTFd that is required by EN ISO 13849-1. The standard calls for the MTTFd of a single channel. This is a very different value to the MTTFd of the combination of both channels of a two channel subsystem. If the PFHD of a two channel subsystem is known, it can simply be entered directly in to SISTEMA.


MTTFd of a Single Channel

This represents the average mean time before the occurrence of a failure that could lead to the failure of the safety function. It is expressed in years. It is an average value of the MTTFd's of the "blocks" of a single channel and can be applied to either a system or a subsystem. The standard gives the following formula which is used to calculate the average of all the MTTFd's of each element used in a single channel or subsystem.

At this stage the value of SISTEMA becomes apparent. Users are spared time consuming consultation of tables and calculation of formulae since these tasks are performed by the software. The final results can be printed out in the form of a multiple page report.


Click to enlarge - Formular 1
 
Formula D1 from EN ISO 13849-1

In most dual channel systems both channel are identical therefore the result of the formula represents either channel.

If the system/subsystem channels are different the standard provides a formula to cater for this.


Click to enlarge - Formular 2
 
Formula 1 from EN ISO 13849-1

This, in effect, averages the two averages. In the cause of simplification it is also allowable to just use the worst case channel value.

The standard groups the MTTFd into three ranges as follows:


Denotation of MTTFd of each channel Range of MTTFd of each channel
Low 3 years <= MTTFd < 10 years
Medium 10 years <= MTTFd < 30 years
High 30 years <= MTTFd < 100 years
  
Table 10: Levels of MTTFd

Note that EN ISO 13849-1 limits the usable MTTFd of a single channel of a subsystem to a maximum of 100 years even though the actual values derived may be much higher.

As we will see later, the achieved range of MTTFd average is then combined with the designated architecture Category and the diagnostic coverage [DC] to provide a preliminary PL rating. The term preliminary is used here because other requirements including systematic integrity and measures against common cause failure still have to be met where relevant.


Methods of Data Determination

We now need to delve one stage deeper into how a manufacturer determines the data either in the form of PFHD or MTTFd. An understanding of this is essential when dealing with manufacturers data.

Data can be grouped into two basic types: 1) mechanistic (electro-mechanical, mechanical, pneumatic and hydraulic) and 2) electronic (solid state).

There is a fundamental difference between the common failure mechanisms of these three technology types. In basic form it can be summarized as follows:

Mechanistic Technology: Failure is proportional to both the inherent reliability and the usage rate. The greater the usage rate, the more likely that one of the component parts may be degraded and fail. Note that this is not the only failure cause, but unless we limit the operation time/cycles it will be the predominant one. It is self evident that a contactor that has switching cycle of once per ten seconds will operate reliably for a far shorter time than an identical contactor that operates one per day. Physical technology devices generally comprise components that are individually designed for their specific use. The components are shaped, molded, cast, machined etc. They are combined with linkages, springs, magnets, electrical windings etc to form a mechanism. Because the component parts do not, in general, have any history of use in other applications, we cannot find any pre-existing reliability data for them. The estimation of the PFHD or MTTFd for the mechanism is normally based on testing. Both EN/IEC 62061 and EN ISO 13849-1 advocate a test process known as B10d Testing.

In the B10d test a number of device samples [usually at least ten] are tested under suitably representative conditions. The mean number of operating cycles achieved before 10% of the samples fail to the dangerous condition is known as the B10d value.

In practice it is often the case that all of the samples will fail to a safe state but in that case the standard states that the B10d[dangerous] value can be taken as twice the B10[safe] value.


Electronic Technology: There are no physical wear related moving parts. Given an operating environment commensurate with the specified electrical and temperature [etc] characteristics, the predominant failure of an electronic circuit is proportional to the inherent reliability of its constituent components [or lack off it]. There are many reasons for individual component failure; imperfection introduced during manufacture, excessive power surges, mechanical connection problems etc. In general, faults in electronic components are difficult to predict by analysis and they appear to be random in nature. Therefore testing of an electronic device in test laboratory conditions will not necessarily reveal typical long term failure patterns.

In order to determine the reliability of electronic devices it is usual to use analysis and calculation. We can find good data for the individual components in reliability data handbooks. We can use analysis to determine which component failure modes are dangerous. It is acceptable and usual to average out the component failure modes as 50% safe and 50% dangerous. This normally results in relatively conservative data.

IEC 61508 provides formulae that can be used to calculate the overall probability of dangerous failure [PFH or PFD] of the device i.e. the subsystem. The formulae are quite complex and take into account [where applicable] component reliability, potential for common cause failure [beta factor], diagnostic coverage [DC], functional test interval and proof test interval. The good news is that this complex calculation will normally be done by the device manufacturer. Both EN/IEC 62061 and EN ISO 13849-1 accept a subsystem calculated in this way to IEC 61508. The resulting PFHD can be used directly into either Annex K of EN ISO 13849-1 or the SISTEMA calculation tool.


Software: Failures of software are inherently systematic in nature. Any failures are caused by the way it is conceived, written or compiled. Therefore all failures are caused by the system under which it is produced, not by its use. Therefore in order to control the failures we must control that system. Both IEC 61508 and EN ISO 13849-1 provide requirements and methodologies for this. We do not need to go into detail here other than to say they use the classic V model.

Click to enlarge - V model
 
Figure 132: V Model for Software Development

Embedded software is an issue for the designer of the device. The usual approach is to develop embedded software in accordance with the formal methods explained in IEC 61508 part 3. When it comes to application code, the software that a user interfaces with, most programmable safety devices are provided with "certified" function blocks or routines. This simplifies the validation task for application code but it must be remembered that the completed application program still needs to be validated. The way the blocks are linked and parameterized must be proved correct and valid for the intended task. EN ISO 13849-1 and IEC/EN 62061 Both provide guidelines for this process.

Diagnostic Coverage

We have already touched on this subject when we considered the Designated Architecture Categories 2, 3 and 4. Those Categories require some form of diagnostic testing to check whether the safety function is still working. The term "diagnostic coverage" [usually abbreviated to DC] is used to characterise the effectiveness of this testing. It is important to realize that DC is not based just on the number of components that can fail dangerously. It takes account of the total dangerous failure rate. The symbol l (lambda) is used for "failure rate." DC expresses the relationship of the rates of occurrence of the two following types of dangerous failure:


DC is expressed by the formula;

DC = ldd/ld expressed as a percentage.

This meaning of the term DC is common to EN ISO 13849-1 and EN/IEC 62061. However the way that it is derived differs. The latter standard proposes the use of calculation based on failure mode analysis but EN ISO 13849-1 provides a simplified method in the form of look-up tables. Various typical diagnostic techniques are listed together with the DC percentage that their use is deemed to achieve. In some cases rational judgment is still required, for example in some techniques the achieved DC is proportional to how often the test is performed. It is sometimes argued that this approach is too vague. However the estimation of DC can depend on many different variables and whichever technique is used the result can usually only truly be described as approximate. It is also important to understand that the tables in EN ISO 13849-1 are based on extensive research conducted by the BGIA into the results achieved by known actual diagnostic techniques used in real applications. In the interest of simplification the standard divides DC into four basic ranges:

<60% = none

60% to <90% = low

90% to <99% = medium

99%+ = high

This approach of dealing with ranges instead of individual percentage values can also be considered to be more realistic in terms of achievable accuracy. The SISTEMA tool uses the same look-up tables as the standard. As the use of complex electronics increases in safety-related devices DC becomes a more important factor. It is likely that future work on the standards will look further into clarification of this issue. In the meantime the use of engineering judgment and common sense should be sufficient to lead to the correct choice of DC range.


Common-Cause Failure

In most dual channel [i.e. single fault tolerant] systems or subsystems the diagnostic principle is based on the premise that there will not be dangerous failures of both channels at the same time. The term “at the same time” is more accurately expressed as “within the diagnostic test interval.” If the diagnostic test interval is reasonably short [e.g. less than eight hours] it is a reasonable assumption that two separate and unrelated faults are highly unlikely to occur within that time. However the standard makes it clear that we need to think carefully about whether the fault possibilities really are separate and unrelated. For example, if a fault in one component can foreseeably lead to failures of other components then the resulting totality of faults are deemed to be a single failure.

It is also possible that an event that causes one component to fail may also cause the failure of other components. This is termed “common cause failure” (CCF). The degree of propensity for CCF is normally described as the beta [ß] factor. It is very important that subsystem and system designers are aware of the possibilities of CCF. There are many different types of CCF and, correspondingly, many different ways of avoiding it. EN ISO 13849-1 plots a rational course between the extremes of complexity and over simplification. In common with EN/IEC 62061 it adopts an approach that is essentially qualitative. It provides a list of measures known to be effective in avoiding CCF.

Table 11 shows a summary of the scoring process.


No. Measure Against CCF Score
1 Separation/Segregation 15
2 Diversity 20
3 Design/Application/
Experience
20
4 Assessment/Analysis 5
5 Competence/Training 5
6 Environmental 35
  
Table 11: Scoring for Common-Cause Failure

A sufficient number of these measures must be implemented in the design of a system or subsystem. It could be claimed, with some justification, that the use of this list alone may not be adequate to prevent all possibility of CCF. However, if the intent of the list is properly considered it becomes clear that the spirit of its requirement is to make the designer analyse the possibilities for CCF and to implement appropriate avoidance measures based on the type of technology and the characteristics of the intended application. Use of the list enforces consideration of some of the most fundamental and effective techniques such as diversity of failure modes and design competencies. The BGIA SISTEMA tool also requires the implementation of the standard's CCF look up tables and makes them available in a convenient form.

Mission Time

Mission time represents the maximum period of time for which a subsystem (or system) can be used. After this time, it must be replaced. Mission time must be declared by the manufacturer of the components. Mission time will usually be the same as the “proof test interval” or “lifetime” (whichever is the smaller) as used in IEC/EN62061. The safety system designer must then consider the mission time of the components to determine the mission time of each safety function. For mechanistic components the T10d value gives this usable lifetime value in terms of the number of operations. The T10d value is derived as part of the B10d calculation.

Systematic Faults

We have already discussed quantified safety reliability data in the form of MTTFd and the probability of dangerous failure. However, this is not the whole story. When we referred to those terms we were really thinking about failures that appear to be random in nature. Indeed IEC/EN 62061 specifically refers to the abbreviation of PFHD as the probability of random hardware failure. But there are some types of failures collectively known as “systematic failure” that can be attributed to errors committed in the design or manufacturing process. The classic example of this is an error in software code. The standard provides measures in Annex G to avoid these errors [and therefore the failures]. These measures include provisions such as the use of suitable materials and manufacturing techniques, reviews, analysis and computer simulation. There are also foreseeable events and characteristics that can occur in the operating environment that could cause failure unless their effect is controlled. Annex G also provides measures for this. For example it is easily foreseeable that there may be occasional losses of power. Therefore the de-energization of components must result in a safe state for the system. These measures may seem to be just common sense, and indeed they are, but they are nevertheless essential. All the rest of the requirements of the standard will be meaningless unless due consideration is given to the control and avoidance of systematic failure. This will also sometimes require the same types of measures used for the control of random hardware failure [in order to achieve the required PFHD] such as automatic diagnostic test and redundant hardware.

Fault Exclusion

One of the primary analysis tools for safety systems is failure analysis. The designer and user must understand how the safety system performs in the presence of faults. Many techniques are available to perform the analysis. Examples include Fault Tree Analysis; Failure Modes, Effects and Criticality Analysis; Event Tree Analysis; and Load-Strength reviews.

During the analysis, certain faults may be uncovered that cannot be detected with automatic diagnostic testing without undue economic costs. Further, the probability that these faults might occur may be made extremely small, by using mitigating design, construction and test methods. Under these conditions, the faults may be excluded from further consideration. Fault exclusion is the ruling out of the occurrence of a failure because the probability of that specific failure of the SRCS is negligible.

ISO13849-1:2006 allows fault exclusion based on the technical improbability of occurrence, generally accepted technical experience and the technical requirements related to the application. ISO13849-2:2003 provides examples and justifications for excluding certain faults for electrical, pneumatic, hydraulic and mechanical systems. Fault exclusions must be declared with detailed justifications provided in the technical documentation.

It is not always possible to evaluate Safety-related Control System without assuming that certain faults can be excluded. For detailed information on fault exclusions, see ISO 13849-2.

As the level of risk gets higher, the justification for fault exclusion gets more stringent. In general, where PLe is required for a safety function to be implemented by a safety-related control system it is not normal to rely upon fault exclusions alone to achieve this level of performance. This is dependent upon the technology used and the intended operating environment. Therefore it is essential the designer takes additional care on the use of fault exclusions as that PL requirement increases.

For example, a door interlocking system that has to achieve PLe will need to incorporate a minimum fault tolerance of 1 (e.g. two conventional mechanical position switches) in order to achieve this level of performance since it is not normally justifiable to exclude faults, such as, broken switch actuators. However, it may be acceptable to exclude faults, such as short circuits in wiring within a control panel designed in accordance with relevant standards.


Performance Level (PL)

The performance level is a discrete level that specifies the ability of the safety-related parts of the control system to perform a safety function.

In order to assess the PL achieved by an implementation of any of the five designated architectures, the following data is required for the system (or subsystem):



Table 12 shows the PL achieved by various combinations. Refer to Annex K of the standard for more precise determination.

Click to enlarge - Fig 10.06 PL Graph
 
Figure 133: Graphical Determination of PL

Table 12 shows the PL achieved by various combinations. Refer to Annex K of the standard for more precise determination. For example, an application uses the Category 3 designated architecture. If the DC is between 60% and 90%, and if the MTTFd of each channel is between 10 and 30 years, then according to Figure 133, PLd is achieved.

Other factors must also be realized to satisfy the required PL. These requirements include the provisions already discussed such as for common cause failures, systematic failure, and mission time.

If the PFHD of the system or subsystem is known, Table 12 (Annex K of the standard) can be used to derive the PL.


MTTFd for each channel Average probability of a dangerous failure per hour (1/h) and corresponding performance level (PL)
Cat. B PL Cat. 1 PL Cat. 2 PL Cat. 2 PL Cat. 3 PL Cat. 3 PL Cat. 4 PL
Years DCavg = none DCavg = none DCavg = low DCavg = medium DCavg = low DCavg = medium DCavg = high
3 3,80 x 10-5 a     2,58 x 10-5 a 1,99 x 10-5 A 1,26 x 10-5 a 6,09 x 10-6 b    
3,3 3,46 x 10-5 a     2,33 x 10-5 a 1,79 x 10-5 A 1,13 x 10-5 a 5,41 x 10-6 b    
3,6 3,17 x 10-5 a     2,13 x 10-5 a 1,62 x 10-5 a 1,03 x 10-5 a 4,86 x 10-6 b    
3,9 2,93 x 10-5 a     1,95 x 10-5 a 1,48 x 10-5 a 9,37 x 10-6 b 4,40 x 10-6 b    
4,3 2,65 x 10-5 a     1,76 x 10-5 a 1,33 x 10-5 a 8,39 x 10-6 b 3,89 x 10-6 b    
4,7 2,43 x 10-5 a     1,60 x 10-5 a 1,20 x 10-5 a 7,58 x 10-6 b 3,48 x 10-6 b    
5,1 2,24 x 10-5 a     1,47 x 10-5 a 1,10 x 10-5 a 6,91 x 10-6 b 3,15 x 10-6 b    
5,6 2,04 x 10-5 a     1,33 x 10-5 a 9,87 x 10-6 b 6,21 x 10-6 b 2,80 x 10-6 c    
6,2 1,84 x 10-5 a     1,19 x 10-5 a 8,80 x 10-6 b 5,53 x 10-6 b 2,47 x 10-6 c    
6,8 1,68 x 10-5 a     1,08 x 10-5 a 7,93 x 10-6 b 4,98 x 10-6 b 2,20 x 10-6 c    
7,5 1,52 x 10-5 a     9,75 x 10-6 b 7,10 x 10-6 b 4,45 x 10-6 b 1,95 x 10-6 c    
8,2 1,39 x 10-5 a     8,87 x 10-6 b 6,43 x 10-6 b 4,02 x 10-6 b 1,74 x 10-6 c    
9,1 1,25 x 10-5 a     7,94 x 10-6 b 5,71 x 10-6 b 3,57 x 10-6 b 1,53 x 10-6 c    
10 1,14 x 10-5 a     7,18 x 10-6 b 5,14 x 10-6 b 3,21 x 10-6 b 1,36 x 10-6 c    
11 1,04 x 10-5 a     6,44 x 10-6 b 4,53 x 10-6 b 2,81 x 10-6 c 1,18 x 10-6 c    
12 9,51 x 10-6 b     5,84 x 10-6 b 4,04 x 10-6 b 2,49 x 10-6 c 1,04 x 10-6 c    
13 8,78 x 10-6 b     5,33 x 10-6 b 3,64 x 10-6 b 2,23 x 10-6 c 9,21 x 10-7 d    
15 7,61 x 10-6 b     4,53 x 10-6 b 3,01 x 10-6 b 1,82 x 10-6 c 7,44 x 10-7 d    
16 7,31 x 10-6 b     4,21 x 10-6 b 2,77 x 10-6 c 1,67 x 10-6 c 6,76 x 10-7 d    
18 6,34 x 10-6 b     3,68 x 10-6 b 2,37 x 10-6 c 1,41 x 10-6 c 5,67 x 10-7 d    
20 5,71 x 10-6 b     3,26 x 10-6 b 2,06 x 10-6 c 1,22 x 10-6 c 4,85 x 10-7 d    
22 5,19 x 10-6 b     2,93 x 10-6 c 1,82 x 10-6 c 1,07 x 10-6 c 4,21 x 10-7 d    
24 4,76 x 10-6 b     2,65 x 10-6 c 1,62 x 10-6 c 9,47 x 10-7 d 3,70 x 10-7 d    
27 4,23 x 10-6 b     2,32 x 10-6 c 1,39 x 10-6 c 8,04 x 10-7 d 3,10 x 10-7 d    
30     3,80 x 10-6 b 2,06 x 10-6 c 1,21 x 10-6 c 6,94 x 10-7 d 2,65 x 10-7 d 9,54 x 10-8 e
33     3,46 x 10-6 b 1,85 x 10-6 c 1,06 x 10-6 c 5,94 x 10-7 d 2,30 x 10-7 d 8,57 x 10-8 e
36     3,17 x 10-6 b 1,67 x 10-6 c 9,39 x 10-7 d 5,16 x 10-7 d 2,01 x 10-7 d 7,77 x 10-8 e
39     2,93 x 10-6 c 1,53 x 10-6 c 8,40 x 10-7 d 4,53 x 10-7 d 1,78 x 10-7 d 7,11 x 10-8 e
43     2,65 x 10-6 c 1,37 x 10-6 c 7,34 x 10-7 d 3,87 x 10-7 d 1,54 x 10-7 d 6,37 x 10-8 e
47     2,43 x 10-6 c 1,24 x 10-6 c 6,49 x 10-7 d 3,35 x 10-7 d 1,34 x 10-7 d 5,76 x 10-8 e
51     2,24 x 10-6 c 1,13 x 10-6 c 5,80 x 10-7 d 2,93 x 10-7 d 1,19 x 10-7 d 5,26 x 10-8 e
56     2,04 x 10-6 c 1,02 x 10-6 c 5,10 x 10-7 d 2,52 x 10-7 d 1,03 x 10-7 d 4,73 x 10-8 e
62     1,84 x 10-6 c 9,06 x 10-7 d 4,43 x 10-7 d 2,13 x 10-7 d 8,84 x 10-8 e 4,22 x 10-8 e
68     1,68 x 10-6 c 8,17 x 10-7 d 3,90 x 10-7 d 1,84 x 10-7 d 7,68 x 10-8 e 3,80 x 10-8 e
75     1,52 x 10-6 c 7,31 x 10-7 d 3,40 x 10-7 d 1,57 x 10-7 d 6,62 x 10-8 e 3,41 x 10-8 e
82     1,39 x 10-6 c 6,61 x 10-7 d 3,01 x 10-7 d 1,35 x 10-7 d 5,79 x 10-8 e 3,08 x 10-8 e
91     1,25 x 10-6 c 5,88 x 10-7 d 2,61 x 10-7 d 1,14 x 10-7 d 4,94 x 10-8 e 2,74 x 10-8 e
100     1,14 x 10-6 c 5,28 x 10-7 d 2,29 x 10-7 d 1,01 x 10-7 d 4,29 x 10-8 e 2,47 x 10-8 e
 
Table 12: Precise MTTFd to Determine PL

Source of Table 12 is Table K.1 of ISO/EN 13849-1:2006

Subsystem Design and Combinations

If the PLs of all the subsystem are known, they can be combined simply into a system using Table 13. The rational behind this table is clear. First, that the system can only be as good as its weakest link (subsystem). Second, the more subsystems there are, the greater the possibility for failure.

PLlow Nlow PL
a >3 Not allowed
=<3 a
b >2 a
=<2 b
c >2 b
=<2 c
d >3 c
=<3 d
e >3 d
.3 e
 
Table 13: PL calculation for series combined subsystems

In the system shown in Figure 135, the lowest Performance Levels are at Subsystems 1 and 2. Both are PLb. Therefore, using Table 13, we can read across b (in the PLlow column), through 2 (in the Nlow column) and find the achieved system PL as b (in the PL column). If all three subsystems were PLb the achieved PL would be PLa.

Note: The application of this table is not mandatory. The use of Annex K of the standard (or SISTEMA) is the preferred method. This table is only intended to provide a very simple approach for small systems.


Click to enlarge - Fig 10.09 Subsystem PLs
 
Figure 134: Combination of series subsystems as a PLb system

Validation

Validation plays an important role throughout the safety system development and commissioning process. ISO/EN 13849-2:2003 sets the requirements for validation. It calls for a validation plan and discusses validation by testing and analysis techniques such as Fault Tree Analysis and Failure Modes, Effects and Criticality Analysis. Most of these requirements will apply to the manufacturer of the subsystem rather than the subsystem user.

Machine Commissioning

At the system or machine commissioning stage, validation of the safety functions must be carried out in all operating modes and should cover all normal and foreseeable abnormal conditions. Combinations of inputs and sequences of operation must also be taken into consideration. This procedure is important because it is always necessary to check that the system is suitable for actual operational and environmental characteristics. Some of those characteristics may be different from the ones anticipated at the design stage.