Software Reliability for SAS

Public Channel / PACS

Share on Social Networks

Share Link

Use permanent link to share in social media

Share with a friend

Please login to send this document by email!

Embed in your website

Select page to start with

Post comment with email address (confirmation of email is required in order to publish comment on website) or please login to post comment

1. SOFTWARE RELIABILITY FOR DIGITAL SUBSTATION AUTOMATION SYSTEMS Daniel Espinosa <daniel.espinosa@cigre.org.mx> Software Life Cycle Software's Life Cycle starts at problem's solution conception, planning and implementation. Continues with maintainability process, brought functional fixes discovered by its use by the end user, and finish when problem definition has changed and it can't be solved by the software without add new functionalities. Last vision, its far away from industry concept, software providers release new versions, with new features at more or less predictable time basis, in most cases to keep cash flow. Software Community developed, no lucrative objectives, like Free (libre) Software, new versions also comes time basis, but with the base to provide new features to their users. Figure 1 : LibreOffice versions life cycle. On Figure 1, is possible know how the open source software LibreOffice [1] declares its version life cycle. As you can see, it has three concurrent maintained version, the “conservative” are recommended for corporations use where no problematic issues are present, but lacks of new features; “recommended” has more features and more stable than “fresh” versions providing new highly unstable features. For LibreOffice release planning, change status of “recommended” to “conservative” requires at least 12 months of fixing issues, but no new features are added in the process. Different approach to stabilizing software Exists macro free software projects using little to large pieces of software, to provide a group of functions to their users. This is the case of Debian Linux Distribution. The last politics, selecting a group of that software pieces with an specific version, providing a set of functionalities, with different set of known or unknowns issues, including security; the are called Unstable, Testing and Stable. Selecting process is based on provided features and functions, and their effect to other selected set of

2. software depending on it. Any early new version is added to Unstable distribution. Later, after a time and release of new versions fixing found issues, this software version is added to Testing, again after a time using the piece of software and no issues or security problems are found on all set of selected software this distribution is declared Stable. This cycle is repeated again and in parallel. This is the Debian's way to stabilizing software. The main difference between stable versions are their provided functionalities. By releasing an Stable Distribution, starts a process to fix just miss-operation in a function and fix security problems. No new features and functions are added. They are performed in Unstable. Testing, is a Feature-Freeze stage of the software; Testing just accept fixes to functionalities but no new one. Proprietary software's process to release new version, include in most cases Alfa, Beta and Final releases, covering Testing process, because this versions have fixed functionalities before to release. There's no public, but private to a limited set of clients testing process. New functionalities have closed development and, in the better case, they are developed because end user's feedback. Pre- release testing of proprietary software, in most cases aren't in “real conditions”, like data flow and end user interaction. After final release, software vendor declare it stable. Select Stable Software Utilities select software by its “reputation”, requiring evidences from other users about their performance. By using it in Substation Automation Systems, Utilities require “No Fail” and “All Time On”, first main focuses on fail tolerance and the second on automatic recovery. Large Utilities maintain professional staff, to select and test software; the last is critical when new requirements are just full filled by software new release, in order to verify if upgrade will provide “No Fail” and “All Ways On” characteristics reached by the previous version. This process provide similar approach from Debian like: Select (Unstable), test (Testing) and deploy (release Stable). Better results are reached if a set of “volunteers” is selected to wide testing new version before deploy to all company members or installations. Production and on maintaining Ones the software is deployed, after it is tested, is considered in Production State, and software vendors inform their users for fixes and the way they must be installed, this process is On Maintaining. Fixes response to users feedback or security advisors. When new issue or security risk is found, vendors must develop new use and test cases, develop fixes and ensure, before release, they are working properly. Because this is just a test case developed by vendor, Utilities must consider to develop their own, take providence to “Test Before Deploy”, in order to ensure software reliability. Version Control Universal or general agreement on how to identify software's versions. Basic rule, is to provide a

5. IHM's software is designed to interface with Substation's operators, when present, allowing to control and supervise a DSAS and its components and its executed in hardened computer to support environment conditions, present in most substation's control rooms. It is executed over an general purpose operating system, providing to software access to computer resources like hard disk or Ethernet card used to connect it to the DSAS. Both, IHM's software and operation system are independent each other, but the former must be compatible with the latter in order to run correctly. Most software designers, consider lot of conditions and test them, in order to verify they perform as expected; unfortunately, no all conditions have a test case; this is specially true when software has been released recently. Over time, designers get experience and use cases, based on feedback from end user, allowing them to create test cases, providing a platform to detect regressions on fixes and future versions; experience, feedback and test cases, allow to get good software performance for early versions. Software providers, must use Software Control Version Systems for each product and version. Software Stable Versions' disadvantages Most Software Stable Versions' disadvantages are obvious, but one of them is and apparent Obsolescence. Time required to stabilize a software version, could take months or years, putting it out from new software developments, like new shining operation systems, and end-user requirements. A product expected to run, without most problems for more than five or then years, must be a Stable Version. Firmware Problem When talking about firmware, device manufacturer have total control on hardware and provided functionalities; it is possible to design well defined use and test cases. Only on bad behavior or poor performance, firmware is fixed and an update is distributed along its costumers, but only manufacturer is able to perform these tasks. Some products allows firmware updates, allowing to release new versions, distribute to customers and with each new shipped products to clients. Firmware could provide fixes, new functions or both. It is not a good practice to deploy new versions, without run tests cases to verify its behavior and performance, but is better to stabilize the firmware version first, meaning an “Deploy only Stable Versions”. For most products in the marked, manufacturers' methods to assign software version's number, makes hard to recognize witch of them provides fixes for detected problems and witch add new functionalities; in some cases, they could be considered Unstable or Testing. By using software version number with no clear “meaning”, good documentation or mixing fixes with new functionalities, makes user's installations as “Test Field”, even when they have been exhaustively tested, find new problems is possible; some of them could fall in customers outage with consequent economical loss o, simply invest hours to re-start devices located far away from maintenance employees, in order to force it to continue serving.

3. different ID for each release, when the product have received a set of fixes or new functionalities. Is a good practice to use Source Code Control Version Systems. But one of the worse, is release fixes or functionalities using the same ID version. By using Source Code Control Version Systems, is possible to track changes and find new bugs introduced by them. Some vendors provide long-term support to their products, by keeping branches for different versions. In some Open Source software, exists too different ways to identify versions. One case, I found particularly convenient, is to have three or four numbers using dots as separator. First one identify a “Mayor Version”, increased when software architecture or dependencies have changed. Second one, identify a “Minor Version”, increased when new set of functionalities are added. Third one called “Micro Version” is increased when just fixes on found issues are included. Forth, is almost new, used by LibreOffice project, to denote fixes on Release Candidate versions, before a final Version release. Micro Versions doesn't include new functionalities, just fix issues or security risks. Taking our definition about definition about software Life Cycle, any Major or Minor release represent a new start. While Micro, are part of Maintainability Process, they provide more and more stability to software and robustness. How new versions affect in service devices Some new versions released by vendors, to be installed to in service devices, could include new functionalities, improve performance or provide simple fixes. Due to no confidence on how version numbering is used by Vendors, is very important to check release notes in order to know what will be installed. Vendors and users must know or certify procedures taken to release new versions, how identify version changes and documentation about it. One of the procedures, include to stress devices to specific tests cases, both old for unchanged functionalities and new tests cases for new ones. Applying and register, drawbacks and fixes on the fly, will take some time, and must be part of Vendors procedures. Versions adding support to latest protocol, adding new curves characteristics, support new gates for logics, and so, are new functionalities that must be tested by the end user, in order to know if previous Stability of the product in service is unaffected. This new features can't be considered part of Maintainability Process; by installing this new one, must be considered to start a new Software's Lice Cycle. How to Tag Stable Versions by End Users End users must develop its own process to tag Stable Versions. Apply simulation, documented tests cases and observation, could be part of the process; but depends on available resources. But certainly, not consider to tag any early new version as suitable to deploy. At least must consider some time to register product's performance under real conditions. Any new version must be considered to be tested, observed and provide feedback to Vendor, finishing on tag it as Stable or “Allowed to Deploy”. Restrictions While “Testing” Any new early version adding new functionalities, must be Tagged as Testing, or equivalent. Run above

4. process to find any issue or security risk, before to tag an stable version. End users must consider to reject to continue Testing process, if a new version adding functionalities is released by the vendor; consider to test just versions to fix issues, in order to Stabilize functions provided by a Mayor or Minor version. Tagging Unstable Tagging software version as Unstable must be considered just for very early new ones. Could be used “Untested” tag to avoid confusions on what unstable means. Any version adding new functionalities, must be tagged “Unstable” or “Untested”. After Utility's Staff have applied well known tests cases under controlled environment, like in a laboratory, with satisfactory results, this version could be tagged as “Testing” or “On Testing”. Development Version Vendor could provide pre- released versions to its users. These versions could change without advise; can add or remove new functions, introduce fails or fix them; could be used to test new algorithms to improve performance, support new protocols, but is uncertainly if they will perform correctly on real conditions. In some cases, new functionalities have no tests cases developed by vendors or users, or they are in early stages. Both, vendors and users, can develop their tests procedures for future Stable Versions. Issues Management End users, generally, are the first to find issues, on performance, miss-operation, bad response, and others. Some of them could be tagged with different level of severity or just as “improvements”. Some vendors have Issues Management Systems, to track improvements requests or fixing issues affecting device's functionalities. Some Open Source projects have mature and open Issues Management Systems, like Bugzilla for GNOME, is an space to report, get suggestions to fix problems and close them. Reports must include information to reproduce the issue, in order to find the problem and fix it. After a set of issues has been fixed or if the problem affect considerably the product's performance, a new version is released. Users are encourage to require from their vendors, post-sale support services, including issues fixing and publications about found issues, their severity, how to proceed to limit or fix problem's consequences and, if applied, new software version to install in order to fix the problem permanently. Software for DSAS When software is used for DSAS and it fails totally or partially, in best case, local or remote operation/supervision could be loss; up to energy loss to end users in worts, when software is executed in protection or bay control units. Protection relays, bay controllers, multifunction meters, supervision devices like gateways, and others, are microprocessor based, then they must execute some kind of software, stored in non-volatile memory, known as “firmware”, in order to get accomplished its functions. Firmware provides the operating system and addition pieces of software to perform device's functions.

6. From this analysis, a conclusion could be rise: manufacturers must consider a high priority the Product's Testing and Stabilization processes. Operating System Problem As pointed before, operating system software, when used run IHM or any other critical software, plays an important role on general stability of performed tasks of the software. One of the main issue is life cycle of the software, it must be in-line with installation existing protection, control and supervision devices' life cycle, both for supervision, configuration and maintenance. While there are basically two tendencies on operating systems, both have basically two stages in their life cycle, the “old but considered stable version” and the “new shining future rich version”. Old versions have been used for years, have been patched along for problems and security issues. New versions, provides new user interface, support new hardware and tries to fix some security issues from the ground, changing structural designs. Old fashioned versions, considered Stable, could see its end of life shortly, for some software providers this means: no more fixes for reported problems or security updates, making this kind of product a vulnerable focus and back doors for malicious hackers. Because its age, this kind of operating systems, allows to run software designed to use old, may be user-unfriendly interface, with poor support of new technologies like multi-threading or memory management. New versions, because most of its architectural changes or new added technologies, provide new user interfaces, good support on hardware and some times better performance. Their life cycle has started recently, providing a long time forward support, lots of upcoming patches and security fixes. Unfortunately, most of the patches and fixes, could break critical running software, forcing software providers to delay on deploy patches, trying to ensure their software could be executed without get poor performance or bad behavior. Its better to get a software provider and operating systems, offering long term support and when this period had finished, allows to contract other companies to provide security fixes on running software, always considering installation devices' life cycle. Conclusions Development of software takes time, but fixing undesirable behavior takes even more. Features are important part of software, but they aren't perfect, they aren't tested completely at development stage because no all conditions are known to test against. Take care to test on real conditions to find what to fix or improve; state stable after a while with no errors or regressions are reported; differentiate between Testing and Production, just use software for Production when it has been tested for some time. Consider any recently new release as untested and unstable. Daniel Espinosa . Professional on Protection and Automation Systems, with experience for more than 10 years in the field. Have been working in Mexico's Comisión Federal de Electricidad to date, developing and establishing IEC 61850 systems, for documentation and components testing. Electrical Engineering degree from Instituto Politécnico Nacional. Actually he is observer member of CIGRÉ SC B5 for Mexico.

Views

  • 537 Total Views
  • 459 Website Views
  • 78 Embeded Views

Actions

  • 0 Social Shares
  • 0 Likes
  • 0 Dislikes
  • 0 Comments

Share count

  • 0 Facebook
  • 0 Twitter
  • 0 LinkedIn
  • 0 Google+

Embeds 3

  • 1 librescl.pwmc.mx
  • 3 www.pwmc.mx
  • 1 104.197.224.230