Learn the tell-tale signs of a system comprise and know what to do about it.
Everything is normal; and then it isn’t. You have procedures — everything stays on track and you keep moving passengers. Your transit agency keeps moving passengers safely and on-time. But, what happens when you do experience a failure? How do you know if it is a “fault” or a compromise?
What is a compromise? Simply, someone has access to your information and may be able to affect your operation — your business side, including HR, accounting, management and fare collection information. Your operational systems may be affected including your signaling and communications (safety critical), and your public facing systems such as HVAC, public announcement, electronic signage, people movers — your operationally critical systems.
Who is compromising you? Does it matter? This article won’t focus on the who or the why; it will focus on how to detect and differentiate being compromised (hacked) from a fault.
Faults occur. Your systems are already fault tolerant. You are well-versed in finding faults, moving to secondary systems, or using different procedures to mitigate the issues until the primary system is restored. Everyone works hard. Everyone was and is safe. It’s all good.
So, if a fault is a failure of one or more components due to electronics failure, breakage, unintentional cuts in communications, power, or connectivity or weather. And, a compromise is an unauthorized person(s) changing the performance of your system or having unauthorized access to information, which results in the item or sub-system failing to perform as needed and designed. How do you tell the difference.
There is no easy answer. The symptoms look the same. There are some tell-tale signs of a compromise: you lose control from your operator console; you lose view (what you see is not what is happening); you know that “the numbers” do not make sense, e.g. your public feed is saying your trains are running 20 minutes late, but your reports all say on-time.
What should you do about it?
If the symptom is no different than having a component fail, why do anything? Your agency already deals with unexpected faults.
The key here is that when a single component fails — the fault is usually contained to a specific system and place. Your safety designs lead to the system failing safely. If you are compromised, the differences are important to consider: will you still “fail safely;” and can many devices fail at the same time, and in different ways causing uncertainty?
First — What can a failure lead to?
It is very difficult to hack a safety-critical system; however, each situation must be independently evaluated and verified. Your operation has so many systems interoperating; it is difficult to know which outcomes are of concern. How much time should you spend to prevent a “Zombies Ahead” message displaying compared to a false emergency alarm sounding at rush-hour at your main station?
You need to look at your systems and determine how a persistent person could make them behave badly. There are always surprise cross-connections between "innocent — unimportant" systems and critical systems. Some retailers have learned this lesson the hard way. Why was an HVAC system able to access the point-of-sales systems?
Second — How separated are your systems?
This makes us ask: do you know how interconnected your agency systems are?
Do you know if you have proper separation between your business systems (HR, Accounting, fare collection), your operationally important systems: HVAC, people movers, announcement systems; and your safety-critical systems. The key idea is to have layers of protection — the most protection for the most critical systems. We neither want to put everything in vaults, nor do we want to under protect them.
Third — Do you know your normal?
You have a complex operation. Do you know the information that should be flowing within each system? How about between the systems? Do you have unexpected connections? Do your vendors and partners have connections into your systems? Do they connect to only the systems they need to interact with?
If you do not know your normal, how will you detect the abnormal? There is nothing to react to if you cannot identify when something is wrong.
Fourth — What are you looking for?
An attacker wants command and control. They need to get into your systems (infect); collect information (surveil); and then send that information back (communicate). The attacker may want to then attack (control). You need to detect these unexpected communications pathways.
You also want to know when settings or timings are outside your normal. You want to be aware whenever your configurations or computer/PLC programs change. Additionally, you need to know when unanticipated changes happen to your list of authorized users or their privileges. Can you detect if, suddenly, your accounts payable staff get access to train dispatch, routing, and interlocking systems?
Fifth — Know When to Dig Deeper
Your crew and staff know when something is wrong. Do they have a way to raise an alert? They know, better than anyone, when something just doesn’t feel right. If they have no way to raise their concern — the beginnings of compromise may lead to an actual compromise.
Containment:
You have or suspect that you are compromised. Do you have a containment strategy? If so, have you tested it? What is the impact on your operation?
It’s Not Anyone’s Fault:
Without having a reaction plan that preserves evidence you may never be able to know how or if you were compromised. In your post-incident analysis - make certain to consider that a cybersecurity compromise as a root-cause or a contributing factor. If you find concern — consider prudent monitoring, corrections or mitigations.
Let’s Compromise:
Don’t let this be your legacy: “Don’t worry, we compromised on our security plan, so it’s not our fault.”
Leigh Weber founded Cybersecurity Analysis Ltd. to help assure that our critical infrastructure organizations are well prepared to assess and defend themselves from cyber threats. He is a member of the APTA Control and Communications Security Working Group and edited the APTA Recommended Practice for “Security Control and Communications Systems in Rail Transit Environments.” He holds the CISSP, cybersecurity certification.