Hitachi High Speed Rail IIoT

While at Hitachi we got to build telemetry extraction prototypes for the Great Western Railway. (https://www.gwr.com/)

I’ve since done done some presentations about that work. Those were fun times!

IIoT_Edge_HighSpeedRail.pptx18385.2KB

Project: IIoT Edge Solutions for High Speed Rail UK & Italy

image

Goals for Industrial IOT

  • Transit data from devices at the edge.
    • Filter and aggregate.
    • Store and forward.
    • Build a state model for device(s).
  • Send commands to devices.
    • From central / core / cloud location.
    • Must be secure.
  • Apply data science and ML to data stream.
    • Predictive maintenance.
    • Smart alerting and error detection.
    • Run as close to edge as possible.
  • Installation and Update

Exemplar Edge Architecture

There was another initiative
There was another initiative for factories that we liked called FogHorn.

FogHorn had a very nice architecture.

  • Internal message bus.
  • Downloadable ML models
    • Custom ML language.
  • CEP
  • Store and Forward Historian
  • App store, UI and easily installed features.
  • Pluggable publication
  • Pluggable Ingestion / Protocol Adapters

Challenges

  1. How to get new devices installed at the edge, or how to install your fancy new edge software on existing devices to make them smart. (How to “ensmarten” devices?)
  2. Given the diverse environments, how can you build a platform that can scale down to tiny devices as well as scaling up to large devices?
  3. Given that in a typical IIoT setting, there will be thousands devices and sensors. How can these THINGS easily be added to the platform, while taking into account security?
  4. Given the large number of devices, there are also a large number of potential protocols and data formats, including possibly obscure networking stacks. How can the diverse set of communications stacks be supported?
  5. Talking to your devices can be very hard. In typical IIoT environments, the edge device is often mobile. Considering high speed trains, mining trucks, or remote factories, how can intermittent and often poor connectivity be compensated for?
  6. To best react to conditions at the edge, how can rules, models, and computation run as close to the edge as possible?
  7. Most IIoT environments such as rail are highly regulated. With certification times as high as 24 months, how can you get your product to market in a timely manner?
  8. Finally, given all the other challenges, how can system security be maintained? Especially how can installation and updates be reliable, secure, and straightforward?

Each Train is a Rolling Datacenter

image
  • For commuter high speed rail, each train is a rolling datacenter, and they all follow a pattern when they are assembled.
  • Most modern trains use an onboard Ethernet network. (with 192.168.* ip addresses). Common components across trains have the same IP addresses for every instance of that class of train.
  • Train Realtime Data Protocol (TRDP)

Interesting Subsystems

image

Some of the interesting subsystems are brakes, power, and Passenger Information. A typical train can have 10,000 sensors.

Brakes - Rail operations always monitor brakes and wheels and wheel bearings. This is the leading cause of accidents and derailments.

Power - Battery, power, and fuel use are particularly interesting to rail operators. Governments also mandate daily reporting.

PIS - The displays and announcements presented to passengers. Rail operators use these as the interface to their customers and can remotely update. Security of these updates is also critical.

How to get to the edge?

image

Typical PC Board found in Rail applications.

The existing On-board Service (OBS) is just a little PC style computer that is manufactured by one company, MEN Mikro Elektronik. This PC is rail certified so it's the way to go.

The standard is Intel Celeron level boards with dual core CPUs, 2 - 4 GB memory, 32 GB CF card with OS and 160 GB SSD.

The current standard practice for updates is for the Rail operators to flash the CF card, which means that while the train is in the service depot, technicians will pull the board and swap the CF card.

Creating a New OBS

image

The old OBS was just a little C program that saved files to disk and when WIFI was available, shore side servers would FTP the files off the train.

For the new system, the rail operators wanted critical alerts and errors to be offloaded all the time via cellular links provided by Vodafone. The operators also plan to use Vodafone as their cloud hosting provider.

At both Mobile World Congress and CeBIT, Vodafone showed ‘Hitachi Rail Analytics’ combining IoT and Cloud - demonstrating how the next generation of connectivity plays a central role in enabling digital businesses and the digital economy as a whole.

It should be noted as well that trains have passenger WIFI service and operations uses the same transport, and therefore operations must have priority QoS versus other data traffic.

In the event the connectivity is lost and there is a backlog of alerts and events, the OBS should store and forward, making a best effort to offload the data. (And in the event that the disk gets filled up, enforce a data retention policy.). At the data collection rate of normal operations the local SSD would fill in about 48 hours or two days of operations.

AMQP and RabbitMQ

image

We found that AMQP and RabbitMQ could be constrained to use less than 100 MB RAM while still providing 10,000 msg/sec throughput.

Digital Avatars

image

Finally, what about Digital Avatars?

A Digital Avatar as best expressed by Amazon is a state machine that acts like a pointer to the real device which can receive commands and report on state even while the real device is offline. The edge model was to produce a mini Avatar at the edge with a full blown Avatar at the IoT Core. In our cases, we could use a custom core, Pentaho, or AWS Greengrass and AWS IoT.

Conclusion

In the final assessment, a viable solution was produced and tested, and accepted as an initial POC by the rail operators. At which point, the project was put on hold due to external forces, and eventually the team was disbanded.

Lesson Learned

image

Language Choice

Given time constraints, Java was used.

Golang has some very good IoT libraries that are only recently becoming interesting (flogo, NATS messaging) and golang is able to product small, efficient binaries.

image

Message Broker / Bus

Given the above constraints, RabbitMQ worked well, albeit it's not as scalable as something like Kafka. But if Kafkaesque performance could be achieved with a much smaller footprint, it would be a better fit. Seemingly a NATS server that could talk MQTT would be ideal. And, NATS with connectors has emerged to do this very thing. MQTT has become the standard for IoT messaging and allows for easier message exchange. It should be noted that AMQP also provides a nice standard means to exchange data.

image

Containerization Options

A final solution with Ansible was used to deploy RPM builds.

It would be recommended to the rail operators to allow an onboard installation with a base OS and a container overlay, with full support for installation or upgrade based on a local docker repository or high speed access to remote repositories.

Much debate went into deployment options and in the final solution ansible was used to deploy RPM builds.

This was chosen to mitigate the physical access and CF card swapping method desired by rail operators.

In other environments, we had also experienced failed container deployments due to poor bandwidth at edge locations.

To solve these issues in the future, it would be recommended to the rail operators to allow an onboard installation with a base OS and a container overlay, with full support for installation or upgrade based on a local docker repository or high speed access to remote repositories.

Testing in a factory IoT setting failed due to images never downloading.

Reducing the size of images is crucial to successful deployments. Therefore using base images such as Alpine or even custom rolled images can significantly reduce image sizes. Some vendor images were totaling over 6 GB in size during installation. Secondly, there were issues in the supported version of CentOS, which was an older kernel leading to poor network and disk performance. Working with rail operators to certify more recent versions of Ubuntu are possible and even welcomed, which would support better container operations.

image

Visualization

For visualizing data during test runs, InfluxDB was used to store data and Grafana was used to display various dashboards. This setup was hosted on an external device, a Core i7 Intel NUC acting as a visualization co-processor. Generally, the device would not be rail installable but represented a proposed track side system that would be installed at each depot and/or run at the IoT Core.