Multi-agent in action: Michael Küpper from Deutsche Bahn on putting railway back on the fast track

When thinking of the butterfly effect, surprising life twists and turns may first come to mind. But there’s a down-to-earth example that you probably have experienced firsthand. It’s the railway - the main subject of Jerzy Jarzębowski’s recent discussion with Michael Kupper. Think how a disruption of train traffic in the north of Germany can heavily impact the schedule as far as the Swiss border. One little event can have a huge impact on the whole transportation system.

Michael and his team have found an AI remedy for that. In this conversation, he explains why classic AI methods do not solve railway-related issues and which solutions can actually make a difference. His findings could change the face of transportation as we know it today. Dive into Michael’s insights and track down the promising innovations in the railway sector and beyond it.

Key Takeaways from the Conversation

More than classical AI: the classical AI approach is crucial for predictive maintenance, helping to identify and address issues before they lead to significant disruptions. However, when it comes to increasing efficiency across tens of thousands of routes, traditional mathematical optimization and classical AI are not enough. More advanced AI solutions involving generative artificial intelligence and multi-agent systems are a must.

Multi-agent reinforcement learning enabling large scale innovation: the transportation sector can benefit enormously from upgrading to multi-agent reinforcement learning (MARL). Unlike single-agent approaches, MARL treats each train as an independent decision-making unit. This allows for greater flexibility and coordination in constructing schedules, thereby optimizing overall network efficiency while ensuring that trains operate without interfering with each other..

Challenges in generalization and scalability: while AI models show promise in specific settings, there are ongoing challenges in generalizing these solutions to different networks and traffic scenarios and scaling them effectively. In the next years, the critical aspect to develop will be the adaptability of AI systems to various operational contexts and consistent performance across larger networks.

Full automation as the future: transportation sector is traditionally slow to integrate new technologies. It has started now on this fast track to an automated future. Companies will aim towards increased automation across various functions, including scheduling, operational control, maintenance, and customer service. Fully automated systems may become the norm within the next decade, driven by continuous advancements in AI.

Conversation with Michael Küpper

Jarek Jarzębowski: Hello, Michael. Can you tell me a little bit more about yourself, your background, and your experience in data and logistics in general?

Michael Küpper: Well, I am a former particle physicist turned management consultant turned digitalization manager.

I’ve been working with Deutsche Bahn, developing a digitalization project of the railway sector in Germany, for the past six and a half years. I’ve been in charge of building an automated AI-based capacity and traffic management system, which is the central brain of the future completely automated digitalized railway system we’re building here at Digitale Schiene Deutschland (DSD).

Before that, I was a management consultant for over ten years, working in various industries. For about three years, I worked for the Boston Consulting Group, and then for seven years, I ran my own company on the East Coast of the US.

Jarek Jarzębowski: Can you tell me a bit more about DSD? What is it, how is it shaped, what will it do, what will it look like, and what tech is behind it?

Michael Küpper: Yes, DSD is a sector initiative that encompasses the entire railway sector in Germany, which is one of the biggest in the world and includes several hundred train operators. Of course, there are the big ones, some of which belong to Deutsche Bahn Holding, but also a lot of smaller, privately run operators.

Harness the full potential of AI for your business

It also includes the infrastructure manager, which is DB InfraGO, formerly known as DB Netz. In the European Union, railway operations are separated from the infrastructure, just as there is a similar unbundling in electricity, gas, and telecommunication networks.

Inside Deutsche Bahn, Digitale Schiene Deutschland is organizationally located in the infrastructure management company because many of the technological innovations we are supposed to foster and produce either refer primarily to the infrastructure or need to be orchestrated by that neutral, regulated entity.

The goal of DSD is to significantly improve capacity, quality, and punctuality of the railway system by applying fundamentally new technologies, including technologies adopted from other sectors.

Jarek Jarzębowski: Can you tell me a little bit more about the current state of innovation and the use of AI in the industry? I mean, trains in general are pretty traditional or even old-fashioned in some ways. What is the actual use of technology, AI, and data from your perspective in the field?

Michael Küpper: There are multiple uses of AI in the industry at various stages of technological maturity. Let’s start with what one could almost call classical AI, the pattern recognition that has been around for a while. That is, of course, also being used in the railway sector, including at Deutsche Bahn. For example, detecting faults on trains directly or via patterns in sensor data from wheels, engines, the pantograph, and other components. This is used to detect flaws and then do reactive maintenance.

The next stage of AI involves making sense of all that data for predictive maintenance of vehicles and infrastructure. Various train operators and infrastructure managers worldwide are using, and further developing, such technologies.

Now, let’s move to more recent developments since about 2016-2017, when Google’s DeepMind had its breakthroughs with AlphaGo and AlphaGo Zero. We, and a few other teams around the world, are building on the AI concepts behind that—specifically deep reinforcement learning—to solve tricky automation problems in railway systems that have never been tackled on a large scale before. This is where the system my team has been developing for the past few years comes in, addressing the automation of planning and operational control.

Jarek Jarzębowski: Can you expand on the problem itself? Why is it such a big problem, and why has it never been tackled?

Michael Küpper: The problem is a gigantic optimization problem. In Germany, for instance, we have about 40,000 train runs per day on 33,000 kilometers of network. When you reach the scale of about 30-35 trains, traditional mathematical optimization methods become infeasible because they take too long to come to reasonable decisions. Decisions like which train goes where, what happens if a track becomes unusable, or a vehicle gets stuck. You need to reroute trains, slow down some, accelerate others, all while respecting numerous constraints like electricity, profile gauge restrictions, passenger connections or similar dependencies among cargo trains. Today, this work is done by hundreds of dispatchers and signallers.This is where AI comes into play. This is not just pattern recognition or learning from past decisions; it’s creating something new, a combination of decision-making systems and generative AI.

Jarek Jarzębowski: Can you tell us a bit more about the solution that you are working on? What is the technology behind it, how are you approaching it, and what is the state of progress in tackling the problem?

Michael Küpper: Only a handful of teams in the world are working on this problem because it is very railway-specific. The market is limited, and the work requires top-notch experts and extensive cloud computing resources for reinforcement learning training. Most of these teams follow a single-agent reinforcement learning approach, where a few decision-making units decide on an abstract level about train order or specific meta settings of a schedule. These meta-decisions are then translated into an actual executable schedule for the railway system.

What’s unique about our team’s approach at DSD is the multi-agent reinforcement learning approach. In this model, every train becomes its own decision-making unit, allowing for maximum freedom in constructing schedules but requiring maximum coordination among the trains. This ensures they generate schedules without impeding each other’s paths.

Jarek Jarzębowski: Can you share a bit about the outcomes that you have already achieved?

Michael Küpper: We have a prototype that can plan schedules for a couple of hundred trains on medium-sized networks of a few thousand route kilometers. The same prototype can also change live schedules of around 40 trains in a regional node, reacting spontaneously to disruptions in a previously created schedule.

Jarek Jarzębowski: How do you deal with the need for a lot of computing power for such a multi-agent approach?

Michael Küpper: Let me get back to that question after a short explanation of the background. The system must reschedule when disruptions occur, for example when a track becomes unusable due to an unauthorized person in it. Within seconds, the system comes up with a new schedule for dozens of trains, rerouting not only those trains directly affected by the blocked track, but optimizing the overall traffic holistically. Eventually, this will ensure that disruptions in one area, say North Germany, can be accounted for in real-time with all secondary and higher-order effects down to the Swiss border in the south.

Jarek Jarzębowski: It reminds me of the butterfly effect. If one change in the route due to a blocked track can change the routes of trains on the other side of the country, can you actually track what’s going on inside the algorithm? Or is it a black box where you input data and get the output without knowing what happened inside?

Michael Küpper: The optimization problem that the algorithm solves is highly complex, so there’s no linear connection between input and output. However, once the system is fully trained, the neural networks inside the machine are frozen, meaning the same starting situation, theoretically, should always produce the same reaction. By analyzing the parameters inside the neural network, we get hints about which factors were most influential in making specific decisions. This helps us explain the outcomes as much as possible.

Check this series

And now about the need for computing power: As described before, decisions in operations are made by a fully trained AI system with frozen neural networks. This does not require much computing power, relatively speaking. The major effort has been spent during training. Admittedly, training is resource-intense, and therefore we use powerful cloud services for it.

Jarek Jarzębowski: What are the biggest obstacles your team faces in implementing this solution, and what strategies are you using to overcome them?

Michael Küpper: There are a few challenges, and that’s an understatement. Technically, we face issues of generalization and scalability. We have good experience with both, but are still far from where we need to be.

Scaling, so far, shows that processing time grows linearly with the number of trains. However, there’s no guarantee this remains true for up to 40,000 train runs. We believe it does, but the path might be bumpy.

The other challenge is generalization. We have good experience training the system on a specific network with specific trains and then modifying these parameters. The system still produces good results, but we don’t know yet how big these changes can be without losing quality in the schedules.

Jarek Jarzębowski: You mentioned aiming for this to work across all of Germany and potentially beyond. Does this mean such a solution could be usable in any railway system worldwide, or would it need significant changes to work in other systems?

Michael Küpper: Many of the basic concepts, AI modeling and neural network configurations we’ve developed, are transferable to related problems in other transportation systems. It doesn’t even have to be a railway system; it could be a subway or streetcar system with similar characteristics. However, customization will be necessary due to differences in operational rules, vehicle characteristics, safety systems, and optimization goals.

Jarek Jarzębowski: Apart from this optimization challenge, do you see other significant challenges that AI might tackle in the near future in your field?

Michael Küpper: Yes, I think based on the progress we’ve made, it’s feasible and worthwhile to use similar approaches for related problems. For example, handling maintenance capacities in facilities connected to the rail network, organizing the driving and maintenance of a vehicle fleet, managing a staff of thousands of locomotive drivers and train conductors, or deploying a limited number of vehicles for maximum productive operations. These operational problems could also be tackled with a multi-agent reinforcement learning approach.

Jarek Jarzębowski: One of the most talked-about subsets of AI currently is large language models (LLMs). Do you think LLMs can be used in the railway industry, and if so, in what capacity?

Michael Küpper: Yes, teams in multiple organizations are already experimenting with using LLMs for various tasks inside the industry. This includes passenger information distribution, speech generation, customer service, and live video chatbots to replace or add to traditional staff at information booths. Another use is in systems engineering, where LLMs assist in crafting comprehensive requirements documents and generating test cases. These tasks may not get fully automated ever – and probably shouldn’t – but LLMs can be very helpful in aiding systems engineers.

Jarek Jarzębowski: Since you have been working in the data science field for so long, do you have any advice for other teams in the railway industry or logistics on approaching the use of data science?

Michael Küpper: Based on my experience, I have two main recommendations: firstly, ensure a solid data foundation, and secondly, allocate enough time and budget. Many companies start using AI with high expectations, but neglect the importance of accurate infrastructure data. Data inaccuracies can hinder progress significantly. Moreover, these projects take time and require significant financial investment due to the need for highly paid experts and extensive computing resources. Management and stakeholders must understand that automation and digitalization in such complex industries cannot be rushed and require substantial and sustained investment.

Jarek Jarzębowski: How do you see the future of the industry in terms of technology use, especially AI, in the next five to ten years?

Michael Küpper: While predicting the future is always tricky, I see a continuous trend toward increased AI use across various aspects of the industry. In the next decade, I believe we will witness fully automated scheduling and operational control in transportation systems worldwide. AI will be sensibly employed to solve complex problems in maintenance, customer service, scheduling, and more.

Jarek Jarzębowski: Is there anything else you would like to share with our audience?

Michael Küpper: Rail companies are not pursuing digitalization and AI research just for its beauty. They do this to address some of the toughest challenges in their industry. They need to increase capacity on the existing rail network, improve quality, reliability, and efficiency. In Germany, trains are often full, and we expect significantly higher passenger numbers in the near future. Building new tracks is almost impossible, lengthy and prohibitively expensive. The aim of the German government is to double the passengers and to increase freight transport to a modal split of 25%. So we need to increase capacity by at least 30% on the existing network with innovations and new digital technologies. This requires automated and optimized driving, a safety system allowing shorter and flexible distances between trains, and intelligent traffic orchestration. Furthermore, predictive maintenance allows for more dynamic and accurate maintenance schedules, increasing vehicle usage and thereby capacity.

Jarek Jarzębowski: Thank you, Michael Kupper, for sharing these insights and the impressive work you and your team are doing at DSD. It has been a fascinating discussion, and I am sure our audience will appreciate the depth of information and your perspective on the future of the railway industry.

Michael Küpper: Thank you, Jarek. It was a pleasure to discuss these topics with you.

Michael Küpper’s Background

Michael Küpper joined Digitale Schiene Deutschland (DSD) at Deutsche Bahn in late 2017. As Product Manager, he has built and led the scaled-agile team-of-teams that implements DSD’s Capacity & Traffic Management System (CTMS) until 2023. He now serves as Stakeholder Manager to drive the strategic vision of CTMS and its enabling technological foundations within the railway sector at large. Michael holds a PhD in physics from The Weizmann Institute of Science in Israel and has over 10 years of experience as strategy and management consultant. Throughout his career, he has introduced Artificial Intelligence (AI) in environments, where AI had not been previously applied, from particle physics analysis to housing price prediction to rail traffic management.

Deutsche Bahn

Closing Thoughts

Classic AI may not be the answer to railway traffic management automation - but advanced AI tools can tackle such issues, perspectively solving some of the biggest pain points of passengers, cargo customers, and operators alike. As the challenges of modern logistics continue to arise, AI combining decision-making systems and generative capabilities may emerge as the ultimate conductor.

The multi-agent reinforcement learning solution employed by Michael Kupper’s team for Deutsche Bahn could serve as a blueprint for other transportation companies aiming for capacity, reliability and efficiency improvement. Although there are some obstacles on the way, the prospects for successful innovation on a larger scale are promising. Let’s see who joins the multi-agent bandwagon!

About the author

Jarek Jarzębowski

People & Culture Lead

Jarek is an experienced People & Culture professional and tech enthusiast. He is a speaker at HR and tech conferences and Podcaster, who shares a lot on LinkedIn. He loves working on the crossroads of humans, technology, and business, bringing the best of all worlds and combining them in a novel way.
At nexocode, he is responsible for leading People & Culture initiatives.

Multi-agent in action: Michael Küpper from Deutsche Bahn on putting railway back on the fast track

Key Takeaways from the Conversation

Conversation with Michael Küpper

Harness the full potential of AI for your business

Michael Küpper’s Background

Closing Thoughts

About the author

1. Definitions

2. Cookies

3. How System Logs work on the Website

4. Cookie mechanism on the Website

5. Additional information

Multi-agent in action: Michael Küpper from Deutsche Bahn on putting railway back on the fast track

Key Takeaways from the Conversation

Conversation with Michael Küpper

Harness the full potential of AI for your business

Michael Küpper’s Background

Closing Thoughts

About the author

This article is a part of

Insights on practical AI applications just one click away

Done!

Thanks for joining the newsletter

1. Definitions

2. Cookies

3. How System Logs work on the Website

4. Cookie mechanism on the Website

5. Additional information