What makes the worst bus stop in London?

In this blog we’ll go into more detail about some of the underlying processes and decisions that drove the results displayed on this “The worst bus stop in London” page.

How much data? And how was it collected?

From the TfL API, we collected three months of bus arrivals data across approximately seven hundred bus routes moving through the seven London travel zones. Pulling data every minute this gave us approximately 5 billion rows of data to start with.

What was in the data? Did it need transforming?

The data included details about bus routes (stops on the route, latitude and longitude values etc.). Provided in JSON format, this was easy to extract with our connector detailed in this blog.

What wasn’t included was any data that described the timeliness of a route i.e. what time did the bus arrive? However extrapolated this and our SQL approach is detailed in this blog. The gist of it is that we retrieved the arrival predictions on the electronic bus boards every minute, and once the predictions stopped for a particular bus at a particular stop, then we knew that the bus has arrived at that stop. Extrapolating the arrival times left us with about 220 million arrival times to analyse.

Also the API timetable data doesn’t reflect what passengers understand to be the “timetable. They’re used to seeing “bus every 8-10 minutes”. The API doesn’t detail this, but instead has, for designated days, the times when the buses are supposed to arrive. So for example the arrivals for a particular bus at a particular stop might be:

• 6:10, 6:20, 6:30
• Between 8AM and 8PM – every 10-12 minutes
• 20:30, 21:00, 21:30, 22:00

So we needed to identify the minimum and maximum waiting times between buses for each time period to reflect what passengers see. But it’s not fair to take minimums and maximums over the whole day.
In the earlier example, “Between 8AM and 8PM” is the most active period. But do all bus routes and bus stops have the same active period? They don’t, for example:

Berrylands Road on the K2 route
Museum Street on the 1 route

Instead we can reduce the minimum and maximum identification to the hour as they’re less likely to change between them. So:

• 6AM to 7AM – 20 to 20 minutes
• 7AM to 8AM – 11 to 12 minutes
• 8AM to 9AM – 10 to 11 minutes

This is actually better for looking at specific hours of the day and creating custom time periods for analysis.

Why look at waiting times between buses? Is an early bus just as bad as a late one?

The easier option is to look at whether the buses arrived at the times announced by the electronic boards. But these times change and update according to various factors. We could compare with the earliest prediction but was that prediction on time in the first place?

By looking at waiting times between buses we can determine whether the “every n minutes” promise is actually true. Because all arriving passengers think, “How long do I really have to wait?”; if they’re not spaced accordingly then they’re going to have to wait much longer. The mathematics of waiting times is explained more in this blog but the crux of it is:

The blue dots are bus arrivals at the designated minutes and if some passenger were to arrive randomly at this stop they’re more likely to arrive at the larger gaps and hence probably need to wait longer for their bus.

What could cause such large gaps between buses? Just a bus being early or late is enough. We tend to think of late buses as negative but going by the above example, an early bus has an equivalent influence on widening the gaps between it and another bus. As we’ll see later on, a bus arriving early or late has no impact on the following buses or others in the network, altering the gaps in with no limit.

Analysing late, early stops and determining the worst stop

How do we determine whether a bus was late or early?

After ingesting the raw timetable data and calculating the minimum and maximum waiting times for each day and hour, we combined this with our arrivals which we also extrapolated and the exact times from the timetable raw data. A sample:

140 490006638N 1 0 00:12:00 00:06:32 10 10

This is subjective depending on the person and the situation they are in. Each passenger has different needs; one may say “a couple of minutes before or after” whereas another might insist on “anything before/after” a minute. So we can compare actual arrivals with when they were supposed to arrive and how long a passenger at that stop should be expected to wait at that hour on that day. But what should be classed as late or early?

A justification for the second view would be if connections to another bus/train need to be made then that minute can make quite a difference. In a busy city like London that does make sense, especially during the evenings when national rail trains are less frequent and a stall of a few minutes on the underground/bus make an already strict schedule impossible.

So for this we’ve decided to use the plus/minus one-minute window and rank them for each zone, taking the percentage of early and late arrivals – percentages because some bus routes are bound to be busier than others and serves as a comparison metric.

The worst stop would be the one which has arrivals not conforming to the timetable i.e. the stops that is both the worst at being late and early.

Determining route and postcode reliability

How do we determine reliability? We did this by measuring how many bus arrivals actually arrived within the minimum and maximum times we calculated and show on the TfL website.

So for each stop we wanted a distribution of waiting times for each bus route that passes through it. Something like this:

Is this good?

The waiting times for this bus stop has a peak near 0 meaning that most of its buses arrive relatively close to each other. Taking this scenario, if someone misses two buses that arrive at the same time how long would they be waiting for the next one? It could be 0 minutes again looking at the graph but it also be 7, 8, 9, 10 etc. Which is it? More importantly how sure are we? According to the graph, not very.
Now if we take these waiting times and see how they compare with the minimums and maximums for the hour they occurred in:

Where the orange line indicates what should happen if the buses had arrived according to the promised intervals between buses. A few peaks, probably a few different periods of the day but it’s certainly less sporadic and makes the blue line look very flat in comparison.

Why does the expected (orange) line start from 8 and end at 23? Because the schedule doesn’t expect waiting times outside of that range. For example the time between two route 14 buses arriving at this stop should never be 6 minutes but there’re quite a substantial amount of those according to the actual (blue) line.

You can look more closely at the waiting times for this stop here. Certainly there should be no buses which are 1 – 6 minutes apart!
What about the entire bus route?

Which looks similar to the distribution for the single stop. This is just a bad route overall. A reliable route would be as close to the expected as possible.

On the results page, when assessing the reliability of a particular postcode, instead of grouping the stops by route, we grouped them by the postcodes to get that area’s reliability.

How does performance differ when evaluating performance based on the electronic board data versus the waiting times?

How could we measure performance? Well we could use the predictions provided by the electronic boards. TfL certainly invests in these predictions (how they go about it though, we’re unsure of) but looking at the data for these, buses do arrive quite punctually (with variance) according to these.

However, if we add the waiting times and whether they hit their designated targets to the top 10 stops for zone 1 then we see that they actually perform quite poorly.

We have the on time percentages for all bus arrivals according to the electronic board (did the bus arrive on time when compared to what was shown on the board?) and waiting time spacing (did the bus arrive within the minimum and maximum waiting time for that hour?).

Dorset Square Marylebone Station 97 41
George Street (W1) 97 39
Borough Station 97 45
Montague Street 97 34
Oxford Circus 96 37
British Museum 96 33
Bedford Place 96 44
Beech Street 96 20
York Street 95 37

If arrivals according to the electronic board are good but waiting times are bad then this suggests that if some bus gets delayed, there is no knock on effect to the predictions for the subsequent buses. Should they be affected? It’s situational since for rush hours in inner zones as the next bus probably isn’t too far behind in distance or time but for zones further out it with longer waits between buses, it’s not as applicable. Also, it’s difficult to instruct the buses behind to slow down deliberately to keep the spacing, especially when there is no traffic. What if they needed to slow down by five or more minutes?

What about the reverse? Are the bad stops actually the bad?

Victoria Bus Station 72 31
Museum of London 72 31
Westminster Station Parliament Square 74 41
Euston Bus Station 74 45
London Bridge Bus Station 75 26
Claverton St. Churchill Gdns 75 32
Clerkenwell Road Great Sutton St. 76 32
Trafalgar Square 76 42
City Thameslink 77 35

They’re still quite inconsistent but it’s interesting to note that the best stops don’t perform much better in this regard. However, the good stops for waiting times generally have better punctuality for the electronic board predictions too:

Grosvenor Road 81 80
Grosvenor Road 87 68
Sloane Square 88 63
Hyde Park Corner 90 63
Waterloo Station Upper Taxi Rd 82 59
Marble Arch 90 58
Strutton Ground 83 58
Marsham Street 85 58
Rochester Row Vane Street 77 57

This implies that good waiting times mean that the original predictions by the board were good. To summarise these scenarios:

• High board and high waiting time accuracy: a good stop, if the times between buses are consistent then it implies that the original predictions according to the board haven’t been altered and are being kept to
• Low board and low waiting time accuracy: generally just a bad stop overall
• High board and low waiting time accuracy: changes to arrival times appear to be independent for buses along the same route. So it’s possible for buses to be on time according to the dynamic board predictions but not the static timetable
• Low board and high waiting time accuracy: doesn’t happen because prediction changes seem to be independent for each bus and for all the buses to be uniformly ahead/behind schedule is very unlikely

So which is the best way to evaluate TfL’s bus performance? For passengers, we can assume they would want to look at waiting times and certainly answers, in probabilistic terms, the question “when will the bus arrive?”.

But for TfL themselves, they likely want to evaluate their prediction generation (sometimes transport operators will outsource this to a third party) then the board times and their punctuality would be of interest. Although the summarised scenarios described earlier suggests relationships which may be of interest to them, providing more insight than just the board punctuality alone.

Leave a Reply

Your email address will not be published nor used for any other purpose. Required fields are marked *