Predicting which stations will be empty or full

As I mentioned on my previous post about Bike sharing Rebalancing a key question that I am trying to answer in my research is which stations of the system will be under unstable conditions, such as empty or full, for example.

To be more accurate, I am not only interested on empty or full stations, rather the exact number of bikes that each station will have in any given time of the day. Under some circumstances it is desirable to have an empty or a full station to allocate the upcoming demand pattern.

Can we anticipate when a station is going to be unstable?

The answer is yes and to do so we can use Machine Learning techniques.

Machine learning is a good fit to tackle this sort of predictive problem. Machine learning techniques explore and learn from data to build a model that can then be used as a decision making tool or to make predictions given some inputs.

In this particular problem I tested the following techniques: 1) Gradient Boosting Machines, 2) Random Forests, 3) Neural Networks and 4) Linear Regression.

The best model turn out to be Gradient Boosting Machines (GBM). GBM repeatedly fit a simple classifier, a decision tree in most of the cases, to a subset of the data, both in terms of the number of observations and the number of attributes used explain the outcome.  Finally, it aggregates, or makes an ensemble of all the simpler model predictions to make a final decision.

In the current problem, the outcome is the expected number of bikes in a given station and time of day and the attributes are past bike observations in that station and the surrounding ones, and weather data such as the temperature. Selecting the right attributes or features is key to achieve good performance and some of the machine learning techniques are more or less sensitive to irrelevant attributes. Below you can see the attributes that I used.

List of attributes


The predictions are made at 20, 40 and 60 min from the current time using data from The Hubway bikes haring system in Boston. For every station we fit a single GBM model using 3 months of historical and weather data and we test it on the last 15 days.  Below we show the mean attribute importance plot aggregated over all the stations. Note, that as expected, the most important attribute is the hour of the day followed by the station activity, measured as the standard deviation of the past 6 observations.

Mean Attribute Importance over all stations (61)


Why is it important to make predictions?

Being able to make predictions is important because:

  • The operator can make better informed decisions
  • It reduces the overall repositioning costs
  • It increases the system performance and user satisfaction
  • We move from a reactive approach to a proactive approach
  • From a mathematical perspective, we can reduce the complexity of the problem allowing for real time decision making
  • It has the potential to modify riders behavior in advanced and allow for rider-based rebalancing policies (eg. suggesting a rider to get a bike from another station well in advance)

The predictions are the building block of the comprehensive framework to model the rebalancing operations of a bike sharing system that I propose.


About Bike Sharing Rebalancing

Bike sharing is a sustainable and environmentally friendly transportation mode that offers bikes ‘‘on-demand’’ to improve daily urban mobility. However, although bikesharing systems potentially offer a viable alternative for enhancing urban mobility, they suffer from the effects of fluctuating spatial and temporal demand. In other words,  the number of bikes and docks available in any given station heavily depends on its location and the time of day, this dynamics introduce inefficiencies in the system, such as having empty or full stations for long periods of time.

station_dynamicsFor example, in this plot we can see the mean number of bikes  for different days of the week and times of day over a period of 3 months.  Note how the demand patterns are different on weekdays and weekends and that during the day there is a sudden increase on arrivals at around 8AM and departures at 16 PM.

The issue of this demand pattern is that during a day, some stations will be empty and some will be full, degrading the level of service.  Furthermore, there is a cascade effect, where when a station becomes full, the nearby stations will also get full quickly, as riders that arrive to the full station are looking for docks to park their bikes. As a result, bike sharing operators are forced to “rebalance” the system by repositioning bikes in real time, and they usually do so by loading and unloading bikes to a fleet of vans that travel around the city. In New York City, due to congestion, it is more efficient to do it by using a rickshaw-like trailer.

Jesse Winter -
Jesse Winter
Darrow Montgomery
Darrow Montgomery

You may wonder if this also occurs in other vehicle sharing systems, such as car sharing, for example. The simple answer is that it only occurs on systems where one-way trips are allowed, meaning that you can pick a vehicle in one station and drop it to another.

Most of car sharing systems do not allow for one-way trips. Look at ZipCar for example. Well, after all, repositioning a bike is much easier than repositioning a vehicle, but, what will happen if the autonomous vehicle becomes mainstream?

Efficiently solving the bike sharing rebalancing problem has been my main research focus. To solve it, you need to answer the 3 following questions:

  • Which stations will be empty or full?
  • How many bikes do we need to add or remove?
  • Which is the optimal route of the repositioning vehicle?