{"id":79,"date":"2015-05-06T17:04:19","date_gmt":"2015-05-06T17:04:19","guid":{"rendered":"http:\/\/www.robertregue.com\/blog\/?p=79"},"modified":"2015-05-06T17:04:19","modified_gmt":"2015-05-06T17:04:19","slug":"predicting-which-stations-will-be-empty-or-full","status":"publish","type":"post","link":"https:\/\/www.robertregue.com\/blog\/2015\/05\/predicting-which-stations-will-be-empty-or-full\/","title":{"rendered":"Predicting which stations will be empty or full"},"content":{"rendered":"<p>As I mentioned on my previous post about <a title=\"About Bike Sharing Rebalancing\" href=\"http:\/\/www.robertregue.com\/blog\/2015\/03\/about-bike-sharing-rebalancing\/\">Bike sharing Rebalancing<\/a>\u00a0a key question that I am trying to answer in my research is which stations of the system <strong><span style=\"color: #ff6600;\">will<\/span><\/strong>\u00a0be under unstable conditions, such as empty or full, for example.<\/p>\n<p>To be more accurate, I am not only interested on empty or full stations, rather the exact number of bikes that each station will have in any given time of the day. Under some circumstances it is desirable to have an empty or a full station to allocate the upcoming demand pattern.<\/p>\n<blockquote><p>Can we anticipate when a station is going to be unstable?<\/p><\/blockquote>\n<p>The answer is yes and to do so we can use <strong><span style=\"color: #ff6600;\">Machine Learning<\/span><\/strong> techniques.<\/p>\n<p>Machine learning is\u00a0a good fit to tackle this sort of predictive problem. Machine learning techniques explore and learn from data to build a model that can then be used as a decision making tool or to make predictions given some inputs.<\/p>\n<p>In this particular problem I tested the following techniques: 1) Gradient Boosting Machines, 2) Random Forests, 3) Neural Networks and 4) Linear Regression.<\/p>\n<p>The best model turn out to be Gradient Boosting Machines (GBM). GBM\u00a0repeatedly fit a simple classifier, a decision tree in most of the cases, to a subset of the data, both in terms of the number of observations and the number of attributes used explain the outcome. \u00a0Finally, it aggregates, or makes an ensemble of all the simpler model predictions to make a final decision.<\/p>\n<p>In the current problem, the outcome is the expected number of bikes in a given station and time of day and the attributes are past bike observations in that station and the surrounding ones, and\u00a0weather data such as the temperature. Selecting the right attributes or features is key to achieve good performance and some of the machine learning techniques are more or less sensitive to irrelevant attributes. Below you can see the attributes that I used.<\/p>\n<figure id=\"attachment_81\" aria-describedby=\"caption-attachment-81\" style=\"width: 490px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-81\" src=\"http:\/\/www.robertregue.com\/blog\/wp-content\/uploads\/2015\/03\/Screen-Shot-2015-03-30-at-7.01.06-PM.png\" alt=\"Variables\" width=\"490\" height=\"250\" srcset=\"https:\/\/www.robertregue.com\/blog\/wp-content\/uploads\/2015\/03\/Screen-Shot-2015-03-30-at-7.01.06-PM.png 639w, https:\/\/www.robertregue.com\/blog\/wp-content\/uploads\/2015\/03\/Screen-Shot-2015-03-30-at-7.01.06-PM-300x153.png 300w\" sizes=\"auto, (max-width: 490px) 100vw, 490px\" \/><figcaption id=\"caption-attachment-81\" class=\"wp-caption-text\">List of attributes<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>The predictions are made at 20, 40 and 60 min from the current time using data from The Hubway bikes haring system in Boston. For every station we fit a single GBM model using 3 months of historical and weather data and we test it on the last 15 days. \u00a0Below we show the mean attribute importance plot aggregated over all the stations. Note, that as expected, the most important attribute is the hour of the day followed by the station activity, measured as the standard deviation of the past 6 observations.<\/p>\n<figure id=\"attachment_82\" aria-describedby=\"caption-attachment-82\" style=\"width: 660px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/www.robertregue.com\/blog\/wp-content\/uploads\/2015\/03\/outcomes.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-82 size-large\" src=\"http:\/\/www.robertregue.com\/blog\/wp-content\/uploads\/2015\/03\/outcomes-1024x583.png\" alt=\"outcomes\" width=\"660\" height=\"376\" srcset=\"https:\/\/www.robertregue.com\/blog\/wp-content\/uploads\/2015\/03\/outcomes-1024x583.png 1024w, https:\/\/www.robertregue.com\/blog\/wp-content\/uploads\/2015\/03\/outcomes-300x171.png 300w, https:\/\/www.robertregue.com\/blog\/wp-content\/uploads\/2015\/03\/outcomes.png 1480w\" sizes=\"auto, (max-width: 660px) 100vw, 660px\" \/><\/a><figcaption id=\"caption-attachment-82\" class=\"wp-caption-text\">Mean Attribute Importance over all stations (61)<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<blockquote><p>Why is it important to make predictions?<\/p><\/blockquote>\n<p>Being able to <strong><span style=\"color: #ff6600;\">make predictions<\/span><\/strong> is important because:<\/p>\n<ul>\n<li>The operator can make better informed decisions<\/li>\n<li>It reduces the overall repositioning costs<\/li>\n<li>It increases the system performance and user satisfaction<\/li>\n<li>We move from a reactive approach to a proactive approach<\/li>\n<li>From a mathematical perspective, we can reduce the complexity of the problem allowing for real time decision making<\/li>\n<li>It has the potential to modify riders behavior in advanced and allow for rider-based rebalancing policies (eg. suggesting\u00a0a rider to get a bike from another station well in advance)<\/li>\n<\/ul>\n<p>The predictions are the building block of the comprehensive framework to model the rebalancing operations of a bike sharing system that I propose.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As I mentioned on my previous post about Bike sharing Rebalancing\u00a0a key question that I am trying to answer in my research is which stations of the system will\u00a0be under unstable conditions, such as empty or full, for example. To be more accurate, I am not only interested on empty or full stations, rather the &hellip; <a href=\"https:\/\/www.robertregue.com\/blog\/2015\/05\/predicting-which-stations-will-be-empty-or-full\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Predicting which stations will be empty or full<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[5],"tags":[],"class_list":["post-79","post","type-post","status-publish","format-standard","hentry","category-bike-sharing"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/www.robertregue.com\/blog\/wp-json\/wp\/v2\/posts\/79","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.robertregue.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.robertregue.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.robertregue.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.robertregue.com\/blog\/wp-json\/wp\/v2\/comments?post=79"}],"version-history":[{"count":3,"href":"https:\/\/www.robertregue.com\/blog\/wp-json\/wp\/v2\/posts\/79\/revisions"}],"predecessor-version":[{"id":90,"href":"https:\/\/www.robertregue.com\/blog\/wp-json\/wp\/v2\/posts\/79\/revisions\/90"}],"wp:attachment":[{"href":"https:\/\/www.robertregue.com\/blog\/wp-json\/wp\/v2\/media?parent=79"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.robertregue.com\/blog\/wp-json\/wp\/v2\/categories?post=79"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.robertregue.com\/blog\/wp-json\/wp\/v2\/tags?post=79"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}