dc.description.abstract | This study takes mode inference as an example to explore the usefulness of mobile phone data in the area of transportation planning. Traffic data – consisting of activity location, origin-destination pair, mode choice and traffic assignment – are essential in transportation planning. Collecting such data via a questionnaire survey, like the home or roadside interview, have long been adopted, but are usually (1) labor intensive, (2) faced with high refusal rates of respondents, and (3) relatively inaccurate due to fade-away memory. Attempts have been made to use GPS data, but GPS data are not readily available and their levels of accuracy are apt to be affected by the shielding effect due to high-rise buildings and obstacles and, hence, are not suitable to be applied in a large transportation network. Mobile phone data, emerging as a vivid data collection method for transportation planning, can automatically and effectively record transportation planning data in time-space dimension without having to add new devices. Thus, the extra cost to retrieve this phone data is small or even negligible. For this study, we adopt two supervised machine leaning methods – support vector machine (SVM) and deep neural network (DNN) – to investigate how modal features (travel time, starting time of trace, traversal speed between traces, maximum speed, and average speed), time of day (peak hours, off-peak hours, whole day), route combinations (bus route, vehicle traversing a bus route, vehicle traversing a non-bus route), and training methods (SVM and DNN) affect accuracy in inferring transportation modes (either bus or vehicle).
The results show four factors – (1) five modal features, (2) whole day data, (3) all bus and vehicle routes combined, and (4) SVM –result in better performance than other combinations in terms of an accuracy index (96.58%) or confusion matrix. Unfortunately, modal travel time between an origin and a destination in the scenario with five modal features can only be obtained by a field survey, which is costly. A second choice (consisting of four modal features – starting time of trace, traversal speed between traces, maximum speed, and average speed) can be used at an acceptable price (accuracy decreased from 96.58% to 74.21% in our experiments). The effort involved in using this four modal feature scenario in large scale networks can be reduced further by classifying used routes between O-D pairs into groups with between-group similarity minimized and within- group similarity maximized. For each group, only one route is taken for training using field survey data and for validation using smart card data; the obtained result is applied equally to other members in the same group. With expected advances in mobile phone infrastructure and technology, higher accuracy in inferring transportation modes using mobile phone data can be anticipated in the near future.
Also worthy of mention is that a novel method for elimination of the oscillation phenomenon has been proposed in this research to correct possible mistakes made by the available methods that have appeared in the literature.
| en_US |