Web Data Mining & Public Mobility Analysis


The purpose of the project is three-folded

  1. To understand public movement from web data (data from microblogging, data from mobile com- pany and etc.) Data on microblogging is multi-dimensional, with high mobility, and with massive information. The utilization of such data would support decision making of smart city in Shanghai.
  2. Use classification algorithms to investigate the functionality of each area. The result of this study facilitates city planning, such as bus route and scheduling, school and hospital location.
  3. Analyze the content on microblog to understand the client need. The result could help enterprises to provide more targeted services and help the government in crisis management.

To achieve this project we propose 3 stages

  1. classification algorithms and evaluation of classification : Compare application of silhouette method to evaluate the right number of classes with DBSCAN, compare different number of classes, and calculate the silhouette for the different cases. The objective is to write a common conference or journal paper together.
  2. comparison of methods on different territories (ISC DBSCAN, UTBM k-means), exchange of data map describing the territory, change the data format, apply the algorithm in both teams, exchange the results.
  3. put the classified map data and the Weibo data together to generate information, understand and explain why the people are in the given area (shop, restaurant…), linked together the mobility and the purpose of the mobility: correlation study between Weibo and Map in space and time (PCA).