Paul M. Aoki

research scientist

Portfolio - data science

In much of my research, I’ve had to do a mix of data engineering, GIS, data science and applied machine learning. (Like most computer scientists, I mainly do this kind of work in Python, dropping into C/C++ for speed and brushing up on my R, Matlab/Octave, etc. for specialized tasks.) Some examples – i.e., those involving projects known to the public – include:

Remote sensing

bgd-150 I applied functional nonparametric statistics (in Python and R) to estimate rainfall statistics using satellite radar data where long-term ground-based measurements are unavailable.

P.M. Aoki. Functional Data Analysis for Rain Rate Statistics. Project report, CS 229 (Machine Learning), Stanford Univ., 2017. [PDF]
plains-150 I built deep convolutional neural network pipelines (both ab initio and transfer learning, using Keras over TensorFlow) for multispectral satellite imagery to estimate rainfall rates where ground-based measurements are unavailable.

P.M. Aoki. Convolutional Neural Networks for Precipitation Estimation from Geostationary Satellite Imagery. Project report, CS 231n (Convolutional Neural Networks for Visual Recognition), Stanford Univ., 2017. [PDF]
trmm-150 At Google Access, I built a global model of rainfall rate statistics (using Python and C++) that used 20 years of satellite radar data to improve upon the then-current ITU-R prediction model). The goal was to understand where point-to-point millimeter-wave radios would be appropriate for backhaul networks (particularly in developing countries).

P.M. Aoki. New Rain Rate Statistics for Emerging Regions: Implications for Wireless Backhaul Planning. arXiv:1609.00426, 2016. [PDF]
ssa-150 A more primitive version of the project above addressed a (similarly speculative) question for high-throughput communications satellite coverage in various spectrum bands.

System modeling

cover-150 The Access Strategy & Operations team looked at many methods to provide broadband Internet access to developing countries. I worked with a team of ex-Big Three analysts to develop business models based on modeling RF capacity within service areas and estimated broadband demand within them. That’s standard telecom practice, but some of the methods under consideration involved coverage from low Earth orbit and the stratosphere so I wrote software (mostly Python) to model this.
zar-150 A more down-to-earth option involved use of television white space (TVWS) spectrum to provide wireless backhaul over large service areas. Given a list of sites, wireless network design can be posed as an optimization problem; I used mixed integer linear programming (MILP) solvers to identify the smallest set of sites with fiber backhaul able to cover a target set of public broadband access sites in various developing countries. (Again, this is standard telecom practice.)

Experimental studies

  At, I used bootstrap methods (in R) to control false discovery rate due to multiple comparisons in an experimental study of mobile data use. (The experiment was conducted using internal logging infrastructure that prevented arbitrary processing of log records, for privacy reasons.)

N. Sambasivan, P. Lee, G. Hecht, P.M. Aoki, M.-I. Carrera, J. Chen, M. Youssefmir, D.P. Cohn, P. Kruskall, E. Wetchler and A.T. Larssen. SmartBrowse: Design and Evaluation of a Mobile Data Price Transparency Tool for Mobile Web Use. Information Technology & International Development 11 (1), 2015, 21-40. [PDF]

Environmental sensing

  At Intel Research, I built logging and processing pipelines (from device firmware to database) for gas and particulate matter pollution measurements collected by three generations of mobile data collection units.

P. Aoki, A. Woodruff, B. Yellapragada and W. Willett. Environmental Protection and Agency: Motivations, Capacity, and Goals in Participatory Sensing. ACM CHI 2017. [PDF]

P. Dutta, P.M. Aoki, N. Kumar, A. Mainwaring, C. Myers, W. Willett and A. Woodruff. Common Sense: Participatory Urban Sensing Using a Network of Handheld Air Quality Monitors. ACM SenSys 2009. [PDF]

P.M. Aoki, R.J. Honicky, A. Mainwaring, C. Myers, E. Paulos, S. Subramanian and A. Woodruff. A Vehicle for Research: Using Street Sweepers to Explore the Landscape of Environmental Community Action. ACM CHI 2009. [PDF]

Speech processing

floors-150 At Xerox PARC, I built a pipeline to segment and visualize utterances captured from multi-party speech (4-10 speakers) to enable human conversation analysts to label the audio segments.

P.M. Aoki, M.H. Szymanski, L. Plurkowski, J.D. Thornton, A. Woodruff and W. Yi. Where’s the `Party’ in `Multi-Party’? Analyzing the Structure of Small-Group Sociable Talk. ACM CSCW 2006. [PDF]