Avatar

Patrick Lumban Tobing

Speech Machine Learning Researcher

San Mateo, California, United States

patrickltobing@gmail.com


Skills

Voice Conversion and Speech Synthesis

Machine Learning and Neural Network

Python/PyTorch/TorchScript/JIT

C/C++/Intel AVX


Languages

Indonesian

English

Japanese



Links

Google Scholar

LinkedIn

GitHub

Presentation with a Live Demo

Demo Page

ORCiD (peer-reviewer record)



Work Experience

Senior Research Engineer @ Sony R&D, United States
Dec 2023 - Current

AI/ML for Speech Synthesis and Processing


Applied Scientist II @ Prime Video, Amazon Studios, Amazon UK
Nov 2021 - Oct 2023

Project: End-to-End Speech Synthesis and Voice Conversion with Human-Level Performance and Quality on Highly Expressive Multilingual Noisy Real-World Data


Postdoctoral Researcher @ Toda Laboratory, Nagoya University, Japan
Apr 2020 - Oct 2021

Project: Low-Latency Real-Time High-Quality Voice Conversion with Neural Vocoder Trained on Non-Parallel Multispeaker and Multilingual Data


Researcher @ Takeda Laboratory, Nagoya University, Japan
Oct 2019 - Mar 2020

Project: Automatic Detection of Risky Driving Scenes with Variational Autoencoder and Clustering Technique on Multimodal Driving Data


Part-Time Researcher @ Toda Laboratory, Nagoya University, Japan
Nov 2016 - Sep 2019

Project: Development of Voice Conversion and Neural Vocoder Techniques


Intern @ Communication Science Laboratory, NTT, Japan
Jun 2017 - Aug 2017
Jun 2016 - Aug 2016

Project: Latent Trajectory Modeling for Acoustic-Articulatory Mapping


Part-Time Web Developer @ Bandung Institute of Technology, Indonesia
Aug 2014 - Sep 2014

Project: High-School Curriculum Education Web System


Intern @ Augmented Human Communication Laboratory, NAIST, Japan
Jun 2013 - Aug 2013

Project: Speech Modification with Acoustic-Articulatory Mapping based on Gaussian Mixture Model

Selected Publications

Journals
  • P. L. Tobing, K. Kobayashi, and T. Toda, “Articulatory controllable speech modification based on statistical inversion and production mappings,” IEEE/ACM Trans. Audio, Speech, and Lang., vol. 25, no. 12, pp. 2337—2350, 2017. [PDF]
  • P. L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, and T. Toda, “Voice conversion with CycleRNN-based spectral mapping and finely tuned WaveNet vocoder,” IEEE Access, vol. 7, pp. 171114—171125, Dec. 2019. [PDF]
  • P. L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, and T. Toda, “An evaluation of voice conversion with neural network spectral mapping models and WaveNet vocoder,” APSIPA Transactions on Signal and Information Processing, vol. 9, E26, Nov. 2020. [PDF]

International Conferences
  • P. L. Tobing, K. Kobayashi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, "Articulatory controllable speech modification based on Gaussian mixture models with direct waveform modification using spectrum differential,” in Proc. INTERSPEECH, Dresden, Germany, Sep. 2015, pp. 3350—3354. [PDF]
  • P. L. Tobing, T. Toda, H. Kameoka, and S. Nakamura, "Acoustic-to-articulatory inversion mapping based on latent trajectory Gaussian mixture model,” in Proc. INTERSPEECH, San Francisco, USA, Sep. 2016, pp. 953—957. [PDF]
  • P. L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, and T. Toda, "NU voice conversion system for the Voice Conversion Challenge 2018", in Proc. Speaker Odyssey, Les Sables d’Olonne, France, Jun. 2018, pp. 219—226. [PDF]
  • P. L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, and T. Toda, "An evaluation of deep spectral mappings and WaveNet vocoder for voice conversion,” in Proc. IEEE SLT, Athens, Greece, Dec. 2018, pp. 297—303. [PDF]
  • P. L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, and T. Toda, "Voice conversion with cyclic recurrent neural network and fine-tuned WaveNet vocoder,” in Proc. ICASSP, Brighton, UK, May 2019, pp. 6815—6819. [PDF]
  • P. L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, and T. Toda, "Non-parallel voice conversion with cyclic variational autoencoder,” in Proc. INTERSPEECH, Graz, Austria, Sep. 2019, pp. 674—678. [PDF]
  • P. L. Tobing, T. Hayashi, and T. Toda, "Investigation of shallow WaveNet vocoder with Laplacian distribution output,” in Proc. ASRU, Sentosa, Singapore, Dec. 2019, pp. 176—183. [PDF]
  • P. L. Tobing, Y.-C. Wu, T. Hayashi, K. Kobayashi, and T. Toda, "Efficient shallow WaveNet vocoder using multiple samples output based on Laplacian distribution and linear prediction,” in Proc. ICASSP, Barcelona, Spain, May 2020, pp. 7204—7208. [PDF]
  • P. L. Tobing, T. Hayashi, Y.-C. Wu, K. Kobayashi, and T. Toda, "Cyclic spectral modeling for unsupervised unit discovery into voice conversion with excitation and waveform modeling,” in Proc. INTERSPEECH, Shanghai, China, Oct. 2020, pp. 4861—4865. [PDF]
  • P. L. Tobing, Y.-C. Wu, and T. Toda, "Baseline system of Voice Conversion Challenge 2020 with cyclic variational autoencoder and Parallel WaveGAN", in Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, Shanghai, China, Oct. 2020, pp. 155—159. [PDF]
  • P. L. Tobing and T. Toda, "Low-latency real-time non-parallel voice conversion based on cyclic variational autoencoder and multiband WaveRNN with data-driven linear-prediction,” in Proc. Speech Synthesis Workshop 11, Budapest, Hungary, Aug. 2021, pp. 142—147. [PDF]
  • P. L. Tobing and T. Toda, "High fidelity and low-latency universal neural vocoder based on multiband WaveRNN with data-driven linear prediction for discrete waveform modeling,” in Proc. INTERSPEECH, Brno, Czech Republic, Sept. 2021, pp. 2217—2221. [PDF]

Education

Nagoya University, Japan
Oct 2016 - Mar 2020

Ph.D. Degree in Information Science

Thesis: High-Quality and Flexible Voice Conversion Techniques based on Statistical Spectral and Waveform Modeling [PDF]


Nara Institute of Science and Technology (NAIST), Japan
Oct 2014 - Sep 2016

Master's Degree in Information Science

Thesis: Articulatory Controllable Speech Modification using Statistical Feature Mapping Techniques [PDF]


Bandung Institute of Technology, Indonesia
Aug 2010 - Jul 2014

Bachelor's Degree in Informatics Engineering

Thesis: Indonesian Text-to-Speech System based on Hidden Markov Model [PDF (Indonesian)]

Awards

Reviewer for MDPI Journals (Applied Sciences, Entropy, Algorithms, Information, Acoustics)
2021 - 2023

[Certificate]


NEC C&C Grant for Non-Japanese Researcher
Apr 2018 - Mar 2019

Topic: Silent Speech Enhancement in Noisy Environments with Air- and Body-Conducted Speech Processing


Half Tuition Fee Exemption for Ph.D. Course in Nagoya University, Japan
Oct 2016 - Sep 2019

Best Student Presentation Award in 2015 Spring Meeting of Acoustical Society of Japan
Mar 2015

Presentation Title: Articulatory controllable speech modification based on Gaussian mixture models with direct waveform modification using spectrum differential


MEXT (Monbukagakusho) Scholarship
Oct 2014 - Sep 2016

Scholarship and Full Tuition Fee Exemption during Master’s Course in NAIST, Japan