Image, video, and text de-identification at source:
We describe a method for extracting MFCCs and BFCCs from an encrypted signal without having to decrypt any intermediate values. To do so, we introduce a novel approach for approximating the value of logarithms given encrypted input data. This method works over any interval for which logarithms are defined and bounded. Extracting spectral features from encrypted signals is the first step towards achieving secure end-to-end automatic speech recognition over encrypted data. We experimentally determine the appropriate precision thresholds to support accurate WER for ASR over the TIMIT dataset.View
We assess the current state of the art in speech summarization, by comparing a typical summarizer on two different domains: lecture data and the SWITCHBOARD corpus. Our results cast significant doubt on the merits of this area's accepted evaluation standards in termms of: baselines chosen, the correspondence of results to our intuition of what "summaries" should be, and the value of adding speech-related features to summarizers that already use transcripts from automatic speech recognition (ASR) system.View
We show that further error rate reduction can be obtained by using convolutional neural networks (CNNs). We first present a concise description of the basic CNN and explain how it can be used for speech recognition. We further propose a limited-weight-sharing scheme that can better model speech features. The special structure such as local connectivity, weight sharing, and pooling in CNNs exhibits some degree of invariance to small shifts of speech features along the frequency axis, which is important to deal with speaker and environment variations.View
Some of the most sensitive information we generate is either written or spoken using natural language. Privacy-preserving methods for natural language processing are therefore crucial, especially considering the ever-growing number of data breaches. However, there has been little work in this area up until now. In fact, no privacy-preserving methods have been proposed for many of the most basic NLP tasks. We propose a method for calculating character bigram and trigram probabilities over sensitive data using homomorphic encryption.View
We propose a set of baseline heuristics for identifying genuinely tabular information and news links in HTML documents. A prototype implementation of these heuristics is described for delivering content from news providers' home pages to a narrow-bandwidth device such as a portable digital assistant or cellular phone display. Its evaluation on 75 Web sites is provided, along with a discussion of topics for future research.View
Universities have long relied on written text to share knowledge. As more lectures are made available on-line, these must be accompanied by textual transcripts in order to provide the same access to information as textbooks. While Automatic Speech Recognition (ASR) is a cost-effective method to deliver transcriptions, its accuracy for lectures is not yet satisfactory. One approach for improving lecture ASR is to build smaller, topic-dependent Language Models (LMs) and combine them (through LM interpolation or hypothesis space combination) with general-purpose, large-vocabulary LMs. In this paper, we propose a simple solution for lecture ASR with similar or better Word Error Rate reductions (as well as topic-specific keyword identification accuracies) than combination-based approaches. Our method eliminates the need for two types of LMs by exploiting the lecture slides to collect a web corpus appropriate for modelling both the conversational and the topic-specific styles of lectures.View
Patricia Thaine is a Computer Science PhD Candidate at the University of Toronto doing research on privacy-preserving natural language processing, with a focus on applied cryptography. She also does research on computational methods for lost language decipherment. Patricia is a recipient of the NSERC Postgraduate Scholarship, the RBC Graduate Fellowship, the Beatrice “Trixie” Worsley Graduate Scholarship in Computer Science, and the Ontario Graduate Scholarship. She has eight years of research and software development experience, including at the McGill Language Development Lab, the University of Toronto's Computational Linguistics Lab, the University of Toronto's Department of Linguistics, and the Public Health Agency of Canada.
Pieter Luitjens has a Bachelor of Science in Physics and Mathematics and a Bachelor of Engineering from the University of Western Australia, as well as a Masters from the University of Toronto. He worked on software for Mercedes-Benz and developed the first deep learning algorithms for traffic sign recognition deployed in cars made by one of the most prestigious car manufacturers in the world. He has over 10 years of engineering experience, with code deployed in multi-billion dollar industrial projects. Pieter specializes in ML edge deployment & model optimization for resource-constrained environments.
Gerald Penn is a Professor of Computer Science at the University of Toronto, where he studies spoken language processing and computational linguistics. He has over 100 publications, with the top one accruing 1,581 citations. He is a senior member of IEEE and AAAI, and a past recipient of the Ontario Early Researcher Award. His lab revolutionized speech recognition with its work on neural networks, which received the IEEE Signal Processing Society's Best Paper Award. He has led numerous research projects, including ones funded by Avaya, Bell Canada, CAE, the Connaught Fund, Microsoft, NSERC, the German Ministry for Training and Research, SMART Technologies, the U.S. Army and the U.S. Office of the Director of National Intelligence. Gerald has also worked at by Bell Labs and NASA.
Former Information and Privacy Commissioner of Ontario (3-terms)
International Management Consultant
Vector Institute Affiliate
Assistant Professor, University of Waterloo
Senior Data Scientist, Globe and Mail