Private AI

Making Privacy Accessible

Anonymize data at source
The Galatea Anonymization Suite


Chrome         Firefox         Safari

IOS         Android



GALATEA FOR IMAGES AND VIDEO

GALATEA FOR TEXT

Blog
  • Perfectly Privacy-Preserving AI

    What is it and how do we achieve it? We identified four pillars of privacy-preserving machine learning.

    View
  • NVIDIA DALI: Speeding up PyTorch

    Some techniques to improve DALI resource usage and create a completely CPU-based pipeline. Up to 4x faster PyTorch training

    View
  • Homomorphic Encryption for Beginners: A Practical Guide (Part 2)

    The Fourier Transform

    View
  • Which privacy-preserving method should I use??

    A tentative decision tree for the privacy-conscious programmer

    View
  • Homomorphic Encryption for Beginners: A Practical Guide (Part 1)

    The basics of homomorphic encryption, followed by a brief overview of the open source homomorphic encryption libraries that are currently available, ending with a tutorial on how to use one of those libraries (namely, PALISADE).

    View
  • Why is Privacy-Preserving Natural Language Processing Important?

    Why we should bother creating natural language processing (NLP) tools that preserve privacy. Apparently not everyone spends hours upon hours thinking about data breaches and data privacy infringements.

    View
  • A Brief Overview of Privacy-Preserving Software Methods

    Symmetric encryption, asymmetric encryption, homomorphic encryption, differential privacy, and secure multi-party computation.

    View
Research
  • Extracting MFCCs and BFCCs from Encrypted Signals

    We describe a method for extracting MFCCs and BFCCs from an encrypted signal without having to decrypt any intermediate values. To do so, we introduce a novel approach for approximating the value of logarithms given encrypted input data. This method works over any interval for which logarithms are defined and bounded. Extracting spectral features from encrypted signals is the first step towards achieving secure end-to-end automatic speech recognition over encrypted data. We experimentally determine the appropriate precision thresholds to support accurate WER for ASR over the TIMIT dataset.

    View
  • A Critical Reassessment of Evaluation Baselines for Speech Summarization

    We assess the current state of the art in speech summarization, by comparing a typical summarizer on two different domains: lecture data and the SWITCHBOARD corpus. Our results cast significant doubt on the merits of this area's accepted evaluation standards in termms of: baselines chosen, the correspondence of results to our intuition of what "summaries" should be, and the value of adding speech-related features to summarizers that already use transcripts from automatic speech recognition (ASR) system.

    View
  • Convolutional Neural Networks for Speech Recognition

    We show that further error rate reduction can be obtained by using convolutional neural networks (CNNs). We first present a concise description of the basic CNN and explain how it can be used for speech recognition. We further propose a limited-weight-sharing scheme that can better model speech features. The special structure such as local connectivity, weight sharing, and pooling in CNNs exhibits some degree of invariance to small shifts of speech features along the frequency axis, which is important to deal with speaker and environment variations.

    View
  • Privacy-Preserving Character Language Modelling

    Some of the most sensitive information we generate is either written or spoken using natural language. Privacy-preserving methods for natural language processing are therefore crucial, especially considering the ever-growing number of data breaches. However, there has been little work in this area up until now. In fact, no privacy-preserving methods have been proposed for many of the most basic NLP tasks. We propose a method for calculating character bigram and trigram probabilities over sensitive data using homomorphic encryption.

    View
  • Flexible Web document analysis for delivery to narrow-bandwidth devices

    We propose a set of baseline heuristics for identifying genuinely tabular information and news links in HTML documents. A prototype implementation of these heuristics is described for delivering content from news providers' home pages to a narrow-bandwidth device such as a portable digital assistant or cellular phone display. Its evaluation on 75 Web sites is provided, along with a discussion of topics for future research.

    View
  • Web-Based Language Modelling for Automatic Lecture Transcription

    Universities have long relied on written text to share knowledge. As more lectures are made available on-line, these must be accompanied by textual transcripts in order to provide the same access to information as textbooks. While Automatic Speech Recognition (ASR) is a cost-effective method to deliver transcriptions, its accuracy for lectures is not yet satisfactory. One approach for improving lecture ASR is to build smaller, topic-dependent Language Models (LMs) and combine them (through LM interpolation or hypothesis space combination) with general-purpose, large-vocabulary LMs. In this paper, we propose a simple solution for lecture ASR with similar or better Word Error Rate reductions (as well as topic-specific keyword identification accuracies) than combination-based approaches. Our method eliminates the need for two types of LMs by exploiting the lecture slides to collect a web corpus appropriate for modelling both the conversational and the topic-specific styles of lectures.

    View
Team

Gerald Penn, PhD

Co-Founder and CSO

Gerald Penn is a Professor of Computer Science at the University of Toronto, where he studies spoken language processing and computational linguistics. He has over 100 publications, with the top one accruing 1,581 citations. He is a senior member of IEEE and AAAI, and a past recipient of the Ontario Early Researcher Award. His lab revolutionized speech recognition with its work on neural networks, which received the IEEE Signal Processing Society's Best Paper Award. He has led numerous research projects, including ones funded by Avaya, Bell Canada, CAE, the Connaught Fund, Microsoft, NSERC, the German Ministry for Training and Research, SMART Technologies, the U.S. Army and the U.S. Office of the Director of National Intelligence. Gerald has also worked at Bell Labs and NASA.

Nina Haikara, MEd

Communications and Marketing Lead

Nina Haikara the Communications Strategist for the University of Toronto's Faculty of Law. She has over 12 years of experience in communications management and has a mater's degree in higher education theory and policy.

Peizhao Hu, PhD

Faculty Affiliate, Security Research

Peizhao Hu is an Assistant Professor in the Department of Computer Science at Rochester Institute of Technology (RIT), New York. His research focuses on (1) privacy-preserving cloud data analytics, specifically homomorphic encryption and multiparty computations; (3) distributed systems, including mobile and pervasive computing. Before joining RIT, he was Senior Research Engineer at NICTA (Australia's centre of research excellence; now Data61@CSIRO).


Advisors
UofT            GAN
NVIDIA             ventureLab             NextAI

Contact Us

Interested in a demo? Email us at info@private-ai.ca
Twitter LinkedIn