Speakers Data Set
The Speakers data set is an audio data set recorded by Numenta in the San Francisco Bay Area earlier this year. The speakers vary in age, gender and ethnicity. All included speech was read from a provided prompt and varies in duration from 1 to 2 minutes. A table that relates recordings to speaker characteristics is bundled with the data.
The files ending in .wav are digital audio recordings in the WAVE file format, and are single channel uncompressed linear 16-bit samples at a sampling rate of 16 KHz. The recordings have been given names to allow quick identification of the speaker, gender and passage read. As an example, in the file name 109_male_p1_r1.wav, the first number (109) specifies the speaker identification code, followed by the gender, the text passage read and finally the recording index. The contents of the text passages can be found in license.pdf.
The file data.csv contains a table of filenames, recording details and self-reported speaker characteristics. Each recording is listed on a separate line of the file, and all lines contain the same fields. The first line of the file provides a header with the field names. The available fields in data.csv are:
- Speaker: an integer index uniquely identifying the speaker
- Gender: 'M', or 'F', indicating a male or female speaker
- Age: age as an integer
- Birthplace: country of birth
- PrimaryLanguage: primary language
- EnglishProficiency: English proficiency as 'Native', 'Proficient' or 'Not Proficient'
- Microphone: the manufacturer of the microphone used to record the audio
- Room: the location where the recording was captured
- Recording: the index of the recording if the recording was repeated
- Passage: the index of the passage being read. The text can be found in license.pdf
- Filename: the name of the file where the audio is stored. The directory name is
share/projects/speech/data/numenta_speech/small
The two additional files bundled with the data set, speaker_mapping.csv and gender_mapping.csv provide fixed mappings between category names and numeric category labels needed to train NuPIC HTM networks. These files are used to configure the sensor node that supplies the HTM network with input. Each file contains one header row and one row per available category label, all stored in the CSV format. For example, the contents of gender_mapping.csv read:
| Id, | Gender |
|---|---|
| 1, | F |
| 2, | M |
By consistently using this file to configure HTM experiments, we can be sure that all gender identification experiments use the same numeric category mapping, simplifying the analysis of HTM classification performance and errors.
Recruitment and Recording
Recruitment of subjects was between April and June 2008. All subjects were recruited and recorded in the San Francisco Bay Area. All subjects were age 18 or older. Subjects were asked to sign a release, complete a short demographic survey and read from several text passages. More subjects were recorded than are distributed with this example.
Recordings were taken indoors in a quiet, but not actively sound-controlled, environment. Recordings were done with an off-the-shelf microphone, and the initial recordings were sampled digitally at 44.1KHz with 32-bit floating point samples using recording software and a laptop.
For distribution with the NuPIC 1.6 software release, recordings were down-sampled to 16ÊKHz and encoded as 16-bit linear pulse-code modulation (PCM) little-endian integer samples. The down-sampling and encoding reduce the size of the audio files 5-fold for distribution with the NuPIC software release, but may change some characteristics of the recording.
|
Downloading Speakers
LicensingThe Speakers data set is covered by the 'Creative Commons Attribution-ShareAlike 3.0 United States' license. More information can be found on the Creative Commons Website. TERMS / COPYRIGHTThis data set is owned and copyrighted by Numenta and is provided under a Creative Commons License, specifically the license titled "Creative Commons Attribution-ShareAlike 3.0 United States". This license is included as a separate file. The license essentially allows you to use, copy, distribute, and present the data. You must attribute the data set as specified below. You are allowed to modify, transform or build upon this work but you must distribute the results under a similar or compatible license. For the specific restrictions, please refer to the license. ATTRIBUTIONWhen discussing results based on this data set, we would appreciate appropriate acknowledgement. Any publications should refer to Numenta (www.numenta.com). The data set should be referred to as the "Numenta Speech Data Set, Version 1.0". |