The FeatSynth Tutorial - Part I

version: 0.1.0 (greetSynth)

In this tutorial, we will build and save a simple FeatSynth object, then use it to resynthesize a short .wav file.

1. Download and/or build the FeatSynth framework and command-line utilities.

2. Make sure the bin directory is in your path.

3. Type (all on one line)

featsynthmaker -S SinSynth -S SinSynth -S WhiteSynth -E CentroidExtractor -E 
RolloffExtractor -M StdMetric -P StdMetric -O GeneticOptimizer -C LSHFCache 
-n 100 -w 1024 -W 0.1 components/ssw-cr.fs
This command should create a FeatSynth component named ssw-cr.fs and save it in the components directory.

The various options specify what components to use to build the FeatSynth:

-S SinSynth -S SinSynth -S WhiteSynth: Specifies that the ParamSynth (parametric synthesizer) associated with this FeatSynth should be a mixture of two sinusoids (SinSynths) and a white noise generator (WhiteSynth).
The parameters that will control this synthesizer are:
0: gain for first SinSynth
1: frequency for first SinSynth
2: gain for second SinSynth
3: frequency for second SinSynth
4: gain for WhiteSynth

-E CentroidExtractor -E RolloffExtractor: Specifies that the FeatureExtractors used to control this FeatSynth should be spectral centroid (CentroidExtractor) and spectral rolloff (RolloffExtractor), two measures of "brightness."

-M StdMetric: Specifies that the Metric to be used to compare feature vectors should be a standard Ln metric with default settings (in this case an L2 norm).

-P StdMetric: Specifies that the Metric to be used to compare parameter vectors should be a standard Ln metric with default settings (in this case an L2 norm).

-O GeneticOptimizer: Specifies that the Optimizer used to find good parameters for the ParamSynth to match the desired features should be the GeneticOptimizer, which uses a genetic algorithm to search for optimal parameters.

-C LSHFCache: Specifies that the MappingCache used to store and retrieve previously computed mappings from parameter vectors to feature vectors should be the LSH forest-indexed database (LSHFCache), which uses an efficient approximate nearest-neighbor indexing algorithm (as described here).

-n 100: Specifies that the MappingCache should be populated initially by generating 100 parameter vectors at random, feeding them to the ParamSynth's synthesize() method, and extracting the features from the resulting audio using the FeatureExtractor.

-w 1024: Specifies that the windows generated and tested to initially populate the MappingCache should be 1024 samples long.

-W 0.1: Specifies that the overall distance metric should be weighted so that the distance between the last and next parameter vectors accounts for 10% of the overall error, and the distance between the target and found feature vectors should account for 90% of the overall error. Specifying higher values may lead to less choppy sounding audio, but potentially at the expense of matching feature values less accurately.

components/ssw-cr.fs: Finally, the last argument specifies what to call the resulting FeatSynth component and where to save it.

4. Choose an input audio file (we'll call it test.wav) containing some music. Make sure it's in PCM format (e.g. .wav, .aif, etc.) NOT mp3, m4a, aac, flac, etc. In all the examples below, replace test.wav with the path/name of your input file.

Type

fextractor -w 1024 -h 512 -e components/ssw-cr.fs test.wav > scores/test.fsc
This command applies the FeatureExtractor in ssw-cr.fs (which, if all went well in step #3, contains a CentroidExtractor and RolloffExtractor) to the input audio file test.wav in windows of 1024 samples taken every 512 samples and stores the results in a FeatScore file, test.fsc. If you examine test.fsc, it should look something like this:
512
2
[3 numbers]
[3 numbers]
...
[3 numbers]
The first number is the hop size (512 samples), which specifies how long a window of audio each line describes. The second number is the number of features besides RMS power (a strong correlate of perceived loudness) that each line contains. The following lines consist of three numbers each, which are the feature values for successive windows of audio: RMS power, spectral centroid (from CentroidExtractor), and spectral rolloff (from RolloffExtractor). We will use this FeatScore file in the next step to resynthesize some audio roughly matching the brightness and loudness of test.wav, but otherwise bearing little resemblance to the original.

5. Type (again, all on one line)

featscore -t 0.015 -i 20 components/ssw-cr.fs scores/test.fsc
audio/testresynth.wav
You should see a large amount of output scrolling by telling you how the synthesis is progressing. Essentially, featscore is using the FeatSynth you created in step 3 to try to synthesize a series of frames of audio that match the features you extracted in step 4 as closely as possible. This should be pretty fast, since there are only two dimensions to optimize. (RMS power is matched automatically by simply modulating the overall gain appropriately.)

-t 0.015 specifies that the FeatSynth should optimize until it gets the combined error (as determined by its Metrics) down below 0.015, then move on.

-i 20 specifies that the FeatSynth should optimize for a maximum of 20 iterations before moving on.

the last three arguments specify the FeatSynth component to use, the FeatScore file to synthesize from, and the audio file to store the results to, respectively.

6. Listen to testresynth.wav! You should have a file that matches the centroid, rolloff, and RMS power of test.wav fairly closely. It probably sounds a bit...strange. This is because

1. We're using a pretty dinky additive synthesizer.
2. We're only using a couple of features that capture only a little of the richness of the original audio.

Compare testresynth.wav with test.wav and see if you can hear the resemblance.

7. If you want to, you can repeat steps 3-6 replacing test.wav with another PCM audio file of your choosing. Or you can repeat step 5 after fiddling around with the feature values in test.fsc (in MATLAB, for example). Or you can write your own .fsc file from scratch and use that instead of test.fsc. But be aware - not all combinations of features are possible. For example, it is mathematically impossible for a signal's spectral rolloff to be significantly lower than its spectral centroid.

Additionally, you can try adding the spectral flatness feature to the mix by inserting "-E FlatnessExtractor" into the argument list in step 3 and then repeating steps 4-6 as well. This will help featscore to match the noisy/pitchy quality of the input. (Strongly depending, of course, on how you define "noisy" and "pitchy".)

---

Moving forward:

For more detail on how to use the other command-line utilities that come with FeatSynth, please refer to this page.

For more information about writing new applications using FeatSynth, please refer to the API documentation



featsynth | soundlab | cs | music