Extracting Samples from VOCALOID Libraries

(This tutorial only works with VOCALOID versions 2 through 4.)
I’m not good at writing tutorial introductions so I’ll just jump right in. I’m also not very good at writing tutorials at all, so please point out if a section is written horribly and you can’t understand what I’m trying to say.


  • Audacity
  • A VOCALOID library (It can be legally owned, FE, AE or whatever. The format doesn’t change)
  • Probably a Windows computer or Mac since those are the only OSes you can get VOCALOID on
  • a brain

here’s where the real tutorial starts
Open Audacity, and go to File > Import > Import Raw… and locate whichever library you want to extract samples from, and open that library’s *.DDB file. (On Windows, I believe all installed libraries are in C:\Program Files (x86)\VoiceDB) The import settings that work with ALL libraries are as follows:

Encoding: Signed 32 Bit PCM
Channels: 1 (mono)
Sample rate: 22050Hz​

Once you’ve imported it, you should get a chunk of audio that’s, well, maybe a few hours or so long. Don’t worry though, there’s not actually that much audio. Like 50% of what’s there is some other kind of data. Actually, probably more than 50% of it. Anyways, now comes the part where you start to actually rip the samples.

Zoom in on a portion of audio (preferably the start) until you see something that looks like normal waveforms. These, as I’d imagine you should already know, are your samples. To save them, what I do, is I select not only the sample, but a bit of the area around it and save that, then trim off anything I don’t need later. Just repeat that until you’ve got all the samples.

i guess i’m done writing the tutorial part so uh

Q&A thing to answer any questions you might have

  • Q: Is sample injecting possible too?
  • A: Sorta. I was able to inject one sample into a library and get the V4 engine to render it correctly, but I kinda forgot how I did it, and it’s a major pain in the ass to do.
  • Q: Why can’t this be done with the original VOCALOID engine?
  • A: Because V1 libraries were based off of analysis of the human voice and not actual samples like V2 and later versions.
  • Q: Are samples in any particular order?
  • A: Yup, they’re usually (sorta) in the order that phonemes are arranged in the *.VVD file that comes with the library. The best way I can explain it is: in Japanese libraries, the order samples are in in the *.DDI is a, i, M, e, o, etc. This means that in the *.DDB of a tri-pitch library, the samples should be ordered like this: a (pitch 1), a (pitch 2), a (pitch 3), a i (pitch 1), a i (pitch 2), a i (pitch 3), etc. I hope that makes at least SOME sense.
This tutorial was originally written by me on GBATemp

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s