I've looked around in this community and found a few folks asking about playing VoiceMessage.wav files on their iPhones. Some can, some can't. Those that can, it seems, may not be using G729-encoded WAVs.
But, before everyone pounces:
I've written a G729 (and Annex A, B and AB) decoder using the CCITU reference code as a starting point for the iPhone/iPod Touch. The reference bitstreams CCITU provide work just fine. (simply: I can get an e-mail with a G729 WAV attachment and play it on my iPod Touch, via my app; porting this to some other platform wouldn't be much of a hassle - licensing the IP, on the other hand...).
Looking at a VoiceMessage.wav file I received from a friend who has Unity, I see that the wave format tag (RIFF fmt chunk, bytes 9 and 10) come back as 0x0133 - that's defined as, WAVE_FORMAT_SIPROLAB_G729 (sorry, I wasn't shouting - I was copying/pasting).
After stripping away the RIFF header (RIFF, fmt ,fact and DATA chunks), I have what should be G729 data. It isn't - or, at least, it is in some sort of packed format.
G729 bitstreams have a SYNC_WORD of 0x6B21, then a 50 (dec: 80 - the number of bytes in the frame) delimiting each frame. Every 164 bytes, this repeats. That's not the case with the WAV I received. The person who sent it to me is in a position to know her system uses G729. The WAV indicates it is G729. There are no SYNC_WORDs anywhere and the 50s are scattered randomly (they're probably data at this point).
If that's the case, where do I find information about that extra layer of compression? Could it be the 80-byte packets are there, and that the SYNC_WORDs were left out, to be put back in at read time? (after all, if you know you're going to be processing 80-byte chunks, why bother writing out the SYNC_WORDs?) hmm...