VCX. Automatic byte order detection

Subscribe to RSS news feed

Endianness

Term endianness usually means byte ordering in memory storage of data types larger than one byte. For example, a word (two bytes) hexadecimal value 0x1873 may be stored in adjacent memory bytes as the following:

1) least significant byte (LSB) followed by most significant byte (MSB), also called little-endian layout:

memory addressvalue
base+00000x73
base+00010x18

2) MSB followed by LSB, also called big-endian layout:

memory addressvalue
base+00000x18
base+00010x73

Same applies to larger data types, for example integer (four bytes) value 0x8E160A3D may be stored as:

1) little-endian layout:

memory addressvalue
base+00000x3D
base+00010x0A
base+00020x16
base+00030x8E

2) big-endian layout:

memory addressvalue
base+00000x8E
base+00010x16
base+00020x0A
base+00030x3D

There are also other layouts possible, like middle-endian, but they are out of scope of this article. Please refer to

http://en.wikipedia.org/wiki/Endianness

for more information about endianness.

PCM audio samples in audio streams

In this article we define PCM sample as a 16-bit integer value, ranging from -32768 to +32767 decimal, or from 0x8000 to 0x7FFF hexadecimal. Audio stream consists of adjacent PCM samples:

[sample N] [sample N+1] [sample N+2] [sample N+3] ...

Stream may contain more that one channel, in which case they interleave in the steam, but in this article we deal with mono streams only, as automatic detection of byte order in non-mono streams using the algorithm described below may not always produce good results.

Let say we want to store the following PCM samples in memory: 0x0234, 0x0123, 0xFEDC and 0xF987. Samples in mono streams are always stored one by one without interleaving. Depending on organization of memory storage it could be done as:

1) little-endian layout:

memory addressvalue
base+00000x34
base+00010x02
base+00020x23
base+00030x01
base+00040xDC
base+00050xFE
base+00060x87
base+00070xF9

2) big-endian layout:

memory addressvalue
base+00000x02
base+00010x34
base+00020x01
base+00030x23
base+00040xFE
base+00050xDC
base+00060xF9
base+00070x87

Note the sample boundaries, marked with different colors here. Depending on layout bytes may be stored in different order within the boundary, but they never "cross" it. That is important for later discussion.

Automatic byte order detection in uncompressed audio stream

When dealing with uncompressed audio streams, especially when they are being transferred over network, care must be taken to retain proper byte order of audio samples in the stream.

Not only the byte order may differ, but the boundary of samples may be unknown. This may happen when data is being transferred over unreliable protocol, like UDP.

Let assume we have received the following sequence of bytes from the network:

stream offsetvalue
+00000x02
+00010x34
+00020x01
+00030x23
+00040xFE
+00050xDC
+00060xF9
+00070x87
+00080xF0

(Yes, that is PCM samples in big-endian layout from the previous example plus one additional byte, but our algorithm must work correctly without this hint :)

We know there are two possible interpretations of this sequence (big- and little-endian), but if we do not know the boundaries of samples, two more additional combinations are possible, making it four different interpretations in total:

1) little-endian, boundaries as is (last sample is ignored, since we have only one byte for it. This last byte will be added at the beginning of next sequence of bytes when it will be received from the network):

sample ##byte valuesample value (decimal)
000x020x3402 (13 314)
0x34
010x010x2301 (8 961)
0x23
020xFE0xDCFE (-8 962)
0xDC
030xF90x87F9 (-30 727)
0x87
040xF00x??F0 (??)
??

2) big-endian, boundaries as is (last sample is ignored, since we have only one byte for it. This last byte will be added at the beginning of next sequence of bytes when it will be received from the network):

sample ##byte valuesample value (decimal)
000x020x0234 (564)
0x34
010x010x0123 (291)
0x23
020xFE0xFEDC (-292)
0xDC
030xF90xF987 (-1 657)
0x87
040xF00xF0?? (??)
??

3) little-endian, adjusted boundaries (we simply ignore the first byte):

sample ##byte valuesample value (decimal)
--0x02--
000x340x0134 (308)
0x01
010x230xFE23 (-477)
0xFE
020xDC0xF9DC (-1 572)
0xF9
030x870xF087 (-3 961)
0xF0

4) big-endian, adjusted boundaries (we simply ignore the first byte):

sample ##byte valuesample value (decimal)
--0x02--
000x340x3401 (13 313)
0x01
010x230x23FE (9 214)
0xFE
020xDC0xDCF9 (-8 967)
0xF9
030x870x87F0 (-30 736)
0xF0

Notice that sequence of bytes is always the same, and it only the matter of interpretation how to convert it into PCM samples.

Now our task is to decide, which interpretation (byte order and boundaries) should be chosen as proper representation of audio signal.

Let take a look at a sine with PCM samples taken periodically. Each vertical line represents one PCM sample: Click for full sized image

As you can see, adjacent samples do not differ much, but rather have a tendency to change slowly into some direction. That assumes audio signal has enough low frequencies for selected sampling rate.

We can use that tendency as a basis of our algorithm. We calculate the sum of differences between adjacent samples in all four interpretations as the following:

1) little-endian, boundaries as is:

sample ##sample valuedifference
0013 3140
018 96113 314 - 8 961 = 4 353
02-8 9628 961 + 8 962 = 17 923
03-30 727-8 962 + 30 727 = 21 765
 Sum:44 041

2) big-endian, boundaries as is:

sample ##sample valuedifference
005640
01291273
02-292583
03-1 6571 365
 Sum:2 221

3) little-endian, adjusted boundaries:

sample ##sample valuedifference
003080
01-477785
02-1 5721 095
03-3 9612 389
 Sum:4 269

4) big-endian, adjusted boundaries:

sample ##sample valuedifference
0013 3130
019 2144 099
02-8 96718 181
03-30 73621 769
 Sum:44 049

The final step is to select the interpretation with minimal sum of differences between samples. In our case it is the interpretation number two — big-endian, boundaries as is (exactly as hint has suggested :).

Notice, that interpretations 2) and 3) are almost similar, same as interpretations 1) and 4). That is because shifting the boundaries by one byte is almost similar as switching from little- to big-endian, when samples are audio samples, i.e. do not differ much from each other.

As you can see, even for such a sort sequence of bytes it is possible to detect proper order of PCM audio samples. The longer the sequence, the better should be the guess. In real applications it is usually enough to analyze about 1/20 sec of audio signal (400 samples for 8000Hz sampling rate).

Please also note, that if you are using reliable protocol (like TCP), the boundaries and byte order may be guessed only once and then applied to all subsequent data. When using unreliable protocol (like UDP), it may be necessary to apply the guess at each sequence (data packet).

The way it works in VC components/VCX library

Our VC components and VCX library products do include byte order and samples boundaries auto detection algorithm described above. By default it is turned off, so you have to choose one of the following methods if you wish to enable it for incoming and/or outgoing streams:

methoddescriptionproperty value
Don't care (default)Auto detection is disabled.unasbo_dontCare
SwapAlways swap the bytes. This converts audio stream from big-endian to little-endian and vice versa. Useful for output streams, or input streams with known byte order and boundaries transferred over reliable protocols.unasbo_swap
Auto-detect onceEnable byte order and boundaries auto-detection algorithm. It analyzes the first data packet only, and apply same order and boundaries for all subsequent packets. Useful for reliable protocols (like TCP).unasbo_autoDetectOnce
Auto-detect continuouslyEnable byte order and boundaries auto-detection algorithm. It analyzes all data packets as they arrive. Useful for unreliable protocols (like UDP).unasbo_autoDetectCont

Assign the selected property value to streamByteOrderInput and/or streamByteOrderOutput property to specify which method to be used with incoming or outgoing data.

Limitations

  • IPServer or IPClient must be working in RAW streaming mode
  • 16-bit uncompressed mono audio streams only