The NRZI method
USB data transfer is basically a two-conductor signaling method. When the data is '1', the signal does not change. When the data is '0', it does change. This is the so-called Non Return to Zero Inverted (NRZI) method. It follows that there is no explicit clock on the USB cable (this compounds the problem). Rather, the signal is restored based upon the intervals between the edges of the data. In this type of digital communication, if the sender uses a perfect clock to create the signal and the receiver uses a perfect clock to interpret the data, the original data can be reconstructed. Since NRZI reconstruction is possible once there is a clock at four times the bit rate, it can be accomplished if both the sender and receiver use 48KHz clocks (the transmission rate is 4 x 12 Mbps). However, when viewing this from the standpoint of an audio device, the very fact that the sender and receiver both have local clocks becomes a stumbling block.
The evil of clocklessness
The fact that there is no clock line within the USB cable leads to a thinner cable. That is an advantage. But, no matter how good the crystal oscillators are at the send and receive ends, there will always be some difference between the two. For example, if the sender is sending audio data at a rate of 48.001KHz and the receiver is receiving at 47.999KHz, the receiver is reconstructing data slightly slower than the transmission rate. When a large quantity of audio data is sent under these conditions, the buffer will soon overflow. This results in lost data.
On the other hand, if the receiver is running faster, an underflow will occur. This results in a discontinuity of audio data. In a CD player, angular control can be used to control the motor such that it will synchronize with the playback data rate. But the USB receiver cannot control the sender. The resulting missing data can be digitally compensated for using a smoothing filter. However, our company's development philosophy does not allow for such deception! (As an aside, there is no problem at all if the data is reconstructed with the receiver's clock after it has all been sent.)
Dealing with the PC (software) is a bigger problem than the specification
We began the development with the USB interface. The USB specification itself is not particularly complicated but we came to understand that the real problem is the software preloaded on the PC. Even if a USB-DAC meets the USB specification, it is useless if it will not operate under Windows or Mac OS. There actually exist USB speakers that will not work with certain PCs. Even though these USB speakers fulfill the USB specification, that alone is not enough...
A FIFO is used to deal with packets that are not in order
USB sends audio data packets in 1ms intervals. Since -- as mentioned previously -- drop-outs in audio cannot be tolerated, audio playback begins when the first data packet arrives. The next packet must arrive before all of the data in the previous packet has been played. Although we are discussing audio packets in particular, it is possible for the order of packets to be disrupted by other USB packets. In other words, a FIFO large enough to hold at least two packets is required to deal with the possible change of order. In the case of dealing with 48kHz/16-bit stereo data, the buffer capacity must be at least 48 x 16 x 2 x 2 = 3,072 bytes...
USB clock error uses up the FIFO!
On the other hand, the USB specification allows for a clock frequency error of 500ppm. This is an easy-to-accomplish specification for a crystal oscillator. It makes the design of the USB circuitry rather easy. However, this is an error allowance between send and receive clocks and poses a problem for audio. In this case, the read and write clocks for the FIFO are different. As the 500ppm error accumulates, the 1 packet buffer margin will be completely used up in 2,000 packets. Since 1 packet is 1ms, 2,000 packets works out to 2 seconds. If one packet is lost and the device jumps to the next, a popping sound will be heard.
A clock-tracking PLL circuit is essential
It would never do to create a DAC that makes noises every few seconds (or even every few dozen seconds depending upon the clock precision). In order to avoid this, it is necessary to have the receive clock track the send clock. For this reason, we determined that it was necessary to have an excellent C/N performance phase-lock loop on the chip... By integrating the PLL, we were able to develop a USB-DAC that did not make popping noises. At this point in our development, it was human nature to want various people to hear the results. So we demonstrated it to all of those purported to be golden ears. The audio signal came through the PCM1716, a DAC with an industry-wide reputation. The PLL was the PLL1700, which has excellent C/N performance...
The distortion is an order of magnitude too high
When the guys in charge listened to the prototype, I saw dubious faces. I was asked a variety of questions such as "Is the source coming from the PC corrupted?" In the end, I was told to measure the audio performance. When I announced the results in a subsequent meeting, I was told the distortion was an order of magnitude too high; THD+N was 0.03%. I wondered what was wrong with 0.03%. I was told that "we could never sell a device with this performance as one of our own." For a 16bit/48kHz system, I would have to achieve at least 0.003%!
I was faced with a problem. Some asked, "Is the digital data getting corrupted somewhere?" But rigorous VHDL simulations did not locate such a bug. For the first time I had the feeling that analog is awful. I went into thinking "since we are processing digital signals, we can expect good sound as a matter of course, and from here on we are dealing with digital!" So this experience was a real shock...
Even for a sample rate of 44.1kHz, the USB isochronous mode packets have a period of 1ms (1kHz). In order to distribute 44.1kHz across 1ms intervals, one 45-sample packet is sent for every nine 44-sample packets. The tracking pulse (as we will call it here) for every 45 sample packet occurs once every 10 packets or with a frequency of 100Hz. Since the PLL loop filter, a so-called low pass filter, has its corner in the tens of kHz range, this 100Hz tracking pulse goes right on through and shows up on the PLL's VCO control voltage. It appears as frequency jitter...
The terrors of the isochronous mode
There's another problem with USB mode: in the adaptive isochronous audio transmission mode, the receiver has to determine the bit rate. This means that the bit rate is unknown prior to the time the data arrives. It cannot be known prior to actually observing the packet. Another terror of USB is that, according to the specification, it would not be unusual for the bit rate to change when the operating system is busy. Since the packets arrive in 1kHz intervals, the PLL must lock within 1ms. In most PLLs, if we say that 1kHz fluctuations are clearly audible and decrease the gain, we cannot track! Terror of terrors, we have just bumped into a brick wall. Upon doing some investigation, we were actually able to observe fluctuations in the audio frequency characteristics of one company's USB-DAC. Upon listening, this could be detected as a disruption in the rhythm of the music...
Also, for isochronous USB data, a buffer is necessary for the time between the beginning of the packet until PLL lock. The more audio quality is pursued, the longer the necessary buffer and the longer the time lag when playback begins...
Feedback control alone doesn't cut it
... The biggest problem is that the 1kHz feedback frequency is smack dab it the middle of the audio range. If the loop filter characteristic is shifted toward the low end, the lock-up time become too long. If the PLL loop filter does not receive a reference signal for several clock cycles, it does not lock. For several days I debated this within my own head: "If I don't use feedback the sound skips. If I do, distortion arises..."
The author describes the final solution: "SpAct deals with this using a two-stage structure: (1) The Time-optimal PLL concept is used and the sender's frequency is estimated. (2) After estimating the frequency, stabilization is accomplished using feed-forward control techniques and crystal oscillator-like performance is preserved..."
Art Dudley's review. Further measurements -- on a 1543-based DAC that, like the Shigaraki converter, uses an S/PDIF input but was then also measured with a USB link -- can be found at Pedja Rogic's site and in post #42 and higher on the DIYHiFi.org pages [his DAC pictured above]. Pedja was surprised to find that the USB-linked jitter measured slightly superior to the S/PDIF feed. (He also measuring the 1kHz spike already reported on by Mr. Kondoh above as a function of the USB data transmission protocol). In the Serbian designer's words, "USB proved not to be a bottle neck". He further concluded that previously published poorer measurements were "mostly caused by the implementations and not by the USB interface as such." Comparing different TDA 1543 chips of different origins, he also found that "the distortion content differences between particular TDA1543 samples are more important than those between S/PDIF and USB DAC versions."
What does all this technical background data suggests? Simply that present-day implementations of USB DACs have earned measured equality with their traditional S/PDIF brethren. They now mandate consideration on even footing. Though USB is still predominantly computer-based, that doesn't make it an inferior format. Naturally, there's another variable when using a CPU-based "transport" - the noisier fan-cooled PC environment over traditional digital disc spinners. The latter lack non-audio computing circuits and fans. This has given rise to an emerging category of products: the dedicated hard-disk music server. Think VRS Systems and Linn, for example.
Adds Gordon Rankin to link the above historical overview with the present state of affairs: "Jitter is mostly caused by the clocking of data from the receiver chip (be it S/PDIF or USB) and the converter chip (the TDA1543/N2 in our case). In S/PDIF land, this is compounded by the fact that the clock and data are integrated into one signal. For years I thought -- as others did -- that most S/PDIF cables where the same. But S/PDIF receivers are really taxed by the cable and the transmitter feeding the cable. That is why there is a difference in sound quality when DACs are driven by different transports and cables.
Mr. Kondoh's article deals with the first generation of silicon for USB Audio, the PCM2702. At that time, the engineers where thinking more like S/PDIF than they where USB. That's why there was more jitter in the PCM2702 part and why the engineers had such problems getting it to work to their satisfaction back then. The second and third generation parts are much better at developing the low jitter clocks. The engineers realized, hey, we only have 3 frequencies to deal with - 32kHz, 44.1kHz and 48kHz. Let's not look at this like SPDIF. Let's look at this like USB.
I have a calibrated frequency meter that shows variations in clocks. With the 2nd and 3rd generation silicon from TI/Burr Brown, these (word) clocks are rock-solid on the correct frequencies. I have several S/PDIF transports that vary widely in (word) clock output between the first and last track of a CD. Clearly the USB technology is moving along in the right direction.
And one thought on the jitter testing done by Stereophile and Pedja. This technique is called sideband testing and is performed in the analog domain. Therefore the outcome of these tests is completely dependent on the analog output stage. It has been said that non-OS type DACs will not give as accurate a set of jitter measurements because of this. I also agree with Pedja in regards to DAC chips and differences. This is why we socket the TDA1543 so I can hand-select the converter chip for each of my products."
Phew. Enough already with all that tech stuff. The next and most important step of this review would have involved the ripping of some of my own music to Gordon's Mac. Ditto for my personal laptop (Windows) to prepare apples-to-apples comparisons against my Zanden front-end for a price-no-object I²S reference, then against the far more sanely priced tube-buffered Canary Audio CD-100 and Consonance CDP-5.0 Droplet players. Alas, at this junction of the process, Mr. Rankin expressed his severe displeasure at my "train wreck" of a presentation above. In his mind, "the real killer thing about this technology is the error-free rips to hard drive, streaming, play lists, movies. Sure the Brick is great but really, people can enjoy this without it and that is the story that needs to be told."
That's exactly where I was headed - to discover the merits of this approach in actual use. Alas, I'm extremely uncomfortable with having a manufacturer tell me how to conduct a review - especially when it was he who approached me in the first place with the review solicitation, suggesting he trusted me enough to handle this subject appropriately. Apparently not. I have thus decided to return the review equipment to its maker while still publishing what now amounts to a mere introduction. However, it might prove useful to readers who consider this very subject: traditional CD-based digital vs. computer-based music server. All the technical evidence I was able to collect suggests strongly that USB as a digital interface today is as valid an approach as the Sony/Philips connection. Hard-disk servers with their menu-driven interfaces certainly offer all manner of snazzy new programming and access features the old digital could merely dream of. What this new way sounds like you'll unfortunately have to find out elsewhere. My apologies...