High Efficiency Speaker Asylum: Wait just a minute.... by Jon Risch

Geddes:"But then there is the recent study (Geddes and Lee, Dec 2005 JAES) on the audiblity of nonlinear distortion in compression drivers - it was inaudible at all drive levels up to the thermal limit of the highest power drivers available."

I just recently saw this post

So I looked back over my JAES and re-read your Engineering Report "Subjective Testing of Compression Drivers", by Earl R. Geddes, Lidia W. Lee, and Roberto Maggalotti.

In it, you state flat out in the Summary and Conclusions section: " ..nonlnear distortion in a compression driver is simply not a factor in it's sound quality."

I think that these are some pretty bold statements, and should be considered quite carefully before anyone would accept them as some sort of fact.

You seem to be basing your statements on the subjective listening test you performed as the subject of the engineering report. However, I was shocked and appalled at how incomplete and sketchy the details were for this controversial result.

There were several things that leaped out at me when I read this report.

Primarily, the biggest one was there was no information on the details of the listening test subjects: how many there were, who they were, the extent of their listening test training (if any), how the listening test data was organized, if it was averaged across the entire listening subject population, or if individuals were looked at for individual statistical significance, etc. In other words, we only get to see the final result of whatever you choose to present, without any details on the inner worings. This is not typical of serious articles on listening test results, it is more typical of casual listening test reports that aren't meant to be considered as evidence.

As one of your results, you state that a p=0.203 was indicative of no significant differences for level, yet this amount would mean that there had been some amount of correlation, and that one could expect these results to come up due to random chance every 5th listening test. However, if this kind of result came up more often than 1 in 5 times, it could be considered as potentially significant. Since we do not know how many listening subjects were lumped together to reach this statistic, it could be the simple case that sensitive listeners were lumped in with the entire listening population. Or not, but we simply don't know from the contents of the paper.

Secondarily, a big assumption was made that checking distortion on the compression driver at (or near) full power (approx. 100W, 28 V of drive stated), and then just around 3 dB down from that and only another 3 dB down from that, were the best levels to use. These levels exclude where most of the listening occurs, at power levels of a few watts and below. The distortion changes as shown in Fig. 1 as you went from 28 to 20 to 14 V of drive level, were in the range of -20 to -25 dB in the bottom of the driver's range, to around -12 to -18 dB for the upper portion of the driver's frequency range.

These translate to distortion levels of 10% and 25% respectively, a large amount of distortion at any of these elevated drive levels. Yet if I understand the listening test correctly, it was these levels that were compared to one another to see if they sounded any different from one another. At these levels of distortion, wouldn't everything tend to sound the same, since there would only be small changes in the distortion at these power levels, and no opportunity to compare against lower power levels where most real world listening is done? This seems like stacking the deck in a most aggressive way, such that it would virtually guarantee that there would be no significant difference between the different drive levels tested. Aside from the absolute levels used, having level changes of roughly only 3 dB and 6 dB are actually very small changes in operating levels, this also begs the question of why were such small steps choosen.

Other important issues concerning the listening test details need to be addressed as well: was there any form of control for these tests? In other words, just how sensitive were these listening tests, what were they inherently capable of detecting? If no controls were run, then the only thing that can be said, is that they appear to have detected the frequency response deviations from driver to driver as audible. Given that we do not even know what those deviations are, we can not even use this information to decide how sensitive the listening test was, we have zero information given in the paper on just how sensitive these listening tests actually were.

Since I myself have conducted and participated in controlled listening tests, I know how hard it can be to hear past all the procedure and hoopla, the mechanics of "taking the test", so I find myself concerned about several of the other details that were given in the engineering report. The source material consisted of just one 15 second musical segment, 15 seconds from "Burning Down the House (live), by the Talking Heads. This was the sole piece of music used to determine whether or not the musical segments being presented for evaluation were the same or not. I find this single source of stimulus being used as utterly amazing, as if we can all hang our hats on this one snippet of a pop rock cut!

It is common practice to use more than one song for serious listening tests, just to make sure that the musical segment did not fail to excite the potential sonic differences. Then, on top of that, it is only 15 seconds long. This is a relatively short periood of time, and even if it were carefully chosen to be as "busy" as possible, this is still not a lot of musical information to be making judgements on.

In my own experiences with listening tests, it takes a certain minimum amount of time to "latch on" to what is going on sonically, that is, you can not instantly begin to determine what sounds different, it takes some mental ledgerdomain and analysis to begin to notice what is actually going on. With a 15 second segment, you would be just about settling in, and WHAM, it's over. This would not lend itself to being able to detect subtle differences at all. Rapid short comparison pieces are typically used for CODEC testing, because they can be selected to highlight one particular misbehavior of the CODEC, but these same short segments are NOT very good for general purpose listening tests, as they do not explore all the possible problems or sonic issues.

Adding to this, it seems that the minimum number of comparisons was 27 of these 15 second segments repeated twice? If the listening subject failed to achieve a certain amount of consistency, then apparently, you made them listen to more comparisons, up to as many as a total of 45 paired comparisons? This is approx. 13 1/2 minutes of serious and hard listening for just the 27 pairs, but if you had to go the full 45 pairs, that's 22 1/2 minutes of sheer listening.

In my own experiences, it is hard to continue past about 10-15 minutes of hard serious listening at one sitting, after that, listening fatique set's in, and the results tend toward random. Your amount of time spent listening is border line here, and this combined with the additional stress of trying to analyze such a short segment, would tend to reduce the sensitivity of the listening subjects.

There were other concerns as well, the use of a Turtle Beach Santa Cruz sound card as a playback and recording device, where it played the test cut, recorded the output of one of the compression drivers, and then played it back for the listening test. Thus the test signal passed through the sound card three times, out and in and out again, before it was finally heard by the listening subject.
I also question the use of a Crown Macro-Tech 5000VZ to drive the compression drivers, this is a 5000W power amp typically used to drive subwoofers to deafening levels. Such high power amplifiers are usually NOT known for their finesse in driving tweeters, or high resolution full range speakers with the utmost delicacy. More probable loss of resolution capability.

So it looks to me like there were many steps that were not taken to maximize the resolving power of the listening test, no steps taken to determine what resolving power the listening test actually had, and no way of knowing if this particular listening test could have detected the difference between an 8 track tape and an SACD, or between anything else.

But let's put all of that aside, let's look at just the result of the listening test: statistically significant results for driver to driver differences (assumed to be only differences in frequency response), but a failure to achieve statsticaly significant differences with drive level changes. This is merely a failure to achieve the pre-selected level of statistical significance, and nothing else. But wait, you are not merely stating the factual and scientific conclusion of the test results, you are doing an additional thing: you are taking a null result (a failure to achieve positive results), and claiming that it was a negative, that there WERE no differences due to level. Then, you take this and further embellish it by stating that this means that there were no differences due to non-linear distortion.

Even if we ignore all of the above voiced concerns and accept the assumptions made by the experimenter, we still can not turn a simple null into a very specific iron clad negative. This is NOT true science, it is not accepted practice amoung statisticians, it is not accepted anywhere as a valid way of looking at test results, except by folks with an agenda in hand.

I question these results so strenuously, because they go against my considerable experience, both professionally with compression driver distortion, and as a practitioner of controlled listening tests over the details of the listening test itself.

I find it distressing that the AES would allow this engineering report to be published with the wording it contains, that of a solid and certain negtative result, when it would be a tottering house of cards anywhere else.

I reject your conclusions as stated, and feel that this engineering report does a grave disservice to the audio community, as well as the engineering community, due to the exaggerated nature of the results as explained.

Jon Risch