WHAT'S THIS THEN?

Since no one in my real life wants to listen to me drone on about DSD - most people really don't have a clue about the ins and outs of digital music, let alone something obscure like DSD - and the bits involved, I thought a dedicated page for my musings and ramblings surrounding Deffy and its (her?) development wouldn't be a bad idea. I do suspect this page won't get frequent updates, since Deffy is not my only 'spare time' project. But once in a while... and if you're interested...

MIXING TWO BITSTREAMS



I wrote about this in 'The Dreaded PCM - Part III': a method described in a paper I discovered during my research, to mix two single bitstreams into one.

It's based on a logical adder system: add the bits of stream A + B and throw in the sum-bit of a previous iteration as a carry-in. Then use the carry-out bit to build the new stream that reflects the combination of A and B. Simple, elegant, easy to implement.

Well... it does work. You can hear the two pieces of music playing as one, which is kind of funny. No PCM involved at all. But there's too much added noise. And the noise sits in the audible band. So filtering this out? There is some mention of noise, in the paper that describes this method, but I was hoping it would be limited. Sadly it isn't. This is really not acceptable for any serious mixing.

Here you can listen to the amazing Sarah Vaughan being interrupted by a pianist playing Scarlatti, using this method. The overlap lasts for about 30 seconds, then Sarah cuts out (rather abruptly) and the pianist takes over: no more added noise.



And the mix here to download as DFF, since it was converted to WAV to make it playable online.

As you can hear for yourself: this is no alternative to the clarity of a PCM conversion and back.

This paper does mention a second method for mixing bit streams, but that one seems more complicated. It's next on my list to have a go at, if I can figure out how it's supposed to work.

3 August 2024









THE DREADED PCM - PART III



I'm getting very nice results.

But not with the plan from part II.

Although my hunch that it had to do with the sigma-delta proved to be correct, the whole business of upsampling and decimating the DSD failed miserably.

I was getting fine DSD out of any upsampling, even with a crazy 8 times (22Mhz), but I couldn't seem to decimate the bitstream properly. Even DSD128 (2 times upsampling) decimated to DSD64 turned into a horrific mess. I tried several approaches... simple cutting of samples, moving averages, a CIC filter... it was to no avail. Clearly I was missing something: decimation of DSD is not that easy?

What mainly eluded me were the incredibly bad results. I expected decimation of DSD to maybe distort a bit, sound not too great a bit... but I was not getting off-sounding DSD: I was getting complete audio garbage.

So after many attempts and some frustrating hours I ditched that approach and dove into the sigma-delta some more. It was a 2nd order implementation, so why not try 3rd order? Didn't seem too difficult to implement in code. Low and behold: the noise was moving up! From 15kHz or so it now started at 20kHz or so. Finally the result I was aiming for.

Then I went overboard and added another integrator... 4th order and no looking back... voila... the noise now starts at the same level as the original DSD. On some files even higher.

Not that simple though, because the integrators need to be scaled down. Apart from possible overflow errors, that seems to be a requirement. But scaling too much and the noise starts to drip down again, scaling too little and the music disappears completely. That happens when the sigma-delta collapses - sometimes after working fine for a while - and it never recovers, like it's pushed off a cliff.

Several technical papers I read do warn about 3rd and 4th order being potentially unstable and describe exactly this behaviour of total collapse.

From a thesis on sigma-delta modulators:

"Type III systems have a characteristic abrupt onset of instability: the system blows up as the A-min point is reached i.e., the quantizer input suddenly jumps to a high or even unbounded magnitude. The loss of stability is irreversible that means a decreased input magnitude does not reestablish stability unless the system is also reset i.e. the state variables are set to zero." Page 84 - https://core.ac.uk/download/pdf/13738084.pdf

Does this thesis offer a solution? Well, scaling does seem to be part of it. Just how to determine the right value? It's all rather technical so this will take me some time to digest. In the end this might be a deal breaker though, because I now seem to venture into some very complicated territory and I'm not sure I can follow.

I'm also lacking the tools to do some proper analysis on all of my results, but what I can tell by gazing at the spectrum and listening to the results: I think this is good enough for a simple fade out or fade in, where I'm only messing with a tiny portion of the DSD. Provided I can get an easy grip on that instability of the sigma-delta.

In the meantime, whilst researching all of this, I came across an interesting paper explaining how you can mix two bitstreams without resorting to conversions. Bits to bits, stream to stream. Seriously? Because if that works, who needs PCM? Just overlap the files to be glued - for a second or so - and mix them together... silence to silence without clicking and plopping?

Now they tell me...

I'll ramble a bit about that one after trying the proposed method...

28 July 2024









THE DREADED PCM - PART II



To my own amazement, it's actually working... I can now fade out a DSD file by the procedure described in part I.

The last 15 seconds of the file are turned into 2822400Hz PCM, faded out, and then put back as DSD.

Funny enough, the most difficult part was getting the fading algorithm right - still not very happy with it.

So far also no clicking and plopping.

I'm quite surprised by all of this. I mean, in theory it sounds nice enough, but to get it to actually work... not just baking your own bread, but now also making pastry!

Although it's not all peaches and croissants.

A problem seems to be a bit too much noise generated compared to the original DSD. It starts somewhere around 15kHz, albeit at -80db... so when listening to the files it's not noticeable - and obviously only on the fading part, in the final seconds, since the rest of the DSD is untouched - but I am trying to figure out how to get rid of it anyway. After experimenting with six different filters and no change - even a 25kHz filter didn't eliminate it, nor a second low pass filter after creating the PCM, I concluded it's the sigma-delta and not the filtering to PCM. The problem might be that the sigma-delta isn't oversampling, since I go from 2822400Hz DSD to 2822400Hz PCM and then straight to the sigma-delta. Might it be that the humongous PCM actually needs to be upsampled first to even higher and larger and the DSD produced then needs to be decimated? Does that push out the noise to high enough frequencies sufficiently?

All in all not a bad result, however, quality-wise it's not what I was hoping for. So next is trying to upsample the PCM before I push it through the sigma-delta. It does spoil my initial idea: we then do need upsampling and decimating anyway. But at least we go up in the MHz and not down to the kHz.

It all seems to become a bit nutty though, cause do I then need a 22Mhz filter (8x upsampling)? Hmmmm?

Well, since I clearly know nothing, I just have to try and see if this somewhat nutty procedure - if I can get it to work - does shift the noise higher up...

26 July 2024









THE DREADED PCM - PART I



After all the trouble with glueing, and a bit tired of the messing around with bits and bytes and not much progress, I had another brilliant idea on how to solve this business of sticking DSD together. Because let's face it: glueing in itself is already a bit nutty, but this medieval method of just pasting DSD together isn't the most sophisticated, no matter how well you glue the bytes.

So I got interested in conversion to PCM (yes, the dreaded PCM that we do not want in our conversions...). Is it really that bad? Because - apart from probably still clicking and plopping when trying to put converted PCM back into a DSD file - wouldn't it be cool to crossfade DSD files and make a truly spectacular DSD mix-tape?

(If you really need that now, check out Tascam Hires Editor...)

Of course for editing we first need PCM. Can't change the volume for instance on DSD directly... so a fade out or a fade in or overlapping... it first needs to be PCM (or at least some type of representation that can be edited). But just that tiny bit of overlap! Not the whole file. Possible? Of course. But can it then be seamlessly integrated in the original DSD when that little bit of edited PCM is turned back into DSD? Hmmm... that's glueing bits and bytes again... plopping and clicking... apart from any delay a filter caused: finding the exact bit?

What did I get myself into...

By now I do have the DSD to PCM conversion working. With some FIR filters designed with RePhase (free software for designing filters, check it out if your interested in this stuff...). A DSD64 file turns into a 352800 .WAV file (32-bit floats). It's the same method FFMPEG is using (based on dsd2pcm), but I went for a bit more taps in the filter, experimented with different cut off frequencies, wrote my own functions - because I needed to understand the basics - and haven't settled yet on which filter to use.

And what about back, PCM to DSD? Yes that too. Upsample, low-pass filter to interpolate the empty space between samples, and then a 2nd order sigma-delta modulator. Mind you, that one wasn't easy to get right. The noise! But when it finally all came together it was very satisfying to play my self created DSD. Bit like baking your own bread I suppose. Provided it's edible.

But here's the rub: the step to PCM is decimating, and the upsampling requires interpolation. No matter how wonderful my filters, or how smart my sigma-delta, we're losing information. Or at least: we're replacing the original information with a bit of a gamble.

I would prefer a lossless conversion.

Is that possible?

What does 'lossless' even mean?

I've seen endless online debates on that one, so let me first explain my definition (feel free to disagree with it, but at least you then know what I'm talking about when I write 'lossless'): if I go from DSD to PCM and back to DSD, I consider it lossless if the original is the same as the end result, bit for bit. Like going from DSD to DST and back. You should not be able to tell that the DSD went through a filter if you compared the original and the filtered one. Simple: exactly the same. No difference on the bit level.

So can it?

Well... what if we do not use decimation and forego the need to upsample+interpolate and ditch the sigma-delta? What if we produce PCM at the whopping sample rate of DSD? Throw every bit through the filter. The files produced - if you actually would write it as a .WAV file - would be humongous. But writing monstrous .WAV files is not the goal here: the goal is to be able to e.g. fade out at the end or fade in at the beginning and perhaps produce overlap between two files and then turn that PCM back into DSD by running the filter backwards.

Running the filter backwards? Whoa... is that even possible?

Let me disappoint you (unless you're well versed in this stuff and by now think I'm crazy): no, that's not possible. We can't run a FIR filter backwards, since we have no idea what the exact filter component was on the output sample for any given input sample. There's no inverse on a FIR filter. Once it's done it's done (of course, we still have our original DSD, so no real harm)...

So for now, the best I can hope for: turn a little bit of DSD to PCM at that really high sample rate (at least no need to decimate, upsample, interpolate, when we want our DSD back). Then use sigma-delta to get that little piece back to DSD after editing, and hope it fits seamlessly into the original.

My oh my...

Let's start with trying to gradually lowering the volume at the end of a given DSD file... a nice fade out... on the last 5 seconds or so and see if I can get some acceptable results...

24 July 2024









THE TROUBLE WITH GLUEING - PART IV



I was hoping to have glueing for DSF files ready for the next release, but decided to hold off for now.

I did experiment as written in Part III, but as also predicted: it was all rather unsatisfactory.

However, I do have some new ideas after messing around for days with the encoding modules (to get from DSD to DST). I was chasing for hours after a bug this week, which I finally squashed, only to realize how extremely sensitive DACs actually are to misplaced zeros and ones. The music exhibited continuous crackles, after I had made some changes. Then finally after hours of searching, I realized it sounded a lot like the glued seams, which then led me to the culprit (and a satisfactory hurray after implementing what I thought should fix the problem, which it did): no more than 7 (or less) misplaced bits at the end of each frame. Not 7 bytes... 7 bits. At the most 7 bits with a misplaced 0 or 1. And since there are 75 frames per second, the clicks and clacks started to add up.

So, 7 wrong bits (out of almost 38000) in a frame and your music starts to sound like a vinyl from the 40's. I know it probably wasn't vinyl back then, but to the point: that's how sensitive DSD is.

So what does this have to do with the glueing? Well, seeing the sensitivity of DACs and the effect only 7 bits can have on my peace of mind - cause frustration ran high when I couldn't find the cause - I decided we need to go deep with the glueing. This simply needs a much more methodical approach. We need to analyse the last byte, the one right before the new part that is glued. I'm by now convinced it's that one byte causing the audible sputter. I do have a solution that makes the sputter very very faint, but I really want to try to get rid of it completely. On every DAC.

Not sure if I set the bar too high here. We'll find out.

I think glueing silence might not be a bad idea - so far it did give me the best results - but I think now that it has to follow the bitstream. Do not simply paste it in, but make sure if the last bit is a 1, it's followed by a 0. And if the last bit is a 0, let it be followed by a 1. Somewhat like the voodoo I wrote about in part III: but it might not be voodoo after all!

(Of course, what about the first byte of the glued on part then? Can you actually ensure that that one doesn't trip up the DAC if it doesn't fit your neat 010101 or 101010 stream of silence? Probably not... but let's focus on one byte at a time and postpone desperation...).

So I need some more time to experiment. No glueing yet for DSF files in the next release (which will be version 4.5.0). Although I don't think anyone is clamouring for it anyway...

30 June 2024









FOLLOW THE DST BIT ROAD - PART I



So how do you get from DSD to DST?

It's quite brilliant and combines a lot of different techniques that are not in any way part of my background. So it's been an ongoing steep learning curve to try to understand how this code works and what the magic is behind an almost 60% lossless compression rate.

This 'trying to understand' usually starts with figuring out what is actually happening in the code: what bits and bytes are moved to where. Once that's understood I can more or less safely try to optimize that code, or rewrite it, without major accidents.

But then comes the more difficult task of figuring out 'why'. Not strictly necessary, but very helpful. That's mainly a lot of research on the internet (ChatGPT has been very helpful too... I've really bonded with Chat over all this...).

It's all part of signal processing: a finite impulse response (FIR) prediction filter - with prediction signals and residual signals (difference between original and prediction) - through matrix decomposition and linear solving, then on to filter coefficients, probability tables, half probabilities, and then an additional step Philips left out of their original code - not sure if they did that on purpose or designed it in later code - to pack all those results and compress them in their own way, using Rice Coding (as in the guy who invented it, Robert F. Rice, from the Jet Propulsion Laboratory) - without that step we still get very nice compression, but not to the max - and then to top it off at the end, arithmetic encoding on the actual music bits, which as I understand it now, packs that bitstream in one gigantic number (as long as necessary decided by ever decreasing intervals that come from probabilities), based on all that earlier gathered information.

During the encoding there's also a step that figures out if the compression is worth the effort - apparently in some cases this might all lead to more bytes than we started with or just a minor reduction that's not worth the effort. In such a case, the original DSD is written into the frame and on to the next one...

So yeah... I just throw it out there for you to grasp the mountain of stuff that needs to be conquered to understand this. Unless your background is in this field of course, then my rambling will most likely sound familiar (and leave much to be desired). But hey, we're never too old. I view this as an excellent opportunity to learn some new things (actually a lot of this is very established techniques... arithmetic coding for instance can be traced back all the way to the 1960's).

So a lot of preparation, pack that preparation and compress it in its own way, then have all that preparation serve the arithmetic encoding, which neatly compresses the actual bitstream to a minimum.

I think that sums it up.

When decoding, all that stuff is unpacked and then worked out the other way around (which luckily can be done a lot faster than the encoding, else your hardware DAC or SACD player would struggle). But since the decoding is very much dependent on the encoding, there's not many ways you can achieve the encoding part. It all needs to follow very strict rules basically set by Philips.

So far I've 'decoded' the first steps of the encoding. I can't say those are the most important steps, because none of the steps can be skipped: they're all important. This is tight code. Make one mistake, misplace a brace or skip an integer, shift a bit too far or a byte too little and it collapses. Not just ending with weird sounding DST, but immediate mayhem, as in complete garbage coming out of the speakers, with no inkling as to how it should sound. Buzzing, hissing, or some crackles and then complete silence... I've heard it all. But assuming the code is fine when the DST sounds okay is also not safe, because there are more subtle ways to distort the music: e.g. with little static noises. I have to be very careful and listen with headphones after I make changes, to make sure I haven't messed up anything. And sometimes it sounds fine but the compression completely failed: back to the last changes and what did I break now...

If you think about it, it's also more or less proof that the compression is lossless (not that I ever doubted that claim, but some people always do...). It's a bit stream. You cannot have any degradation from a lossy compression, because it will manifest itself immediately in the output in a very nasty way. You'd completely lose the music after two or three compression / decompression cycles (or perhaps even after one cycle already) and end up with just noise.

That dot product of vectors I wrote about in an earlier ramble, that's the vectors for the correlation matrix to be build. That code has completely changed and I'm now using SSE instructions (at least in the 64-bit version). My initial idea the slowness of the compression was mainly due to that first part of the code was wrong. Not sure how I reached that faulty conclusion: it's also all the stuff that follows.

So... after that Toeplitz matrix (a diagonally repeating matrix, thanks Chat) is set up, it gets decomposed into its lower triangular half (via Cholesky) and then solved to its coefficients, by combining the decomposition and the correlation vectors in a neat calculation step of linear solving.

We then have a prediction filter and its coefficients, based on that vector dot-product and its 128 shifted steps, which in its turn was based on the original bitstream of one frame per channel.

So far so good...

Hopefully in Part II I can tell you what we then do with that filter...

25 June 2024









THE TROUBLE WITH GLUEING - PART III



I've finally gotten around to testing my wonderful 'Click and Plop protection' on a hardware DAC (that doesn't seem such a difficult thing but fact is I'm a bit of a nomad, and currently unable to access all the stuff in my man cave - I have to do it with an Astell & Kern portable DAC, which requires a bit more setup).

To not keep you in suspense any longer: the news is not good.

The seams with plop protection play marvelous on my Foobar setup... not a peep or squeak, but my A&K, a design that plays DSD natively, doesn't seem to care about my protection and plops away rather nastily.

I'll have to revisit this subject once more.

First off, what if I made the following mistake: I look back into the last frame, to count the zeros. Then insert silence to cover up those zeros on a 1024 alignment. But then the next channel comes. What if that block has less or more zeros? The DSD silence might be misaligned between the channels.

So before I try anything new, let's first look into that problem, and figure out if that might be causing my A&K spitting in my ears.

If that's sorted and things still don't work then I'll try three things I think.

1 - Extend the frame with silence to cover the whole frame, no matter how many zeros. However, I do not really see the benefit. Clearly my A&K is not charmed by inserted silence... shifting it to earlier, how's that gonna help? So yeah, I'll try, but I predict failure on this one (unless it's some frame alignment issue...).
2 - This is a long shot and seems a bit like voodoo, but desperate times: try silence starting with a 0. Yes, officially DSD-silence is defined as 0x55. That's decimal 85 and binary 01010101. Reading the byte from right to left, it's a 1 first. Let's try 10101010. That's decimal 170 and hex 0xAA. I got this idea from the DST decoder, which initializes status bytes for the filter alternating between the two. So I can't exclude it might make a difference. Perhaps it should even be a choice depending on the last bit, before pasting any silence bytes. If the last bit is a 1, paste 0x55 (reading from right to left) else paste 0xAA.
3 - Try a quick 'music shifting'.

Essentially the third idea I already mentioned in a previous ramble: shift the start of one track into the last frame of the previous track. That resembles the glueing of DFF files. The problem there is, if you think about it: the zeros might not start for all channels at the same time (as mentioned earlier). Which is actually a perfect excuse to not try to fiddle too much with bytes and offsets, but just sacrifice that last frame and then glue. However, similar to my objections of idea 1... why would that not plop and crack..?

Frankly I do not have high hopes my A&K will be happy with any of these solutions, but I won't know until I try...

23 June 2024









A BIT ABOUT DST



So, quite a bunch of bugs fixed in the last two weeks. Most of it collateral damage from release 4.0.0, which was a big one, with many changes and additions. Hence, quite a few bugs too.

I'm preparing now for version 4.5, which has several smaller improvements and one big change in the DST implementation. But you won't notice an under-the-hood upgrade like that. And you shouldn't. Where conversion from DST to DSD is concerned (in converting a compressed DFF to DSF or stripping the compression away), everything should work as it did. However, in development this ate away quite a bit of time, since the DST to DSD conversion code isn't the easiest to work on.

As written before, I've also gotten my hands on the reverse code: from DSD to DST. And it's working. Albeit incredibly slow. I wonder how computers 20 years ago liked this. Having a 386 or so, churning away the whole night on a few gigabytes that needed to go onto an SACD? Seeing the speed now, a night might have been too short (depending a bit on which season).

The strange thing is that one of the slowdowns occurs quite early on in the algorithm. I've tried six different approaches to that piece of code (not done yet trying), but wasn't able to shave off more than perhaps 5%. I guess trying it through SSE instructions might work, although most compilers are already optimized to use those instructions, even if the programmer doesn't. The code is calculating a simple dot product of vectors. A typical problem where SSE shines. But these vectors are long (around 38000 elements), and the whole procedure is repeated 128 times, shifting a step. So walking 38000 elements, multiplying them with other elements, and that 128 times over. And that's only for one channel. Now do a file with 6 channels? Or worse, try a DSD128 with 6 channels, and the elements double to 76000. You see the conundrum. And this kind of code can be optimized of course, but you can't change the calculation itself. We need that data. There's a limit to speed gain and at that point you just have to endure it.

In the end (at least to be able to test with some speed) I decided to speed it up with threads. Twenty-five of them attacking one file. That works. But due to the overhead it's still not very fast. So why not fifty threads, or a hundred? Well, then you get bogged down a bit by the creation and destruction of threads. There's a balance to these things. And to be able to compress really large files you also can't do this 'in memory', but have to read it in chunks from the file. That's also not the fastest. As a comparison: the decoding, with just one thread, is still about three times faster than the encoding with twenty-five threads.

Anyway, it's far from ready to introduce and frankly I'm also unsure where I stand with this one. Am I violating any licenses or patents, when I throw it out there? On the other hand, I don't see the big difference with decoding, and the code on how to do this was made public years ago.

I've searched for patents surrounding DST and could only find one, from 2004. It was held by Philips and talks not so much about DST itself, but about the CRC capabilities within the DFF specs. Apparently they believed DST in combination with CRC - and its block nature - could be used as a secure transmission system (secure as in error free) for anything, not just music. But seeing the amount of processing power this compression needs, I'm not so sure how practical the idea was, especially back in those days. I mean, if this is still slow on a 13th generation i7? I'm not using CRC and that particular patent expired in 2010: Philips didn't pay the legal fees (which somehow strikes me as quite funny - apparently they moved on). So I believe on that issue I'm okay. Patents in general have a shelf life of I believe 20 years, so even if there are others on this issue, they shouldn't be alive for much longer.

Still I wonder why the compression can't be found in abundance if there are no issues? I'll search some more... In the mean time I'll keep reworking this code. It's quite a fun project.

17 June 2024









THE TROUBLE WITH GLUEING - PART II



After writing part I (scroll a bit down for that one), I thought: what I actually should do is shift the music from the next track into the ending block of the previous one...

But although that would probably work, and is quite elegant a solution, that would land me in a lot of offset headaches and complicated code to work out the new sizes, alignment issues etc... a bit of a nightmare, with no actual guarantee it will be plop and crackle free, until I try.

So no, from a coding perspective that's a nasty solution. Let's stick to these Sony blocks for now.

What I did try was to replace the zeros with silence. That worked, a bit. To my surprise not all of the time and not on all channels (also a reason I do not want to get into music shifting - no guarantees). Then I thought: what if this is an alignment issue? So I tried silence in the block but only the last 1024 bytes. That worked. Of course then the problem became: what if there's more zeros than 1024? Then it needs to be 2048 bytes of silence.

And so it was implemented: an algorithm that does precisely that. Either 1024, 2048 or 3072 bytes of silence, depending on the amount of closing zeros. And if there are more than 3072 zero bytes in the block, then the whole block gets replaced by silence.

The good news: so far it works like a charm. No more plops and clicks on the seams. And I didn't have to add an extra block of silence. It works fine without additional bytes. Of course this might be a bit of a coincidence. I'm not fully sure if it's really alignment or some other issue with some tracks or simply a matter of a minimum amount of silence required.

The bad news: we can't use this if we want to directly glue music to music (e.g. from a split file somewhere in the middle of a track), cause even with only 8 bits of 0 at the end, the algorithm would insert 1024 bytes of silence. Of course, ideally you'd never want to glue music to music and your glueing will always involve tracks that begin and end in 'silence' (not the dead silence of DSD silence, but recording silence). However, I can't exclude you might want to do that anyway, so I can't make this solution a permanent one (the music shifting solution would actually solve this problem... no need to choose - maybe somewhere in the future I'll try...). It will be an option in the Glue window. Turn it on if you want to glue regular tracks with a beginning and ending in silence. Turn it off if you do want to glue broken up pieces of music (or turn it off anyway if you want to hear what you then get).

I do have some more testing to do, mainly on a real DAC. So far I've only been testing this on a computer with the SACD stuff in Foobar. I still have to see if this also holds up on the real thing...

11 June 2024









THE TROUBLE WITH GLUEING - PART I

So, have a look at this...



You're still here!

It's a representation of a DSD bit stream (you might have guessed as much). It's from a .DSF file: the right channel of a Vivaldi track. The final 1/75th of a second or so. Isn't DSD amazing? That's music! Well, actually not... just some faint noise as the recording comes to an end... but still!

I was working on glueing .DSF files. For .DFF files that functionality already exists in Deffy, so now it was time to do the .DSF files. But I noticed quickly that the glued end result seemed more prone to clicking and plopping on the seams than glued .DFF files were. The clicking and plopping is a well known problem with DSD and has to do with the nature of those bits and the way they're interpreted by a DAC. I wrote on the homepage of Deffy, under 'About glueing', that this annoyance is not something I can fix, but in hindsight that might have been a bit of a lazy approach...

I then tried (against my own better judgment) to insert some DSD silence, but that only made things worse (I told you so). So I decided to have a look at the actual bits to see what was going on.

The above result is what I saw.

I didn't count them, but there should be 32768 (4096 times 8) characters in there.

What's striking about that bit stream is the end of it: all zeros. But DSD-silence isn't all zeros. It's 0101010101010101 etc. So I realized: this is the problem with inserting silence, that bunch of zeros. If you add silence to those zeros, the DAC doesn't recognize it as silence. And in general, if you glue DSFs together, it's those zeros causing the plops and clicks on the seams.

To get my proof I decided - as an experiment - to sacrifice that very last block (there's no music in it, Vivaldi wouldn't mind) to get rid of those zeros, and then insert silence.

Voilla! There it was... a perfect transition from one track to the next. Inserting silence works without clicks and plops, even in the middle of the track, but not after those zeros (preliminary conclusion).

So why are there these zeros if that's not silence?

Well that's thanks to Sony (sometimes I think: did they use an intern to write the DSF specs?). DSF files go per block of 4096 bytes per channel. So in a stereo file, the left channel gets 4096 bytes with bits to process and then the next block of 4096 bytes is fed to the right channel, and so on. But due to this block nature, you end up with a problem at the end. The DSF specs then states: make sure you end with a 4096 block per channel, but if you do not have anymore samples left, fill the block with zeros...

Now my question: shouldn't that have been DSD silence? Fill it up with 0101010101 and not 0000000000? Alas, too late. The millions of DSFs out there are already all ending with zeros... Of course, in defense of Sony: they probably never counted on some nutcase with a Deffy, who would start glueing their DSF files together! Those zeros are perfectly fine if a DAC realizes the music stops there (it does, through the sample count field in the header of the DSF file - but that header for glued files is changed to reflect the total glued file - so the seams suddenly become part of the music and then those zeros start to interfere).

But if the music doesn't end...? Indeed...

DFF files do not have this mandatory fill with zeros. They do not go by block but by byte. One byte for left, one byte for right, one byte for left, etc. (of course you could argue that's a block too: of 8 bits. And a last byte might end in zeros too - e.g. 10000000 -, but still, from a programmer's perspective, the byte option seems more logical). When the samples end, the bytes end. I'm not sure why Sony felt the need to stuff it into blocks. Maybe they just didn't want to follow Philips? Perhaps a technical reason on the hardware side? Or did they simply follow the frame size according to the DFF rules (that one might actually have something to do with it...)? Unless they read this and feel the urge to comment we'll never know.

Anyway, since I'm never sure after just one experiment, I'll make this optional in the DSF glueing: not sacrificing the last block, but looking back into the last block and replacing those zeros at the end with silence. Then probably also add one block of silence per channel. I think that option will take care of the plops and clicks between glued .DSF files.

I'll know more when that's working...

9 June 2024

... Deffy is freeware and does not contain adware or spyware...

Deffy icon created by Freepik - Flaticon / Coloring by mymymyohmy.com

OTHER SOFTWARE