U235 SoundEngine 0.24 released!

Proud to release the best most stable version of our sound engine to-date! Not just a simple set of extra features added but some significant improvements added in this release, as well as some annoying bugs being squashed.

Unfortunately some of the improvements and fixes have required changes to how some commands are called, these are kept to a minimum but sometimes they simply have to happen. Both of the two breaking changes are detailed in the changelog (available here and on the obtain page) and relate to SoundBank selection and the U235SE_modaddy/modaddy variable. The changes are small, but will cause existing code to break, so please read and understand these changes before attempting to use this version.

With that bad news out the way, what is the exciting new improvements? Well better interaction with the Jaguar’s memory bus! The size of cached data pre-fetched has been increased, this means less interruptions to bus access as sample data is fetched in larger chunks, for small samples this could even mean keeping the DSP off the bus almost completely! AND improved playback accuracy, not just slightly, but by a factor of 10! this results in notes and fractional notes sounding much better and in-tune.

There are other improvements as always, and thanks to code provided by CJ (Reboot) demonstrating a bug that would cause module playback to stop, this bug has now been removed allowing for module changes without risk of music going away.

We hope you enjoy this update and thank you for your continued use of our software.

Object list chart

I have just added a new page to our Developer section on this website. The Object Lists page only has a simple chart to aid in creating objects for the Object Processor, I did think about making it a simple news post but didn’t want it to be more complicated to find than it needed to be, so it now has it’s own page.

Hopefully others will find this useful.

U235 SoundEngine 0.23 released!

After a 4 year break, why not go for the other extreme? So here we are with roughly a week since last release!  and quite a bit done too (full details in the changelog).

A significant change has been made to the way fine tunes in modules are handled, this has removed the need of a 4K LUT and impact on system bus when working with samples with a fine tune applied, as well as much more accurate playback of these samples with other effects.

As well as optimisations to the code to free up some extra cycles on the DSP, we have exposed a couple of SE status registers, allowing users to monitor for successful stopping of the DSP, and the status of sample voices, so now you can look to see if there is a voice doing nothing and fire a sample on it, instead of overwriting an existing sample.

Also the ability to stop a looping sample when it reaches the end of the loop, and not immediately. 

As always, it can be downloaded here

SoundEngine 0.22 released

After over 4 years! unfortunately most of those years have been spent languishing in a variety of code repositories and not being developed.  Although along the way some bugs and omissions have been patched and a small amount of tidying up.  Nothing too ground breaking, equates for Joysticks are correct in this release and the DSP stop code finally works thanks to Shamus.  Hoping to get through the backlog of bugs and push another release out much sooner (although 4 years shouldn’t be hard to beat 🙂 ).

As always, get the latest version here

Minor update to U-235 Disassembler

A new version of the Disassembler has been made available, there are only some minor improvements here.  The project that this was being developed for has been put on hold due to other commitments, however I have pushed this to release as it resolves a bug that was raised indicating an error with JRISC GPU pack/unpack instructions.  This build resolves that issue.

If you are using this disassembler against Jaguar RISC GPU code, I would recommend you update to the latest version.

Details of other changes are in the changelog.

(Want this for another platform other than Windows? if so, please let us know)

RISC Disassembler

We have just made available a little tool that has been in the works for the last few weeks.  It’s by no means perfect but has served us well in our endeavours so far.

We are pleased to present to you, the U-235 Disassembler.  A simple disassembler that understands the Atari Jaguar RISC cores, and can even scan through a binary and locate possible locations for RISC code within it!  Handy if you are trying to find where your code has ended up in that 4MB binary.

In addition to the Jaguar RISCs there is also limited support for MIPS R3000 code too, the output from this is missing a significant number of Op-Codes and is more likely to be at this time not 100% syntactical correct code, but it’s has already proven useful.

If you are interested in this tool it is available to download here.

Updated version of Jaguar Graphics Convertor released

Our graphics convertor tool has had a minor update to allow it to function without needing you to install additional libraries.  It still makes use of the excellent ImageMagick libraries, but these are now statically linked into the binary.

So now it can still read all of the image formats, but is all contained in the single (larger) exe.

It can be downloaded from here

GPU in Main… Science!

This can be somewhat of a taboo in the Jaguar world, and it seems to crop up every once and a while, sometimes heralded as the ultimate fix, sometimes just mentioned as an interesting quirk.  The RISC CPU’s in the Jag have their fair share of bugs, one of these is related to the GPU executing it’s code from the systems main RAM, restricting it to running code out of the limited 4K of local RAM built onto the chip.  Naturally no one ever abides by manufacturers rules and it was soon discovered that it is in actual fact possible to run code from main memory!  There are a few caveats about address restrictions when it comes to jumps but nothing too complex.  It is most likely a simple cock-up that snuck past in the final design of the chip and Atari at the time thought it easier to simply say “do not do this” rather than having to come up with work around solutions, needless to say there are a few commercially released games on the Jag that actually run code from main RAM (Rayman being one of them).

Anyway, that’s all by the by.  There is a lot of passion and unfortunately the fud that comes with passion relating to this technology.  So I sat down and decided to try and shine some sciency light on this afterall! (I may as well put that BSc Computer Science (Hons) to use I guess 😀 )

So here are a few facts:

  1. The Atari Jaguar has a single shared bus between all of it’s devices and the main memory
  2. Main memory is 2MB of DRAM (120ns)
  3. The local RAM on the RISC devices has it’s own local bus to the RISC core, is 32 bits wide and SRAM
  4. If the GPU is accessing main RAM it is tying up the bus, so unless a higher priority CPU comes along and nabs it, it has the bus, nothing else gets to play with the main RAM.

What does this mean performance wise? well DRAM is significantly slower than SRAM, and requires regular refreshing.  So reads of instructions are going to be slower, and that is assuming nothing else has the bus (there are 4 other devices that could grab it or want it)

The performance aspects however always seem to be overlooked, some rules seem to suggest avoiding “tight loops” in main RAM, but to be honest this is irrelevant anyway as everything you run will take longer.  To prove this point (here comes the science) I have crafted a simple little piece of code.

My aim to accurately time the GPU running in local RAM and also the exact same code in main RAM.  To do this I am using the programmable timers available within the Jag, setting JPIT counters will cause them to decrement based on the ticks from the system clock (~25MHz).  The idea is simple,

  1. set-up a counter
  2. read the counters value at the start
  3. Do some busy work (ensuring not to access any register to cause a pipeline stall)
  4. read the counters value at the end
  5. Save both counter values and subtract one from the other

The final value will be the number of ticks of the counter to complete the busy work.

To remove any question about loops etc I made a nice simple flat piece of code for the testing:

gpucode:
.GPU
.ORG G_RAM

    movei    #$F10036,r0     ; The JPIT Readable counter
    movei    #startval,r1    ; where we are going to store our start counter
    movei    #endval,r2      ; where we are going to store our end counter
    moveq    #0,r3           ; start counter value reg
    moveq    #0,r4           ; end counter value reg

    ; get the current counter value
    loadw    (r0),r3         ; save this in the start counter reg

    ; now for some busy work
    rept    400           ; 400 repetitions
        moveq    #4,r10
        move    r12,13
        moveq    #6,r11
        move    r14,r15
    endr

    ; get the counter now
    loadw    (r0),r4
    nop
    nop
    nop

    ; save our counters
    store    r3,(r1)
    store    r4,(r2)

    ; lots of pointless faffing just to make sure the writes have completed
    nop
    nop
    nop
    nop

    ; change the screen colour so we know we have finished faffing
    movei    #BG,r20
    movei    #$4400000,r21
    nop
    nop
    store    r21,(r20)

    moveq    #0,r5        ; stop the GPU
    movei    #G_CTRL,r6
    nop
    nop
    store    r5,(r6)
    nop
    nop
    nop

As you can see, nothing amazingly complex, and the test code performs no reads or writes, these are pure and simple instructions which should all complete in a single operation.  The results from this little test are quite telling, but not surprising really:

I ran the test 3 times for each, the values output are the hex values of the timer, as I simply reset the Jaguar with the jcp -r command the JPIT counter doesn’t actually reset but carries on regardless! (I didn’t know that until now! learning! isn’t science great! 😀 )  This is why the values move around, but the interesting part is the difference between the two values, this represents how long it took to complete our 1200 lines of code (4*400).  So first up, running the code in local RAM on the chip:

$d44c – $cae2 = $96a = 2410
$988c – $8f4f = $93e = 2531
$8d48 – $83e6 = $962 = 2402

Average of about 2448 ticks to complete 1200 instructions

And now EXACTLY the same code in Main RAM

$f519 – $a335 = $51e4 = 20964
$7567 – $23a5 = $51c2 = 20930
$86e9 – $3519 = $51d0 = 20944

Average of about 20946!!

That is almost 10 times slower!! and these instructions don’t really do ANYTHING! and this is on a system where the only other thing running is the 68K which is sat patiently waiting for results to appear.  If additional padding nops were added to code to make jumps work, or there were instructions that actually accessed other areas of main RAM, or perhaps even WRITE to main RAM.. well things are going to get slower and I dare say more messy as the RAM page is flipped back and forth..

So my verdict.. run it in Local people, there may be some situations where it may be necessary to run in main, I would view these as the edge cases, minorities.  It should be possible to pretty much run everything in local, a bit of thought and some paging of code if required should be all that’s needed to keep your GPU code running in a tip-toppety fashion.

Hopefully people will find this an informative and useful read.  At the end of the day this is a hobby, if you want to run your code in main, go for it! have fun! enjoy what you are doing! but just don’t expect it to be the most snappy code.

SoundEngine 0.21 released

A much smaller than originally planned update, I was just too excited to get the new pad reading release out there, as well as the updated manual.

As well as the release of this version I have also updated the website to include a list of known and resolved bugs.  If you find a bug that’s not already listed here, please let me know.

As always the latest version can be downloaded here

Enjoy