Experimental very fast tape loading

Anything related to the tools Tap2Wav, Tap2CD, Tap2Dsk, Sedoric Disc Manager, Tape Header Creator, WriteDsk, and generaly speaking tools related to the management of Oric data files and devices.
User avatar
Dbug
Site Admin
Posts: 4444
Joined: Fri Jan 06, 2006 10:00 pm
Location: Oslo, Norway
Contact:

Re: Experimental very fast tape loading

Post by Dbug »

NekoNoNiaow wrote: Sat Mar 31, 2018 7:03 am Huhu.
Don't worry, I do the same. Just ask DBug about the length of my emails/posts. :D
I don't think I need to say anything, people can read what you wrote on the forum already :lol:
User avatar
Symoon
Archivist
Posts: 2307
Joined: Sat Jan 14, 2006 12:44 am
Location: Paris, France

Re: Experimental very fast tape loading

Post by Symoon »

Ok, the problem is more complicated than I thought.
Actually, some Atmos run slower than others. Don't ask me why but I can clearly see it with the latest additions to the code:
- when decoding a byte that is in a dictionnary, the Oric runs the longest loop in the code (something like 100 cycles)
- if four bytes consecutively have to be read using this loop, it runs fine on two of my Atmos, but fails on a third one

The symptom is the the 3rd Atmos misses a sinusoid. If I put some neutral (stop bit) delay between those bytes, the problem vanishes.

From there, I have several options:
- knowing that the loop will be even longer at the end of a filled RAM page (4 more cycle to go to the next page), I can try optimising the loop but I don't think I can save enough cycles and only have two spare bytes to code ( :lol: )
- I could add an option for "old Atmos" that will generate a slower signal for those bytes (adding options is confusing, the user will not know if his Atmos is "old" or not, or if it fails because of an audio problem)
- I will have to slow down the speed to try being compatible with more Atmos
- old Atmos can die, it won't work with them !

:?
User avatar
NekoNoNiaow
Flight Lieutenant
Posts: 272
Joined: Sun Jan 15, 2006 10:08 pm
Location: Montreal, Canadia

Re: Experimental very fast tape loading

Post by NekoNoNiaow »

Interesting, any idea why this Atmos is slower?

I am asking because if this can be traced back to a particular hardware difference (like a different version of the video chip which steals more cycles from the CPU) maybe it can be detected at run time in order to change the parameters of the routine for this particular machine.

From what you say about the code, you may not have much to tweak but one never knows. Once you publish the code I am sure some kittens will find ways to optimize it even further. ;)
Symoon wrote: Sat Mar 31, 2018 11:37 am From there, I have several options:
- knowing that the loop will be even longer at the end of a filled RAM page (4 more cycle to go to the next page), I can try optimising the loop but I don't think I can save enough cycles and only have two spare bytes to code ( :lol: )
- I could add an option for "old Atmos" that will generate a slower signal for those bytes (adding options is confusing, the user will not know if his Atmos is "old" or not, or if it fails because of an audio problem)
- I will have to slow down the speed to try being compatible with more Atmos
- old Atmos can die, it won't work with them !
I find that asking the question from the point of view of user experience is often worth it.
What would users prefer as an experience?
Very slightly (probably imperceptible) slower loading but full compatibility with their Atmos, "It just works".
Or full speed loading but if this might randomly fail on their Atmos and they would have to figure out whether it is because the tape is corrupted or because of their Atmos model and then test with the "slower speed" tape.
Dbug wrote: Sat Mar 31, 2018 7:27 am I don't think I need to say anything, people can read what you wrote on the forum already :lol:
:lol: I am my own worst enemy. ;)
User avatar
Dbug
Site Admin
Posts: 4444
Joined: Fri Jan 06, 2006 10:00 pm
Location: Oslo, Norway
Contact:

Re: Experimental very fast tape loading

Post by Dbug »

Considering the encoded data is at a constant speed, generated (I guess) assuming a standard 6502 running at 1mhz (which in theory means 1 million clock cycles per second), and considering that the actual 1mhz is just an approximation coming from some fast vibrating quartz, it happens that some machines do run faster than others, but since all the components are using the same derived clock, we never see that as a problem.

That's my theory at least (and that's why in the thread about merging multiple oric outputs, the first step was to have them all use the same clock)
User avatar
Symoon
Archivist
Posts: 2307
Joined: Sat Jan 14, 2006 12:44 am
Location: Paris, France

Re: Experimental very fast tape loading

Post by Symoon »

NekoNoNiaow wrote: Sat Mar 31, 2018 8:05 pm Interesting, any idea why this Atmos is slower?
Nope, but this Atmos was modified by the previous owner, so I guess it might affect something. Also noticed the keys sound is lower.
I don't dare opening it since there are several switches (one to reboot, another to switch from ROM 1.1 to ROM 1.0) and wouldn't like to damage the wires.
NekoNoNiaow wrote: Sat Mar 31, 2018 8:05 pm I find that asking the question from the point of view of user experience is often worth it.
What would users prefer as an experience?
Very slightly (probably imperceptible) slower loading but full compatibility with their Atmos, "It just works".
Or full speed loading but if this might randomly fail on their Atmos and they would have to figure out whether it is because the tape is corrupted or because of their Atmos model and then test with the "slower speed" tape.
You're probably right. I'll have to convince myself, since in my mind so far, TAP2CD was the "fastest safe" option; I wanted first to push the limits as far as I could, would it be in "laboratory conditions" ;)
I'm currently between 2 and 4 times faster than TAP2CD. Slowing down could be up to a 25% speed loss, for an unkonwn amount of machines :-/

Anyway I will try first to savage my code, 1st idea is to try removing a JSR / RTS; that could save the situation...
(not that I don't want to share the code, but I don't like the idea of publishing an unfinished code, as long as I still have ideas to try ;) )
User avatar
NekoNoNiaow
Flight Lieutenant
Posts: 272
Joined: Sun Jan 15, 2006 10:08 pm
Location: Montreal, Canadia

Re: Experimental very fast tape loading

Post by NekoNoNiaow »

Symoon wrote: Sat Mar 31, 2018 8:44 pm Nope, but this Atmos was modified by the previous owner, so I guess it might affect something. Also noticed the keys sound is lower.
I don't dare opening it since there are several switches (one to reboot, another to switch from ROM 1.1 to ROM 1.0) and wouldn't like to damage the wires.
Oh, if these modifications indeed do affect the speed as you indicate, then that would be worth knowing because that would mean you can safely keep using your current setup since real Atmos capturable in the wild would run fine with it.

This said, from an electronics stand point, neither of these two mods would explain why the machine is slower since they should both be passive.
It is puzzling though that the sound volume is lower, maybe there are other modifications on this machine which would deserve investigation.

DBug might be right that natural quartz frequency variations might push the machine too close to the edge where your system stops working but it would be interesting to run other timing tests to see if this affects other aspects of your machine (like timers/interrupts). Maybe you could run some of the timing tests used to validate emulators. ;)
Symoon wrote: Sat Mar 31, 2018 8:44 pm You're probably right. I'll have to convince myself, since in my mind so far, TAP2CD was the "fastest safe" option; I wanted first to push the limits as far as I could, would it be in "laboratory conditions" ;)
I'm currently between 2 and 4 times faster than TAP2CD. Slowing down could be up to a 25% speed loss, for an unkonwn amount of machines :-/
Well, as long as you are validation phase, you can still send the program for test by a larger number of people and see what the results are.
If your machine is the only one which suffers from the issue then you would be good to go. ;)
Symoon wrote: Sat Mar 31, 2018 8:44 pm Anyway I will try first to savage my code, 1st idea is to try removing a JSR / RTS; that could save the situation...
(not that I don't want to share the code, but I don't like the idea of publishing an unfinished code, as long as I still have ideas to try ;) )
I understand.
I cannot help too much with 6502 ASM yet but if you are stuck you could still release just part of the interrupt code to see if people have ideas to make it faster. If they do, then you could try it and continue experimenting without needing to do a full monty. ;)
User avatar
Symoon
Archivist
Posts: 2307
Joined: Sat Jan 14, 2006 12:44 am
Location: Paris, France

Re: Experimental very fast tape loading

Post by Symoon »

I will run a few tests on other machines today (got 4 or 5 more to test, that I can only test during some weekends), but I think I'll have the problem on other machines. I also still have to set the thresholds values correctly - I kept modifying them, not understanding that the problem was elsewhere... So it's a real mess now.
I also noticed that the previous tests I asked in forums, gave rather constant results... But when loading a longer program on my machines, it wasn't so clear... Then again, I think both threshold and Atmos speed problems interacted.
So now that I have this in mind, I'm busy trying to find a more visual and global way to look at the problem:
seuils.png
seuils.png (7.12 KiB) Viewed 10733 times
NekoNoNiaow wrote: Sun Apr 01, 2018 5:33 am I cannot help too much with 6502 ASM yet but if you are stuck you could still release just part of the interrupt code to see if people have ideas to make it faster. If they do, then you could try it and continue experimenting without needing to do a full monty. ;)
Good idea ;) Here we go:
The Oric waits for the interrupt in an infinite loop (this is Fabrice's TAP2CD idea!). It is mandatory for precision purpose: the loop only lasts 3 cycles, if it lasts longer there is not enough precision to separate the different sinusoids on all machines.
It means that after the interrupt, I need to go back to the code after the infinite loop, which requires a trick that costs time.

Code: Select all

(...)
0460   2     38       SEC
0461   2/3   B0 FE     BCS -2   infinite loop (waiting for interrupt)
(...)

Interrupt code:
04D0   4   AE 00 03  LDX 0300	Reset flag on CB1
04D3   4   AE 08 03  LDX 0308	read timer (sinusoid duration) in X
04D6   4   8E 09 03  STX 0309	Rest timer counter (writing in #309 sets #308 with #F5 once instruction executed)
04D9   4   28        PLP        Get the system flags saved by the interrupt
04DA   2   18        CLC        Set C to 0 to leave the loop
04DB   3   08        PHP        Save the system flags
04DC   6   40        RTI        Back to the loop 
Note: the 2nd colum is the cycles cost.
If anyone has an idea to go faster, I'd be more than happy :)
User avatar
Dbug
Site Admin
Posts: 4444
Joined: Fri Jan 06, 2006 10:00 pm
Location: Oslo, Norway
Contact:

Re: Experimental very fast tape loading

Post by Dbug »

If the idea is to go out of the loop as fast as possible, instead of PLP/CPC/PHP you could use code that takes about the same amount of time to instead patch the program counter return address stored in the stack to add +2 to it.

Since you have full control to the location of your code and initial stack value, you know exactly which byte of the stack you can patch to achieve that.

The advantage is that instead of having RTI bring you back at the start of the BCS to detect that now the carry is cleared and then exit, you directly return after the BCS itself, which gives you a 3 clock cycles advantage.

Also, if you can afford to to have A or Y containing a value before your waiting loop, the entire address change code becomes a simple STA or STY before the RTI, which takes only 4 clock cycles (instead of the 9 cycles taken by PLP/CLC/PHP) which means ultimately you exit your waiting loop 8 clock cycles earlier than on your current code.

Admittedly, it's ugly :D

EDIT: You can maybe also move the CB1 reset (first LDX) later in the code so you can read the timer sine value 4 cycles earlier, that should not impact the IRQ behavior.
User avatar
Symoon
Archivist
Posts: 2307
Joined: Sat Jan 14, 2006 12:44 am
Location: Paris, France

Re: Experimental very fast tape loading

Post by Symoon »

Dbug wrote: Sun Apr 01, 2018 8:10 am If the idea is to go out of the loop as fast as possible, instead of PLP/CPC/PHP you could use code that takes about the same amount of time to instead patch the program counter return address stored in the stack to add +2 to it.
(...)
The advantage is that instead of having RTI bring you back at the start of the BCS to detect that now the carry is cleared and then exit, you directly return after the BCS itself, which gives you a 3 clock cycles advantage.
Is it a 3 cycles or 2 cycles advantage, since I'm exiting from the BCS?
I think I tried to, but it took more time, or same with more bytes ;) Here the PLP/CLC/PHP/RTI is 15 cycles, couldn't find a shorter sequence...

Problem here is that:
- sadly registers A and Y must remain unaffected (or restored)
- I can't use much more bytes (I almost fill the page), but let's forget that for now

Oh BTW it's not just the interrupt that has to be faster, that would be too easy :D It's the whole loop decoding a byte (about 100 cycles + interrupt time). I need to save about 6 cycles I think.
User avatar
Dbug
Site Admin
Posts: 4444
Joined: Fri Jan 06, 2006 10:00 pm
Location: Oslo, Norway
Contact:

Re: Experimental very fast tape loading

Post by Dbug »

And that's why having the entire code would be handy ;)
User avatar
Symoon
Archivist
Posts: 2307
Joined: Sat Jan 14, 2006 12:44 am
Location: Paris, France

Re: Experimental very fast tape loading

Post by Symoon »

You're right Dbug, it's just that it's so messy and with comments in French or outdated, that it would take a while to clean it for everyone to have a proper prensetation. For now, I'll use spare time to try alternatives first.
That being said, I can send the rough code without comments at all if you guys wish me to.

I think I found something. I'm calling (JSR) the rom in E56C, which is only something like 18 bytes.
So I have to find 15 bytes and I can save 12 cycles per loop (JSR/RTS), which I hope would save the thing ;)
Trying...
User avatar
Symoon
Archivist
Posts: 2307
Joined: Sat Jan 14, 2006 12:44 am
Location: Paris, France

Re: Experimental very fast tape loading

Post by Symoon »

Anyone willing to do a test?
Haven't tested it on my "slow" machine yet, but it works on 3 other Atmos and fails on one (which already failed almost all the previous tests).

Just do HIRES first, then CLOAD"" :)

oricium_hires_test-file.zip
(13.41 KiB) Downloaded 339 times
User avatar
Symoon
Archivist
Posts: 2307
Joined: Sat Jan 14, 2006 12:44 am
Location: Paris, France

Re: Experimental very fast tape loading

Post by Symoon »

Tested and working on my "slow" Atmos! :mrgreen:
User avatar
ibisum
Wing Commander
Posts: 1646
Joined: Fri Apr 03, 2009 8:56 am
Location: Vienna, Austria
Contact:

Re: Experimental very fast tape loading

Post by ibisum »

Wow! That is very fast indeed. What we need now: a time machine, to go back and demo this to us as 13-year olds. ;)
User avatar
Symoon
Archivist
Posts: 2307
Joined: Sat Jan 14, 2006 12:44 am
Location: Paris, France

Re: Experimental very fast tape loading

Post by Symoon »

ibisum wrote: Mon Apr 02, 2018 3:05 pm Wow! That is very fast indeed. What we need now: a time machine, to go back and demo this to us as 13-year olds. ;)
Lol, we would have to bring our computers with us, WAV players didn't exist by then, did they? ;)

I recall as a child, my brothers and I would start the Xenon-1 tape in SLOW mode before going to lunch, so it was loaded when we were back (about half and hour loading IIRC)
Post Reply