Actually, I was thinking of something a little more radical.thrust26 wrote:Steve Judd has developed a quite brilliant routine (C=Hacking, ussue 10) for exactly this, which works without any tables in the inner loop.JamesD wrote:I think the only significant improvement will be in devising a way to calculate/write all pixels for a byte in one shot, possibly reducing the number of OR operations with screen memory.It saves a lot of time for lines where dX > 2 * dY on the C64 the Atari 2600 (~33% saving there). Maybe it can somehow be adapted for the Oric?Code: Select all
initialize x, dx, etc. xold = x take a step in x: LSR X have we hit the end of a column? If so, then plot and check on y is it time to take a step in y? if not, take another step in x if it is, then let a=x EOR xold plot a into the buffer let xold=x keep on going until we're finished
If you have the starting X, you can load from the table with all bits set from that point in the byte on.
Then loop without the LSR until it's time to step in Y.
Use the new X in a 2nd table lookup with all the pixels set up to the last pixel and AND that with the first table lookup we used.
Where the two bitmasks cross you have set all the pixels you need to write to the screen for that byte.
Which is faster depends on up to 8 (or 6 with the Oric) LSRs vs 2 table lookups.
However, I was thinking there must be a way to skip the inner loop mathematically (hopefully simple math) to save clock cycles.
I need to sit down and play with it but I think it's possible.
That is what I was referring to.
So instead of the inner loop you would:
lookup1
math (table???)
AND lookup2
write to screen.
<edit>
Correction to my earlier edit.
IF I can come up with a way replace the loop with math/table logic I can move the column test to the bottom of that sequence which means it's not just eliminating LSR it's also cutting out that test/branch combo from the loop.