When I check back the code of 4KKong and Cyclotron, I notice that many things could have been improved, specially in the category of parameter passing.
Typically, routines are called in some similar way:
Code: Select all
lda #value1
ldx #value2
ldy #value3
jsr SomeRoutine
I was thinking that since the routine knows in wich order parameters are supposed to be passed, that we could do some trickery and avoid the specification of registers, like this:
Code: Select all
jsr SomeRoutine
.byte value1,value2,value3
For such a method to be efficient, the size taken by this stack correction code needs to be smaller that the sum of bytes we don't waste at each function call.
In the minigames, typical routines that are called very often are things like text display routines, memory copy, graphical primitives, sound replay things, keyboard testing, ... a way to use less memory would be to actually also not have to use a 3 bytes long absolute JSR to reach these routines. So came to me the idea that perhaps we could use BRK for such a purpose.
Let's imagine we have a table of adresses of all routines we want to use with this optimised parameter calling method, and of the number of parameters the routine needs, we could get something like this:
Code: Select all
#define ROUTINE_CLEAR_SCREEN 0
#define ROUTINE_SWITCH_TEXT 1
#define ROUTINE_SWITCH_HIRES 2
#define ROUTINE_DISPLAY_TEXT 3
#define ROUTINE_READ_KEYBOARD 4
TableRoutineAdressLow
.byte <ClearScreen
.byte <SwitchToText
.byte <SwitchToHires
.byte <DisplayText
.byte <ReadKeyboard
TableRoutineAdressHigh
.byte >ClearScreen
.byte >SwitchToText
.byte >SwitchToHires
.byte >DisplayText
.byte >ReadKeyboard
TableRoutineParamCount
.byte 0
.byte 0
.byte 0
.byte 2 ; Message index, screen position index
.byte 0
Code: Select all
; Display text
brk
.byte ROUTINE_DISPLAY_TEXT
.byte MESSAGE_HELLO_WORLD,POSITION_STATUS_LINE
I'm not sure of how compact or efficient this would be, and I'm not very sure of how to actually make this BRK code handling routine...
So let's do a bit of maths, to compare the efficiency of normal code and of this 'optimized' code. Let's define some values:
- ROUTS the number of vectorized routines
- CALLS the number of times functions are called.
- PARAMS the number of parameters passed during the various calls
In Cyclotron, I have many calls to the following routines:
- 2x KeyboardGetChar (0 parameter)
- 3x DrawDisc (1 parameter)
- 4x KeyboardDebounce (0 parameter)
- 4x IncrementScore (0 parameter)
- 4x Temporize (0 parameter)
- 6x EraseMemory (1 parameter)
- 7x Arena_DrawRectangle (1 parameter)
- 9x Display_Message (1 parameter)
There are some others, but let's say that for the sake of discussion that I just vectorized these ones. This gives us the following:
- ROUTS=8 (because we have 8 routines).
- CALLS =39 (because it's the result of 2+3+4+4+4+6+7+9)
- PARAMS=25 (because it's what (3x1)+(7x1)+(9x1) is equal to)
With the old routine, the total memory consumption is simple to compute, since this is just the sum of all JSR and LDA/LDX/LDY used when calling the functions:
(CALLS*3)+(PARAMS*2)
=(39*3)+(25*2)
=117+50
=167 bytes
With the new routine, the cost of function calls and parameters storing is equal to one less byte for each JSR, and one less byte for each LDA/LDX/LDY used when calling the functions, plus the one time cost of the tables defining the functions, plus the size of the BRK handling code:
(CALLS*2)+PARAMS+(ROUTS*3)+BRKCODE
=(39*2)+25+8+BRKCODE
=78+25+8+BRKCODE
=111+BRKCODE
If we want (in this example code) to actually get a memory decrease, the BRK handling code has to be less than 56 bytes long.
Now, in the whole code of Cyclotron I had 133 jsr in total, some being repeated, some other not. Speculating that these 94 additional function calls are all going to different functions (worse case for the new routine because it increases the parameter table), and have no parameters either (worse case because there is no win on parameters passing), this would add to the original code 94*3=282 bytes, while it would add to the new routine 94*2+94*3=470 bytes, which I admit is not pretty.
In order to get a good candidate for vectorization, a routine needs to be called a minimum number of time. This number depends of how many parameters the routine has:
The cost for the standard routine is: CALLS*(3+PARAMS*2)
The cost for the BRK routine is: CALLS*(2+PARAMS)+3
By comparing the cost for the various number of parameters, we can find the number of calls under which the new routine is more efficient:
- For 0 parameters it need to be called at least 4 times.
- For 1 parameter it need to be called at least 3 times.
- For 2 parameters it need to be called at least 2 times.
- For 3 parameters it need to be called at least 1 time (but then it takes as much room as a standard call, benefits arrive with the second call)
So, does it looks like something doable, or does it looks like very painfull idea without much room for positive points ? And how would be this BRK handling routine coded on the Oric ?
Yuck, looks like a decently long post, thanks for reading