Basic TAP file format

jtiai · Post by **jtiai** » Mon Aug 26, 2019 8:32 pm

Hi all,

I'm trying to replicate bas2tap (which is written in quite terrible c++

) and since I'm Python programmer, I write it in Python of course.

Only thing I really don't understand is few bytes in tokenized basic line.

First word (2 bytes) is a starting address, but starting address of what? In bas2tap source code it's counted from 0x501 but when checking out file saved from Oricutron with hex editor starting address for first line is always 0x50F. And I never managed to match anything sensible for start address of next line. How this value should be calculated?

Since as I understood next word (2 bytes) is line number, then actual tokenized line ending to one 00 byte.

christian · Post by **christian** » Mon Aug 26, 2019 9:34 pm

Hi,

Basic:

Code: Select all

10 REM LIGNE 10
20 PRINT "TEST"

Tokenized lines:

Code: Select all

          +-----+----> Address for next line ($0510)
          |   | +----+----> Line number (10)
          |   | |   | +----> Token for REM
          |   | |   | |  +------------------------+----> LIGNE 10
          |   | |   | |  |                        |  +----> End Of Line
00000501  10 05 0a 00 9d 20 4c 49  47 4e 45 20 31 30 00       |..... LIGNE 10.|


          +--+----> Address for next line ($051d)
          |   | +----+----> Line number (20)
          |   | |   | +----> Token for PRINT
          |   | |   | |  +-------------------+----> "TEST"
          |   | |   | |  |                   | +----> End Of Line
00000510  1d 05 14 00 ba 20 22 54 45  53 54 22 00                |.... "TEST"..|

           +---+---> End of Program
0000051d   00 00

Tape file (CSAVE "TESTSAVE"):

Code: Select all

                                      +---+--> End address
                                      |   | +---+--> Start address
00000000  16 16 16 16 24 ff ff 00  00 05 1f 05 01 03 54 45  |....$.........TE|
00000010  53 54 53 41 56 45 00 10  05 0a 00 9d 20 4c 49 47  |STSAVE...... LIG|
00000020  4e 45 20 31 30 00 1d 05  14 00 ba 20 22 54 45 53  |NE 10...... "TES|
00000030  54 22 00 00 00 0b                                 |T"....|

I wrote 2 utilities in python bas2txt and txt2bas to translate from tokenized to text and from text to tokenized.

jtiai · Post by **jtiai** » Tue Aug 27, 2019 5:02 am

Thank you very much for so illustrative explanation.

Apparently it was pure luck that my beginning of the next line addresses were same in both test cases.

Once I get my tokenizer working my plan is to build higher lever BASIC abstraction like:

Code: Select all

if a > 100 then
   do something
   do something else
endif

or

Code: Select all

switch a then
  case 1: print"foobar": break
  case 2: print"barfoo": break
endswitch

jtiai · Post by **jtiai** » Tue Aug 27, 2019 7:23 pm

\o/ got it working!

Got some small issues with byte ordering. For some strange reason (I blame English engineers back in the days), header addresses uses MSB format but then actual code lines uses LSB format. Ever heard of consistency..?

Also I couldn't figure out why an earth there are 2 random bytes at the end of the file...

But now journey continues and I can start to implement much higher level syntax on top of standard Oric syntax.

christian · Post by **christian** » Tue Aug 27, 2019 8:52 pm

Good news.

You're right, the header use MSB and the ROM saves BASIC programs with one more byte than necessary ($0b i, my previous post).
So the end address in the header need also to be one more than the real end of BASIC program in memory.
You can add an arbitrary byte.

forum.defence-force.org

Basic TAP file format

Basic TAP file format

Re: Basic TAP file format

Re: Basic TAP file format

Re: Basic TAP file format

Re: Basic TAP file format