Introduction
ARM processors after (and including) the ARM 3 offer various ID and internal configuration facilities by providing internally a co-processor 15 which you can read from and and write to.
The setup is controlled by co-processor 15 registers, accessed with MRC and MCR in non-user mode.
These registers are particular to the processor specified.
ARM 3
- Register 0 - Processor identification (read only)
Bits 0 - 7 Revision of processor Bits 8 - 15 Should be '3', identifying processor as an ARM3 Bits 16 - 23 Manufacturer code (&56 = VLSI Technology Inc.) Bits 24 - 31 Designer code (&41 = ARM Ltd)
- Register 1 - Cache flush (write only)
Write-sensitive, writing anything to register 1 will cause the cache to be flushed. - Register 2 - Miscellaneous control
Bit 0 - Turns the cache on (1) or off (0) Bit 1 - Determines if user mode and non-user mode use the same address mapping. 1 if they do, or 0. Should be 1 for use with MEMC. Bit 2 - 0 for normal operation, 1 for special monitor mode (processor runs at memory speed and address/data always put on external pins even if data fetched from cache - for logic analyser to trace the program properly). Other bits reserved.
- Register 3 - Which areas are cachable
Controls which areas of memory are cachable, in 2Mb chunks.Bit 0 - 1 if virtual addresses &0000000-&01FFFFF are cachable, 0 if not Bit 0 - 1 if virtual addresses &0200000-&03FFFFF are cachable, 0 if not ... Bit 31 - 1 if virtual addresses &3E00000-&3FFFFFF are cachable, 0 if not
- Register 4 - Which areas are updateable
Controls which areas of memory are updateable, in 2Mb chunks. Writes to non-updateable memory go to the real memory, not the cache. This is suitable for things like ROMs, since you don't want the cached data to be altered by attempted writes.Bit 0 - 1 if virtual addresses &0000000-&01FFFFF are updateable, 0 if not Bit 0 - 1 if virtual addresses &0200000-&03FFFFF are updateable, 0 if not ... Bit 31 - 1 if virtual addresses &3E00000-&3FFFFFF are updateable, 0 if not
- Register 5 - Which areas are disruptive
Controls which areas of memory are disruptive, in 2Mb chunks. Writes to disruptive areas of memory cause the cache to be flushed. For example, writing to physical memory at &2000000-&2FFFFFF on an MEMC system will usually cache virtually addresses memory and if this location was cached, an attempt to read it would read back the old contents.Bit 0 - 1 if virtual addresses &0000000-&01FFFFF are disruptive, 0 if not Bit 0 - 1 if virtual addresses &0200000-&03FFFFF are disruptive, 0 if not ... Bit 31 - 1 if virtual addresses &3E00000-&3FFFFFF are disruptive, 0 if not
Register 2 is set to zero after power-up, and registers 3-5 are undefined. The registers 3-5 should be set up correctly before the cache is switched on. You should always check the processoridentity before setting up the registers, unless you are completely certain your code will only ever be executed on an ARM3 processor.
ARM 610
- Register 0 - Processor identification (read only)
The value returned for an ARM610 processor should be &4156061x.Bits 0 - 7 Revision of processor (&1x) Bits 8 - 15 Processor identity Bits 16 - 23 Manufacturer code (&56 = VLSI Technology Inc.) Bits 24 - 31 Designer code (&41 = ARM Ltd)
- Register 1 - Control (write only)
All values set to 0 at power-up.Bit 0 - On-chip MMU turned off (0) or on (1) Bit 1 - Address alignment fault disabled (0) or enabled (1) Bit 2 - Instruction/data cache turned off (0) or on (1) Bit 3 - Write buffer turned off (0) or on (1) Bit 4 - 26 bit program space if 0, 32 bit program space if 1 Bit 5 - 26 bit data space if 0, 32 bit data space if 1 Bit 6 - Early abort mode if 0, late abort mode if 1 Bit 7 - Little-endian operation if 0, big-endian if 1 Bit 8 - System bit - controls the ARM610 permission system
- Register 2 - Translation Table Base (write only)
Bits 14-31 hold the base of the currently active Level One page table. - Register 3 - Domain Access Control (write only)
This register holds the current access control for domains 0 to 15. Each domain has two bits (domain 0 bits 0,1 ... domain 15 bits 30,31) which may be set as follows:00 No Access - Domain fault generated if tried to access 01 Client - Accesses are checked against permission bits in section/page descriptor 10 Reserved - Currently behaves like no access mode 11 Manager - Accesses are NOT checked, permission faults cannot be generated
- Register 4 - Reserved - do not attempt to access
- Register 5 - Page fault status / TLB flush
When reading, this holds the status of the last data fault (not updated for pre-fetch fault). Only the bottom byte is of significance.Bits 0 - 3 Status Bits 4 - 7 Domain Bits 8 - 11 Set to zero Bits 12 - 31 Whatever was the last value on the internal data bus
When writing to this register, any value written will cause the Translation Look-aside Buffer to be flushed.
- Register 6 - Data fault address / TLB purge
When reading this register, you can determine the virtual address of the last page fault.When writing this register, the value given (in bits 14-31) is treated as an address. The TLB will be searched for a corresponding address and if it is found, it is marked as invalid. This is to allow the page table in main memory to be updated and the now-invalid entries in the on-chip TLB to be purged without assuming the penalty of flushing the entire TLB. - Register 7 - IDC flush (write only)
Any data written to this location will cause the IDC (Instruction/Data cache) to be flushed. - Registers 8 to 15 - Reserved
Accessing these registers will cause the undefined instruction trap to be taken.
ARM 710
This is similar to the ARM610.
- Register 0 - Processor identification (read only)
The value returned for an ARM610 processor should be &4104710x.Bits 0 - 3 Revision of processor? Bits 3 - 15 Processor identity - &710 Bits 16 - 23 Manufacturer code Bits 24 - 31 Designer code (&41 = ARM Ltd)
- Register 1 - Control (write only)
All values set to 0 at power-up.Bit 0 - On-chip MMU turned off (0) or on (1) Bit 1 - Address alignment fault disabled (0) or enabled (1) Bit 2 - Instruction/data cache turned off (0) or on (1) Bit 3 - Write buffer turned off (0) or on (1) Bit 4 - 26 bit program space if 0, 32 bit program space if 1 Bit 5 - 26 bit data space if 0, 32 bit data space if 1 Bit 6 - Early abort mode if 0, late abort mode if 1 Bit 7 - Little-endian operation if 0, big-endian if 1 Bit 8 - System bit - controls the ARM710 permission system Bit 9 - ROM bit - controls the ARM710 permission system
- Register 2 - Translation Table Base (write only)
Bits 14-31 hold the base of the currently active Level One page table. - Register 3 - Domain Access Control (write only)
This register holds the current access control for domains 0 to 15. Each domain has two bits (domain 0 bits 0,1 ... domain 15 bits 30,31) which may be set as follows:00 No Access - Domain fault generated if tried to access 01 Client - Accesses are checked against permission bits in section/page descriptor 10 Reserved - Currently behaves like no access mode 11 Manager - Accesses are NOT checked, permission faults cannot be generated
- Register 4 - Reserved - do not attempt to access
- Register 5 - Page fault status / TLB flush
When reading, this holds the status of the last data fault (not updated for pre-fetch fault). Only the bottom byte is of significance.Bits 0 - 3 Status Bits 4 - 7 Domain Bits 8 - 11 Set to zero Bits 12 - 31 Whatever was the last value on the internal data bus
When writing to this register, any value written will cause the Translation Look-aside Buffer to be flushed.
- Register 6 - Data fault address / TLB purge
When reading this register, you can determine the virtual address of the last page fault.When writing this register, the value given (in bits 14-31) is treated as an address. The TLB will be searched for a corresponding address and if it is found, it is marked as invalid. This is to allow the page table in main memory to be updated and the now-invalid entries in the on-chip TLB to be purged without assuming the penalty of flushing the entire TLB. - Register 7 - IDC flush (write only)
Any data written to this location will cause the IDC (Instruction/Data cache) to be flushed. - Registers 8 to 15 - Reserved
Accessing these registers will cause the undefined instruction trap to be taken.
ARM 7500
The registers are exactly the same as the ARM710, except the processor ID (register 0) will be different. The datasheet did not specify what should be expected.
ARM 7500FE
The registers are exactly the same as the ARM710, except the processor ID (register 0) will be different. The datasheet did not specify what should be expected, however interrogation of the Bush set-top box reveals &41077100 .
StrongARM SA110
- Register 0 - Processor identification (read only)
The value returned for an SA110 processor should be &4401A10x.Bits 0 - 3 Processor revision number
- Register 1 - Control (read/write)
All values set to 0 at power-up.Bit 0 - On-chip MMU turned off (0) or on (1) Bit 1 - Address alignment fault disabled (0) or enabled (1) Bit 2 - Data cache turned off (0) or on (1) Bit 3 - Write buffer turned off (0) or on (1) Bit 7 - Little-endian operation if 0, big-endian if 1 Bit 8 - System bit - controls the MMU permission system Bit 9 - ROM bit - controls the MMU permission system Bit 12 - Instruction cache turned off (0) or on (1)
- Register 2 - Translation Table Base (read/write)
Bits 14-31 hold the base of the currently active Level One page table. - Register 3 - Domain Access Control (read/write)
This register holds the current access control for domains 0 to 15.
The document I have contains no further details, though I would assume it would be similar to the ARM610/710/etc usage. - Register 4 - Reserved - do not attempt to access
- Register 5 - Fault status (read/write)
When reading, this holds the status of the last data fault (not updated for pre-fetch fault). Only the bottom byte is of significance.Bits 0 - 3 Status Bits 4 - 7 Domain Bit 8 Zero Bits 9 - 31 Undefined on read, ignored on write
- Register 6 - Fault address (read/write)
When reading this register, you can determine the virtual address of the last page fault. - Register 7 - Cache control (write only)
Any data written to this location will cause the selected cache to be flushed.The OPC_2 and CRm co-processor fields select which cache operation should occur: Function OPC_2 CRm Data Flush I + D %0000 %0111 - Flush I %0000 %0101 - Flush D %0000 %0110 - Flush D single %0001 %0110 Virtual address Clean D entry %0001 %1010 Virtual address Drain write buf. %0100 %1010 -
- Register 8 - TLB operations (write only)
Any data written to this location will cause the selected TLB flush operation.The OPC_2 and CRm co-processor fields select which cache operation should occur: Function OPC_2 CRm Data Flush I + D %0000 %0111 - Flush I %0000 %0101 - Flush D %0000 %0110 - Flush D single %0001 %0110 Virtual address
- Registers 9 to 14 - Reserved
Accessing these registers will cause the undefined instruction trap to be taken. - Register 15 - Test, Clock, and Idle (write only)
The OPC_2 and CRm co-processor fields select the following... Function OPC_2 CRm Enable odd word %0001 %0001 loading of Icache LFSR Enable even word %0001 %0010 loading of Icache LFSR Clear Icache %0001 %0100 LFSR Move LFSR to %0001 %1000 R14,Abort Enable clock %0010 %0001 switching Disable clock %0010 %0010 switching Disable nMCLK %0010 %0100 output Wait for %0010 %1000 interrupt
ARM9...XScale
Unfortunately I do not have details of these registers.
Try http://www.arm.com/.
How to read these registers
The code I knocked up for the Bush box processor ID was:
10 DIM code% 32 20 P% = code% 30 [ OPT 3 40 SWI "OS_EnterOS" 50 MRC CP15, 0, R0, C0, C0 60 TSTP PC, #&F0000000 70 MOV R0, R0 80 MOV PC, R14 90 ] 100 PRINT ~USR(code%)
When run, this would print:
>RUN 00008FAC OPT 3 00008FAC EF000016 SWI "OS_EnterOS" 00008FB0 EE100F10 MRC CP15, 0, R0, C0, C0 00008FB4 E31FF20F TSTP PC, #&F0000000 00008FB8 E1A00000 MOV R0, R0 00008FBC E1A0F00E MOV PC, R14 41077100 >
Note that this code must run in a privileged mode.
Co-processors
There are between zero and three possible co-processors. Most desktop ARM systems do not have logic for external co-processors, so we may either use that which is built into the ARM itself, or an emulated co-processor.
CP15 is reserved on the ARM 3 and later processors for internal configuration, as described in this document.
CP0 and CP1 is used by the floating point system. It may either be an external floating point chip (as used with the ARM 3), hardware built into the processor (as in the ARM 7500FE), or a totally software-based emulation (as with the FPEmulator that we all know).
Here is a short exercise for you:
10 DIM code% 16 20 P% = code% 30 [ OPT 3 40 CDP CP1, 0, C0, C1, C2, 0 50 ADFS F0, F1, F3 60 MOV PC, R14 70 ] >RUN 00008F78 OPT 3 00008F78 EE010102 CDP CP1, 0, C0, C1, C2 00008F7C EE010102 ADFS F0, F1, F2 00008F80 E1A0F00E MOV PC, R14 >
What do you notice? 🙂
When the ARM executes a co-processor instruction, or an undefined instruction, it will offer it to any co-processors which may be presently attached. If hardware is available to process the given instruction, then it is expected to do so. If it is busy at the time the instruction is offered, the ARM will wait for it.
If there is no co-processor capable of executing the instruction, the ARM will take its undefined instruction trap, in which case the following will happen:
- The PSR and PC are both saved (the method differs for 26 bit and 32 bit ARMs)
- SVC mode (26 bit) / UND mode (32 bit) is entered, and the I bit of the PSR is set
- The instruction at address &00000004 is executed
This trap may be used to add instructions to the instruction set by emulation, or to implement a software emulation of hardware that isn't fitted. The Floating Point Emulator works by doing this.
To return, simply pull the saved PC and PSR (depends on 26/32 bit) and push them to the current PC and PSR, like MOVS PC, R14 in 26 bit systems. This will pick up with the instruction following the one which caused the trap.
All of the co-processor instructions can be executed conditionally. Please note that the conditionals relate to the status of the ARM processor, and not the status of any of the co-processors. This is because the ARM always tries the instruction first, and offers it around and maybe takes the undefined application trap, so the conditions are ARM related.
To make this clearer:
10 DIM code% 32 20 P% = code% 30 [ OPT 3 40 FLTS F0, R0 50 FLTS F1, R1 60 FMLS F2, F0, F1 70 FIX R0, F2 80 MOVS PC, R14 90 ] 100 INPUT "First number : "A% 110 INPUT "Second number: "B% 120 PRINT USR(code%)
This probably won't assemble without an enhanced BASIC assembler.
Anyway, you might think the ARM will hand over to the floating point co-processor to do the four FP instructions, then hand back afterwards.
If you did, you would be incorrect!
What actually is executed is:
MCR CP1, 0, R0, C0, C0 MCR CP1, 0, R1, C1, C0 CDP CP1, 9, C2, C0, C1 MRC CP1, 0, R0, C0, C2
It is worth pointing out that objasm specifies co-processor registers using the CR notation (ie, CR0 - CR15
), which is first defined with the CN directive. It does not appear as if default co-processor instructions are defined in Nick Roberts' ASM, though I've only looked in the instructions at the "defined symbols" section...
Darren Salt's ExtBASICasm provides the register names C0 - C15 to refer to the co-processors. So if any of these examples fail when you try to assemble them, please check what format your assembler provides these instructions.
MRC
The instruction MRC transfers a co-processor register to an ARM register. It takes the form:
MRC <co-pro>, <op>, <ARM reg>, <co-pro reg>, <co-pro reg2>, <op2>
The co-processor is denoted in most assemblers by CPx. The register <co-pro reg> is written to <ARM reg> , using operation <op> . This may, possibly, be further modified by <co-pro reg2> and <op2>. For an idea of the sorts of times when this might be necessary, consider instructions of the form LDR Ra, [Rb], #x The final <op2> may be omitted, as it is in the example, but the other parts of the MRC instruction must be supplied.
MCR
The instruction MCR transfers an ARM register to a co-processor register. It takes the form:
MCR <co-pro>, <op>, <ARM reg>, <co-pro reg>, <co-pro reg2>, <op2>
The co-processor is free to interpret the fields as it desires, but the standard interpretation is that the contents of the ARM register are written to the co-processor register using the operation code given, which may be further modified by the second co-processor register and/or the second operation code.
LDC and STC
The instruction LDC loads data from memory into the co-processor register, while STC saves data from a co-processor register to memory.
The ARM should supply the address, the co-processor accepts the data and controls how much is transferred.
LDC <co-pro>, <co-pro reg>, <address> LDCL <co-pro>, <co-pro reg>, <address> STC <co-pro>, <co-pro reg>, <address> STCL <co-pro>, <co-pro reg>, <address>
If the 'L' flag is specified, a long transfer is performed. Otherwise a short transfer is performed. The 'L' flag follows the extension, like LDCEQL The address is an expression which results in an address being generated, so examples of which are:
[Rx] [Rx, #x] ! [Rx], #x
These are like those used for the LDR instruction. However they are only eight bits wide and specify word offsets (the ARM types are 12 bit and byte offset).
What happens is the 8 bit unsigned offset is shifted left two bits and added or subtracted from the base register, this may be done before or after the base is used as the transfer address. The new base value can be written back, or left unmodified.
The next difference is that post-indexed addressing requires explicit setting of the W bit of the instruction (unlike LDR/STR which always does it when post-indexed). You set the 'W' bit with the '!' flag, like
STR CP0, CR1, [R2, #16]!
The base register is used for the first transfer. If there are any further transfers, the base will be incremented by one word for each of those additional transfers.
CDP
The instruction CDP instructs the co-processor to do some processing. It takes the form:
CDP <co-pro>, <co-pro reg1>, <co-pro reg2>, <co-pro reg3>, <op>
This tells the co-processor to do something. The ARM will not wait for it to finish, nor is any sort of status sent back to the ARM. It is possible for a co-processor to maintain a queue of instructions, allowing it and the ARM to process in parallel.
A variant of this may be obtained with the floating point hardware; while it does not (to my knowledge) support a queue of instructions, it is true that the ARM will await the FPU to finish an operation before providing the next. With careful coding, it would therefore be possible to get the ARM to do some sort of processing (a few instructions) in between sending an instruction to the FPU and reading it's result back.
So instead of:
FLTE F0, R0 FLTE F1, R1 MUFE F2, F0, F1 FIX R0, F2 MOV R1, #0
you could save a small amount of time with:
FLTE F0, R0 FLTE F1, R1 MUFE F2, F0, F1 MOV R1, #0 FIX R0, F2
as the FPU could be finishing the MUF while you MOV. The hardware FPU (as in the 7500FE) runs asynchronous - you can switch to synchronous by setting a bit in the FPSR. The software emulation always runs synchronously, and as it uses the ARM in order to emulate the FP instructions, there is no possible advantage to be gained.
Obviously the above example is somewhat contrived. However it is only an example. Real life code, such an an MP3 decoder, could well benefit from careful arrangement of code.
There are no rules for the register types and/or the operation codes. These depend upon the co-processor.