A set of instructions to get a specific task done. These instructions are usually written and stored, so that they can be looked up again and again and followed exactly the same way each and every time.
These written programs have a several different languages in which they can be written for example Python, C, C++, Java etc. But the interesting thing is that most of these languages cannot be understood by computers (specifically CPU) directly. They need some sort of a translation, this translation makes sure that the logic of the program stays intact but the program is converted into a form that can be understood the CPU.
Not all CPUs understand the same language as well, each CPU has certain features or hardware components that other CPUs donât have. The instruction you need to write to make use of different feature/component on different CPUs might be different. It depends on who manufactured the CPU, what application was it designed for, what architecture does the CPU follow etc.
So on one end of the spectrum we have humans writing a bunch of instructions that CPUs have to follow. On the other end, CPU donât directly understand what those instructions mean, you need to convert them into a language that they would understand (machine code).
Humans usually never write code directly in machine language, they write their instructions in a way thatâs more easier for them to understand and change rather than editing an obscure bunch of numbers. These programs are then fed into another program called the âcompilerâ whoâs job is convert the instructions written by the human into a set of instructions that can be executed by the CPU.
The building blocks that you use in many modern day programming language like C, C++ or Python include conditional statements, for and while loops, Object Oriented Programming, functions, modules, memory structures etc. These buildings blocks are more or less exclusively present in the realm of language used by the human. The language understood by the CPU does not recognise any of these, it only recognises the 100 or 150 instructions that it has and nothing more.
This means, the concept of functions, memory structures etc have to be implemented in machine language to get the expected behaviour.
Â
Coming back to the reason why we write programs is to get the required behaviour from a computer. The ârequired behaviourâ is first written in language that is more understandable by the humans. Then the compiler takes that definition of ârequired behaviourâ written in some language and converts it into machine code without changing the ârequired behaviourâ.
A CPU essentially has two things, some finite amount of memory and the ability to execute instructions (micro behaviours). These âmicro behavioursâ can be chained together to give rise to the ârequired behaviourâ. All of this seems pretty straight forward right now, the problem comes when you want the ârequired behaviourâ to be implemented while using the least amount of memory possible or execute it as fast as possible. These are âCompiler Optimizationsâ, this enables your program to be faster, better or smaller.
Â
All modern day programming languages have different ways of âexpressingâ this ârequired behaviourâ. Many a times the ârequired behaviourâ is already present in one programming language, but since you donât have a compiler for the same, you cannot use it.
Â
Humans are not prefect at issuing instructions for the CPU to follow. This causes the CPU to give rise to certain behaviour that was not expected. This could happen in either of the two ways, first, the CPU behaves in a certain way that is not well understood by the human programmer and hence the assumptions made while writing the instructions gave rise to an unexpected behaviour. Second, the programmer forgot to specify how to handle the corner cases while he was writing the program. These unexpected behaviour basically deteriorate the reliability of the system. The best case is when we known the behaviour of the system under all possible inputs, but there is a possibility that checking the behaviour of a program under all possible inputs might not be feasible.
Â
The 8051 and MSP430 CPU has the following class of instructions:
- Data Transfer Instructions - Transfer data to between RAM and registers
- Arithmetic Instructions - Performs arithmetic operations
- Logical Instructions - Perform boolean operations like AND, OR, XOR, NOT etc.
- Boolean Instructions - Single bit level manipulations. Examples, Set, Clear and Complement instructions.
- Program Branching Instructions - Change the flow of execution of instructions either depending on a condition or pre-determined location.
Â
A more lower level representation of instructions is required if you need to your program to be able to run on multiple CPUs that understand different machine code. The instruction set of each CPU is different, meaning each CPU has support for different addressing modes (the way in which you define the operand for your instruction), support for different instructions. If we have a language that is generic enough that all Instruction Set Architectures (ISAs) are specialized implementation of âGeneral Purpose Instruction Set Architecture (GPISA)â. This would allow us to compile each program down to a GPISA and then use the feature set available in each ISA to generate the machine code for that CPU.
Â
Having a language that is close to the machine code and is ISA agnostic would be helpful. If the language is close to machine code, this means translation into the actual machine code would be easier.
Â
Here is a list of common microcontrollers. And the most commonly used ISAs are:
- RISC-V
- MCS-51 (8051 core)
- ARM
- 6502
- x86 (havenât seen this in any microcontroller)
- ARM64
- AVR
- AVR32
- MIPS
- Z80
Â
One way of defining a close to machine code language is by listing out all the variations in different machine codes and then building a generic machine code language. Like listing out all the different machine code instructions, all possible addressing formats possible etc. This seems similar to an Intermediate Representation (IR) in compiler design lingo.
Â
The idea is to implement this