Program

Program

Tags
Location
Published
Published December 11, 2022
Author
Apoorva Singh
A set of instructions to get a specific task done. These instructions are usually written and stored, so that they can be looked up again and again and followed exactly the same way each and every time.
These written programs have a several different languages in which they can be written for example Python, C, C++, Java etc. But the interesting thing is that most of these languages cannot be understood by computers (specifically CPU) directly. They need some sort of a translation, this translation makes sure that the logic of the program stays intact but the program is converted into a form that can be understood the CPU.
Not all CPUs understand the same language as well, each CPU has certain features or hardware components that other CPUs don’t have. The instruction you need to write to make use of different feature/component on different CPUs might be different. It depends on who manufactured the CPU, what application was it designed for, what architecture does the CPU follow etc.
So on one end of the spectrum we have humans writing a bunch of instructions that CPUs have to follow. On the other end, CPU don’t directly understand what those instructions mean, you need to convert them into a language that they would understand (machine code).
Humans usually never write code directly in machine language, they write their instructions in a way that’s more easier for them to understand and change rather than editing an obscure bunch of numbers. These programs are then fed into another program called the “compiler” who’s job is convert the instructions written by the human into a set of instructions that can be executed by the CPU.
The building blocks that you use in many modern day programming language like C, C++ or Python include conditional statements, for and while loops, Object Oriented Programming, functions, modules, memory structures etc. These buildings blocks are more or less exclusively present in the realm of language used by the human. The language understood by the CPU does not recognise any of these, it only recognises the 100 or 150 instructions that it has and nothing more.
This means, the concept of functions, memory structures etc have to be implemented in machine language to get the expected behaviour.
 
Coming back to the reason why we write programs is to get the required behaviour from a computer. The “required behaviour” is first written in language that is more understandable by the humans. Then the compiler takes that definition of “required behaviour” written in some language and converts it into machine code without changing the “required behaviour”.
A CPU essentially has two things, some finite amount of memory and the ability to execute instructions (micro behaviours). These “micro behaviours” can be chained together to give rise to the “required behaviour”. All of this seems pretty straight forward right now, the problem comes when you want the “required behaviour” to be implemented while using the least amount of memory possible or execute it as fast as possible. These are “Compiler Optimizations”, this enables your program to be faster, better or smaller.
 
All modern day programming languages have different ways of “expressing” this “required behaviour”. Many a times the “required behaviour” is already present in one programming language, but since you don’t have a compiler for the same, you cannot use it.
 
Humans are not prefect at issuing instructions for the CPU to follow. This causes the CPU to give rise to certain behaviour that was not expected. This could happen in either of the two ways, first, the CPU behaves in a certain way that is not well understood by the human programmer and hence the assumptions made while writing the instructions gave rise to an unexpected behaviour. Second, the programmer forgot to specify how to handle the corner cases while he was writing the program. These unexpected behaviour basically deteriorate the reliability of the system. The best case is when we known the behaviour of the system under all possible inputs, but there is a possibility that checking the behaviour of a program under all possible inputs might not be feasible.
 
The 8051 and MSP430 CPU has the following class of instructions:
  1. Data Transfer Instructions - Transfer data to between RAM and registers
  1. Arithmetic Instructions - Performs arithmetic operations
  1. Logical Instructions - Perform boolean operations like AND, OR, XOR, NOT etc.
  1. Boolean Instructions - Single bit level manipulations. Examples, Set, Clear and Complement instructions.
  1. Program Branching Instructions - Change the flow of execution of instructions either depending on a condition or pre-determined location.
 
A more lower level representation of instructions is required if you need to your program to be able to run on multiple CPUs that understand different machine code. The instruction set of each CPU is different, meaning each CPU has support for different addressing modes (the way in which you define the operand for your instruction), support for different instructions. If we have a language that is generic enough that all Instruction Set Architectures (ISAs) are specialized implementation of “General Purpose Instruction Set Architecture (GPISA)”. This would allow us to compile each program down to a GPISA and then use the feature set available in each ISA to generate the machine code for that CPU.
 
Having a language that is close to the machine code and is ISA agnostic would be helpful. If the language is close to machine code, this means translation into the actual machine code would be easier.
 
Here is a list of common microcontrollers. And the most commonly used ISAs are:
  1. RISC-V
  1. MCS-51 (8051 core)
  1. ARM
  1. 6502
  1. x86 (haven’t seen this in any microcontroller)
  1. ARM64
  1. AVR
  1. AVR32
  1. MIPS
  1. Z80
 
One way of defining a close to machine code language is by listing out all the variations in different machine codes and then building a generic machine code language. Like listing out all the different machine code instructions, all possible addressing formats possible etc. This seems similar to an Intermediate Representation (IR) in compiler design lingo.
 
The idea is to implement this