Do You want to create your own OS (mission 7)

11 min readSep 6, 2021

Hello Friends, This is the seventh part of my article series, about building my own OS as an experiment. If you already read my previous articles on this series I think it may be a help to understand this part.

Virtual memory and Pagin

Let’s talk about Virtual memory. what is it?

In computing, virtual memory, or virtual storage is a memory management technique that provides an “idealized abstraction of the storage resources that are actually available on a given machine” which “creates the illusion to users of a very large (main) memory”.

The computer’s operating system, using a combination of hardware and software, maps memory addresses used by a program, called virtual addresses, into physical addresses in computer memory. Main storage, as seen by a process or task, appears as a contiguous address space or collection of contiguous segments. The operating system manages virtual address spaces and the assignment of real memory to virtual memory. Address translation hardware in the CPU, often referred to as a memory management unit (MMU), automatically translates virtual addresses to physical addresses. Software within the operating system may extend these capabilities to provide a virtual address space that can exceed the capacity of real memory and thus reference more memory than is physically present in the computer.

The primary benefits of virtual memory include freeing applications from having to manage a shared memory space, ability to share memory used by libraries between processes, increased security due to memory isolation, and being able to conceptually use more memory than might be physically available, using the technique of paging or segmentation.

You can find more information from Wikipedia.

Virtual Memory Through Segmentation?

You could skip paging completely and simply use division for virtual memory. Every client mode cycle would get its own section, with base location and cutoff appropriately set up. This way no cycle can see the memory of another interaction. An issue with this is that the actual memory for a cycle should be coterminous (or if nothing else it is extremely advantageous in case it is). It is possible that we need to know ahead of time how much memory the program will require (improbable), or we can move the memory portions to where they can develop when the breaking point is reached (costly, causes fracture — can result in “out of memory” despite the fact that enough memory is accessible). Paging tackles both these issues.

Paging

In PC working frameworks, memory paging is a memory the board plot by which a PC stores and recovers information from optional storage[a] for use in principle memory. In this plan, the working framework recovers information from auxiliary stockpiling in same-size blocks called pages. Paging is a significant piece of virtual memory executions in present-day working frameworks, utilizing the optional capacity to allow projects to surpass the size of accessible actual memory.

For straightforwardness, primary memory is classified “Smash” (an abbreviation of irregular access memory) and auxiliary stockpiling is designated “plate” (a shorthand for hard circle drive, drum memory or strong state drive, and so forth), however likewise with numerous parts of processing, the ideas are autonomous of the innovation utilized.

Paging in x86

Paging in x86 comprises a page index (PDT) that can contain references to 1024 page tables (PT), every one of which can highlight 1024 areas of actual memory called page outlines (PF). Each page outline is 4096 bytes enormous. In a virtual (straight) address, the most noteworthy 10 pieces indicate the offset of a page catalog section (PDE) in the current PDT, the following 10 pieces the offset of a page table passage (PTE) inside the page table highlighted by that PDE. The least 12 pieces in the location are counterbalanced inside the page casing to be tended to.

All page indexes, page tables, and page outlines should be adjusted on 4096-byte addresses. This makes it conceivable to address a PDT, PT, or PF with simply the most elevated 20 pieces of a 32-bit address since the last 12 should be zero.

The PDE and PTE structure is basically the same as one another: 32 pieces (4 bytes), where the most elevated 20 pieces focus on a PTE or PF, and the least 12 pieces control access rights and different designs. 4 bytes times 1024 equivalents 4096 bytes, so a page index and page table both fit in a page outline themselves.

The interpretation of straight locations to actual addresses is portrayed in the figure underneath.

While pages are ordinarily 4096 bytes, it is additionally conceivable to utilize 4 MB pages. A PDE then, at that point focuses straightforwardly on a 4 MB page outline, which should be adjusted on a 4 MB address limit. The location interpretation is practically equivalent to in the figure, with simply the page table advance eliminated. It is feasible to blend 4 MB and 4 KB pages.

Identity and Enabling Paging

The least complex sort of paging is the point at which we map each virtual location onto a similar actual location, called personality paging. This should be possible to incorporate time by making a page catalog where every passage focuses on its comparing 4 MB outline. In NASM this should be possible with macros and orders (%rep, times, and dd). It can obviously additionally be done at showtime by utilizing normal get-together code directions.

Paging is empowered by first composing the location of a page registry to cr3 and afterward setting bit 31 (the PG “paging-empower” bit) of cr0 to 1. To utilize 4 MB pages, set the PSE bit (Page Size Extensions, bit 4) of cr4. The accompanying get together code shows a model:

; eax has the address of the page directory
    mov cr3, eax

    mov ebx, cr4        ; read current cr4
    or  ebx, 0x00000010 ; set PSE
    mov cr4, ebx        ; update cr4

    mov ebx, cr0        ; read current cr0
    or  ebx, 0x80000000 ; set PG
    mov cr0, ebx        ; update cr0

    ; now paging is enabled

Note that all locations inside the page index, page tables, and in cr3 should be actual addresses to the designs, never virtual. This will be more significant in later areas where we progressively update the paging structures (see the part “Client Mode”).

The guidance that is helpful when refreshing a PDT or PT is invlpg. It discredits the Translation Lookaside Buffer (TLB) section for a virtual location. The TLB is a reserve for deciphered addresses, planning actual addresses relating to virtual addresses. This is possibly required while changing a PDE or PTE that was recently planned to something different. In the event that the PDE or PTE had recently been set apart as not present (bit 0 was set to 0), executing invlpg is pointless. Changing the worth of cr3 will make all sections in the TLB be negated.

; invalidate any TLB references to virtual address 0
    invlpg [0]

This section will describe how paging affects the OS kernel. We encourage you to run your OS using identity paging before trying to implement a more advanced paging setup since it can be hard to debug a malfunctioning page table that is set up via assembly code.

On the off chance that the bit is put toward the start of the virtual location space — that is, the virtual location space (0x00000000, “size of bit”) guides to the area of the piece in memory — there will be issues while connecting the client mode measure code. Typically, during connecting, the linker accepts that the code will be stacked into the memory position 0x00000000. Consequently, when settling supreme references, 0x00000000 will be the base location for ascertaining the specific position. Be that as it may if the portion is planned onto the virtual location space (0x00000000, “size of part”), the client mode measure can’t be stacked at virtual location 0x00000000 — it should be set elsewhere. In this manner, the suspicion from the linker that the client mode measure is stacked into memory at position 0x00000000 isn’t right. This can be revised by utilizing a linker script that advises the linker to expect an alternate beginning location, yet that is an extremely lumbering answer for the clients of the working framework.

This likewise expects that we need the piece to be important for the client mode cycle’s location space. As we will see later, this is a pleasant element, since during framework calls we don’t need to change any paging constructions to gain admittance to the bit’s code and information. The portion pages will obviously require advantage level 0 for access, to forestall a client cycle from perusing or composing part memory.

Virtual Address for the Kernel

Ideally, the portion ought to be put at an extremely high virtual memory address, for instance, 0xC0000000 (3 GB). The client mode measure isn’t probably going to be 3 GB huge, which is presently the lone way that it can struggle with the bit. At the point when the portion utilizes virtual addresses at 3 GB or more, it is known as a higher-half bit. 0xC0000000 is only a model, the bit can be set at any location higher than 0 to get similar advantages. Picking the right location relies upon how much virtual memory ought to be accessible for the piece (it is most straightforward if all memory over the part virtual location ought to have a place with the bit) and how much virtual memory ought to be accessible for the kernel.

Placing the Kernel

Placing the Kernal at 0xC0100000

First and foremost, it is smarter to put the portion at 0xC0100000 than 0xC0000000, since this makes it conceivable to plan (0x00000000, 0x00100000) to (0xC0000000, 0xC0100000). Along these lines, the whole reach (0x00000000, “size of part”) of memory is planned to the reach (0xC0000000, 0xC0000000 + "size of kernel").

Putting the portion at 0xC0100000 isn’t hard, however, it requires some idea. This is by and by a connecting issue. When the linker settles all outright references in the piece, it will accept that our part is stacked at actual memory area 0x00100000, not 0x00000000 since migration is utilized in the linker script (see the segment “Connecting the portion”). Notwithstanding, we need the leaps to be settled utilizing 0xC0100000 as a base location since in any case, a piece hop will bounce straight into the client mode measure code (recollect that the client mode measure is stacked at virtual memory 0x00000000).

Nonetheless, we can’t just advise the linker to accept that the part begins (is stacked) at 0xC01000000, since we need it to be stacked at the actual location 0x00100000. The justification for having the bit stacked at 1 MB is on the grounds that it can’t be stacked at 0x00000000 since there is BIOS and GRUB code stacked under 1 MB. Besides, we can’t accept that we can stack the bit at 0xC0100000 since the machine probably won’t have 3 GB of actual memory.

This can be addressed by utilizing both movement (.=0xC0100000) and the AT guidance in the linker script. Movement indicates that non-relative memory-references ought to should utilize the migration address as a base in address computations. AT determines where the piece ought to be stacked into memory. Migration is done at interface time by GNU ld, the heap address indicated by AT is taken care of by GRUB when stacking the piece, and is important for the ELF design.

We can modify the first linker script to implement this:

ENTRY(loader)           /* the name of the entry symbol */    . = 0xC0100000          /* the code should be relocated to 3GB + 1MB */    /* align at 4 KB and load at 1 MB */
    .text ALIGN (0x1000) : AT(ADDR(.text)-0xC0000000)
    {
        *(.text)            /* all text sections from all files */
    }    /* align at 4 KB and load at 1 MB + . */
    .rodata ALIGN (0x1000) : AT(ADDR(.text)-0xC0000000)
    {
        *(.rodata*)         /* all read-only data sections from all files */
    }    /* align at 4 KB and load at 1 MB + . */
    .data ALIGN (0x1000) : AT(ADDR(.text)-0xC0000000)
    {
        *(.data)            /* all data sections from all files */
    }    /* align at 4 KB and load at 1 MB + . */
    .bss ALIGN (0x1000) : AT(ADDR(.text)-0xC0000000)
    {
        *(COMMON)           /* all COMMON sections from all files */
        *(.bss)             /* all bss sections from all files */
    }

At the point when GRUB leaps to the bit code, there is no paging table. Consequently, all references to 0xC0100000 + X will not be planned to the right actual location, and will thusly cause an overall assurance special case (GPE) at the absolute best, in any case (if the PC has multiple GB of memory) the PC will simply crash.

Thusly, gathering code that doesn’t utilize relative leaps or relative memory addressing should be utilized to do the accompanying:

Set up a page table.

Add character planning for the initial 4 MB of the virtual location space.

Add a passage for 0xC0100000 that guides to 0x0010000

On the off chance that we avoid the personality planning for the initial 4 MB, the CPU would create a page shortcoming following paging was empowered when attempting to get the following guidance from memory. After the table has been made, a leap should be possible to a mark to make eip highlight a virtual location in the higher half:

; assembly code executing at around 0x00100000
    ; enable paging for both actual location of kernel
    ; and its higher-half virtual location

    lea ebx, [higher_half] ; load the address of the label in ebx
    jmp ebx                ; jump to the label

    higher_half:
        ; code here executes in the higher half kernel
        ; eip is larger than 0xC0000000
        ; can continue kernel initialisation, calling C code, etc.

The register eip will now point to a memory location somewhere right after 0xC0100000 - all the code can now execute as if it were located at, the higher half. The entry mapping of the first 4 MB of virtual memory to the first 4 MB of physical memory can now be removed from the page table and its corresponding entry in the TLB invalidated with invlpg [0].

There are a couple of more subtleties we should manage when utilizing a higher-half portion. We should be cautious when utilizing memory-planned I/O that utilizes explicit memory areas. For instance, the edge support is situated at 0x000B8000, however since there is no access in the page table for the location 0x000B8000 anymore, the location 0xC00B8000 should be utilized, since the virtual location 0xC0000000 guides to the actual location 0x00000000.

Any express references to addresses inside the multiboot structure should be changed to mirror the new virtual addresses also.

Planning 4 MB pages for the portion is straightforward, however squanders memory (except if you have a huge part). Making a higher-half portion planned in as 4 KB pages saves memory yet is more enthusiastically to set up. Memory for the page registry and one-page table can be held in the .information area, yet one requirement is to arrange the mappings from virtual to actual addresses at run-time. The size of the part can be dictated by trading names from the linker script , which we’ll have to do later in any case when composing the page outline allocation.

Virtual Memory Through Paging

Paging empowers two things that are useful for virtual memory. In the first place, it takes into account fine-grained admittance control to memory. You can stamp pages as perused just, read-compose, just for PL0 and so forth Second, it makes the hallucination of touching memory. Client mode measures, and the portion, can get to memory as though it were touching, and the adjoining memory can be stretched out without the need to move information around in memory. We can likewise permit the client mode programs admittance to all memory under 3 GB, yet except if they really use it, we don’t need to relegate page casings to the pages. This permits cycles to have code situated close to 0x00000000 and the stack at just underneath 0xC0000000, and still not need multiple real pages.

This is the end of this part, For more information, you can always see these references -Wikipedia & Little OS book.

Thank you.