Bootstrapping the blue Windows when you have random AArch64 devices in the backyard
Last year I mentioned my attempt to bootstrap Windows 10 on Dragonboard 410c. This year I ported EDK2 to Nintendo Switch and successfully booted Windows 10 arm64 installation ramdisk (rs4, rs5, and 19H1 tested as of writing time). I will briefly introduce a common way to port EDK2 with existing codebase (e.g. U-Boot), as well as cases of booting in EL2 (hypervisor).
Background
While this article applies to most ARM SoCs, the following content will use Tegra as the example. NVIDIA developed a few solutions for Windows on ARM in Windows 8 era: Tegra 3 (Tegra30) and 4 (Tegra114). No further model have official Windows BSP (Board Support Package) released publicly due to low market acceptance of those Windows RT products.
Despite of that, general AArch64 processors are capable to run Windows 10 without additional HAL extension library if the following conditions are satisfied:
- Architecture Timer with ACPI GTDT table description. Either CP15 or MMIO clock is okay.
- Generic Interrupt Controller v2/v3 (we are not yet aware of v4 support) with ACPI MADT (APIC) table description, or Broadcom Interrupt Controller
- AArch64 instruction set (crypto extension is not required)
- ARM Performance Monitor Unit with ACPI MADT (APIC) table description
One noticeable exception the initial generation of Qualcomm Kryo (Snapdragon 820, 821) due to the faulty cache design in large core cluster. Windows removed the required erratum for it due to the complication of patch maintenance.
In the case of Tegra X1, it satisfied all conditions outlined above. I used an old-bootrom Nintendo Switch as my experiment platform since it is much cheaper than Jetson TX1. Additionally, there is verified CoreBoot and U-Boot source code for these Tegra X1 devices including Nintendo Switch.
I assume you are familiar with the NVIDIA RCM Exploit (Fusee-Gelee) as well as Tegra Boot flow. If you are not familiar with Tegra Boot flow, please refer to Tegra Technical Reference Manual available on NVIDIA developer site.
Port U-Boot Code to EDK2
There are a few environment assumptions that need to be addressed while porting U-Boot device/driver code to EDK2:
- While U-Boot runs in AArch64 context, it only utilizes little amount of memory at the memory bottom in most circumstances. EDK2/TianoCore loads everything as high as possible per UEFI specification. Certain peripheral operations are not 64-bit addressing aware. It’s okay to force converting 64-Bit pointers to 32-Bit without data loss in the U-Boot assumption, but in EDK2 this might lead to issues. One case is SDMA (single operation DMA). Tegra SDHCI controller SDMA operations are not 64-bit addressing aware. To address the issue, I slightly modified the DMA bounce buffer allocation library (also ported from U-Boot) to allocate bottom memory instead.
- Syntax styles. U-Boot observes the Linux naming convention for functions and types; EDK2 observes the Windows style. It might be a good idea to write a shim to provide functions like readl/writel as well as udelay/mdelay.
- There is probably no need for porting generic classes (e.g. udevice). You might not need them in EDK2 context.
To save myself some time bootstrapping the microSD slot, I ported the clock and device framework from U-Boot to EDK2. Here are a few suggestions while porting U-Boot code to EDK2:
- Address issues mentioned above.
- Put device specific definitions into “Include” directory, use PCD database when necessary.
- Install these code services as DXE driver whenever possible. Invoke them using protocols.
- For board/machine-dependent code library (e.g. mach-tegra), depends on the usage to integrate them with driver or use additional library instead.
From Device Tree to ACPI
Device Tree is the de-facto standard in ARM to describe the system and peripheral hierarchy. Windows RT introduces the intensive use of ACPI on ARM platforms. I will cover some required tables for a success Windows startup on ARM platforms. For tables such as CSRT and DSDT, check out the Microsoft documentation.
GTDT (Generic Timer Description Table)
For SoC with architecture timer, ARM defines GTDT table to describe platform timer information. In the device tree, an architectural timer may looks like this:
timer {
compatible = "arm,armv8-timer";
interrupts = <GIC_PPI 13 (GIC_CPU_MASK_SIMPLE(4) IRQ_TYPE_LEVEL_LOW)>,
<GIC_PPI 14 (GIC_CPU_MASK_SIMPLE(4) IRQ_TYPE_LEVEL_LOW)>,
<GIC_PPI 11 (GIC_CPU_MASK_SIMPLE(4) IRQ_TYPE_LEVEL_LOW)>,
<GIC_PPI 10 (GIC_CPU_MASK_SIMPLE(4) IRQ_TYPE_LEVEL_LOW)>;
interrupt-parent = <&gic>;
};
And it looks like this in ACPI GTDT table:
....
[024h 0036 8] Counter Block Address : FFFFFFFFFFFFFFFF
[02Ch 0044 4] Reserved : 00000000
[030h 0048 4] Secure EL1 Interrupt : 0000001D
[034h 0052 4] EL1 Flags (decoded below) : 00000002
Trigger Mode : 0
Polarity : 1
Always On : 0
[038h 0056 4] Non-Secure EL1 Interrupt : 0000001E
[03Ch 0060 4] NEL1 Flags (decoded below) : 00000002
Trigger Mode : 0
Polarity : 1
Always On : 0
[040h 0064 4] Virtual Timer Interrupt : 0000001B
[044h 0068 4] VT Flags (decoded below) : 00000002
Trigger Mode : 0
Polarity : 1
Always On : 0
[048h 0072 4] Non-Secure EL2 Interrupt : 0000001A
[04Ch 0076 4] NEL2 Flags (decoded below) : 00000002
Trigger Mode : 0
Polarity : 1
Always On : 0
[050h 0080 8] Counter Read Block Address : FFFFFFFFFFFFFFFF
...
- If your platform does not have MMIO architectural timer, write the address as 0xFFFFFFFFFFFFFFFF.
- If you boot from EL2, you are required to supply all timer values. Otherwise only EL1 timers are needed.
- PPI starts at 16. Plus 16 for all interrupt numbers you have in the device tree. The four interrupts are Secure EL1, Non-secure EL1, virtual timer and hypervisor in sequence.
- You may have platform watchdog, supply it in the GTDT table too (see Qualcomm example). It is not mandatory for booting Windows though.
MADT (Multiple APIC Description Table)
Most AArch64 SoC systems have one or more GIC-compatible interrupt controllers. Windows has inbox GIC support, all needed is supplying proper information in the MADT table. The table also describes ARM Performance Monitor Unit information for system’s reference. In device tree, GIC and PMU look like this:
gic: interrupt-controller@50041000 {
compatible = "arm,gic-400";
#interrupt-cells = <3>;
interrupt-controller;
reg = <0x0 0x50041000 0x0 0x1000>,
<0x0 0x50042000 0x0 0x2000>,
<0x0 0x50044000 0x0 0x2000>,
<0x0 0x50046000 0x0 0x2000>;
interrupts = <GIC_PPI 9 (GIC_CPU_MASK_SIMPLE(4) IRQ_TYPE_LEVEL_HIGH)>;
interrupt-parent = <&gic>;
};
arm_pmu: arm-pmu {
compatible = "arm,armv8-pmuv3";
interrupts = <GIC_SPI 144 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 145 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 146 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 147 IRQ_TYPE_LEVEL_HIGH>;
};
An example of the MADT table can be found here.
- In MADT table, each processor core have an table entry. Make sure you have the same CPU object in DSDT table, with identical and unique UID and CPU interface ID.
- If your platform supports ARM PSCI, parking address field can be ignored.
- The four registers in GIC device tree are GIC distributor, GIC base address, hypervisor GIC base address and virtual GIC base address.
- You might need to supply GIC redistributor address on GICv3 architecture.
- SPI interrupt number starts at 32. Plus 32 for all performance interrupt number in MADT table.
- MPIDR value needs to be referred from platform resources.
DBG2 (Microsoft Debug Table 2)
Microsoft defines DBG2 table for ARM platforms. Although Microsoft docs mark DBG2 table info as mandatory, you do not need to supply debug device information if you just want to boot Windows as a proof-of-concept :P. An empty DBG2 table is enough for booting.
For debug purposes, it is necessary to define at least one debug device (8250/16550 serial or USB) in DSDT and DBG2 table. More information can be found at here.
FADT (Fixed ACPI Description Table)
Indicates PSCI support and Hardware-reduced ACPI mode, then you are good to go.
Debugging ACPI
It’s incredibly difficult to debug early ACPI startup if you don’t have serial or debug access on the platform. Fortunately, Linux provides some utility for it. It is feasible to enable the UEFI FrameBuffer early printk support on 5.0+ kernels to simplify the debug process.
Conclusion
With much effort, Windows on ARM can run on a variety of AArch64 devices. There’s still much work between “just-booted” and “usable”, and it may cost you countless nights to achieve your marvel, even if there are always guys ask you “why”:
https://www.reddit.com/r/SwitchHacks/comments/awukbx/windows_on_switch_being_worked_on/