The case of UEFI for Windows on ARM, and comparison with LK/ABoot

Nights before trips are always boring, and I decided to draft some words to spend the time. So we have Windows 10 on ARM running on Dragonboard 410c, and Lumia 950 XL (Article in Chinese, sorry). It will be helpful to write down some firmware-related information for platform bring-ups for further reference. Meanwhile, the comparison of Little Kernel, the common Linux Android (well, Qualcomm says so) bootloader will provide useful information for Android on Lumia project.

I recommended you read this article if you are not familiar with UEFI.

Assumptions, assumptions

Compared to Linux, Windows Kernel assumes its platform firmware and bootloader (aka. Windows Boot Manager) prepare the basic environment for successful kernel initializations. If certain components are not initialized, bugchecks may occur. Even the system successfully launches, it may have some unexpected behaviors (weird things). An official document explains these a lot.

Little Kernel initializes basic hardware too (at least you need serial for debugging). Certain periapical, including clocks, regulators, and USB are initialized too for application purposes (e.g. Fastboot). Anyway, it initializes less periapical as possible. Sometimes even the panel is not brought up (I’ve seen a case on Android phone).

In short, you have to do more for a successful Windows bring-up:

  • If you know certain components are in the usable state already, skip initialization procedures. For example, on Lumia 950 XL, our UEFI implementation does not need to initialize USB since our bootstrapper (Qualcomm UEFI) did so.
  • If your platform has PCIe components, clocks them up, set regulators and mappings, etc.
  • Initialize at least one debug resource described in your DBG2 table (if applicable, likely on all ARM platforms)
  • Bring up the panel, set basic display parameters and pass a framebuffer pointer for Windows.

So how about Linux? If your Linux platform uses DT instead of ACPI, you are likely not required to do most of the stuff Windows requires. On Qualcomm platforms, Linux kernel will clock up PCIe cores, set regulators and mappings to make it in the usable state. If your platform uses standard ACPI and platform drivers do not perform additional initialization procedures, initialize these components in firmware.

Fill the hole

Both UEFI w/ ACPI and LK will perform fix-up tasks before transferring control to the kernel. On Qualcomm platforms, chipset metadata (revision, foundry ID, etc.) will be filled in DSDT. Certain logic in DSDT depends on them. Typical Linux Android device will ship with a large DT for multiple variants. LK selects the best fit using chipset ID/PMIC ID/board ID, then fill in some memory region information for kernel use.

ACPI tables in the firmware for Windows 10 on ARM is pre-patched. So I don’t implement the fix-up logic additionally.

Multi-processor Startup, Again

Why am I discussing the thing again? Because it is important.

Little Kernel (and likely other Linux Android bootloaders) will only use a single processor in its lifecycle (a notable exception is Raspberry Pi, which uses spin table except 3+). When it transfers control to Linux, Linux will bring other cores out of reset state and make them available for use.

Windows platforms that implement ACPI Multi-Processor Parking Protocol behaves differently. Although firmware uses a single core, other CPU cores are brought out of the reset state and being instructed to run a special piece of code. The code flow is like this:

parking:
    Wait for an interrupt.
    Am I the processor being waked up?
    If yes, go to the address that OS told me
    If not, go back to parking.

(Interrupt acknowledgment and memory barriers ignored. Sorry, I don’t want to write assembly at 11 PM.)

Because different platforms handle core startup differently (on Qualcomm platforms, TrustZone has participated), booting Linux Kernel and starts cores the Linux way with a UEFI firmware that implements this protocol may fail. Someone told me he was unable to bring up other three cores on 640. It is reasonable since LK on recent Lumia phones is launched via a special UEFI application in Windows Boot Application form. Qualcomm UEFI put the other three cores in running state (and WFI). Both LK and Linux are not aware of that (they have the assumption of core state). Finally, core startup fails.

Since it is not possible to ditch Qualcomm UEFI (unlike the exploit for first-generation Lumia WP8 devices), we have to comfort the parking protocol in AArch32 mode (You have PSCI for AArch64 SoCs):

  • Ignore other cores Unicore is the best
  • Implement parking protocol for unsupported systems (not too hard). Linux has the protocol support; you have to enable it.
  • Go AArch64 and use PSCI (remember to use HVC mode for 8992/8994)

 

Good night (And to my girlfriend: If you see this article, sorry that I say “Good Night” too early.)

 

Give TianoCore/EDK2 on AArch64 a hand in 2018

Windows 10 PE runs on a Spandragon 410 processor.

Also posted here: Bringing up Windows 10 AArch64 on a $50 Single Board Computer

Windows on ARM is not a new topic. There are some guys attempted to bring up Windows RT and Windows 10 on Qemu (ARM/AArch64 target). It even runs on Raspberry Pi 3. Obviously it is not a Snapdragon 835-only thing. We can give it a hand on our own Single Board Computers.

This article covers some important details in Dragonboard 410c SBC’s aa64 UEFI implementation.

Contents

    • Windows Boot Requirements
    • Bootstrapping your own EDK2/TianoCore UEFI
    • Memory Allocation / Memory Management Unit
    • UEFI Flash Definition
    • First-stage Bootloader (Little Kernel)
    • Persistent NVRAM Support
    • A “Working” RTC
    • Multi-processor startup (PSCI)

Windows Boot Requirements (AArch64)

  • AArch64 architecture processor. It seems that AArch64 cryptography extension is required too (Raspberry Pi 3 randomly throws UNSUPPORTED_PROCESSOR bugcheck, rs4 fixed the issue). The bugcheck is raised in Errata Check (a hardcoded ID check).
  • For multi-processor systems, either Microsoft ARM Multi-processor Parking Protocol or ARM PSCI interface shall be implemented. All current Windows 10 IoT ARM32 platforms implement former one.
  • A working interrupt controller. Most AArch64 SoC cores include ARM GIC, so there’s little work to do here. The only exception I know is BCM2837. Windows has inbox Broadcom interrupt controller support (for the sake of Raspberry Pi). But if your SoC has additional third party interrupt controller, you need to supply your own HAL extension library. There is few documentation for this available though…
  • A working processor timer. If not, supply your own HAL extension library.
  • Complete ACPI 5.1/6.0 and UEFI 2.3+ implementation. Do not try to use Das U-Boot’s EFI implementation; it’s broken.

These requirements are fairly similar to ARM SBBR certification requirements. If your SBC has a working EDK2/TianoCore UEFI, then you are probably good to go. Bootstrapping your own EDK2 is pretty easy too.

Bootstrapping your own EDK2/TianoCore

The board I used (DragonBoard 410c) doesn’t have a known EDK2/TianoCore implementation. So I have to build my own. This repository for Raspberry Pi 3 is a good start point and reference for you.

You need to do these things in UEFI:

  • Initialize serial output (for debugging) and Memory Management Unit (MMU). Refer to your platform datasheet for device memory address allocation.
  • Retrieve required information from pre-UEFI environment and build Hand-off Blocks (HOB) for DXE phase
  • Initialize processor (exception vector, etc.) in DXE phase.
  • Initialize required peripherals (GPIO, GIC, eMMC, USB, RTC, Display…) in DXE phase.
  • Initialize UEFI services (variable services) in DXE phase.
  • Jump to BDS phase, start Windows Boot Manager or something else.

Memory Allocation / Memory Management Unit

Memory allocation is a platform-specific thing. Check your platform HRD to get some idea about MMU and memory allocation. For Snapdragon 410, check out Qualcomm LM80-P0436-13.

UEFI Flash Definition

Our UEFI FD starts at 0x80200000. Update your tokens in platform definition and flash definition:

And the first piece code should be your SEC initialization code (without relocation).

Little Kernel (mentioned below) will be responsible for jumping into UEFI FD at 0x80200000 and handing off execution. If you want, you can actually removes Android-specific header and device tree validation in LK (apps/aboot.c).

First-stage bootloader (Little Kernel)

DragonBoard 410c uses ARM Secure Monitor Call to switch to AArch64 mode (See Qualcomm LM80-P0436-1 for more information). The stock close-sourced SBL doesn not recognize AArch64 ELF files (later model should). LK performs basic platform initialization (UART, eMMC, MMU, etc.) A modified variant LK also initializes FrameBuffer for U-Boot. We can make it work for our UEFI too.
Windows requires UEFI provide a BGRA FrameBuffer. To achieve this, we need to modify pixel unpack pattern in platform/msm_shared/mdp5.c:


case 32:
/* Windows requires a BGRA FB */
writel(0x000236FF, pipe_base + PIPE_SSPP_SRC_FORMAT);
writel(0x03020001, pipe_base + PIPE_SSPP_SRC_UNPACK_PATTERN);

You can either specify a hard-coded address for FrameBuffer, or have a random piece of memory block to transfer information (pixel format, width, height, etc.) to UEFI. UEFI SEC phase retrieve the information, allocate HOB block and transfer information to DXE phase. A simple FrameBuffer driver retrieve information from HOB block, initializes UEFI Graphics Output Protocol. For optimal performance, initialize this piece of memory block as write-through cache memory in MMU initialization.

Persistent NVRAM Support

For persistent NVRAM support, it’s a good idea to use eMMC as storage device. This implementation demonstrates how to simulate NVRAM using eMMC and a piece of memory. I slightly modified it make it work for Qualcomm devices:

  • If eMMC NVRAM region is corrupted or uninitialized, provision it and perform a platform warm reset so I don’t get a synchronous exception in volatile variable initialization phase.
  • Modify dependency relationship to prevent “device not found” error in BlockRamVariable DXE initialization.

Windows Boot Manager depends on a “working” Real Time Clock for miscellaneous purposes. APQ8016/MSM8916 has a RTC on its PMIC processor PM8916. To access RTC services, read/write SPMI registers (see Qualcomm LM80-P0436-36). If you are lazy, just use Xen fake RTC in ArmVirtPkg.
To enable PM8916 RTC, set SPMI register 0x6046 to enabled state, then read 0x6048 and three following bits.

Note: I implemented my own PMIC protocol called PM8916Protocol that read/writes PMIC register on SPMI bus, slave #0. This RTC library is based on Xen face RTC library from ArmVirtPkg.

4KB / 64KB Page Table

For most single board computers, you will probably hit issues in ExitBootServices. EDK2 assumes runtime world follows 64KB/Page memory allocation, while most single board computers supply only less than 2GB memory. On these boards, MMU will run in 4KB PT mode. To resolve the issue, go to MdePkg/Include/AArch64/ProcessorBind.h:


///
/// The stack alignment required for AARCH64
///
#define CPU_STACK_ALIGNMENT 16

///
/// Page allocation granularity for AARCH64
///
#define DEFAULT_PAGE_ALLOCATION_GRANULARITY (0x1000)

///
/// For the sake of our SBCs
///
#define RUNTIME_PAGE_ALLOCATION_GRANULARITY (0x1000)

Set runtime page allocation granularity to 0x1000 (4KB). If your board has memory larger than 2GB, you should not modify this value; instead, check your memory allocation. There’s another interesting case with 4KB/64KB page on Cortex A53.

ARM Erratum

I randomly hit crashes (synchronous exception) during my UEFI development. After some investigation, it seems that the problem is related to load/store commands. (See ARM Errata 835769, 843419) To prevent random crashes, add these two flags to your GCC compiler:

Multi-Processor Startup (PSCI)

For platforms that implement ARM PSCI, indicate PSCI support in ACPI FADT table:


EFI_ACPI_6_0_HW_REDUCED_ACPI | EFI_ACPI_6_0_LOW_POWER_S0_IDLE_CAPABLE, // UINT32 Flags
{
EFI_ACPI_6_0_EMBEDDED_CONTROLLER,
0,
0,
EFI_ACPI_6_0_DWORD,
0x009020B4
}, // EFI_ACPI_6_0_GENERIC_ADDRESS_STRUCTURE ResetReg
1, // UINT8 ResetValue
EFI_ACPI_6_0_ARM_PSCI_COMPLIANT, // UINT16 ArmBootArchFlags
EFI_ACPI_6_0_FIXED_ACPI_DESCRIPTION_TABLE_MINOR_REVISION, // UINT8 MinorRevision

Typically you don’t need HVC call for PSCI. If you did so (and your platform doesn’t support HVC call for PSCI), you will get a INTERNAL_POWER_ERROR bugcheck with first parameter of 0x0000BEEF.
If you indicates PSCI support, you don’t have to provide parking protocol version in your ACPI MADT table. Simply set it to 0. Here’s one example:


[02Ch 0044 1] Subtable Type : 0B [Generic Interrupt Controller]
[02Dh 0045 1] Length : 50
[02Eh 0046 2] Reserved : 0000
[030h 0048 4] CPU Interface Number : 00000000
[034h 0052 4] Processor UID : 00000000
[038h 0056 4] Flags (decoded below) : 00000001
Processor Enabled : 1
Performance Interrupt Trigger Mode : 0
Virtual GIC Interrupt Trigger Mode : 0
[03Ch 0060 4] Parking Protocol Version : 00000000
[040h 0064 4] Performance Interrupt : 00000017
[044h 0068 8] Parked Address : 0000000080301000
[04Ch 0076 8] Base Address : 0000000000000000
[054h 0084 8] Virtual GIC Base Address : 0000000000000000
[05Ch 0092 8] Hypervisor GIC Base Address : 0000000000000000
[064h 0100 4] Virtual GIC Interrupt : 00000000
[068h 0104 8] Redistributor Base Address : 0000000000000000
[070h 0112 8] ARM MPIDR : 0000000000000000
[078h 0120 1] Efficiency Class : 00
[079h 0121 3] Reserved : 000000

See ARM Juno reference platform to get some idea about crafting ACPI tables.

That’s it! Welcome to Windows 10 Userland.

Spend some nights writing Windows drivers. 😛

Windows 10 PE runs on a Spandragon 410 processor.
Windows 10 PE runs on a Spandragon 410 processor.

DragonBoard 410c w/ UEFI

DragonBoard 410c runs Windows RT 8.1

I purchased a Qualcomm DragonBoard 410c for some specific purpose (generally, for fun). All DragonBoard 410c ship with Android, but I want Windows IoT and UEFI so I flashed it.

Dragonboard 410c runs Windows RT 8.1
Dragonboard 410c runs Windows RT 8.1

Well, you can boot Windows RT 8.1 on it, as long as you got critical HAL extensions. But the most important one, USB controller, is not available on DragonBoard 410c. It utilizes USB Role Switch, which is officially supported in Windows 10, not in Windows 8.1. I posted how to boot Windows RT 8.1 installation disk on XDA Developers forum, if you are interested in that, please search it.

Note: USB will not be a problem for DragonBoard 800 because Snapdragon 800 has more than one USB channel, while Snapdragon 410 has only one USB channel.

Windows failed to load with Bugcheck code 0x5c
Windows failed to load with Bugcheck code 0x5c

I think there’s something weird with the internal EFI shell. It seems that all GOP operations will let the firmware hang, however, you can run EFI applications with GOP in the internal EBL(yet another lightweight EFI shell). What’s more, if you boot any Windows OS releases from EFI shell, it will crash during HAL initialization. Such situation doesn’t exist on any Qualcomm-based Windows Phones (MSM8960), so I believe it is a bug specific to 410c’s firmware.

Huaji PC Boot Logo
Huaji PC Boot Logo

All ACPI-related files, including boot logo, is stored in a small hidden FAT16 partition called PLAT. You can replace files – but I haven’t tested customized ACPI DSDT table. Maybe we can let USB controller work by removing device URS0 and expose device USB0 to root.

GRUB-EFI (ARMHF) works on Qemu emulator, but it will hang on 410c’s firmware. I haven’t got a USB UART cable yet, so I didn’t know what happened.