Play “Overcooked” efficiently

I got a Nintendo Switch from my friend (for a research project). Meanwhile, I enjoyed the game “Overcooked” on Switch. In this game, you control cooks to perform variety tasks and then deliver orders in time. If orders are delivered in advance, some tip will be given. 1-4 players can play the game simultaneously.

It’s clear that you have to do everything as quick as possible to achieve high score in the game. Every task (e.g. cutting meat) need some time to complete. Certain task (e.g. frying) depends on other tasks. To eliminate unnecessary time cost (i.e. waiting for cutting to complete), I use the following strategies:

  • Minimize workers’ stall time (doing nothing). For example, it is not necessary for workers to wait for frying process (polling is not efficient). Like interrupts in modern machines, they can do something else like washing dishes and cutting meats while waiting for cooking. Once interrupt signals (frying completes), they enter interrupt servicing routine: get the food put into a plate. In most cases, the food is ready to serve then. Finally, they returned to what they were doing.
  • Again, make sure everything is doing something. This is especially important if you are playing with your friends. You had better analyze the dependency chain and discuss strategies with your friend before starting the game. Of course you should issue instructions to your friend during game if necessary.

Not all kitchens are easy to deal with. Some have dynamic arrangements – contents may change their location during the game session. Some kitchens have no constant light source. Other have isolated workspaces with conveyor belts or tables for swapping materials (I call it a “bus”).

  • Tables for swapping are usually space-constrained. If you are playing with your friends, you are probably simulating a Symmetric Multiprocessing system, and the bottleneck will be bus bandwidth. In such cases, you should consider the priority of materials. Once transfer finishes, get them as soon as possible.
  • Conveyor belts are high-latency bus, but they have relatively high bandwidth (Hey DDR4, I am looking at you). In some kitchen scenarios (e.g. making burgers), you can put everything on the belt in batches and fetch in batches too.
  • Some conveyor belts connect to trash can, which means materials must be fetched before the expiration. But some cook utilities will appear again if you put them into the trash can. In this way, you can prioritize the transfer of contents on the conveyor belt.
  • Try achieve full-duplex transfer and prefetch to save time. Consider the following scenario: you have a pot that cooks rice at once side, and food materials (rice and flour tortilla) on the other side. For the first time, you get rice and put them into pot. Once rice finishes, you carry cooked rice to the other side and wrap them with tortilla. Don’t get tortilla separately in another transfer. If you really have to do that, you can instruct other cooks (if exist) to prefetch some for you.
  • Prefetch might not work for all kitchens. In the case of cooking soup, mice will steal your food if it is unattended for a while. But you can secure processed food in pots so it won’t get stolen.

Get familiar with your kitchen and good luck! (Well, it is a bit boring if you have learned Machine Architecture and Operating System internals).

Give TianoCore/EDK2 on AArch64 a hand in 2018

Windows 10 PE runs on a Spandragon 410 processor.

Also posted here: Bringing up Windows 10 AArch64 on a $50 Single Board Computer

Windows on ARM is not a new topic. There are some guys attempted to bring up Windows RT and Windows 10 on Qemu (ARM/AArch64 target). It even runs on Raspberry Pi 3. Obviously it is not a Snapdragon 835-only thing. We can give it a hand on our own Single Board Computers.

This article covers some important details in Dragonboard 410c SBC’s aa64 UEFI implementation.

Contents

    • Windows Boot Requirements
    • Bootstrapping your own EDK2/TianoCore UEFI
    • Memory Allocation / Memory Management Unit
    • UEFI Flash Definition
    • First-stage Bootloader (Little Kernel)
    • Persistent NVRAM Support
    • A “Working” RTC
    • Multi-processor startup (PSCI)

Windows Boot Requirements (AArch64)

  • AArch64 architecture processor. It seems that AArch64 cryptography extension is required too (Raspberry Pi 3 randomly throws UNSUPPORTED_PROCESSOR bugcheck, rs4 fixed the issue). The bugcheck is raised in Errata Check (a hardcoded ID check).
  • For multi-processor systems, either Microsoft ARM Multi-processor Parking Protocol or ARM PSCI interface shall be implemented. All current Windows 10 IoT ARM32 platforms implement former one.
  • A working interrupt controller. Most AArch64 SoC cores include ARM GIC, so there’s little work to do here. The only exception I know is BCM2837. Windows has inbox Broadcom interrupt controller support (for the sake of Raspberry Pi). But if your SoC has additional third party interrupt controller, you need to supply your own HAL extension library. There is few documentation for this available though…
  • A working processor timer. If not, supply your own HAL extension library.
  • Complete ACPI 5.1/6.0 and UEFI 2.3+ implementation. Do not try to use Das U-Boot’s EFI implementation; it’s broken.

These requirements are fairly similar to ARM SBBR certification requirements. If your SBC has a working EDK2/TianoCore UEFI, then you are probably good to go. Bootstrapping your own EDK2 is pretty easy too.

Bootstrapping your own EDK2/TianoCore

The board I used (DragonBoard 410c) doesn’t have a known EDK2/TianoCore implementation. So I have to build my own. This repository for Raspberry Pi 3 is a good start point and reference for you.

You need to do these things in UEFI:

  • Initialize serial output (for debugging) and Memory Management Unit (MMU). Refer to your platform datasheet for device memory address allocation.
  • Retrieve required information from pre-UEFI environment and build Hand-off Blocks (HOB) for DXE phase
  • Initialize processor (exception vector, etc.) in DXE phase.
  • Initialize required peripherals (GPIO, GIC, eMMC, USB, RTC, Display…) in DXE phase.
  • Initialize UEFI services (variable services) in DXE phase.
  • Jump to BDS phase, start Windows Boot Manager or something else.

Memory Allocation / Memory Management Unit

Memory allocation is a platform-specific thing. Check your platform HRD to get some idea about MMU and memory allocation. For Snapdragon 410, check out Qualcomm LM80-P0436-13.

UEFI Flash Definition

Our UEFI FD starts at 0x80200000. Update your tokens in platform definition and flash definition:

And the first piece code should be your SEC initialization code (without relocation).

Little Kernel (mentioned below) will be responsible for jumping into UEFI FD at 0x80200000 and handing off execution. If you want, you can actually removes Android-specific header and device tree validation in LK (apps/aboot.c).

First-stage bootloader (Little Kernel)

DragonBoard 410c uses ARM Secure Monitor Call to switch to AArch64 mode (See Qualcomm LM80-P0436-1 for more information). The stock close-sourced SBL doesn not recognize AArch64 ELF files (later model should). LK performs basic platform initialization (UART, eMMC, MMU, etc.) A modified variant LK also initializes FrameBuffer for U-Boot. We can make it work for our UEFI too.
Windows requires UEFI provide a BGRA FrameBuffer. To achieve this, we need to modify pixel unpack pattern in platform/msm_shared/mdp5.c:


case 32:
/* Windows requires a BGRA FB */
writel(0x000236FF, pipe_base + PIPE_SSPP_SRC_FORMAT);
writel(0x03020001, pipe_base + PIPE_SSPP_SRC_UNPACK_PATTERN);

You can either specify a hard-coded address for FrameBuffer, or have a random piece of memory block to transfer information (pixel format, width, height, etc.) to UEFI. UEFI SEC phase retrieve the information, allocate HOB block and transfer information to DXE phase. A simple FrameBuffer driver retrieve information from HOB block, initializes UEFI Graphics Output Protocol. For optimal performance, initialize this piece of memory block as write-through cache memory in MMU initialization.

Persistent NVRAM Support

For persistent NVRAM support, it’s a good idea to use eMMC as storage device. This implementation demonstrates how to simulate NVRAM using eMMC and a piece of memory. I slightly modified it make it work for Qualcomm devices:

  • If eMMC NVRAM region is corrupted or uninitialized, provision it and perform a platform warm reset so I don’t get a synchronous exception in volatile variable initialization phase.
  • Modify dependency relationship to prevent “device not found” error in BlockRamVariable DXE initialization.

Windows Boot Manager depends on a “working” Real Time Clock for miscellaneous purposes. APQ8016/MSM8916 has a RTC on its PMIC processor PM8916. To access RTC services, read/write SPMI registers (see Qualcomm LM80-P0436-36). If you are lazy, just use Xen fake RTC in ArmVirtPkg.
To enable PM8916 RTC, set SPMI register 0x6046 to enabled state, then read 0x6048 and three following bits.

Note: I implemented my own PMIC protocol called PM8916Protocol that read/writes PMIC register on SPMI bus, slave #0. This RTC library is based on Xen face RTC library from ArmVirtPkg.

4KB / 64KB Page Table

For most single board computers, you will probably hit issues in ExitBootServices. EDK2 assumes runtime world follows 64KB/Page memory allocation, while most single board computers supply only less than 2GB memory. On these boards, MMU will run in 4KB PT mode. To resolve the issue, go to MdePkg/Include/AArch64/ProcessorBind.h:


///
/// The stack alignment required for AARCH64
///
#define CPU_STACK_ALIGNMENT 16

///
/// Page allocation granularity for AARCH64
///
#define DEFAULT_PAGE_ALLOCATION_GRANULARITY (0x1000)

///
/// For the sake of our SBCs
///
#define RUNTIME_PAGE_ALLOCATION_GRANULARITY (0x1000)

Set runtime page allocation granularity to 0x1000 (4KB). If your board has memory larger than 2GB, you should not modify this value; instead, check your memory allocation. There’s another interesting case with 4KB/64KB page on Cortex A53.

ARM Erratum

I randomly hit crashes (synchronous exception) during my UEFI development. After some investigation, it seems that the problem is related to load/store commands. (See ARM Errata 835769, 843419) To prevent random crashes, add these two flags to your GCC compiler:

Multi-Processor Startup (PSCI)

For platforms that implement ARM PSCI, indicate PSCI support in ACPI FADT table:


EFI_ACPI_6_0_HW_REDUCED_ACPI | EFI_ACPI_6_0_LOW_POWER_S0_IDLE_CAPABLE, // UINT32 Flags
{
EFI_ACPI_6_0_EMBEDDED_CONTROLLER,
0,
0,
EFI_ACPI_6_0_DWORD,
0x009020B4
}, // EFI_ACPI_6_0_GENERIC_ADDRESS_STRUCTURE ResetReg
1, // UINT8 ResetValue
EFI_ACPI_6_0_ARM_PSCI_COMPLIANT, // UINT16 ArmBootArchFlags
EFI_ACPI_6_0_FIXED_ACPI_DESCRIPTION_TABLE_MINOR_REVISION, // UINT8 MinorRevision

Typically you don’t need HVC call for PSCI. If you did so (and your platform doesn’t support HVC call for PSCI), you will get a INTERNAL_POWER_ERROR bugcheck with first parameter of 0x0000BEEF.
If you indicates PSCI support, you don’t have to provide parking protocol version in your ACPI MADT table. Simply set it to 0. Here’s one example:


[02Ch 0044 1] Subtable Type : 0B [Generic Interrupt Controller]
[02Dh 0045 1] Length : 50
[02Eh 0046 2] Reserved : 0000
[030h 0048 4] CPU Interface Number : 00000000
[034h 0052 4] Processor UID : 00000000
[038h 0056 4] Flags (decoded below) : 00000001
Processor Enabled : 1
Performance Interrupt Trigger Mode : 0
Virtual GIC Interrupt Trigger Mode : 0
[03Ch 0060 4] Parking Protocol Version : 00000000
[040h 0064 4] Performance Interrupt : 00000017
[044h 0068 8] Parked Address : 0000000080301000
[04Ch 0076 8] Base Address : 0000000000000000
[054h 0084 8] Virtual GIC Base Address : 0000000000000000
[05Ch 0092 8] Hypervisor GIC Base Address : 0000000000000000
[064h 0100 4] Virtual GIC Interrupt : 00000000
[068h 0104 8] Redistributor Base Address : 0000000000000000
[070h 0112 8] ARM MPIDR : 0000000000000000
[078h 0120 1] Efficiency Class : 00
[079h 0121 3] Reserved : 000000

See ARM Juno reference platform to get some idea about crafting ACPI tables.

That’s it! Welcome to Windows 10 Userland.

Spend some nights writing Windows drivers. 😛

Windows 10 PE runs on a Spandragon 410 processor.
Windows 10 PE runs on a Spandragon 410 processor.

玩物丧志 · Yubikey 4 (智能卡登录)

Yubikey 是很早就有的东西了,然而受限于经济能力还有海关,直到最近有人提出来要团购我才买了个一个玩玩。我主要拿来做智能卡登录。

为什么呢?

虽然我用的电脑都装备有了 TPM ,但是这样还是有一些不方便的:

  • TPM 的数据基本上只留在这台电脑了(这也是优点)
  • 虽然我的电脑很轻,但是我不一定天天带出去,万一有事情需要连接到服务器上还是比较麻烦
  • 虚拟智能卡基本上一直在域账号里使用比较有效,然而我平时还是以登录个人账号为主(虽然电脑已经加入域)
 Yubikey 可以解决移动性的问题,而且出门不需要输入一个几十位复杂脑残的密码。
也许我哪天要玩玩 U2F 呢?这大概是以后的文章了。

准备智能卡证书

假定电脑已经加入域,域环境中已经部署了 Active Directory Certificate Services,并且 CS 服务器已经发布到 AD 中。

登录域账号,打开 Certificate Management,并 Request New Certificate。
一路往下,选择 Smart Card User / Smart Card  Logon 模板(我选的是前者)。
在 Private Key Configuration 下把私钥长度设置为 2048/4096,然后 CSP 不用动。
注:如果你选择了 Smart Card Crypto Service Provider,那么默认情况下在有 TPM 而且虚拟智能卡开启的情况下,会选择的是虚拟智能卡,在 Windows 证书选择里表示为“安全设备凭据”而不是“智能卡凭据”,但是功能类似。然后勾选允许导出私钥。
申请完证书,导出证书。
打开 Yubikey PIV Manager,并导入证书。
Screenshot of Yubikey PIV Manager
Screenshot of Yubikey PIV Manager

基本工作到此结束。下面来试试智能卡登录。

如果是要给 Active Directory Federation Services 打开智能卡登录支持呢?

很简单。进入 AD CS 管理界面,找到认证选项,在对应的 Intranet / External 认证模式里勾选证书登录,AD FS Web Proxy 启用49443端口即可。

Smartcard and ADFS
Smartcard and ADFS

效果大概如图所示。

未完待续