I got a Nintendo Switch from my friend (for a research project). Meanwhile, I enjoyed the game “Overcooked” on Switch. In this game, you control cooks to perform variety tasks and then deliver orders in time. If orders are delivered in advance, some tip will be given. 1-4 players can play the game simultaneously.
It’s clear that you have to do everything as quick as possible to achieve high score in the game. Every task (e.g. cutting meat) need some time to complete. Certain task (e.g. frying) depends on other tasks. To eliminate unnecessary time cost (i.e. waiting for cutting to complete), I use the following strategies:
Minimize workers’ stall time (doing nothing). For example, it is not necessary for workers to wait for frying process (polling is not efficient). Like interrupts in modern machines, they can do something else like washing dishes and cutting meats while waiting for cooking. Once interrupt signals (frying completes), they enter interrupt servicing routine: get the food put into a plate. In most cases, the food is ready to serve then. Finally, they returned to what they were doing.
Again, make sure everything is doing something. This is especially important if you are playing with your friends. You had better analyze the dependency chain and discuss strategies with your friend before starting the game. Of course you should issue instructions to your friend during game if necessary.
Not all kitchens are easy to deal with. Some have dynamic arrangements – contents may change their location during the game session. Some kitchens have no constant light source. Other have isolated workspaces with conveyor belts or tables for swapping materials (I call it a “bus”).
Conveyor belts are high-latency bus, but they have relatively high bandwidth (Hey DDR4, I am looking at you). In some kitchen scenarios (e.g. making burgers), you can put everything on the belt in batches and fetch in batches too.
Some conveyor belts connect to trash can, which means materials must be fetched before the expiration. But some cook utilities will appear again if you put them into the trash can. In this way, you can prioritize the transfer of contents on the conveyor belt.
Try achieve full-duplex transfer and prefetch to save time. Consider the following scenario: you have a pot that cooks rice at once side, and food materials (rice and flour tortilla) on the other side. For the first time, you get rice and put them into pot. Once rice finishes, you carry cooked rice to the other side and wrap them with tortilla. Don’t get tortilla separately in another transfer. If you really have to do that, you can instruct other cooks (if exist) to prefetch some for you.
Prefetch might not work for all kitchens. In the case of cooking soup, mice will steal your food if it is unattended for a while. But you can secure processed food in pots so it won’t get stolen.
Get familiar with your kitchen and good luck! (Well, it is a bit boring if you have learned Machine Architecture and Operating System internals).
Nights before trips are always boring, and I decided to draft some words to spend the time. So we have Windows 10 on ARM running on Dragonboard 410c, and Lumia 950 XL (Article in Chinese, sorry). It will be helpful to write down some firmware-related information for platform bring-ups for further reference. Meanwhile, the comparison of Little Kernel, the common Linux Android (well, Qualcomm says so) bootloader will provide useful information for Android on Lumia project.
Compared to Linux, Windows Kernel assumes its platform firmware and bootloader (aka. Windows Boot Manager) prepare the basic environment for successful kernel initializations. If certain components are not initialized, bugchecks may occur. Even the system successfully launches, it may have some unexpected behaviors (weird things). An official document explains these a lot.
Little Kernel initializes basic hardware too (at least you need serial for debugging). Certain periapical, including clocks, regulators, and USB are initialized too for application purposes (e.g. Fastboot). Anyway, it initializes less periapical as possible. Sometimes even the panel is not brought up (I’ve seen a case on Android phone).
In short, you have to do more for a successful Windows bring-up:
If you know certain components are in the usable state already, skip initialization procedures. For example, on Lumia 950 XL, our UEFI implementation does not need to initialize USB since our bootstrapper (Qualcomm UEFI) did so.
If your platform has PCIe components, clocks them up, set regulators and mappings, etc.
Initialize at least one debug resource described in your DBG2 table (if applicable, likely on all ARM platforms)
Bring up the panel, set basic display parameters and pass a framebuffer pointer for Windows.
So how about Linux? If your Linux platform uses DT instead of ACPI, you are likely not required to do most of the stuff Windows requires. On Qualcomm platforms, Linux kernel will clock up PCIe cores, set regulators and mappings to make it in the usable state. If your platform uses standard ACPI and platform drivers do not perform additional initialization procedures, initialize these components in firmware.
Fill the hole
Both UEFI w/ ACPI and LK will perform fix-up tasks before transferring control to the kernel. On Qualcomm platforms, chipset metadata (revision, foundry ID, etc.) will be filled in DSDT. Certain logic in DSDT depends on them. Typical Linux Android device will ship with a large DT for multiple variants. LK selects the best fit using chipset ID/PMIC ID/board ID, then fill in some memory region information for kernel use.
ACPI tables in the firmware for Windows 10 on ARM is pre-patched. So I don’t implement the fix-up logic additionally.
Multi-processor Startup, Again
Why am I discussing the thing again? Because it is important.
Little Kernel (and likely other Linux Android bootloaders) will only use a single processor in its lifecycle (a notable exception is Raspberry Pi, which uses spin table except 3+). When it transfers control to Linux, Linux will bring other cores out of reset state and make them available for use.
Windows platforms that implement ACPI Multi-Processor Parking Protocol behaves differently. Although firmware uses a single core, other CPU cores are brought out of the reset state and being instructed to run a special piece of code. The code flow is like this:
Wait for an interrupt.
Am I the processor being waked up?
If yes, go to the address that OS told me
If not, go back to parking.
(Interrupt acknowledgment and memory barriers ignored. Sorry, I don’t want to write assembly at 11 PM.)
Because different platforms handle core startup differently (on Qualcomm platforms, TrustZone has participated), booting Linux Kernel and starts cores the Linux way with a UEFI firmware that implements this protocol may fail. Someone told me he was unable to bring up other three cores on 640. It is reasonable since LK on recent Lumia phones is launched via a special UEFI application in Windows Boot Application form. Qualcomm UEFI put the other three cores in running state (and WFI). Both LK and Linux are not aware of that (they have the assumption of core state). Finally, core startup fails.
Since it is not possible to ditch Qualcomm UEFI (unlike the exploit for first-generation Lumia WP8 devices), we have to comfort the parking protocol in AArch32 mode (You have PSCI for AArch64 SoCs):
Ignore other cores Unicore is the best
Implement parking protocol for unsupported systems (not too hard). Linux has the protocol support; you have to enable it.
Go AArch64 and use PSCI (remember to use HVC mode for 8992/8994)
Good night (And to my girlfriend: If you see this article, sorry that I say “Good Night” too early.)
Windows on ARM is not a new topic. There are some guys attempted to bring up Windows RT and Windows 10 on Qemu (ARM/AArch64 target). It even runs on Raspberry Pi 3. Obviously it is not a Snapdragon 835-only thing. We can give it a hand on our own Single Board Computers.
This article covers some important details in Dragonboard 410c SBC’s aa64 UEFI implementation.
Windows Boot Requirements
Bootstrapping your own EDK2/TianoCore UEFI
Memory Allocation / Memory Management Unit
UEFI Flash Definition
First-stage Bootloader (Little Kernel)
Persistent NVRAM Support
A “Working” RTC
Multi-processor startup (PSCI)
Windows Boot Requirements (AArch64)
AArch64 architecture processor. It seems that AArch64 cryptography extension is required too (Raspberry Pi 3 randomly throws UNSUPPORTED_PROCESSOR bugcheck, rs4 fixed the issue). The bugcheck is raised in Errata Check (a hardcoded ID check).
A working interrupt controller. Most AArch64 SoC cores include ARM GIC, so there’s little work to do here. The only exception I know is BCM2837. Windows has inbox Broadcom interrupt controller support (for the sake of Raspberry Pi). But if your SoC has additional third party interrupt controller, you need to supply your own HAL extension library. There is few documentation for this available though…
A working processor timer. If not, supply your own HAL extension library.
These requirements are fairly similar to ARM SBBR certification requirements. If your SBC has a working EDK2/TianoCore UEFI, then you are probably good to go. Bootstrapping your own EDK2 is pretty easy too.
Bootstrapping your own EDK2/TianoCore
The board I used (DragonBoard 410c) doesn’t have a known EDK2/TianoCore implementation. So I have to build my own. This repository for Raspberry Pi 3 is a good start point and reference for you.
You need to do these things in UEFI:
Initialize serial output (for debugging) and Memory Management Unit (MMU). Refer to your platform datasheet for device memory address allocation.
Retrieve required information from pre-UEFI environment and build Hand-off Blocks (HOB) for DXE phase
Initialize processor (exception vector, etc.) in DXE phase.
Initialize UEFI services (variable services) in DXE phase.
Jump to BDS phase, start Windows Boot Manager or something else.
Memory Allocation / Memory Management Unit
Memory allocation is a platform-specific thing. Check your platform HRD to get some idea about MMU and memory allocation. For Snapdragon 410, check out Qualcomm LM80-P0436-13.
UEFI Flash Definition
Our UEFI FD starts at 0x80200000. Update your tokens in platform definition and flash definition:
And the first piece code should be your SEC initialization code (without relocation).
Little Kernel (mentioned below) will be responsible for jumping into UEFI FD at 0x80200000 and handing off execution. If you want, you can actually removes Android-specific header and device tree validation in LK (apps/aboot.c).
First-stage bootloader (Little Kernel)
DragonBoard 410c uses ARM Secure Monitor Call to switch to AArch64 mode (See Qualcomm LM80-P0436-1 for more information). The stock close-sourced SBL doesn not recognize AArch64 ELF files (later model should). LK performs basic platform initialization (UART, eMMC, MMU, etc.) A modified variant LK also initializes FrameBuffer for U-Boot. We can make it work for our UEFI too.
Windows requires UEFI provide a BGRA FrameBuffer. To achieve this, we need to modify pixel unpack pattern in platform/msm_shared/mdp5.c:
/* Windows requires a BGRA FB */
writel(0x000236FF, pipe_base + PIPE_SSPP_SRC_FORMAT);
writel(0x03020001, pipe_base + PIPE_SSPP_SRC_UNPACK_PATTERN);
You can either specify a hard-coded address for FrameBuffer, or have a random piece of memory block to transfer information (pixel format, width, height, etc.) to UEFI. UEFI SEC phase retrieve the information, allocate HOB block and transfer information to DXE phase. A simple FrameBuffer driver retrieve information from HOB block, initializes UEFI Graphics Output Protocol. For optimal performance, initialize this piece of memory block as write-through cache memory in MMU initialization.
Persistent NVRAM Support
For persistent NVRAM support, it’s a good idea to use eMMC as storage device. This implementation demonstrates how to simulate NVRAM using eMMC and a piece of memory. I slightly modified it make it work for Qualcomm devices:
If eMMC NVRAM region is corrupted or uninitialized, provision it and perform a platform warm reset so I don’t get a synchronous exception in volatile variable initialization phase.
Modify dependency relationship to prevent “device not found” error in BlockRamVariable DXE initialization.
Windows Boot Manager depends on a “working” Real Time Clock for miscellaneous purposes. APQ8016/MSM8916 has a RTC on its PMIC processor PM8916. To access RTC services, read/write SPMI registers (see Qualcomm LM80-P0436-36). If you are lazy, just use Xen fake RTC in ArmVirtPkg.
To enable PM8916 RTC, set SPMI register 0x6046 to enabled state, then read 0x6048 and three following bits.
Note: I implemented my own PMIC protocol called PM8916Protocol that read/writes PMIC register on SPMI bus, slave #0. This RTC library is based on Xen face RTC library from ArmVirtPkg.
4KB / 64KB Page Table
For most single board computers, you will probably hit issues in ExitBootServices. EDK2 assumes runtime world follows 64KB/Page memory allocation, while most single board computers supply only less than 2GB memory. On these boards, MMU will run in 4KB PT mode. To resolve the issue, go to MdePkg/Include/AArch64/ProcessorBind.h:
/// The stack alignment required for AARCH64
#define CPU_STACK_ALIGNMENT 16
I randomly hit crashes (synchronous exception) during my UEFI development. After some investigation, it seems that the problem is related to load/store commands. (See ARM Errata 835769, 843419) To prevent random crashes, add these two flags to your GCC compiler:
Multi-Processor Startup (PSCI)
For platforms that implement ARM PSCI, indicate PSCI support in ACPI FADT table:
Typically you don’t need HVC call for PSCI. If you did so (and your platform doesn’t support HVC call for PSCI), you will get a INTERNAL_POWER_ERROR bugcheck with first parameter of 0x0000BEEF.
If you indicates PSCI support, you don’t have to provide parking protocol version in your ACPI MADT table. Simply set it to 0. Here’s one example:
First of all, Windows “Gatekeeper” doesn’t block the execution of applications that don’t require installation. I tried to run PuTTY, a popular tool on Windows and it works.
Secondly, Windows “Gatekeeper” is based on Microsoft SmartScreen, which means disabling SmartScreen will turn it off too. Prior to application execution, SmartScreen will send file hash and publisher information(including certificate thumbprint) to Microsoft’s server, then SmartScreen server send back metadata including application reputation. Response is signed with a specific key that will be checked in client side for message integrity.
Unlike macOS, attempt to start application from console(e.g. Command Prompt and PowerShell) will trigger “Gatekeeper”.
The window is web-based. Although you can’t modify the response directly(no one wants to deal with sha256RSA unless the key leaks), you can attach a debugger to have some fun with it.
Microsoft claims that this feature is opt-in for most Windows SKUs (except Windows 10 Cloud AFAIK), and it is not revalent to UMCI (User-mode Code Integrity), which is enforced in Windows 10 Cloud.
Someone asked me if I could extract some images from a popular Chinese mobile game. I accepted the challenge, but things were far more complicated than I expected.
What I knew
This game is Unity3D-based.
Original assets were encrypted with known algorithm and key. DISCLAIMER: I will not tell you details about encryption.
The story began
I thought I could extract assets I needed with existing tools (e.g. Disunity) but I was proved wrong. Disunity has been refactored, and remaining work is still in progress (at least the moment I write this article). Since resource extraction has not been implemented at this moment, Disunity couldn’t be my choice.
Then I turned to a tool called Unity Assets Bundle Extractor. It did a great job extracting resources I needed graphically. However, acquiring thousands of texture assets from 2000+ isolated files is not an easy job. I tried the command line support but failed (maybe I was too stupid).
Luckily this toolkit provides some API and documentation. Since it was compiled with Microsoft Visual C++ 2010, I was unable to use it directly(C++ ABI constantly changes with every MSVC release). And I was too lazy to write a C wrapper for P/Invoke. But these C++ header files point to a perfect solution – parse file and implement my own UnityFS parser/reader.
Special thank to the UABE project – without these generous header, I would not be able to implement my own parsing and compose this article.
UnityFS was a new asset bundle format introduced in Unity 5. I am not a Unity3D developer, and I absolutely didn’t know why Unity introduce a new bundle format. But anyway, let’s analyze it.
Things you need to know
UnityFS is just bundle of several Unity assets. Each asset contains a collection of serialized Unity objects (e.g. 2D texture, text resources, scene objects, etc.).
UnityFS follows a standard Unity file header structure. Let’s call it AssetsBundleHeader06
You have to parse asset files in order to extract what you need. There’s bunch of documentation about this. Look into the old Disunity source code for some idea.
So the header goes like this. There’s a DWORD flags data that matters – it contains some critical information required for decompression and directory parsing. The rule goes like this:
(Flags & 0x3F) is compression mode. 0 means no compression, 1 means LZMA and 2/3 means LZ4/LZ4HC.
(Flags & 0x40) says whether the bundle has directory info.
(Flags & 0x80) says whether the block and directory list is at the end of this bundle file.
C# provides a good BinaryReader that makes things a bit easy. But it can be improved for better Null-terminated String and Big Endian support. Be careful with endianness. Unity utilizes both Big Endian and Little Endian in a single file and personally I didn’t get this. For the sake of convenience, I extended the original BinaryReader for these support. Length of each data type matters – but that’s a basic stuff for CS students.
UnityFS uses optional block-based compression for streaming (you can read a specific bundle without downloading the whole file). Both LZMA and LZ4* (LZ4Hc, etc.) are supported. The Unity’s proprietary parser and Disunity respects this design. But I just wanted these bundle files, so I decided to read all blocks at once and decompress into a single memory stream.
You can implement your own block-based reader – but my time budget didn’t allow me to do this.
There we go…block and file information!
Following a unknown 16 bytes block, there’s a Big-Endian UInt32 value represents block count in a single package. Each block information contains a Big-Endian UInt32 decompressed size, a Big-Endian UInt32 compressed size and a flag that we might not interested in.
Then a BE UInt32 value represents file count in a single package. Each file information contains file offset we need(BE UInt64), the decompressed size(BE UInt64), a BE UInt32 flag and a Null-Terminated string of file name.
Parse your assets now
With sufficient information we retrieved, we were able to extract raw asset files from a UnityFS bundle. Then what you need is search the Internet for ideas of extracting objects(text resources, 2D texture, etc.) from Unity assets. Good luck on exploring!
In this article, we discussed structure and parsing of UnityFS resource bundle file. For more information about UnityFS and Unity asset files, please research these projects I mentioned in this article.
We have to specify routing table’s priority on client server(not IPsec Access Server) since we wants to specify routing table manually: –with-routing-table-prio=32800. Also, TAP/TUN device is enabled instead of StrongSwan’s own kernel module. It will simplify the configuration later.
Then make and install it.
Configure IPsec Access Server
Just a reminder, if you have any issues about IP range, please refer to the demo topology graph. The image is unavailable right now for some reason. I’ll fix it ASAP.
Go to /usr/local/etc. Edit ipsec.secrets:
: PSK "<IPsec PRE SHARED KEY, PLEASE REMEMBER TO REPLACE IT WITH YOU OWN KEY>"
s2s : XAUTH "<ANY PASSWORD YOU WANT, PLEASE REMEMBER TO REPLACE IT WITH YOU OWN PASSWORD>"
load_modular = yes
duplicheck.enable = no
compress = yes
# In China, please consider about replacing 22.214.171.124/126.96.36.199 to 188.8.131.52. They do offer correct DNS query results outside mainland China, as long as you have configured the Chinese routing exception for it(route to non-mainland China outbound server)
dns1 = 184.108.40.206
dns2 = 220.127.116.11
nbns1 = 18.104.22.168
nbns2 = 22.214.171.124
Turn on IPv4 forwarding in sysctl.conf.
Edit iptables. The following configuration is for Ubuntu 14.04 LTS. For other Linux distro, check out its documentation.
iptables -A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -s 126.96.36.199/24 -j ACCEPT
iptables -A FORWARD -s 188.8.131.52/24 -j ACCEPT
iptables -A FORWARD -s 184.108.40.206/24 -j ACCEPT
iptables -A INPUT -i eth0 -p esp -j ACCEPT
iptables -A INPUT -i eth0 -p udp --dport 500 -j ACCEPT
iptables -A INPUT -i eth0 -p tcp --dport 500 -j ACCEPT
iptables -A INPUT -i eth0 -p udp --dport 4500 -j ACCEPT
iptables -A INPUT -i eth0 -p udp --dport 1701 -j ACCEPT
iptables -A INPUT -i eth0 -p tcp --dport 1723 -j ACCEPT
iptables -A FORWARD -j REJECT
iptables -t nat -A POSTROUTING -s 220.127.116.11/24 -o eth0 -j MASQUERADE
iptables -t nat -A POSTROUTING -s 18.104.22.168/24 -o eth0 -j MASQUERADE
iptables -t nat -A POSTROUTING -s 22.214.171.124/24 -o eth0 -j MASQUERADE
iptables-save > /etc/iptables.rules
cat > /etc/network/if-up.d/iptables<<EOF
iptables-restore < /etc/iptables.rules
chmod +x /etc/network/if-up.d/iptables
Configure IPsec client
Suppose you have the correct StrongSwan with TAP/TUN and routing table priority installed.
Go to /usr/local/etc, edit ipsec.secrets, just put what you have in the previous step.
Go to /usr/local/etc, edit ipsec.conf:
# Enable this will cause authentication failure
left=<eth0's IP address>
# Ask your server
right=<Your IPsec VPN Server's public IP address>
# For Microsoft Azure and other service providers who use SNAT, specify that to prevent IKE_SA failure
On your IPsec VPN server, type sudo ipsec start .
On your client server, type:
sudo ipsec start
sudo ipsec up s2sbj1
It should get connected shortly. Go to ifconfig and you should find a new network adapter called ipsec0.
Configure rule-based routing
Create a new routing table.
user@ibntwkstgepbj1:~$ sudo vim /etc/iproute2/rt_tables
Get the routing configuration in table 220 (IPsec table).
user@ibntwkstgepdm1:~$ sudo ip route list table 220
default dev ipsec0 proto static src 126.96.36.199
188.8.131.52 via 10.0.0.1 dev eth0 proto static src 10.0.0.4
Specify the default route for this table (copy it from 220):
user@ibntwkstgepdm1:~$ sudo ip route add default dev ipsec0 proto static src 184.108.40.206 table bj1s2s
(You don’t have to copy the second line I guess, but I added that)
Add IP rules.
user@ibntwkstgepdm1:~$ ip rule add from <IP Range> table bj1s2s
Refresh routing table.
user@ibntwkstgepdm1~$ ip route flush cache
Compose a shell script if you want to compelte that automatically for every reconnection.
We implemented a simple line optimization using rule-based routing in this example. For application-based service, you are all set and ready to go. For VPN access services, configuration for iptables is needed in order to tag data packets and route them correctly. Check out this article for more details.
I didn’t offer a auto routing script in this example. I strongly recommend you to write it since it saves your time by configuring routing table automatically.
For multiple IPsec connections and routings, just specify the source IP, which is 220.127.116.11 and 18.104.22.168 in routing tables. They use the same adapter.
Windows Audio Session API (WASAPI) was first introduced in Windows Vista. It offers advanced audio control and playback features for Windows Apps. Since it mitigates SRC issue in some aspects, WASAPI Exclusive Mode gains its popularity among music lovers. Windows Runtime supports a small subset of WASAPI APIs, including WASAPI Exclusive mode (in Windows 10).
However, here is one thing you should know: Windows 10 Mobile doesn’t support WASAPI Exclusive mode (by design). As far as I know, only desktop platform is supported yet. Luckily, modern Windows Phone devices can choose the best format for shared mode using input wave’s format, so you don’t have to worry that (at least on Lumia 950 and Lumia 950 XL).
There’s no IMMDevice available in Windows Runtime. Try to create it via CLSID & IID will throw HRESULT Class Not Registered. It will cause some trouble getting all natively supported formats for Exclusive mode. A possible solution is using Properties property in DeviceInformation class in Windows Runtime instead of IMMDevice class & OpenPropertyStore method. Then, query the format using IsFormatSupported in IAudioClient2. Remember that in Exclusive Mode, system won’t return the best-fit wave format in this method, so you have to try all formats and select the best wave format by yourself. In my sample, I specified 44.1kHz / 16Bit / WaveFormat = 0x1, which is supported by my Surface Pro’s audio subsystem.
Initialize Audio Client
Then you can initialize IAudioClient2 in exclusive mode. Simply pass AUDCLNT_SHAREMODE_EXCLUSIVE in, specify Buffer Length and Wave Format. You may get some HRESULTs like AUDCLNT_E_BUFFER_SIZE_NOT_ALIGNED or AUDCLNT_E_BUFFER_SIZE_ERROR. Just check out this page and find solutions.
Like HW-Offload mode, you don’t have to calculate available frames by yourself In event-based playback mode. The value for padding frames is exactly the same as available frames. Then return audio samples as what you do in shared mode.
It should supports Windows 8.1 desktop, but I haven’t test it on Windows 8.1 since I don’t have a Windows 8.1 desktop device.
I often noticed weird noise during playback when the system average load is high. I believe the root cause is process priority.
Windows 10 removed Background Audio category in WASAPI headers. In order to implement background playback, a customized out-of-process COM server or Media Foundation extension is required.
I don’t offer a demo program here, because it’s pretty easy to adapt the official sample to WASAPI Exclusive mode. 🙂