I have been noticing some hiccups in my Ugoos AM6B and had been following it very closely. I saw that the linux scheduler was moving the process around from CPU#0 in cluster 0 to CPU#X in cluster 1. I got deep into the weeds and found that when using lscpu it doesn’t show any cache information
I read ARM documentation, S922X datasheet and arch/arm64/include/xxxx files to understand how cache detection happens. I was seeing VPTI cache detection on all cores 0-5 but then dmesg shows an error “no cache hierarchy found for cpu 0”.
Anyway, enough of the backstory. I wondered if there are programs running in CoreElec that would benefit from a better scheduling job of linux scheduler. I am still searching for an answer to this.
Regardless of benefit or not, I created a new dtb file and lo-and-behold now linux kernel can do runtime detection of cache
CoreELEC:~ £ lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 6
On-line CPU(s) list: 0-5
Vendor ID: ARM
Model name: Cortex-A53
Model: 4
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
Stepping: r0p4
CPU(s) scaling MHz: 100%
CPU max MHz: 1800.0001
CPU min MHz: 500.0000
BogoMIPS: 48.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32
Model name: Cortex-A73
Model: 2
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: r0p2
CPU(s) scaling MHz: 100%
CPU max MHz: 2208.0000
CPU min MHz: 500.0000
BogoMIPS: 48.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32
Caches (sum of all):
L1d: 192 KiB (6 instances)
L1i: 320 KiB (6 instances)
L2: 1.3 MiB (2 instances)
CoreELEC:~ £ lscpu --output-all -e
BOGOMIPS CPU CORE SOCKET CLUSTER NODE BOOK DRAWER L1d:L1i:L2 POLARIZATION ADDRESS CONFIGURED ONLINE MHZ SCALMHZ% MAXMHZ MINMHZ MODELNAME
48.00 0 0 0 - - - - 0:0:0 - - - yes 1800.0001 100% 1800.0001 500.0000 Cortex-A53
48.00 1 1 0 - - - - 1:1:0 - - - yes 1800.0001 100% 1800.0001 500.0000 Cortex-A53
48.00 2 0 0 - - - - 2:2:1 - - - yes 2208.0000 100% 2208.0000 500.0000 Cortex-A73
48.00 3 1 0 - - - - 3:3:1 - - - yes 2208.0000 100% 2208.0000 500.0000 Cortex-A73
48.00 4 2 0 - - - - 4:4:1 - - - yes 2208.0000 100% 2208.0000 500.0000 Cortex-A73
48.00 5 3 0 - - - - 5:5:1 - - - yes 2208.0000 100% 2208.0000 500.0000 Cortex-A73
Now I see the linux scheduler doesn’t jump the process from cluster 0 (A53) to cluster 1 (A73) anymore.
ANECDOTE (not a data backed experiment) , now I see that av1 decoding is better than before. I still cannot decode 4k smoothly (but it drops fewer frames).
Data: When running cpu intensive task before device tree change CPU#0 would reach 89% usage and CPU#4/5 would reach 60%. AFTER device tree change CPU#0 reaches 35% usage and CPU# 4/5 are 80+%.
I see that linux scheduler is working better by identifying that L2 cache between A53 and A73 is not shared. All cache within A53 and A73 is shared.
I am attaching two files here for testing. The file that ends in ALL has the final complete form. The file ending in A53 defines 256 KB cache for all 6 cores and you can see the scheduler bumps the process out of cluster 0 because it assumes that cache is shared between all 6 processor.
I am looking for this community’s opinion on whether this is useful (for e.g. in gaming/ software decoding).
g12b_s922x_ugoos_am6b_all.dtb (74.1 KB)
g12b_s922x_ugoos_am6b-A53-Cache.dtb (74.0 KB)
cores.
Attachment removed! It will be available in nightly build!