Use script from here, reported working here
Report back about results & issues
Optimization for S922 is complete right? Ie, no more potential speed improvements with DTB modifications.
Or still ongoing discovery to improve performance even more? I didnāt follow recent conversation but seems to have shifted to s905x4.
ok i apply script in autostart.sh
CoreELEC:~ # lscpu
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: ARM
Model name: Cortex-A55
Model: 0
Thread(s) per core: 1
Core(s) per cluster: 4
Socket(s): -
Cluster(s): 1
Stepping: r2p0
CPU(s) scaling MHz: 100%
CPU max MHz: 2004.0000
CPU min MHz: 100.0000
BogoMIPS: 48.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Spec store bypass: Not affected
Spectre v1: Mitigation; __user pointer sanitization
Spectre v2: Not affected
Srbds: Not affected
Tsx async abort: Not affected
CoreELEC:~ # lscpu --output-all -e
BOGOMIPS CPU CORE SOCKET CLUSTER NODE BOOK DRAWER CACHE POLARIZATION ADDRESS CONFIGURED ONLINE MHZ SCALMHZ% MAXMHZ MINMHZ MODELNAME
48.00 0 0 0 0 - - - - - - - yes 2004.0000 100% 2004.0000 100.0000 Cortex-A55
48.00 1 1 0 0 - - - - - - - yes 2004.0000 100% 2004.0000 100.0000 Cortex-A55
48.00 2 2 0 0 - - - - - - - yes 2004.0000 100% 2004.0000 100.0000 Cortex-A55
48.00 3 3 0 0 - - - - - - - yes 2004.0000 100% 2004.0000 100.0000 Cortex-A55
CoreELEC:~ #
Then you did wrong. Script is one-time execution, not autostartā¦ After execution, needs reboot to get effective.
ok, after apply Script & reboot:
CoreELEC:~ # lscpu
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: ARM
Model name: Cortex-A55
Model: 0
Thread(s) per core: 1
Core(s) per cluster: 4
Socket(s): -
Cluster(s): 1
Stepping: r2p0
CPU(s) scaling MHz: 100%
CPU max MHz: 2004.0000
CPU min MHz: 100.0000
BogoMIPS: 48.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
Caches (sum of all):
L1d: 128 KiB (4 instances)
L1i: 128 KiB (4 instances)
L2: 256 KiB (4 instances)
L3: 500 KiB (1 instance)
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Spec store bypass: Not affected
Spectre v1: Mitigation; __user pointer sanitization
Spectre v2: Not affected
Srbds: Not affected
Tsx async abort: Not affected
CoreELEC:~ # lscpu --output-all -e
BOGOMIPS CPU CORE SOCKET CLUSTER NODE BOOK DRAWER L1d:L1i:L2:L3 POLARIZATION ADDRESS CONFIGURED ONLINE MHZ SCALMHZ% MAXMHZ MINMHZ MODELNAME
48.00 0 0 0 0 - - - 0:0:0:0 - - - yes 2004.0000 100% 2004.0000 100.0000 Cortex-A55
48.00 1 1 0 0 - - - 1:1:1:0 - - - yes 2004.0000 100% 2004.0000 100.0000 Cortex-A55
48.00 2 2 0 0 - - - 2:2:2:0 - - - yes 2004.0000 100% 2004.0000 100.0000 Cortex-A55
48.00 3 3 0 0 - - - 3:3:3:0 - - - yes 2004.0000 100% 2004.0000 100.0000 Cortex-A55
CoreELEC:~ # lscpu --caches
NAME ONE-SIZE ALL-SIZE WAYS TYPE LEVEL SETS PHY-LINE COHERENCY-SIZE
L1d 32K 128K 16 Data 1 32 64
L1i 32K 128K 16 Instruction 1 32 64
L2 64K 256K 2 Unified 2 512 64
L3 500K 500K 15 Unified 3 512 64
CoreELEC:~ #
You are the first reporter who has an L3 cache that has 15 ways instead of 16. Your L2 is 2 ways instead of 4.
What CPU do you have?
You confirmed my findings as well that cache size does not matter, it can be wrongā¦ important to get the hierarchy right. Especially important if cache is NOT shared like big. little and 905X4 L2. Kernel pins some instructions and data in a stale process to higher levels of cache. Kernel might use the information cache is not shared and push it to L3 if it wants to switch between cores thereby reducing the penalty of process bump in non shared cache processor.
There is a benchmark written in Fortran/C you can run. It does a 100 million iterations to average out values.
Iāll let the admins speak to simplification and how to implement it. I saw one post below that had 15 ways of cache. Thatās not a number divisible by 2. Who knows whatās happening there.
Btw separately I think you may be IO bound on your Cube. The CPU might be spending a lot of time in waiting for reads and writes. I installed iotop to check and it seems at beginning of every video it still reads/writes at 10-20 KB/s.
Edit cache 15 ways is a kernel NE vs NG difference. Havenāt looked into it further
Amlogic S905X4-K
A55
Device: Dune HD Homatics
Maybe thatās just the way it is for that box. Youāll have to find someone with the same device and compare. All other 905X4 have different values.
Interesting results, different cache strategy (sets/ways), but looks OK.
What triggers me is L3 500k. Maybe sth reserved 12k? Or wrong detection? Or some chip yield improvement by AML?
But cache strategy fits detected 500k (64bytes x 512 sets x 15 ways = 491520 bytes)
@Zuma Which kernel you are running?
I was running on mine yesterday with ng, today I tried with ne on my Homatics Box R 4K Plus S905X4. I am getting same result as you.
CoreELEC:~ # lscpu --caches
NAME ONE-SIZE ALL-SIZE WAYS TYPE LEVEL SETS PHY-LINE COHERENCY-SIZE
L1d 32K 128K 16 Data 1 32 64
L1i 32K 128K 16 Instruction 1 32 64
L2 64K 256K 2 Unified 2 512 64
L3 500K 500K 15 Unified 3 512 64
@rho-bot Is there a place in /sys/devices or elsewhere that I can read the DDR frequency from for Amlogic SOCs?
ok after installing CPMās latest build, am6b+
CoreELEC:~ # lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 6
On-line CPU(s) list: 0-5
Vendor ID: ARM
Model name: Cortex-A53
Model: 4
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
Stepping: r0p4
CPU(s) scaling MHz: 100%
CPU max MHz: 1800.0000
CPU min MHz: 500.0000
BogoMIPS: 48.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32
Model name: Cortex-A73
Model: 2
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: r0p2
CPU(s) scaling MHz: 100%
CPU max MHz: 2208.0000
CPU min MHz: 500.0000
BogoMIPS: 48.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32
Caches (sum of all):
L1d: 192 KiB (6 instances)
L1i: 320 KiB (6 instances)
L2: 1.3 MiB (2 instances)
CoreELEC:~ # lscpu --caches
NAME ONE-SIZE ALL-SIZE WAYS TYPE LEVEL SETS PHY-LINE COHERENCY-SIZE
L1d 32K 192K 4 Data 1 128 64
L1i 32K 320K 2 Instruction 1 256 64
L2 256K 1.3M 16 Unified 2 256 64
CoreELEC:~ # lscpu --caches
NAME ONE-SIZE ALL-SIZE WAYS TYPE LEVEL SETS PHY-LINE COHERENCY-SIZE
L1d 32K 192K 4 Data 1 128 64
L1i 32K 320K 2 Instruction 1 256 64
L2 256K 1.3M 16 Unified 2 256 64
CoreELEC:~ # ^C
CoreELEC:~ # lscpu --output-all -e
BOGOMIPS CPU CORE SOCKET CLUSTER NODE BOOK DRAWER L1d:L1i:L2 POLARIZATION ADDRESS CONFIGURED ONLINE MHZ SCALMHZ% MAXMHZ MINMHZ MODELNAME
48.00 0 0 0 - - - - 0:0:0 - - - yes 1800.0000 100% 1800.0000 500.0000 Cortex-A53
48.00 1 1 0 - - - - 1:1:0 - - - yes 1800.0000 100% 1800.0000 500.0000 Cortex-A53
48.00 2 0 0 - - - - 2:2:1 - - - yes 2208.0000 100% 2208.0000 500.0000 Cortex-A73
48.00 3 1 0 - - - - 3:3:1 - - - yes 2208.0000 100% 2208.0000 500.0000 Cortex-A73
48.00 4 2 0 - - - - 4:4:1 - - - yes 2208.0000 100% 2208.0000 500.0000 Cortex-A73
48.00 5 3 0 - - - - 5:5:1 - - - yes 2208.0000 100% 2208.0000 500.0000 Cortex-A73
/sys/class/aml_ddr might be what your looking for.
My box says the same.
Is the ram really running at 752 MHz?
Ugoos says 3733 Mbps, must be one of these samsung chips:
Changing L3 size to 0x80000 fixes the 500K and 15 ways issue.
CoreELEC:~ # lscpu --caches
NAME ONE-SIZE ALL-SIZE WAYS TYPE LEVEL SETS PHY-LINE COHERENCY-SIZE
L1d 32K 128K 16 Data 1 32 64
L1i 32K 128K 16 Instruction 1 32 64
L2 64K 256K 2 Unified 2 512 64
L3 512K 512K 16 Unified 3 512 64
I ran some memcpy benchmarks from lmbech (entware).
/sys/class/aml_ddr # cat usage_stat
MAX bandwidth: 9754163 KB/s, usage: 83.01%, tick:305010215 us
AVG bandwidth: 470363 KB/s, usage: 3.99%, samples:3599
A53 integrated memory controller maxes out at 1600 MT/s which is 800 MHz in DDR mode. The ram chip may be higher but the processorās memory controller doesnāt go that high.
Homatics box R 4K plus silver, after modifyed the dtb.imgās:
NE nightly:
/flash$ lscpu --caches
NAME ONE-SIZE ALL-SIZE WAYS TYPE LEVEL SETS PHY-LINE COHERENCY-SIZE
L1d 32K 128K 16 Data 1 32 64
L1i 32K 128K 16 Instruction 1 32 64
L2 64K 256K 2 Unified 2 512 64
L3 512K 512K 16 Unified 3 512 64
NG nightly:
/flash$ lscpu --caches
NAME ONE-SIZE ALL-SIZE WAYS TYPE LEVEL SETS PHY-LINE COHERENCY-SIZE
L1d 32K 128K 4 Data 1 128 64
L1i 32K 128K 4 Instruction 1 128 64
L2 64K 256K 4 Unified 2 256 64
L3 512K 512K 16 Unified 3 512 64
Currently testingā¦