S922X Ugoos AM6B Device Tree - Performance/Efficiency - Testing Needed

Use script from here, reported working here
Report back about results & issues

1 Like

Optimization for S922 is complete right? Ie, no more potential speed improvements with DTB modifications.

Or still ongoing discovery to improve performance even more? I didnā€™t follow recent conversation but seems to have shifted to s905x4.

ok i apply script in autostart.sh

CoreELEC:~ # lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               ARM
  Model name:            Cortex-A55
    Model:               0
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          1
    Stepping:            r2p0
    CPU(s) scaling MHz:  100%
    CPU max MHz:         2004.0000
    CPU min MHz:         100.0000
    BogoMIPS:            48.00
    Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Spec store bypass:     Not affected
  Spectre v1:            Mitigation; __user pointer sanitization
  Spectre v2:            Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected
CoreELEC:~ # lscpu --output-all -e
BOGOMIPS CPU CORE SOCKET CLUSTER NODE BOOK DRAWER CACHE POLARIZATION ADDRESS CONFIGURED ONLINE       MHZ SCALMHZ%    MAXMHZ   MINMHZ MODELNAME
   48.00   0    0      0       0    -    -      - -     -            -       -             yes 2004.0000     100% 2004.0000 100.0000 Cortex-A55
   48.00   1    1      0       0    -    -      - -     -            -       -             yes 2004.0000     100% 2004.0000 100.0000 Cortex-A55
   48.00   2    2      0       0    -    -      - -     -            -       -             yes 2004.0000     100% 2004.0000 100.0000 Cortex-A55
   48.00   3    3      0       0    -    -      - -     -            -       -             yes 2004.0000     100% 2004.0000 100.0000 Cortex-A55
CoreELEC:~ #

Then you did wrong. Script is one-time execution, not autostartā€¦ After execution, needs reboot to get effective.

ok, after apply Script & reboot:

CoreELEC:~ # lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               ARM
  Model name:            Cortex-A55
    Model:               0
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          1
    Stepping:            r2p0
    CPU(s) scaling MHz:  100%
    CPU max MHz:         2004.0000
    CPU min MHz:         100.0000
    BogoMIPS:            48.00
    Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
Caches (sum of all):
  L1d:                   128 KiB (4 instances)
  L1i:                   128 KiB (4 instances)
  L2:                    256 KiB (4 instances)
  L3:                    500 KiB (1 instance)
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Spec store bypass:     Not affected
  Spectre v1:            Mitigation; __user pointer sanitization
  Spectre v2:            Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected
CoreELEC:~ # lscpu --output-all -e
BOGOMIPS CPU CORE SOCKET CLUSTER NODE BOOK DRAWER L1d:L1i:L2:L3 POLARIZATION ADDRESS CONFIGURED ONLINE       MHZ SCALMHZ%    MAXMHZ   MINMHZ MODELNAME
   48.00   0    0      0       0    -    -      - 0:0:0:0       -            -       -             yes 2004.0000     100% 2004.0000 100.0000 Cortex-A55
   48.00   1    1      0       0    -    -      - 1:1:1:0       -            -       -             yes 2004.0000     100% 2004.0000 100.0000 Cortex-A55
   48.00   2    2      0       0    -    -      - 2:2:2:0       -            -       -             yes 2004.0000     100% 2004.0000 100.0000 Cortex-A55
   48.00   3    3      0       0    -    -      - 3:3:3:0       -            -       -             yes 2004.0000     100% 2004.0000 100.0000 Cortex-A55
CoreELEC:~ # lscpu --caches
NAME ONE-SIZE ALL-SIZE WAYS TYPE        LEVEL SETS PHY-LINE COHERENCY-SIZE
L1d       32K     128K   16 Data            1   32                      64
L1i       32K     128K   16 Instruction     1   32                      64
L2        64K     256K    2 Unified         2  512                      64
L3       500K     500K   15 Unified         3  512                      64
CoreELEC:~ #




You are the first reporter who has an L3 cache that has 15 ways instead of 16. Your L2 is 2 ways instead of 4.

What CPU do you have?

You confirmed my findings as well that cache size does not matter, it can be wrongā€¦ important to get the hierarchy right. Especially important if cache is NOT shared like big. little and 905X4 L2. Kernel pins some instructions and data in a stale process to higher levels of cache. Kernel might use the information cache is not shared and push it to L3 if it wants to switch between cores thereby reducing the penalty of process bump in non shared cache processor.

There is a benchmark written in Fortran/C you can run. It does a 100 million iterations to average out values.

Iā€™ll let the admins speak to simplification and how to implement it. I saw one post below that had 15 ways of cache. Thatā€™s not a number divisible by 2. Who knows whatā€™s happening there.

Btw separately I think you may be IO bound on your Cube. The CPU might be spending a lot of time in waiting for reads and writes. I installed iotop to check and it seems at beginning of every video it still reads/writes at 10-20 KB/s.

Edit cache 15 ways is a kernel NE vs NG difference. Havenā€™t looked into it further

Amlogic S905X4-K
A55
Device: Dune HD Homatics

Maybe thatā€™s just the way it is for that box. Youā€™ll have to find someone with the same device and compare. All other 905X4 have different values.

Interesting results, different cache strategy (sets/ways), but looks OK.
What triggers me is L3 500k. Maybe sth reserved 12k? Or wrong detection? Or some chip yield improvement by AML? :wink:
But cache strategy fits detected 500k (64bytes x 512 sets x 15 ways = 491520 bytes)

@Zuma Which kernel you are running?

2 Likes

I was running on mine yesterday with ng, today I tried with ne on my Homatics Box R 4K Plus S905X4. I am getting same result as you.

CoreELEC:~ # lscpu --caches
NAME ONE-SIZE ALL-SIZE WAYS TYPE        LEVEL SETS PHY-LINE COHERENCY-SIZE
L1d       32K     128K   16 Data            1   32                      64
L1i       32K     128K   16 Instruction     1   32                      64
L2        64K     256K    2 Unified         2  512                      64
L3       500K     500K   15 Unified         3  512                      64
1 Like

@rho-bot Is there a place in /sys/devices or elsewhere that I can read the DDR frequency from for Amlogic SOCs?

ok after installing CPMā€™s latest build, am6b+

CoreELEC:~ # lscpu
Architecture:           aarch64
  Byte Order:           Little Endian
CPU(s):                 6
  On-line CPU(s) list:  0-5
Vendor ID:              ARM
  Model name:           Cortex-A53
    Model:              4
    Thread(s) per core: 1
    Core(s) per socket: 2
    Socket(s):          1
    Stepping:           r0p4
    CPU(s) scaling MHz: 100%
    CPU max MHz:        1800.0000
    CPU min MHz:        500.0000
    BogoMIPS:           48.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32
  Model name:           Cortex-A73
    Model:              2
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s):          1
    Stepping:           r0p2
    CPU(s) scaling MHz: 100%
    CPU max MHz:        2208.0000
    CPU min MHz:        500.0000
    BogoMIPS:           48.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32
Caches (sum of all):
  L1d:                  192 KiB (6 instances)
  L1i:                  320 KiB (6 instances)
  L2:                   1.3 MiB (2 instances)


CoreELEC:~ # lscpu --caches
NAME ONE-SIZE ALL-SIZE WAYS TYPE        LEVEL SETS PHY-LINE COHERENCY-SIZE
L1d       32K     192K    4 Data            1  128                      64
L1i       32K     320K    2 Instruction     1  256                      64
L2       256K     1.3M   16 Unified         2  256                      64

CoreELEC:~ # lscpu --caches
NAME ONE-SIZE ALL-SIZE WAYS TYPE        LEVEL SETS PHY-LINE COHERENCY-SIZE
L1d       32K     192K    4 Data            1  128                      64
L1i       32K     320K    2 Instruction     1  256                      64
L2       256K     1.3M   16 Unified         2  256                      64
CoreELEC:~ # ^C
CoreELEC:~ # lscpu --output-all -e
BOGOMIPS CPU CORE SOCKET CLUSTER NODE BOOK DRAWER L1d:L1i:L2 POLARIZATION ADDRESS CONFIGURED ONLINE       MHZ SCALMHZ%    MAXMHZ   MINMHZ MODELNAME
   48.00   0    0      0       -    -    -      - 0:0:0      -            -       -             yes 1800.0000     100% 1800.0000 500.0000 Cortex-A53
   48.00   1    1      0       -    -    -      - 1:1:0      -            -       -             yes 1800.0000     100% 1800.0000 500.0000 Cortex-A53
   48.00   2    0      0       -    -    -      - 2:2:1      -            -       -             yes 2208.0000     100% 2208.0000 500.0000 Cortex-A73
   48.00   3    1      0       -    -    -      - 3:3:1      -            -       -             yes 2208.0000     100% 2208.0000 500.0000 Cortex-A73
   48.00   4    2      0       -    -    -      - 4:4:1      -            -       -             yes 2208.0000     100% 2208.0000 500.0000 Cortex-A73
   48.00   5    3      0       -    -    -      - 5:5:1      -            -       -             yes 2208.0000     100% 2208.0000 500.0000 Cortex-A73



/sys/class/aml_ddr might be what your looking for.

@rome1931

# cat /sys/class/aml_ddr/freq
752 MHz

This cannot be true, can it?

My box says the same.

Is the ram really running at 752 MHz?

Ugoos says 3733 Mbps, must be one of these samsung chips:

Changing L3 size to 0x80000 fixes the 500K and 15 ways issue.

CoreELEC:~ # lscpu --caches
NAME ONE-SIZE ALL-SIZE WAYS TYPE        LEVEL SETS PHY-LINE COHERENCY-SIZE
L1d       32K     128K   16 Data            1   32                      64
L1i       32K     128K   16 Instruction     1   32                      64
L2        64K     256K    2 Unified         2  512                      64
L3       512K     512K   16 Unified         3  512                      64
1 Like

I ran some memcpy benchmarks from lmbech (entware).

/sys/class/aml_ddr # cat usage_stat
MAX bandwidth:  9754163 KB/s, usage: 83.01%, tick:305010215 us
AVG bandwidth:   470363 KB/s, usage:  3.99%, samples:3599

A53 integrated memory controller maxes out at 1600 MT/s which is 800 MHz in DDR mode. The ram chip may be higher but the processorā€™s memory controller doesnā€™t go that high.

Homatics box R 4K plus silver, after modifyed the dtb.imgā€™s:

NE nightly:

/flash$ lscpu --caches
NAME ONE-SIZE ALL-SIZE WAYS TYPE        LEVEL SETS PHY-LINE COHERENCY-SIZE
L1d       32K     128K   16 Data            1   32                      64
L1i       32K     128K   16 Instruction     1   32                      64
L2        64K     256K    2 Unified         2  512                      64
L3       512K     512K   16 Unified         3  512                      64

NG nightly:

/flash$ lscpu --caches
NAME ONE-SIZE ALL-SIZE WAYS TYPE        LEVEL SETS PHY-LINE COHERENCY-SIZE
L1d       32K     128K    4 Data            1  128                      64
L1i       32K     128K    4 Instruction     1  128                      64
L2        64K     256K    4 Unified         2  256                      64
L3       512K     512K   16 Unified         3  512                      64

Currently testingā€¦