S905X4 Device Tree - Performance/Efficiency - Testing Needed

Installed System tool addon from CoreELEC repo and made some test with stress-ng tool.

Not patched, original dtb.

CoreELEC (official): 21.1.1-Omega_nightly_20241003 (Amlogic-ng.arm)
      Machine model: Amlogic
     CoreELEC dt-id: sc2_s905x4_kinhank_g1
      Linux version: 4.9.269 (docker@894e25749f63) #1 Thu Oct 3 04:52:34 IDT 2024
      Kodi compiled: 2024-10-03 04:15:47 +0200

CoreELECg1:~ # stress-ng --skip-silent --cache-enable-all --class cpu-cache --all 1 -t 100
stress-ng: info:  [5912] setting to a 1 min, 40 secs run per stressor
stress-ng: info:  [5912] far-branch: architecture not supported
stress-ng: info:  [5912] flushcache: architecture not supported
stress-ng: info:  [5912] icache: architecture not supported
stress-ng: info:  [5912] dispatching hogs: 1 bitonicsort, 1 bsearch, 1 cache, 1 cacheline, 1 dekker, 1 heapsort, 1 hsearch, 1 insertionsort, 1 l1cache, 1 list, 1 llc-affinity, 1 lockbus, 1 lsearch, 1 malloc, 1 matrix, 1 matrix-3d, 1 membarrier, 1 memcpy, 1 mergesort, 1 misaligned, 1 peterson, 1 prefetch, 1 qsort, 1 radixsort, 1 shellsort, 1 skiplist, 1 sparsematrix, 1 spinmem, 1 str, 1 stream, 1 tree, 1 tsearch, 1 vecfp, 1 vecmath, 1 vecshuf, 1 vecwide, 1 wcs, 1 zlib
stress-ng: info:  [5915] cache: cache flags used: prefetch fence
stress-ng: info:  [5915] cache: unavailable unused cache flags: flush sfence clflushopt cldemote clwb
stress-ng: info:  [5916] cacheline: using built-in defaults as no suitable cache found
stress-ng: info:  [5916] cacheline: to fully exercise a 64 byte cache line, 32 instances are required
stress-ng: info:  [5918] heapsort: using method 'heapsort-nonlibc'
stress-ng: info:  [5935] qsort: using method 'qsort-libc'
stress-ng: info:  [5936] radixsort: using method 'radixsort-nonlibc'
stress-ng: info:  [5931] mergesort: using method 'mergesort-nonlibc'
stress-ng: info:  [5934] prefetch: using built-in defaults as no suitable cache found
stress-ng: info:  [5942] stream: using built-in defaults as no suitable cache found
stress-ng: info:  [5942] stream: stressor loosely based on a variant of the STREAM benchmark code
stress-ng: info:  [5942] stream: do NOT submit any of these results to the STREAM benchmark results
stress-ng: info:  [5942] stream: Using cache size of 2048K
stress-ng: info:  [5939] sparsematrix: 10000 items in 500 x 500 sparse matrix (4.00% full)
stress-ng: info:  [5934] prefetch: using a 4096 KB L3 cache with prefetch method 'builtin'
stress-ng: info:  [5932] misaligned: exercised all int16rd int16wr int16inc int32rd int32wr int32inc
stress-ng: info:  [5942] stream: memory rate: 259.05 MB read/sec, 172.70 MB write/sec, 22.64 double precision Mflop/sec (instance 0)
stress-ng: info:  [5912] skipped: 7: far-branch (1) flushcache (1) icache (1) judy (1) l1cache (1) llc-affinity (1) wcs (1)
stress-ng: info:  [5912] passed: 35: bitonicsort (1) bsearch (1) cache (1) cacheline (1) dekker (1) heapsort (1) hsearch (1) insertionsort (1) list (1) lockbus (1) lsearch (1) malloc (1) matrix (1) matrix-3d (1) membarrier (1) memcpy (1) mergesort (1) misaligned (1) peterson (1) prefetch (1) qsort (1) radixsort (1) shellsort (1) skiplist (1) sparsematrix (1) spinmem (1) str (1) stream (1) tree (1) tsearch (1) vecfp (1) vecmath (1) vecshuf (1) vecwide (1) zlib (1)
stress-ng: info:  [5912] failed: 0
stress-ng: info:  [5912] metrics untrustworthy: 0
stress-ng: info:  [5912] successful run completed in 1 min, 40.58 secs

CoreELECg1:~ # stress-ng --skip-silent --cache-enable-all --class memory --all 1 -t 100
stress-ng: info:  [8670] setting to a 1 min, 40 secs run per stressor
stress-ng: info:  [8670] dispatching hogs: 1 atomic, 1 bad-altstack, 1 bitonicsort, 1 bsearch, 1 context, 1 full, 1 heapsort, 1 hsearch, 1 insertionsort, 1 list, 1 lockbus, 1 lsearch, 1 malloc, 1 matrix, 1 matrix-3d, 1 mcontend, 1 membarrier, 1 memcpy, 1 memfd, 1 memrate, 1 memthrash, 1 mergesort, 1 mincore, 1 misaligned, 1 null, 1 pipe, 1 pipeherd, 1 prefetch, 1 qsort, 1 radixsort, 1 randlist, 1 remap, 1 resources, 1 rmap, 1 shellsort, 1 skiplist, 1 sparsematrix, 1 spinmem, 1 stack, 1 stackmmap, 1 str, 1 stream, 1 tlb-shootdown, 1 tmpfs, 1 tree, 1 tsearch, 1 vm, 1 vm-addr, 1 vm-rw, 1 vm-segv, 1 wcs, 1 zero, 1 zlib
stress-ng: info:  [8678] heapsort: using method 'heapsort-nonlibc'
stress-ng: info:  [8700] mergesort: using method 'mergesort-nonlibc'
stress-ng: info:  [8699] memthrash: no NUMA nodes or maximum NUMA nodes, ignoring numa memthrash method
stress-ng: info:  [8699] memthrash: starting 4 threads on each of the 1 stressors on a 4 CPU system
stress-ng: info:  [8698] memrate: using buffer size of 262144K, cache flushing disabled
stress-ng: info:  [8698] memrate: cache flushing can be enabled with --memrate-flush option
stress-ng: info:  [8706] prefetch: using built-in defaults as no suitable cache found
stress-ng: info:  [8703] null: exercising /dev/null with writes, lseek, ioctl, fcntl, fallocate, fdatasync and mmap; for just write benchmarking use --null-write
stress-ng: info:  [8707] qsort: using method 'qsort-libc'
stress-ng: info:  [8708] radixsort: using method 'radixsort-nonlibc'
stress-ng: info:  [8706] prefetch: using a 4096 KB L3 cache with prefetch method 'builtin'
stress-ng: info:  [8731] sparsematrix: 10000 items in 500 x 500 sparse matrix (4.00% full)
stress-ng: info:  [8736] stream: using built-in defaults as no suitable cache found
stress-ng: info:  [8736] stream: stressor loosely based on a variant of the STREAM benchmark code
stress-ng: info:  [8736] stream: do NOT submit any of these results to the STREAM benchmark results
stress-ng: info:  [8736] stream: Using cache size of 2048K
stress-ng: info:  [8746] zero: exercising /dev/zero with reads, mmap, lseek, and ioctl; for just read benchmarking use --zero-read
stress-ng: info:  [8702] misaligned: exercised all int16rd int16wr int16inc int32rd int32wr int32inc
stress-ng: info:  [8736] stream: memory rate: 41.85 MB read/sec, 27.90 MB write/sec, 3.66 double precision Mflop/sec (instance 0)
stress-ng: info:  [8670] skipped: 4: judy (1) numa (1) oom-pipe (1) wcs (1)
stress-ng: info:  [8670] passed: 52: atomic (1) bad-altstack (1) bitonicsort (1) bsearch (1) context (1) full (1) heapsort (1) hsearch (1) insertionsort (1) list (1) lockbus (1) lsearch (1) malloc (1) matrix (1) matrix-3d (1) mcontend (1) membarrier (1) memcpy (1) memfd (1) memrate (1) memthrash (1) mergesort (1) mincore (1) misaligned (1) null (1) pipe (1) pipeherd (1) prefetch (1) qsort (1) radixsort (1) randlist (1) remap (1) resources (1) rmap (1) shellsort (1) skiplist (1) sparsematrix (1) spinmem (1) stack (1) stackmmap (1) str (1) stream (1) tlb-shootdown (1) tmpfs (1) tree (1) tsearch (1) vm (1) vm-addr (1) vm-rw (1) vm-segv (1) zero (1) zlib (1)
stress-ng: info:  [8670] failed: 0
stress-ng: info:  [8670] metrics untrustworthy: 0
stress-ng: info:  [8670] successful run completed in 1 min, 41.53 secs
Patched dtb.
CoreELEC (official): 21.1.1-Omega_nightly_20241003 (Amlogic-ng.arm)
      Machine model: Amlogic
     CoreELEC dt-id: sc2_s905x4_kinhank_g1
      Linux version: 4.9.269 (docker@894e25749f63) #1 Thu Oct 3 04:52:34 IDT 2024
      Kodi compiled: 2024-10-03 04:15:47 +0200

CoreELECg1:~ # stress-ng --skip-silent --cache-enable-all --class cpu-cache --all 1 -t 100
stress-ng: info:  [5430] setting to a 1 min, 40 secs run per stressor
stress-ng: info:  [5430] far-branch: architecture not supported
stress-ng: info:  [5430] flushcache: architecture not supported
stress-ng: info:  [5430] icache: architecture not supported
stress-ng: info:  [5430] dispatching hogs: 1 bitonicsort, 1 bsearch, 1 cache, 1 cacheline, 1 dekker, 1 heapsort, 1 hsearch, 1 insertionsort, 1 l1cache, 1 list, 1 llc-affinity, 1 lockbus, 1 lsearch, 1 malloc, 1 matrix, 1 matrix-3d, 1 membarrier, 1 memcpy, 1 mergesort, 1 misaligned, 1 peterson, 1 prefetch, 1 qsort, 1 radixsort, 1 shellsort, 1 skiplist, 1 sparsematrix, 1 spinmem, 1 str, 1 stream, 1 tree, 1 tsearch, 1 vecfp, 1 vecmath, 1 vecshuf, 1 vecwide, 1 wcs, 1 zlib
stress-ng: info:  [5433] cache: cache flags used: prefetch fence
stress-ng: info:  [5433] cache: unavailable unused cache flags: flush sfence clflushopt cldemote clwb
stress-ng: info:  [5436] heapsort: using method 'heapsort-nonlibc'
stress-ng: info:  [5434] cacheline: to fully exercise a 64 byte cache line, 32 instances are required
stress-ng: info:  [5439] l1cache: l1cache: size: 32.0K, sets: 128, ways: 4, line size: 64 bytes
stress-ng: info:  [5457] sparsematrix: 10000 items in 500 x 500 sparse matrix (4.00% full)
stress-ng: info:  [5449] mergesort: using method 'mergesort-nonlibc'
stress-ng: info:  [5453] qsort: using method 'qsort-libc'
stress-ng: info:  [5460] stream: stressor loosely based on a variant of the STREAM benchmark code
stress-ng: info:  [5460] stream: do NOT submit any of these results to the STREAM benchmark results
stress-ng: info:  [5460] stream: Using cache size of 512K
stress-ng: info:  [5454] radixsort: using method 'radixsort-nonlibc'
stress-ng: info:  [5441] llc-affinity: using LLC cache size of 512K
stress-ng: info:  [5452] prefetch: using a 512 KB L3 cache with prefetch method 'builtin'
stress-ng: info:  [5460] stream: memory rate: 230.35 MB read/sec, 153.57 MB write/sec, 20.13 double precision Mflop/sec (instance 0)
stress-ng: info:  [5450] misaligned: exercised all int16rd int16wr int16inc int32rd int32wr int32inc
stress-ng: info:  [5430] skipped: 5: far-branch (1) flushcache (1) icache (1) judy (1) wcs (1)
stress-ng: info:  [5430] passed: 37: bitonicsort (1) bsearch (1) cache (1) cacheline (1) dekker (1) heapsort (1) hsearch (1) insertionsort (1) l1cache (1) list (1) llc-affinity (1) lockbus (1) lsearch (1) malloc (1) matrix (1) matrix-3d (1) membarrier (1) memcpy (1) mergesort (1) misaligned (1) peterson (1) prefetch (1) qsort (1) radixsort (1) shellsort (1) skiplist (1) sparsematrix (1) spinmem (1) str (1) stream (1) tree (1) tsearch (1) vecfp (1) vecmath (1) vecshuf (1) vecwide (1) zlib (1)
stress-ng: info:  [5430] failed: 0
stress-ng: info:  [5430] metrics untrustworthy: 0
stress-ng: info:  [5430] successful run completed in 1 min, 41.61 secs
CoreELECg1:~ # stress-ng --skip-silent --cache-enable-all --class memory --all 1 -t 100
stress-ng: info:  [7384] setting to a 1 min, 40 secs run per stressor
stress-ng: info:  [7384] dispatching hogs: 1 atomic, 1 bad-altstack, 1 bitonicsort, 1 bsearch, 1 context, 1 full, 1 heapsort, 1 hsearch, 1 insertionsort, 1 list, 1 lockbus, 1 lsearch, 1 malloc, 1 matrix, 1 matrix-3d, 1 mcontend, 1 membarrier, 1 memcpy, 1 memfd, 1 memrate, 1 memthrash, 1 mergesort, 1 mincore, 1 misaligned, 1 null, 1 pipe, 1 pipeherd, 1 prefetch, 1 qsort, 1 radixsort, 1 randlist, 1 remap, 1 resources, 1 rmap, 1 shellsort, 1 skiplist, 1 sparsematrix, 1 spinmem, 1 stack, 1 stackmmap, 1 str, 1 stream, 1 tlb-shootdown, 1 tmpfs, 1 tree, 1 tsearch, 1 vm, 1 vm-addr, 1 vm-rw, 1 vm-segv, 1 wcs, 1 zero, 1 zlib
stress-ng: info:  [7391] heapsort: using method 'heapsort-nonlibc'
stress-ng: info:  [7413] null: exercising /dev/null with writes, lseek, ioctl, fcntl, fallocate, fdatasync and mmap; for just write benchmarking use --null-write
stress-ng: info:  [7409] memthrash: no NUMA nodes or maximum NUMA nodes, ignoring numa memthrash method
stress-ng: info:  [7409] memthrash: starting 4 threads on each of the 1 stressors on a 4 CPU system
stress-ng: info:  [7417] qsort: using method 'qsort-libc'
stress-ng: info:  [7425] sparsematrix: 10000 items in 500 x 500 sparse matrix (4.00% full)
stress-ng: info:  [7416] prefetch: using a 512 KB L3 cache with prefetch method 'builtin'
stress-ng: info:  [7410] mergesort: using method 'mergesort-nonlibc'
stress-ng: info:  [7418] radixsort: using method 'radixsort-nonlibc'
stress-ng: info:  [7430] stream: stressor loosely based on a variant of the STREAM benchmark code
stress-ng: info:  [7430] stream: do NOT submit any of these results to the STREAM benchmark results
stress-ng: info:  [7430] stream: Using cache size of 512K
stress-ng: info:  [7440] zero: exercising /dev/zero with reads, mmap, lseek, and ioctl; for just read benchmarking use --zero-read
stress-ng: info:  [7408] memrate: using buffer size of 262144K, cache flushing disabled
stress-ng: info:  [7408] memrate: cache flushing can be enabled with --memrate-flush option
stress-ng: info:  [7430] stream: memory rate: 47.53 MB read/sec, 31.69 MB write/sec, 4.15 double precision Mflop/sec (instance 0)
stress-ng: info:  [7412] misaligned: exercised all int16rd int16wr int16inc int32rd int32wr int32inc
stress-ng: info:  [7384] skipped: 4: judy (1) numa (1) oom-pipe (1) wcs (1)
stress-ng: info:  [7384] passed: 52: atomic (1) bad-altstack (1) bitonicsort (1) bsearch (1) context (1) full (1) heapsort (1) hsearch (1) insertionsort (1) list (1) lockbus (1) lsearch (1) malloc (1) matrix (1) matrix-3d (1) mcontend (1) membarrier (1) memcpy (1) memfd (1) memrate (1) memthrash (1) mergesort (1) mincore (1) misaligned (1) null (1) pipe (1) pipeherd (1) prefetch (1) qsort (1) radixsort (1) randlist (1) remap (1) resources (1) rmap (1) shellsort (1) skiplist (1) sparsematrix (1) spinmem (1) stack (1) stackmmap (1) str (1) stream (1) tlb-shootdown (1) tmpfs (1) tree (1) tsearch (1) vm (1) vm-addr (1) vm-rw (1) vm-segv (1) zero (1) zlib (1)
stress-ng: info:  [7384] failed: 0
stress-ng: info:  [7384] metrics untrustworthy: 0
stress-ng: info:  [7384] successful run completed in 1 min, 52.12 secs

From command output it seems that the patch sets up the system cache and enables L1 caching. It brings slightly slower cache performance and improved memory performance. So the patch could go upstream if nobody sees setbacks.