S922X Ugoos AM6B Device Tree - Performance/Efficiency - Testing Needed

I think some of the device trees in the compiled dtb files you uploaded may not have the changes you made to the dts.

g12b_w400_a in g12b_s922x_odroid_n2.dtb didn’t seem to have any changes when I tried it. May explain why others weren’t seeing anything when they tried some of the other dtb files.

Using the following shell script:

#!/bin/sh

(
    for scaling_min_freq in $(find /sys/devices/system/cpu -name scaling_min_freq); do
        echo 667000 | tee "${scaling_min_freq}" > /dev/null
    done

    for scaling_governor in $(find /sys/devices/system/cpu -name scaling_governor); do
        echo 'ondemand' | tee "${scaling_governor}" > /dev/null
    done

    # Default: 0 (= ignore (0) or include (1) I/O activity in CPU activity calculations)
    for io_is_busy in $(find /sys/devices/system/cpu -name io_is_busy); do
        echo 1 | tee "${io_is_busy}" > /dev/null
    done

    # Default: 95 (= percentage CPU usage required to increase frequency to maximum allowed for policy)
    for up_threshold in $(find /sys/devices/system/cpu -name up_threshold); do
        echo 100 | tee "${up_threshold}" > /dev/null
    done

    # Default: 50000 (= time in μs of intervals where CPU usage is evaluated and frequency adjusted)
    for sampling_rate in $(find /sys/devices/system/cpu -name sampling_rate); do
        # echo 100000 | tee "${sampling_rate}" > /dev/null
        echo 50000 | tee "${sampling_rate}" > /dev/null
    done

    # Default: 1 (= temporary factor (1 to 100) applied to sampling_rate after CPU frequency is at maximum, effectively keeps CPU frequency at max for longer)
    for sampling_down_factor in $(find /sys/devices/system/cpu -name sampling_down_factor); do
        # echo 50 | tee "${sampling_down_factor}" > /dev/null
        echo 1 | tee "${sampling_down_factor}" > /dev/null
    done
)

Gives typical CPU usage on my system of something like this:

 == 2x A53 cores (ondemand, 1896 Mhz) ===
|  500 MHz |   0d:00h:00m:00s |   0.00 % |
|  667 MHz |   0d:16h:17m:36s |  52.61 % |===================================================
| 1000 MHz |   0d:05h:34m:35s |  18.01 % |=================
| 1200 MHz |   0d:02h:27m:02s |   7.91 % |======
| 1398 MHz |   0d:03h:50m:01s |  12.38 % |===========
| 1512 MHz |   0d:01h:44m:29s |   5.62 % |====
| 1608 MHz |   0d:00h:25m:36s |   1.38 % |
| 1704 MHz |   0d:00h:09m:50s |   0.53 % |
| 1800 MHz |   0d:00h:06m:48s |   0.37 % |
| 1896 MHz |   0d:00h:22m:07s |   1.19 % |
 == 4x A73 cores (ondemand, 1000 Mhz) ===
|  500 Mhz |   0d:00h:00m:00s |   0.00 % |
|  667 Mhz |   1d:03h:37m:16s |  89.19 % |========================================================================================
| 1000 Mhz |   0d:00h:15m:39s |   0.84 % |
| 1200 Mhz |   0d:00h:10m:35s |   0.57 % |
| 1398 Mhz |   0d:00h:09m:10s |   0.49 % |
| 1512 Mhz |   0d:00h:09m:30s |   0.51 % |
| 1608 Mhz |   0d:00h:15m:24s |   0.83 % |
| 1704 Mhz |   0d:00h:49m:04s |   2.64 % |=
| 1800 Mhz |   0d:01h:31m:27s |   4.92 % |===
 ========================================
| Total    |   1d:06h:58m:10s | 30.60 °C |

Used the following shell script to make and apply the device tree changes on the fly:

#!/bin/sh

(
    /bin/mount -o rw,remount /flash

    fdtput -p /flash/dtb.img /cpus/l2-cache0 compatible cache -t s
    fdtput -p /flash/dtb.img /cpus/l2-cache0 cache-level 2 -t i
    fdtput -p /flash/dtb.img /cpus/l2-cache0 cache-unified
    fdtput -p /flash/dtb.img /cpus/l2-cache0 cache-size 0x40000 -t x
    fdtput -p /flash/dtb.img /cpus/l2-cache0 cache-line-size 64 -t i
    fdtput -p /flash/dtb.img /cpus/l2-cache0 cache-sets 512 -t i
    fdtput -p /flash/dtb.img /cpus/l2-cache0 phandle 14 -t i

    fdtput -p /flash/dtb.img /cpus/l2-cache1 compatible cache -t s
    fdtput -p /flash/dtb.img /cpus/l2-cache1 cache-level 2 -t i
    fdtput -p /flash/dtb.img /cpus/l2-cache1 cache-unified
    fdtput -p /flash/dtb.img /cpus/l2-cache1 cache-size 0x100000 -t x
    fdtput -p /flash/dtb.img /cpus/l2-cache1 cache-line-size 64 -t i
    fdtput -p /flash/dtb.img /cpus/l2-cache1 cache-sets 1024-t i
    fdtput -p /flash/dtb.img /cpus/l2-cache1 phandle 15 -t i

    fdtput -p /flash/dtb.img /cpus/cpu@0 capacity-dmips-mhz 491 -t i
    fdtput -p /flash/dtb.img /cpus/cpu@0 next-level-cache 14 -t i

    fdtput -p /flash/dtb.img /cpus/cpu@1 capacity-dmips-mhz 491 -t i
    fdtput -p /flash/dtb.img /cpus/cpu@1 next-level-cache 14 -t i

    fdtput -p /flash/dtb.img /cpus/cpu@100 capacity-dmips-mhz 1024 -t i
    fdtput -p /flash/dtb.img /cpus/cpu@100 next-level-cache 15 -t i

    fdtput -p /flash/dtb.img /cpus/cpu@101 capacity-dmips-mhz 1024 -t i
    fdtput -p /flash/dtb.img /cpus/cpu@101 next-level-cache 15 -t i

    fdtput -p /flash/dtb.img /cpus/cpu@102 capacity-dmips-mhz 1024 -t i
    fdtput -p /flash/dtb.img /cpus/cpu@102 next-level-cache 15 -t i

    fdtput -p /flash/dtb.img /cpus/cpu@103 capacity-dmips-mhz 1024 -t i
    fdtput -p /flash/dtb.img /cpus/cpu@103 next-level-cache 15 -t i


    /bin/sync
    /bin/mount -o ro,remount /flash

    read -p "Restart now? [Y/N]: " KEYINPUT
    if [ "$KEYINPUT" != "${KEYINPUT#[Yy]}" ]; then
        /sbin/reboot
    fi
)

Have initially tried it using the parameters identified by @YadaYada

Profiling some Python plugins indicates a fairly consistent speed improvement of ~30%

Will see how CPU usage over time is impacted, and also what effect your original values have.

2 Likes