S922X Ugoos AM6B Device Tree - Performance/Efficiency - Testing Needed

MasterKeyxda · 25 September 2024 05:53

Short answer: it depends, gotta experiment to form an opinion and also to see if it is worth it. Have no inclination positive or negative on it.

Longer answer:
Theoretically it is a single cluster so there is no bumping of processes between clusters. Scheduler is free to put the process anywhere it wants.

However, exposing cache information (size, line-size, sets, etc.) to the userspace would benefit some programs which use _builtin_prefetch if the programmers coded it in such a way.

The kernel would detect the different L1 data and L1 instruction cache by itself and then we would define unified L2 that holds both data and instruction. The kernel would detect the size like it does for S922X. I have not done enough research on 905x4 to say what it could do. Architecturally it is similar to A53 with in-order execution pipeline. Basically lots of ifs, buts, etc.

To make the most out of a A55 core I’d propose using a Latency Aware Virtual Deadline scheduler (LAVD). In that we could utilize memory latency data that benefits “small” sized programs. Once you reach large programs that benefit goes away and CPU is saturated.

Edit
@frodo19 there is potentially a way to find out if cache information helps speed up other boxes. This is an inference exercise mostly. If someone is willing to put the -A53-Cache DTB and report back their findings about speed up compared to CE default we’d have information and then figure out how to make A55 905X4 behave the same way

Edit 2:

Dead end. Do not experiment.

Unfortunately I have been so used to the full DTB file that I would be a biased opinion.