ImageFetch offsets for 2D array coordinates have a different composite size than the coordinates. The rescaling pass was not taking this into account.
Fixes broken shaders when scaling is enabled in Astral Chain, and likely other titles.
Adds detection of additional CPU flags to cpu_detect and additions to telemetry output.
This is not exhaustive but guided by features that [dynarmic utilizes](bcfe377aaa/src/dynarmic/backend/x64/host_feature.h (L12-L33)) as well as features that are currently utilized but not reported to telemetry(invariant_tsc). This is intended to guide future optimizations.
AVX512 in particular is broken up into its individual subsets and some other processor features such as [sha](https://en.wikipedia.org/wiki/Intel_SHA_extensions) and [gfni](https://en.wikipedia.org/wiki/AVX-512#GFNI) are added to have some forward-facing data-points.
What used to be a single `CPU_Extension_x64_AVX512` telemetry field
is also broken up into individual `CPU_Extension_x64_AVX512{F,VL,CD,...}` fields.
Set the zero-enum value to Unknown
Move the Manufacterer enum into the CPUCaps structure namespace
Add "ParseManufacturer" utility-function
Fix cpu/brand string buffer sizes(!)
- Instead of randomization, choose in-order addresses for where to map NROs into memory.
- This results in predictable behavior when debugging and consistent behavior when reproducing issues.
Thanks to @asLody for optimizing this function. This raised the focus that this function should be optimized more.
The current table assumes that the host GPU is able to invert for free, so only AND,OR,XOR are accumulated in the performance metrik.
Performance results:
Instructions
0: 8
1: 30
2: 114
3: 80
4: 24
Latency
0: 8
1: 30
2: 194
3: 24
- This makes these functions more accurate to the real HOS implementations.
- Fixes memory access issues in Super Smash Bros. Ultimate that occur when un/mapping NROs.
When CreateRenderer fails, the GraphicsContext that was std::move'd into
it is destroyed before the Scoped that was created to manage its
currency. In that case, the GraphicsContext::Scoped will still call its
destructor at the ending of the function. And because the context is
destroyed, the Scoped will cause a crash as it attempts to call a
destroyed object's DoneCurrent function.
Since we know when the call would be invalid, call the Scoped's Cancel
method. This prevents it from calling a method on a destroyed object.
If a GraphicsContext is destroyed before its Scoped is destroyed, this
causes a crash as the Scoped tries to call a method in the destroyed
context on exit.
Add a way to Cancel the call when we know that calling the
GraphicsContext will not work.
When CreateGPU fails, yuzu would try and shutdown the GPU instance
regardless of whether any instance was actually created.
Check for nullptr before calling its methods to prevent a crash.
* gl_graphics_pipeline: Improve shader builder synchronization
Make use of GLsync objects to ensure better synchronization between shader builder threads and the main context
* gl_graphics_pipeline: Make built_fence access threadsafe
* gl_graphics_pipeline: Use GLsync objects only when building in parallel
* gl_graphics_pipeline: Replace GetSync calls with non-blocking waits
The spec states that a ClientWait on a Fence object ensures the changes propagate to the calling context
It is possible for virtual_offset to not be 0 when the iterator is at the beginning, and thus, std::prev(it) may be evaluated, leading to a crash in debug mode.
Co-Authored-By: Fernando S. <1731197+FernandoS27@users.noreply.github.com>
- Updates the KMemoryManager implementation against latest documentation.
- Reworks KMemoryLayout to be accessed throughout the kernel.
- Fixes an issue with pool sizes being incorrectly reported.
Was getting an unhandled `invalid_argument` [exception](https://en.cppreference.com/w/cpp/thread/thread/join) during
shutdown on my linux machine. This removes the need for a `StopBackendThread` function entirely since `jthread`
[automatically handles both checking if the thread is joinable and stopping the token before attempting to join](https://en.cppreference.com/w/cpp/thread/jthread/~jthread) in the case that `StartBackendThread` was never called.
Per the spec, bufSize is the number of integers that will be written, in this case, 1.
Also, the length argument is optional if the information of the number of elements written is not needed.
Inlines implementation of exclusive instructions into JITted code,
improving performance of applications relying heavily on these
instructions.
We also fastmem these instructions for additional speed, with
support for appropriate recompilation on fastmem failure.
An unsafe optimization to disable the intercore global_monitor is also
provided, should one wish to rely solely on cmpxchg semantics for
safety.
See also: merryhime/dynarmic#664
RDNA2 devices running under the RADV driver were crashing when VK_EXT_vertex_input_dynamic_state was enabled.
Blacklisting these devices until a proper fix is established.