Texture unregistration is finished on the GPU thread. The FlutterTexture implementation might not know when it is finished which leads to a race condition. Adding this callback so the FlutterTexture is aware of end of the unregistration process.
This reverts commit c2879cae2ee3707ad07af1118bf4862dc1d82bb7.
Additionally, we fix https://github.com/flutter/flutter/issues/40863 by adding a secondary VSYNC callback.
Unit tests are updated to provide VSYNC mocking and check the fix of https://github.com/flutter/flutter/issues/40863.
The root cause of having https://github.com/flutter/flutter/issues/40863 is the false assumption that each input event must trigger a new frame. That was true in the framework PR https://github.com/flutter/flutter/pull/36616 because the input events there are all scrolling move events. When the PR was ported to the engine, we can no longer distinguish different types of events, and tap events may no longer trigger a new frame.
Therefore, this PR directly hooks into the `VsyncWaiter` and uses its (newly added) secondary callback to dispatch the pending input event.
This reverts commit fcc4ab32301396986dd5103d6d444bff35fe0f63.
Fixes https://github.com/flutter/flutter/issues/41394 and other
related correctness issues.
TBR: @arbreng @jason-simmons @mehmetf
On Fuchsia, add a build flag for compositing OpacityLayers using the system
compositor vs Skia, which exposes a fastpath for opacity via Scenic.
This will only work under certain circumstances, in particular nested
OpacityLayers will not render correctly!
On Fuchsia, add a build flag for compositing PhysicalShapeLayers using
the system compositor vs Skia. Set to off by default, which restores
performant shadows on Fuchsia.
Remove the opacity exposed from ChildView, as that was added mistakenly.
Finally, we centralize the logic for switching between the
system-composited and in-process-composited paths inside of
ContainerLayer. We also centralize the logic for computing elevation
there. This allows the removal of many OS_FUCHSIA-specific code-paths.
Test: Ran workstation on Fuchsia; benchmarked before and after
Bug: 23711
Bug: 24163
* Fix broken tests
The earlier design speculated that embedders could affect the same
transformations on the layers post engine compositor presentation but before
final composition.
However, the linked issue points out that this design is not suitable for use
with hardware overlay planes. When rendering to the same, to affect the
transformation before composition, embedders would have to render to an
off-screen render target and then apply the transformation before presentation.
This patch negates the need for that off-screen render pass.
To be clear, the previous architecture is still fully viable. Embedders still
have full control over layer transformations before composition. This is an
optimization for the hardware overlay planes use-case.
Fixes b/139758641
Additionally, we now use the engine directly as a delegate instead of storing potentially dead runtime_controller.
Unit tests have been updated to include an engine restart check which would fail before the fix.
This fixes https://github.com/flutter/flutter/issues/40303
This change sets up a "spying canvas" to try and detect empty canvases.
When using platform views with a custom embedder, if a platform view
overlay canvas is known to be empty we skip creating a compositor layer
for that overlay.
The engine's activity_running flag tracks whether the app is in the paused or
running lifecycle state. The engine had been defaulting activity_running to
false (meaning paused). But the animator had been defaulting its paused flag
to false, which allowed frames to render at startup. If the engine loses and
regains its surface, then frames would stop rendering because activity_running
is false (even though frames had been rendering when the engine initially
acquired its surface).
This change puts the engine and the animator into a consistent state at
startup. Frames will continue to render until the embedder sends a lifecycle
message that will pause both the engine and the animator.
See https://github.com/flutter/flutter/issues/32624
The dynamic linker on some older versions of Android on x86 fails when doing
dlsym(RTLD_DEFAULT) lookups of symbols exported by the engine library itself.
The engine needs to do this for some data files that are linked into the engine
library (ICU data and Dart snapshot blobs).
To work around this, the engine will declare static symbols for these data
objects on the affected platforms.
Fixes https://github.com/flutter/flutter/issues/20091
This patch allows embedders to split the Flutter layer tree into multiple
chunks. These chunks are meant to be composed one on top of another. This gives
embedders a chance to interleave their own contents between these chunks.
The Flutter embedder API already provides hooks for the specification of
textures for the Flutter engine to compose within its own hierarchy (for camera
feeds, video, etc..). However, not all embedders can render the contents of such
sources into textures the Flutter engine can accept. Moreover, this composition
model may have overheads that are non-trivial for certain use cases. In such
cases, the embedder may choose to specify multiple render target for Flutter to
render into instead of just one.
The use of this API allows embedders to perform composition very similar to the
iOS embedder. This composition model is used on that platform for the embedding
of UIKit view such and web view and map views within the Flutter hierarchy.
However, do note that iOS also has threading configurations that are currently
not available to custom embedders.
The embedder API updates in this patch are ABI stable and existing embedders
will continue to work are normal. For embedders that want to enable this
composition mode, the API is designed to make it easy to opt into the same in an
incremental manner.
Rendering of contents into the “root” rendering surface remains unchanged.
However, now the application can push “platform views” via a scene builder.
These platform views need to handled by a FlutterCompositor specified in a new
field at the end of the FlutterProjectArgs struct.
When a new platform view in introduced within the layer tree, the compositor
will ask the embedder to create a new render target for that platform view.
Render targets can currently be OpenGL framebuffers, OpenGL textures or software
buffers. The type of the render target returned by the embedder must be
compatible with the root render surface. That is, if the root render surface is
an OpenGL framebuffer, the render target for each platform view must either be a
texture or a framebuffer in the same OpenGL context. New render target types as
well as root renderers for newer APIs like Metal & Vulkan can and will be added
in the future. The addition of these APIs will be done in an ABI & API stable
manner.
As Flutter renders frames, it gives the embedder a callback with information
about the position of the various platform views in the effective hierarchy.
The embedder is then meant to put the contents of the render targets that it
setup and had previously given to the engine onto the screen (of course
interleaving the contents of the platform views).
Unit-tests have been added that test not only the structure and properties of
layer hierarchy given to the compositor, but also the contents of the texels
rendered by a test compositor using both the OpenGL and software rendering
backends.
Fixes b/132812775
Fixesflutter/flutter#35410
After pre-roll we know if there have been any mutations made to the IOS embedded UIViews. If there are any mutations and the thread configuration is such chat the mutations will be committed on an illegal thread (GPU thread), we merge the threads and keep them merged until the lease expires. The lease is currently set to expire after 10 frames of no mutations. If there are any mutations in the interim we extend the lease.
TaskRunnerMerger will ultimately be responsible for enforcing the correct thread configurations.
This configuration will be inactive even after this change since still use the same thread when we create the iOS engine. That is slated to change in the coming PRs.
When |OS_FUCHSIA| is defined (even when |FUCHSIA_SDK| is defined as
well), use the Fuchsia SDK trace macros rather than the Dart timeline.
Reasons for doing this include:
Fuchsia's trace macros support categories. This allows one to
distinguish between (e.g.) "flutter" and "skia" trace events for trace
recording and trace visualization.
Fuchsia has existing in tree benchmarks that depend on finding certain
events under category "flutter".
See the Fuchsia performance mailing list discussion for more context.
The rasterizer may only be accessed safely on the GPU task runner. The test was accessing the same on a non-engine known task runner instead (i.e the tests main task runner).
Crashes previously reproducible on all platforms with the following filters: `--gtest_filter="*ShellTest.SetResourceCacheSize*" --gtest_repeat=-1 --gtest_shuffle --gtest_random_seed=1988` at run ~400.
Fixes https://github.com/flutter/flutter/issues/37629
Debug builds log invalid file errors on launch of anything using the
embedding API due to an unconditional use of assets_dir, even though
only one of assets_dir or assets_path needs to be set (and the embedding
API currently uses the latter). This checks that the FD has been set
before trying to use it to create an asset resolver.
Also eliminates a duplicate code path in embedder.cc, where it was
calling RunConfiguration::InferFromSettings, then running exactly the
same asset manager creation code again locally.
libapp.so contains compiled application Dart code. On most Android systems,
this library can be loaded by calling dlopen("libapp.so"), which will search
Android's default library directories.
On some Android devices this does not work as expected. As a workaround, this
patch provides a fallback path to libapp.so based on ApplicationInfo.nativeLibraryDir.
Fixes https://github.com/flutter/flutter/issues/35838
This patch reworks image decompression and collection in the following ways
because of misbehavior in the described edge cases.
The current flow for realizing a texture on the GPU from a blob of compressed
bytes is to first pass it to the IO thread for image decompression and then
upload to the GPU. The handle to the texture on the GPU is then passed back to
the UI thread so that it can be included in subsequent layer trees for
rendering. The GPU contexts on the Render & IO threads are in the same
sharegroup so the texture ends up being visible to the Render Thread context
during rendering. This works fine and does not block the UI thread. All
references to the image are owned on UI thread by Dart objects. When the final
reference to the image is dropped, the texture cannot be collected on the UI
thread (because it has not GPU context). Instead, it must be passed to either
the GPU or IO threads. The GPU thread is usually in the middle of a frame
workload so we redirect the same to the IO thread for eventual collection. While
texture collections are usually (comparatively) fast, texture decompression and
upload are slow (order of magnitude of frame intervals).
For application that end up creating (by not necessarily using) numerous large
textures in straight-line execution, it could be the case that texture
collection tasks are pending on the IO task runner after all the image
decompressions (and upload) are done. Put simply, the collection of the first
image could be waiting for the decompression and upload of the last image in the
queue.
This is exacerbated by two other hacks added to workaround unrelated issues.
* First, creating a codec with a single image frame immediately kicks of
decompression and upload of that frame image (even if the frame was never
request from the codec). This hack was added because we wanted to get rid of
the compressed image allocation ASAP. The expectation was codecs would only be
created with the sole purpose of getting the decompressed image bytes.
However, for applications that only create codecs to get image sizes (but
never actually decompress the same), we would end up replacing the compressed
image allocation with a larger allocation (device resident no less) for no
obvious use. This issue is particularly insidious when you consider that the
codec is usually asked for the native image size first before the frame is
requested at a smaller size (usually using a new codec with same data but new
targetsize). This would cause the creation of a whole extra texture (at 1:1)
when the caller was trying to “optimize” for memory use by requesting a
texture of a smaller size.
* Second, all image collections we delayed in by the unref queue by 250ms
because of observations that the calling thread (the UI thread) was being
descheduled unnecessarily when a task with a timeout of zero was posted from
the same (recall that a task has to be posted to the IO thread for the
collection of that texture). 250ms is multiple frame intervals worth of
potentially unnecessary textures.
The net result of these issues is that we may end up creating textures when all
that the application needs is to ask it’s codec for details about the same (but
not necessarily access its bytes). Texture collection could also be delayed
behind other jobs to decompress the textures on the IO thread. Also, all texture
collections are delayed for an arbitrary amount of time.
These issues cause applications to be susceptible to OOM situations. These
situations manifest in various ways. Host memory exhaustion causes the usual OOM
issues. Device memory exhaustion seems to manifest in different ways on iOS and
Android. On Android, allocation of a new texture seems to be causing an
assertion (in the driver). On iOS, the call hangs (presumably waiting for
another thread to release textures which we won’t do because those tasks are
blocked behind the current task completing).
To address peak memory usage, the following changes have been made:
* Image decompression and upload/collection no longer happen on the same thread.
All image decompression will now be handled on a workqueue. The number of
worker threads in this workqueue is equal to the number of processors on the
device. These threads have a lower priority that either the UI or Render
threads. These workers are shared between all Flutter applications in the
process.
* Both the images and their codec now report the correct allocation size to Dart
for GC purposes. The Dart VM uses this to pick objects for collection. Earlier
the image allocation was assumed to 32bpp with no mipmapping overhead
reported. Now, the correct image size is reported and the mipmapping overhead
is accounted for. Image codec sizes were not reported to the VM earlier and
now are. Expect “External” VM allocations to be higher than previously
reported and the numbers in Observatory to line up more closely with actual
memory usage (device and host).
* Decoding images to a specific size used to decode to 1:1 before performing a
resize to the correct dimensions before texture upload. This has now been
reworked so that images are first decompressed to a smaller size supported
natively by the codec before final resizing to the requested target size. The
intermediate copy is now smaller and more promptly collected. Resizing also
happens on the workqueue worker.
* The drain interval of the unref queue is now sub-frame-interval. I am hesitant
to remove the delay entirely because I have not been able to instrument the
performance overhead of the same. That is next on my list. But now, multiple
frame intervals worth of textures no longer stick around.
The following issues have been addressed:
* https://github.com/flutter/flutter/issues/34070 Since this was the first usage
of the concurrent message loops, the number of idle wakes were determined to
be too high and this component has been rewritten to be simpler and not use
the existing task runner and MessageLoopImpl interface.
* Image decoding had no tests. The new `ui_unittests` harness has been added
that sets up a GPU test harness on the host using SwiftShader. Tests have been
added for image decompression, upload and resizing.
* The device memory exhaustion in this benchmark has been addressed. That
benchmark is still not viable for inclusion in any harness however because it
creates 9 million codecs in straight-line execution. Because these codecs are
destroyed in the microtask callbacks, these are referenced till those
callbacks are executed. So now, instead of device memory exhaustion, this will
lead to (slower) exhaustion of host memory. This is expected and working as
intended.
This patch only addresses peak memory use and makes collection of unused images
and textures more prompt. It does NOT address memory use by images referenced
strongly by the application or framework.