31 Commits

Author SHA1 Message Date
Michael Goderbauer
486f26fbf9
Rename trace-whitelist to trace-allowlist (#19047) 2020-06-15 20:52:43 -07:00
Jason Simmons
e8c13aa012
Disable event tracing templates in release mode (#18855) 2020-06-09 10:58:05 -07:00
Kaushik Iska
ede658e2d1
[profiling] CPU Profiling support for iOS (#18087)
See flutter.dev/go/engine-cpu-profiling for details
2020-05-07 08:11:07 -07:00
Kaushik Iska
abc72933e7
[pipeline] Add trace event for lag between target and display times (#17384)
This change also adds TimeRecorder which records time at the start
of each frame to capture the latest vsync target display time and
wires it in to the rasterizer to add trace events when there is a lag.
2020-04-02 17:15:45 -07:00
gaaclarke
fddb0c272e
Made it so you can whitelist what events you want to listen to (#17108) 2020-03-16 11:00:03 -07:00
Chris Bracken
37a283765c
Migrate Fuchsia runners to SDK tracing API (#10478)
Migrates the Fuchsia Flutter and Dart runners off the internal tracing
APIs and onto the public SDK.
2019-08-06 11:26:34 -07:00
Nathan Rogers
e81aa5869c
Use Fuchsia trace macros when targeting Fuchsia SDK (#10634)
When |OS_FUCHSIA| is defined (even when |FUCHSIA_SDK| is defined as
well), use the Fuchsia SDK trace macros rather than the Dart timeline.

Reasons for doing this include:

Fuchsia's trace macros support categories.  This allows one to
distinguish between (e.g.) "flutter" and "skia" trace events for trace
recording and trace visualization.

Fuchsia has existing in tree benchmarks that depend on finding certain
events under category "flutter".

See the Fuchsia performance mailing list discussion for more context.
2019-08-05 12:52:36 -07:00
Chinmay Garde
ad582b5089
Rework image & texture management to use concurrent message queues. (#9486)
This patch reworks image decompression and collection in the following ways
because of misbehavior in the described edge cases.

The current flow for realizing a texture on the GPU from a blob of compressed
bytes is to first pass it to the IO thread for image decompression and then
upload to the GPU. The handle to the texture on the GPU is then passed back to
the UI thread so that it can be included in subsequent layer trees for
rendering. The GPU contexts on the Render & IO threads are in the same
sharegroup so the texture ends up being visible to the Render Thread context
during rendering. This works fine and does not block the UI thread. All
references to the image are owned on UI thread by Dart objects. When the final
reference to the image is dropped, the texture cannot be collected on the UI
thread (because it has not GPU context). Instead, it must be passed to either
the GPU or IO threads. The GPU thread is usually in the middle of a frame
workload so we redirect the same to the IO thread for eventual collection. While
texture collections are usually (comparatively) fast, texture decompression and
upload are slow (order of magnitude of frame intervals).

For application that end up creating (by not necessarily using) numerous large
textures in straight-line execution, it could be the case that texture
collection tasks are pending on the IO task runner after all the image
decompressions (and upload) are done. Put simply, the collection of the first
image could be waiting for the decompression and upload of the last image in the
queue.

This is exacerbated by two other hacks added to workaround unrelated issues.
* First, creating a codec with a single image frame immediately kicks of
  decompression and upload of that frame image (even if the frame was never
  request from the codec). This hack was added because we wanted to get rid of
  the compressed image allocation ASAP. The expectation was codecs would only be
  created with the sole purpose of getting the decompressed image bytes.
  However, for applications that only create codecs to get image sizes (but
  never actually decompress the same), we would end up replacing the compressed
  image allocation with a larger allocation (device resident no less) for no
  obvious use. This issue is particularly insidious when you consider that the
  codec is usually asked for the native image size first before the frame is
  requested at a smaller size (usually using a new codec with same data but new
  targetsize). This would cause the creation of a whole extra texture (at 1:1)
  when the caller was trying to “optimize” for memory use by requesting a
  texture of a smaller size.
* Second, all image collections we delayed in by the unref queue by 250ms
  because of observations that the calling thread (the UI thread) was being
  descheduled unnecessarily when a task with a timeout of zero was posted from
  the same (recall that a task has to be posted to the IO thread for the
  collection of that texture). 250ms is multiple frame intervals worth of
  potentially unnecessary textures.

The net result of these issues is that we may end up creating textures when all
that the application needs is to ask it’s codec for details about the same (but
not necessarily access its bytes). Texture collection could also be delayed
behind other jobs to decompress the textures on the IO thread. Also, all texture
collections are delayed for an arbitrary amount of time.

These issues cause applications to be susceptible to OOM situations. These
situations manifest in various ways. Host memory exhaustion causes the usual OOM
issues. Device memory exhaustion seems to manifest in different ways on iOS and
Android. On Android, allocation of a new texture seems to be causing an
assertion (in the driver). On iOS, the call hangs (presumably waiting for
another thread to release textures which we won’t do because those tasks are
blocked behind the current task completing).

To address peak memory usage, the following changes have been made:
* Image decompression and upload/collection no longer happen on the same thread.
  All image decompression will now be handled on a workqueue. The number of
  worker threads in this workqueue is equal to the number of processors on the
  device. These threads have a lower priority that either the UI or Render
  threads. These workers are shared between all Flutter applications in the
  process.
* Both the images and their codec now report the correct allocation size to Dart
  for GC purposes. The Dart VM uses this to pick objects for collection. Earlier
  the image allocation was assumed to 32bpp with no mipmapping overhead
  reported. Now, the correct image size is reported and the mipmapping overhead
  is accounted for. Image codec sizes were not reported to the VM earlier and
  now are. Expect “External” VM allocations to be higher than previously
  reported and the numbers in Observatory to line up more closely with actual
  memory usage (device and host).
* Decoding images to a specific size used to decode to 1:1 before performing a
  resize to the correct dimensions before texture upload. This has now been
  reworked so that images are first decompressed to a smaller size supported
  natively by the codec before final resizing to the requested target size. The
  intermediate copy is now smaller and more promptly collected. Resizing also
  happens on the workqueue worker.
* The drain interval of the unref queue is now sub-frame-interval. I am hesitant
  to remove the delay entirely because I have not been able to instrument the
  performance overhead of the same. That is next on my list. But now, multiple
  frame intervals worth of textures no longer stick around.

The following issues have been addressed:
* https://github.com/flutter/flutter/issues/34070 Since this was the first usage
  of the concurrent message loops, the number of idle wakes were determined to
  be too high and this component has been rewritten to be simpler and not use
  the existing task runner and MessageLoopImpl interface.
* Image decoding had no tests. The new `ui_unittests` harness has been added
  that sets up a GPU test harness on the host using SwiftShader. Tests have been
  added for image decompression, upload and resizing.
* The device memory exhaustion in this benchmark has been addressed. That
  benchmark is still not viable for inclusion in any harness however because it
  creates 9 million codecs in straight-line execution. Because these codecs are
  destroyed in the microtask callbacks, these are referenced till those
  callbacks are executed. So now, instead of device memory exhaustion, this will
  lead to (slower) exhaustion of host memory. This is expected and working as
  intended.

This patch only addresses peak memory use and makes collection of unused images
and textures more prompt. It does NOT address memory use by images referenced
strongly by the application or framework.
2019-07-09 14:59:34 -07:00
Zachary Anderson
0a2e28d797
Revert tracing changes (#9296)
* Revert "[fuchsia] Fix alignment of Fuchsia/non-Fuchsia tracing (#9289)"

This reverts commit f80ac5f571479053b134e60bca77603269b2ce2a.

* Revert "Align fuchsia and non-fuchsia tracing (#9199)"

This reverts commit 78265484623037c6544dfd5380367bca29fa27b0.
2019-06-12 10:25:49 -07:00
Dan Field
f80ac5f571
[fuchsia] Fix alignment of Fuchsia/non-Fuchsia tracing (#9289) 2019-06-11 18:02:15 -07:00
Dan Field
7826548462
Align fuchsia and non-fuchsia tracing (#9199) 2019-06-05 15:14:27 -07:00
Chinmay Garde
f6e6d39860
Wire up Fuchsia SDK related updated for shell dependencies. #8869
This does not actually import the runners into the engine. It only sets up the targets so they need no modifications are necessary when the migration is done. The engine has been verified to build in both buildroots.
2019-05-06 18:01:59 -07:00
Zachary Anderson
ce9ea58694
[fuchsia] Disable FML_TRACE_COUNTER events to unblock roll (#8325) 2019-03-27 10:36:34 -07:00
Zachary Anderson
42d062f95e
[fuchsia] Add missing trace macros (#8297) 2019-03-25 16:03:41 -07:00
Chinmay Garde
ad5b722a72
Simplify the fallback waiter and add traces for vsync scheduling overhead. (#8185) 2019-03-18 15:49:16 -07:00
Chinmay Garde
4b01d795fe
Add frame and target time metadata to vsync events and connect platform vsync events using flows. (#8172)
This will allow us to easily visualize the time the platform informed the engine of a vsync event, its arguments, and when the engine began its UI thread workload using this information.
2019-03-14 16:48:01 -07:00
Jason Simmons
403337ebb8
Do not pass short-lived buffers as labels to Dart_TimelineEvent (#8166)
Dart no longer makes a copy of the label string when recording events.

See https://github.com/flutter/engine/pull/8152
2019-03-14 12:49:42 -07:00
Chinmay Garde
906d684a77
Reland ""Add support for trace counters with variable arguments and instrument the raster cache." (#8145)
This reverts commit bc901324faf5a24f9220cc7ecbcf5b97b96ae09f and fixes the
discovered on Windows builds.
2019-03-13 13:53:22 -07:00
Chinmay Garde
bc901324fa
Revert "Add support for trace counters with variable arguments and instrument the raster cache. (#8094)" (#8122)
This reverts commit 2a0d3542851ae59c2d2f490d1111eeb57b0da388.
2019-03-11 15:09:24 -07:00
Chinmay Garde
2a0d354285
Add support for trace counters with variable arguments and instrument the raster cache. (#8094) 2019-03-11 14:44:43 -07:00
Michael Goderbauer
70a1106b50
Unify copyright lines (#6757) 2018-11-07 12:24:35 -08:00
Chinmay Garde
ef98dcb11f
Add support for counter timeline traces from the engine. (#6315) 2018-09-26 13:26:23 -07:00
Chinmay Garde
57fd394e59
Ensure that objects on stack that close traces have unique variable names. (#6298) 2018-09-22 14:46:29 -07:00
Chinmay Garde
33b412313e
Fix sundry Fuchsia build issues after the tonic/fxl migration. (#5920) 2018-08-01 13:29:45 -07:00
Chinmay Garde
336c23f846
Remove //flutter/glue and use FML directly. (#5862) 2018-07-25 13:20:48 -07:00
Chinmay Garde
4691a0b23e
Import intrusively ref counted shared pointers into FML. (#5062) 2018-04-21 20:50:03 -07:00
Ben Konyi
4d718568b2
Updated fml to build on Windows. (#4415) 2017-12-08 10:40:10 -08:00
George Kulakowski
3aa7522c11 Rename ftl to fxl in Fuchsia specific code (#4090) 2017-09-11 15:58:48 -07:00
Chinmay Garde
61c4898a9b Add support for flow traces in fml/trace_event. (#3903) 2017-07-18 19:00:29 -07:00
Chinmay Garde
1c07ea530f Remove uses of //base from all //flutter projects and replace them with //fml variants. (#3492) 2017-03-22 15:42:51 -07:00
Chinmay Garde
26a6615dad Implementations of platform agnostic portions of FML. (#3487) 2017-03-20 13:41:41 -07:00