Investigating Unity iOS executable bloat

We’ve been busy getting Resynth ready for submission, and part of that process has involved doing a lot of optimisation for both runtime performance and binary size. We made a lot of fixes to improve performance on low end iOS devices, and crunched down textures and sounds where we could to reduce the installed app size. The last thing I want to do is to look at the iOS executable binary size.

According to the Unity build logs, our data size breakdown looks like this:

Textures      26.1 mb    71.1%
Meshes        0.0 kb     0.0%
Animations    156.0 kb   0.4%
Sounds        1.6 mb     4.3%
Shaders       47.7 kb    0.1%
Other Assets  3.4 mb     9.2%
Levels        324.1 kb   0.9%
Scripts       1.1 mb     3.1%
Included DLLs 3.9 mb     10.7%
File headers  77.1 kb    0.2%
Complete size 36.8 mb    100.0%

We’ve got a bunch of textures that we want looking crisp on high resolution devices so there’s not much more we can do here. It’s pretty small compared to a lot of games anyway.

The iTunes Connect file size estimate for iPhone 6S is 61.3 Mb. That’s the full install size after the IPA file is uncompressed on the device. If we subtract the total data size Unity gave us, that leaves 24.5 Mb for the code and any other miscellaneous files in the IPA like icons and launch screens.

Looking again at the Unity breakdown, we see that scripts seem to contribute only 1.1 Mb. That’s a far cry from our estimate of 24.5 Mb! Scripts are compiled to native code, of course, and the iTunes estimate is the full binary size including the Unity engine code, launch screens, and icons. Let’s see if we can get a more detailed breakdown and work out what’s going on and maybe even reduce the iOS executable binary size.

Examining the IPA

To investigate this, let’s start with the build that we uploaded to iTunes Connect. The IPA file is really just a zip file with its contents arranged in a specific way. If we unzip the IPA and take a look in the Payload/resynth.app directory, we see this:

-rw-r--r--   1 buildbot  staff       2371 Jun  9 14:14 AppIcon57x57.png
                               ..etc
-rw-r--r--   1 buildbot  staff      13313 Jun  9 14:14 AppIcon83.5x83.5@2x~ipad.png
drwxr-xr-x  32 buildbot  staff       1088 Jun  9 14:14 Data
-rw-r--r--   1 buildbot  staff       2763 Jun  9 14:15 Info.plist
-rw-r--r--   1 buildbot  staff       4617 Jun  9 14:14 LaunchImage-568h@2x.png
                               ..etc
-rw-r--r--   1 buildbot  staff       3935 Jun  9 14:14 LaunchImage@2x.png
-rw-r--r--   1 buildbot  staff       2053 Jun  9 14:15 LaunchScreen-iPad.nib
-rw-r--r--   1 buildbot  staff     221960 Jun  9 14:14 LaunchScreen-iPad.png
-rw-r--r--   1 buildbot  staff       4357 Jun  9 14:15 LaunchScreen-iPhone.nib
-rw-r--r--   1 buildbot  staff     133646 Jun  9 14:14 LaunchScreen-iPhoneLandscape.png
-rw-r--r--   1 buildbot  staff     133646 Jun  9 14:14 LaunchScreen-iPhonePortrait.png
-rw-r--r--   1 buildbot  staff          8 Jun  9 14:15 PkgInfo
drwxr-xr-x   3 buildbot  staff        102 Jun  9 14:20 _CodeSignature
-rw-------   1 buildbot  staff        667 Jun  9 14:20 archived-expanded-entitlements.xcent
drwxr-xr-x   4 buildbot  staff        136 Jun  9 14:14 de.lproj
-rw-------   1 buildbot  staff       8247 Jun  9 14:21 embedded.mobileprovision
drwxr-xr-x   4 buildbot  staff        136 Jun  9 14:14 en.lproj
drwxr-xr-x   4 buildbot  staff        136 Jun  9 14:14 es.lproj
drwxr-xr-x   4 buildbot  staff        136 Jun  9 14:14 fr.lproj
drwxr-xr-x   4 buildbot  staff        136 Jun  9 14:14 it.lproj
drwxr-xr-x   4 buildbot  staff        136 Jun  9 14:14 ja.lproj
drwxr-xr-x   4 buildbot  staff        136 Jun  9 14:14 ko.lproj
-rwx------   1 buildbot  staff  271362480 Jun  9 14:21 resynth
drwxr-xr-x   4 buildbot  staff        136 Jun  9 14:14 zh.lproj

There are multiple variations of the app icon and launch screens, so I’ve removed those lines for brevity.

The executable binary itself is highlighted above, and it clocks in at 271 Mb in size! That doesn’t seem right!

Well actually, there are two reasons this binary is so big. Firstly, I built a universal app, which means the binary includes two versions of the executable, one for armv7 (32-bit) CPUs and one for arm64 (64-bit) CPUs. Secondly, I enabled bitcode, which increases the binary size significantly. Bitcode is an intermediate representation of a program though; it’s not the final, machine-readable binary version. Apple’s servers recompile this bitcode into an armv7 or arm64 binary depending on the device of the user, so you don’t have to worry: your users aren’t going to get a massive binary file!

Non-bitcode builds

We can get more useful information from a non-bitcode build. This will give us a better approximation of the final binary size on a user’s device. Here is the binary from a universal build with bitcode disabled:

-rwx------   1 buildbot  staff  38247008 Jun 15 15:19 resynth

38 Mb, much more reasonable! We can use Apple’s otool program to examine this in a bit more detail. At a shell prompt we can run:

$ otool -fv resynth

This will show us the headers for the binary:

Fat headers
fat_magic FAT_MAGIC
nfat_arch 2
architecture armv7
    cputype CPU_TYPE_ARM
    cpusubtype CPU_SUBTYPE_ARM_V7
    capabilities 0x0
    offset 16384
    size 17976384
    align 2^14 (16384)
architecture arm64
    cputype CPU_TYPE_ARM64
    cpusubtype CPU_SUBTYPE_ARM64_ALL
    capabilities 0x0
    offset 18006016
    size 20240992
    align 2^14 (16384)

Here we see that for the arm64 architecture, the size is around 20 Mb. This is much closer to our iPhone 6S estimate of 24 Mb. Note that the header gives us the full size which includes code as well as constant, static, and global data in the binary.

Remember that this size also includes the Unity engine itself. Let’s see if we can figure out what code is actually included in our binary, and if Unity is doing a good job of only including the code that we use.

Unity compilation refresher

Before we go on, let’s talk about how Unity generates your iOS binary.

When you build your project, Unity takes all your non-editor C# scripts and compiles them with the Mono compiler. This process generates several DLLs. Game scripts will end up in two DLLs called Assembly-CSharp.dll and Assembly-CSharp-firstpass.dll.

These DLLs are actually not in any machine-specific binary format; they are in an intermediate language (IL), which is a bytecode that is interpreted by the Mono (or .NET) runtime and turned into platform-specific binary code dynamically, while the program is running. This process is called just-in-time (JIT) compiling.

Unfortunately Apple doesn’t allow binary code to be dynamically generated and executed, so for iOS Unity must convert the bytecode into a platform-specific, machine-readable binary format ahead-of-time (AOT), that is, at build time. Before IL2CPP (Intermediate Language 2 CPP) existed, Unity used Mono’s AOT compiler to do this. IL2CPP sidesteps that process and instead turns the IL bytecode into C++ code which can be compiled with any C++ compiler.

This makes it much easier for the Unity engine to be ported to new platforms. Unity no longer has to add support for the platform to Mono’s AOT and JIT compilers; instead they just rely on the platform’s native C++ compiler. This also results in better performance.

For iOS, Unity writes out an Xcode project that includes all the generated C++ files. This is then compiled and linked and the result is your iOS binary executable. The Xcode compiler will remove any code that is not referenced and optimise the remaining code.

Digging in to symbols

The symbol files contain the names of all symbols (functions, globals variables, classes, etc) in the app and where they are located in memory. This is useful because the binary that Apple distributes to users is stripped of most symbol names, so if we get a crash report from a user or from Apple, it will contain a bunch of meaningless memory addresses. We need to convert the memory addresses to human readable names to get anything useful out of the crash info, and we can do this using the symbol files (see here for how to do this).

In iTunes Connect we can download the symbols (DSYMs) for the IPA that we uploaded. If our app is a universal app, when we download and extract the symbols we should find two directories with random GUIDs that correspond to the two architectures, armv7 and arm64 (if it’s not a universal app there will be only one directory). Drilling all the way down into the directories we should find a single binary. We can determine what architecture the binary is for using the file command:

$ file resynth

This will output:

resynth: Mach-O 64-bit dSYM companion file

This tells us that this binary is the symbol file for the arm64 version of the app. It’s a iOS Mach-O binary but it doesn’t contain any executable code, only symbol information.

We can also get the symbols after building in Xcode. In your Xcode DerivedData directory for your game, find the Release-iphoneos directory. There should be a directory in there named with a name like resynth.app.dSYM. The full path will be something like:

~/Library/Developer/Xcode/DerivedData/Unity-iPhone/Build/Products/Release-iphoneos/resynth.app.dSYM/Contents/Resources/DWARF

To view all symbols in the binary we can use the nm command:

$ nm -U -arch=arm64 resynth | less

This will pipe the output through the less program, which will let you scroll through the list using the arrow, home, end, page up, and page down keys (or ‘j’ and ‘k’ if your terminal is not configured correctly!).

Here’s a part of the output for Resynth:

000000010006f0c8 t _PackButton_get_IsComingSoon_m833010487
000000010006f0b8 t _PackButton_get_IsUnlocked_m755995454
000000010006f0a8 t _PackButton_get_Pack_m3924171162
000000010006f0b0 t _PackButton_set_Pack_m3180976721
00000001000720f0 t _PackManager_Awake_m1581176858
0000000100071998 t _PackManager_CanPlayerBuyLevelPack_m1143483345
00000001000719a0 t _PackManager_CanPlayerBuyPack_m3419497270
0000000100071aa8 t _PackManager_CanPlayerBuyThemePack_m3160178262
0000000100073834 t _PackManager_CreatePackButton_m3207230420

This is showing some of the symbols for Resynth’s DLC pack management code. Although IL2CPP has generated native code with mangled names, we can still make out the C# classes and methods that correspond to the above code (PackButton and PackManager).

These are things that we’d expect to see in here, because this is code that is used!

Just for sanity’s sake, let’s look for some other things that probably shouldn’t be in here:

$ nm -U -arch=arm64 resynth | grep UnityWebRequest | wc -l
     367

Resynth doesn’t use UnityWebRequest, so it’s strange that it’s included here.

Let’s try physics colliders, which Resynth also does not use:

$ nm -U -arch=arm64 resynth | grep Collider | wc -l
     203

Okay, also a bit strange. It seems that Unity isn’t stripping out some unused components. Additionally, even a lot of core .NET functionality is still being included: things like ArrayList, Hashtable, TLS code, and FTP code are still present, and the game definitely doesn’t use these.

Unused code

Before going any further, we should make sure that we have removed all unnecessary non-editor scripts from the project. Unfortunately Unity will compile and link all non-editor scripts even if no GameObjects reference them. This means that if, for example, you imported a plugin from the asset store and it had some example code, that code might end up in your final build on the app store!

It’s a shame that Unity has no way to disable these files or exclude them from being built. For now the only way seems to be to either delete them entirely or move them into an “editor” folder.

Method tables

One of the auto-generated C++ files that Unity writes out is a file called Il2CppMethodPointerTable.cpp. At the bottom of this file is a big array of pointers to methods called the “method pointer table”. We can perform our own “stripping” by replacing entries in this table with a NULL or 0. This will cause the linker to avoid linking the code, and reduce the size of the final binary!

We have to be very careful though: if we remove methods that are used, the game will crash, so we have to test carefully. It’s not obvious that some methods are required. For example, I wanted to remove all ArrayList and Hashtable methods since Resynth only uses the generic collections, but it turns out that these are used deep in the Mono base class libraries during application startup (fortunately I found this out quickly!).

We can do this stripping process semi-automatically in a build post-process step with some code like this:

Final thoughts

In the above gist I’ve only stripped a handful of methods, but when I was testing in Resynth, I removed quite a few more. Unfortunately I was only able to reduce the binary by around 630k. There is also a lot of corresponding metadata for types and methods which is still included. Removing this by parsing the generated C++ files would be very messy and difficult.

Unfortunately, stripping code manually from the method table doesn’t seem worth it. It’s a time-consuming process, plus it has the added potential of crashing your game if you aren’t careful! Although it’s obvious that some methods are not used, for others, you have to play the entire game and ensure that all code paths are tested just in case you strip out something that is used. It’s not worth it for the small gain.

8 thoughts on “Investigating Unity iOS executable bloat

  1. Hi Sam,

    I’m Jonas from Unity, and (among other things) I’m doing some work on Unity’s modularization and code stripping systems, with the goal to have Unity get better at removing unused code.

    Assuming that “Strip Engine Code” is checked in Player settings, Unity should be able to strip out unused subsystems (like eg Physics, WebRequest) on iOS and WebGL. However, the problem is that the stripping system works pretty much as a black box today, and you can’t really easily figure out what gets stripped and why. For instance a single Collider in a scene or a single Physics.Raycast call is enough to drag in the whole 3d physics subsystem.

    But we are building tooling to help you with that. One upcoming feature in Unity (no ETA, sorry) is the Build Report, which will provide you with a lot of information on your build, including what files got included, and which native subsystems were included in the build and why. While this feature is WIP, the backend already exists in Unity 5.4+ and you can preview the data here: http://files.unity3d.com/build-report – try making an iOS or WebGL build and check the stripping tab. For WebGL builds, it can also calculate the binary code size of each module (disclaimer: this is all WIP, and may no work at all for you, but many people have found it useful already in it’s current state).

    The Build Report will help you profile and optimize your build size, but another thing we are working on (again, no ETA, sorry) is let you configure the Unity editor to disable subsystems completely, disallowing you to accidentally add stuff you don’t need in the first place.

    I hope these tools will help make this easier in the future.

    jonas

  2. Hi Jonas,

    Thanks very much for your comments! I actually realised just recently (after I had already written this post – I should update it) that UnityWebRequest is being included because I’m using Facebook analytics. But I can’t for the life of me find where any physics is being used.

    It’s also a shame that so many versions of different container types are being included, like ArrayList vs generic List. But this seems unavoidable, and I consider it just as a cost of using Mono and Unity.

    It would be really great if it were easier to exclude scripts from being built as well (e.g. demo/example scripts).

    It’s great to hear about the WIP tools though. I’ll check out that build report tool too!

    Thanks again!
    Sam

  3. >But I can’t for the life of me find where any physics is being used.

    Ideally, the Build Report tool should help you find out (though if it is because it is used by scripts, it will currently not give you any details on where exactly – might add more info on that in the future).

    >It would be really great if it were easier to exclude scripts from being built as well (e.g. demo/example scripts).

    I looked at automatically excluding user scripts not referenced anywhere – but found this to be causing too much trouble with breaking projects because people access their own code through reflection, etc. Also, typically the user code itself is not very big compare to the total binary size. But user code might of course drag in .NET APIs which cause unnecessary code bloat.

    1. > But user code might of course drag in .NET APIs which cause unnecessary code bloat.

      That’s the main case I was thinking about. I had some unused example code from some asset store package that was dragging in some .NET generic classes that I otherwise didn’t use, among other things. If there were properties associated with a folder or a script where you could flag it as “not included for build” or something that would be really helpful (the way you can include/exclude DLL plugins for different platforms).

  4. Hi Sam,
    I’m Mantas from Unity Mobile team. I would like to add couple of comments in addition to what Jonas already mentioned:
    – empy Unity app currently has 9-10 MB executable size (for single arch), yours being twice as big probably means that many of subsystems are touched either from scripting code or data stored in the scenes. Third party plugins contributing noticeably is also an option (most of them are linked statically into main application).
    – Il2CppMethodPointerTable.cpp is mostly responsible for managed code, usually it’s smaller part of the engine. Native component stripping is controlled via Classes/Native/UnityClassRegistration.cpp and Classes/Native/UnityICallRegistration.cpp files. Look for RegisterAllClasses() and RegisterAllStrippedInternalCalls() functions. You might use them to experiment on how much of the code would be stripped, but note that if these components are registered there is great chance that they will called and app might crash.
    – regarding excluding specific scripts from build. You can simulate that with scripting defines. For example surround your rarely used scripting code with guard like #if !DISABLE_UNIMPORTANT_SCRIPTS #endif and then put DISABLE_UNIMPORTANT_SCRIPTS into “Scripting Define Symbols” in Player Settings inspector.
    – I would like also note that footprint of UnityWebRequest should be quite small, as on iOS it shares native implementation with WWW class, also most of the functions are lightweight wrappers around iOS NSURLConnection API calls.
    – ArrayList and Hashtable and many other container classes are quite common building block for many standard .NET libraries, it’s design limitation not unique to the Mono, but similar in whole .NET ecosystem. We are actively exploring how to make .NET profile that would easily strippable and suitable for Mobiles.

    Hopefully it helps you improve build size.
    Mantas

  5. Thanks Mantas! With the physics example, I’m definitely not including any colliders or physics code anywhere in the game (I’ve searched through the entire project – assets/metafiles/scripts – with both Visual Studio and Sublime Text). I actually reported a bug in June, it’s case number 921427, which IIRC includes the full project. But I’ll see if the build report tool helps!

  6. I got a chance to look at the bug report in question now – since this was publicly discussed on this blog before, I’ll post my reply here:
    —————————————————————–

    I can probably explain at least some of the dependencies you are seeing.

    >I see RakNet references in there, as well as things like SmallXmlParser, BigInteger, Hashtable, UnityWebRequest, NetworkPlayer, Physics, Physics2D – none of these are used by the game.

    -RakNet, NetworkPlayer: these are part of our old networking system. This was never modularized to be strippable, as it is on schedule to be removed from Unity as it is outdated, and superseded by UNET.

    -UnityWebRequest: Used internally by Unity Analytics and HW Stats (Build Report can show this). Turn those off to get rid of the dependency.

    -Physics, Physics2D: There was a dependency from the new UI system to Physics and Physics2D to avoid UI interactions when the UI is occluded by colliders. This has been fixed now (so it became a soft dependency), in 5.6, IIRC. But I also see Physics being used because of a dependency from VR to Physics. Are you using VR code in your game?

    -SmallXmlParser, BigInteger, Hashtable: these managed types are used by internal dependencies in mscorlib (.NET APIs), which are not really possible to break down. SmallXmlParser and BigInteger are used by crypto code, which mscorlib references in some places we cannot break down without building our own .NET profile (which we may do at some point). Hashtable is used in many places, also by UnityEngine APIs.

    Hope this helps explain, not sure it will help you solve it all – though you should be able to get rid of Physics and Physics2D, which are also the biggest potential gains of any of the items you mentioned.

Leave a Reply

Your email address will not be published. Required fields are marked *