With much of EFS performance being based on various usage patterns, this document should serve as a starting point. Be aware that tuning EFS will be a requirement after monitoring customer behavior and likely require monitoring for maintaining long-term performance.

See the https://app.tettra.co/teams/rstudio/pages/efs-research-fsbench#header-fav4i-results page for test methods and results.

In the results below, all times are in seconds (lower is better).

Max I/O vs. GeneralPurpose

Summary: Use GeneralPurpose

When creating a filesystem you must choose a performance mode which cannot be altered. We strongly recommend using GeneralPurpose.

AWS recommends:

File systems in the Max I/O mode can scale to higher levels of aggregate throughput and operations per second. This scaling is done with a tradeoff of slightly higher latencies for file metadata operations. Highly parallelized applications and workloads, such as big data analysis, media processing, and genomic analysis, can benefit from this mode.

In our testing maxIO is dramatically worse because of the increased latency - especially in the “many small files” scenario.

task General Purpose MaxIO
Write CSV, 100MB over 1000 files 24.559524 89.26881
Read CSV, 100MB over 1000 files 7.436476 18.80406

Bursting vs. Provisioned Throughput

Summary: Similar performance under normal conditions, but Provisioned lets you pay extra to avoid surprises

The default bursting behavior is likely how we will want customers to start using EFS. The reason behind this is because we have no way of predicting how much throughput they will need. The customers will need to monitor their Burst Credit balance and permitted throughput via CloudWatch to ensure that they are not surprised by throttling if they run out. We highly recommend setting alarms based on these metrics.

Throttling is remedied by either generating more Burst Credits (writing files to the filesystem or waiting for the Burst Credits to refresh) or converting to Provisioned Throughput mode. Large filesystems (> 1TB) should be able to theoretically burst for 50% of the time. For smaller filesystems, Provisioned Throughput can be set to maintain a constant performance level.

Note that generating large files to bump into a larger tier of burst performance is both time consuming and expensive. Weigh these options carefully. Creating > 1TB of data could cost hundreds of dollars just to store the initial data.

If migrating to EFS, Provisioned Throughput can help save time if you wish to move a lot of data. In our tests, moving from Bursting to 500MiB Provisioned improved speed by 5x and preserved Burst Credits.

In most of our testing for Multi AZ EFS, bursting performs better than provisioned. For One Zone, the difference appeared to be minimal.

task Bursting Provisioned
Read CSV, 100MB 0.84550 0.879125
Read 14 days of CRAN logs with fread 51.62412 51.420417

task parallelism Bursting Provisioned
DD write, 10MB over 1000 files 2 12.749688 12.918125
DD write, 10MB over 1000 files 4 34.989000 36.452625
DD write, 10MB over 1000 files 8 69.847063 71.607333
DD write, 10MB over 1000 files 16 142.516063 146.054792
DD read, 10MB over 1000 files 2 5.400938 5.503292
DD read, 10MB over 1000 files 4 5.667313 5.742042
DD read, 10MB over 1000 files 8 8.373500 8.297542
DD read, 10MB over 1000 files 16 15.613875 15.381583

task parallelism Bursting Provisioned
DD write, 1GB 2 18.924750 5.714708
DD write, 1GB 4 39.551812 14.736875
DD write, 1GB 8 79.081250 26.482083
DD write, 1GB 16 158.753500 50.825208
DD read, 1GB 2 5.820938 5.409708
DD read, 1GB 4 13.352562 8.297000
DD read, 1GB 8 26.826000 15.804083
DD read, 1GB 16 53.789563 32.500833

Multi AZ vs One Zone

Summary: One Zone is significantly faster (and cheaper), Multi AZ has higher availability

AWS currently supports 99.99% uptime for Multi AZ and 99.9% for One Zone.

Most of our customers who want fail over will prefer using a Multi AZ filesystem. However, there are major performance gains if they are willing to tolerate using a single availability zone. The One Zone filesystem is still durable, however if that availability zone goes down, there is no failover. This might be a great candidate for fast development environments.

One Zone has performance that might be imperceptible compared to NFS.

Read more about storage classes here.

task One Zone Multi AZ
Write CSV, 100MB over 1000 files 14.865020 40.90557
Read CSV, 100MB over 1000 files 6.773882 10.66935

Instance Types

In general for EFS, AWS recommends preferring instance types with more CPU or memory depending on the workload. Prefer memory-optimized or compute-optimized over general purpose instance types.

For fsbench workloads, we have observed performance gains by using memory-optimized instance types, e.g. r5. For UI-related tasks like “Install BH” this could provide a nicer user experience.

For servers which utilize many NFS client connections (e.g. Launcher) the enhanced networking might prove to be noticeably better. Consider using the “n” variants, e.g. r5n.

task t3.large i3.large i3en.large c5.xlarge c5n.xlarge m5.large m5n.large r5.large r5n.large
Install BH 297.649077 296.10031 296.10031 320.9155 291.9650 305.9505 301.7615 273.391882 261.468
Write CSV, 100MB over 1000 files 22.630077 22.27608 22.27608 23.2695 23.1700 23.7610 22.4840 21.745412 20.028
Read CSV, 100MB over 1000 files 7.743615 7.21200 7.21200 7.8495 6.8405 7.3100 7.7515 6.870118 7.167

task parallelism t3.large i3.large i3en.large c5.xlarge c5n.xlarge m5.large m5n.large r5.large r5n.large
DD read, 1GB 2 5.423615 5.540231 5.540231 6.0005 5.9540 5.8845 5.9995 5.639824 7.1650
DD read, 1GB 4 9.089462 9.245846 9.245846 13.4175 13.3075 13.3950 13.2725 10.268941 15.3465
DD read, 1GB 8 17.563923 17.707692 17.707692 26.8515 26.9025 26.9140 26.8560 19.823353 30.2875
DD read, 1GB 16 35.753846 35.969923 35.969923 53.9250 53.9265 53.9315 53.8575 39.928000 53.5820
DD read, 10MB over 1000 files 2 5.640154 5.163615 5.163615 5.7280 4.6690 5.2370 5.4005 5.073765 5.6865
DD read, 10MB over 1000 files 4 6.145077 5.713846 5.713846 5.5230 4.5725 5.7755 5.9100 5.754588 6.2995
DD read, 10MB over 1000 files 8 10.072461 9.230615 9.230615 6.2870 5.6185 9.2360 9.4405 9.604059 10.4465
DD read, 10MB over 1000 files 16 19.851154 18.062308 18.062308 10.2755 9.4545 18.0845 18.2430 19.030471 20.7270

Instance sizes

We have observed significant gains in going from large to xlarge instance sizes - primarily in parallelized load. For servers with many users, increasing the instance size is recommended. Do not attempt to use smaller instance types e.g. c5.large with 4GB memory.

task t3.large t3.xlarge
Write CSV, 100MB over 1000 files 23.364000 25.19867
Read CSV, 100MB over 1000 files 7.306667 8.63100

task parallelism t3.large t3.xlarge
DD read, 1GB 2 5.300000 5.153333
DD read, 1GB 4 8.446000 7.598333
DD read, 1GB 8 15.882000 15.978333
DD read, 1GB 16 32.297667 32.665667
DD read, 10MB over 1000 files 2 5.335667 6.285000
DD read, 10MB over 1000 files 4 5.897333 5.806333
DD read, 10MB over 1000 files 8 9.516000 6.873000
DD read, 10MB over 1000 files 16 18.621667 11.671333

read_ahead_kb vs. default

Linux kernels (5.4.*) use a read_ahead_kb of 128, however the AWS docs recommend 15000. The efs-utils package will set this correctly, but for customers who wish to use only standard NFS utilities will need to do this manually.

task Without efs-utils With efs-utils
Write CSV, 100MB over 1000 files 20.89300 23.364000
Read CSV, 100MB over 1000 files 7.46175 7.306667

task parallelism Without efs-utils With efs-utils
DD write, 1GB 2 5.95375 5.881333
DD write, 1GB 4 13.58900 18.950000
DD write, 1GB 8 25.71275 27.258000
DD write, 1GB 16 46.19775 50.150667
DD write, 10MB over 1000 files 2 12.41725 12.789000
DD write, 10MB over 1000 files 4 32.88250 40.746667
DD write, 10MB over 1000 files 8 65.42150 71.439000
DD write, 10MB over 1000 files 16 134.31875 146.629667

Mounting considerations

We strongly recommend using efs-utils to mount the EFS filesystem. If this is not feasible, standard NFS client connections are possible, but there are mounting instructions and additional considerations to take into account.

Multiple Users

When using an EFS filesystem for many users, we recommend splitting up the data between users as much as possible. For example, writing large files will block metadata operations in that directory until the write operation is complete. Try to keep the users isolated to separate directories whenever possible.

Special Considerations and Product Limitations

Operations which consume many small files will not perform well in most EFS settings.

We recommend pre-installing R packages so that users do not have to repeatedly install them.

Prefer reading large files over splitting data between many small files.

Project sharing using RSW (in its current state) will not work due to NFS ACLs not being supported by EFS.

For RSW, the default lock type of link-based won’t work, the advisory type must be used instead.

Monitoring usage

Please read about available CloudWatch metrics and creating customized metrics using metric math for EFS.

If using Bursting mode, be sure to monitor the BurstCreditBalancemetric. If this begins to decrease substantially over time, it will be time to consider adding data to bump the filesystem size into a larger tierwith more burst credits, or moving to Provisioned Throughput to establish a consistent baseline.

If using Bursting mode, using metric math, you can compare MeteredIOBytes to PermittedThroughput to know if you are using all of your available throughput. If this is the case, it might be an indication that you should move to Provisioned Throughput.

If using Provisioned Throughput PermittedThroughput can be used to determine whether or not your storage volume has bumped you above your designated throughput setting.