Aeron Practice

Design Overview

https://github.com/real-logic/aeron/wiki/Design-Overview


TCP or UDP

TCP

For real-time video conference application, how do you choose between TCP and UDP?

本题的关键在于比较TCP和UDP的特点,并且根据real-time video conference这个特定的应用场景进行选择。前面提到过,TCP的重传机制会增加延迟,所以不适用于当前场景。

其次,视频音频编码本身可以容忍数据出错甚至数据丢失。因此,并不需要采用TCP进行可靠的数据传输。当某一视频帧出现丢包时,可以直接跳过这一帧或者继续播放上一帧。

再次,一旦出现网络堵塞的状况,发送端应该主动丢弃一部分数据。原因是,即使这些视频帧发送到了接收端,也可能已经“过期”了,不会被解码显示。

采用自己设计的UDP更便于实现对数据包的控制。然而,即使使用UDP,也需要实现TCP的某些模块:比如需要flow control和congestion control来判断接收端的播放情况和网络情况,并且也需要反馈机制判断接收端的接收状况。尽管对于当前场景我们不需要ACK每个数据包,但是接收端可以反馈当前收到的最新完整视频帧的序号。这样,如果一旦发生丢包,发送端可以以接收端收到的最新视频帧为基础,压缩后继的视频。

UDP

If you are designing a reliable UDP, what should you do?

通常,所谓的reliable都是指接收端能够将收到的数据情况反馈给发送端。由于我们已经知道一种可靠的传输协议,TCP,故reliable UDP的设计完全可以参考TCP的设计方式,引入ACK,flow control,congestion control等模块。模块的实现可以直接模仿TCP。Reliable UDP的核心在于反馈机制,这里给出几个可能的实现方式。

由于reliable要求在接收端能够恢复数据包的顺序,故发送端每个数据包都需要有sequence number。现在着重讨论反馈机制:

  1. 最朴素的ACK方式:发送端每发送一个数据包,都需要接收端返回ACK,一旦超时,发送端重新发送数据包,直到该数据包被接收端ACK。该方法效率不高,因为之后的所有数据包都被当前数据包block,并且每次返回ACK增加了overhead。
  2. Block/bit map ACK:发送端发送一批数据包,例如32个,编号0~31。接收端发回的ACK中用32bits(4bytes)的bit map表示收到了哪些数据包,发送端再一次性重发所有未被收到的数据包。该方法能够更加充分地利用带宽,在发送端一次性传输更多的数据。但缺点是在发送端接收端都需要更深的buffer,暂存正在传输的所有数据。
  3. ACK last packet:发送端可以在发送最后一个数据包时要求接收端反馈ACK,并重发丢失的数据包。这样做的好处可以减少由ACK造成的data overhead,但需要通过buffer暂存数据。

事实上,可以结合方法2和方法3,在每一批数据包的最后一个置位request ACK flag,要求接收端返回bit map ACK。更进一步地,可以根据丢包率及延迟,估计网络状况,动态地调整bit map的大小:在网络状况好的情况下,用更大的bit map,即同时发送更多数据。否则,减小发送数据量。事实上,这种对于网络状况的自适应也相当于实现了congestion control。


Operating system socket buffers have an impact on some of the settings within Aeron.

  • SO_RCVBUF can impact loss rates when too small for the given processing. If too large, this buffer can increase latency. Values that tend to work well with Aeron are 2MB to 4MB. This setting must be large enough for the MTU of the sender. If not, persistent loss can result. In addition, the receiver window length should be less than or equal to this value to allow plenty of space for burst traffic from a sender.

  • SO_SNDBUF can impact loss rate. Loss can occur on the sender side due to this buffer being too small. This buffer must be large enough to accommodate the MTU as a minimum. In addition, some systems, most notably Windows, need plenty of buffering on the send side to reach adequate throughput rates. If too large, this buffer can increase latency or cause loss. This usually should be less than 2MB.

Linux

As was mentioned above, changing the location of the buffers for Aeron can be a good thing. For Linux, this means that /dev/shm will be the location of the buffers if present.

Linux normally requires some settings of sysctl values. One is net.core.rmem_max to allow larger SO_RCVBUF and net.core.wmem_max to allow larger SO_SNDBUF values to be set.

Windows

Windows tends to use SO_SNDBUF values that are too small. It is recommended to use values more like 1MB or so.

Mac/Darwin

Mac tends to use SO_SNDBUF values that are too small. It is recommended to use larger values, like 16KB.


Media Driver

Media Driver instances sit on a box an send/receive UDP packets over the network, whilst ensuring that the mapped files and cleaned and rotated.

If you’ve got multiple publishers or subscribers sitting in different processes on the same box then probably your best bet is to run separate a instance of the media driver and have it manage all the processes.

If you’re going to just have a single process on a machine with the publishers and subscribers inside then its probably easiest to just keep it embedded within the process. Just my opinion of course

Aeron instances in application, commonly referred to as “clients”, communicate with Media Drivers via a set of buffers.

The location of these buffers is normally in the OS file system. By default, the java.io.tmpdir or /dev/shm/ is used to hold these files.

How to Run Aeron Media Driver

To run the Aeron Media Driver as a foreground process, use the script provided with the driver. The script provides the appropriate configuration for the driver. You can provide your own configuration via environment variables:

  • AERON_DIR(Method:aeronDirectoryName())

The path to the directory where the Aeron Media Driver needs to store its files. On Linux, the directory inside /dev/shm/ is recommended. If you provide your own path, make it the same for the driver and any microservice that operates with this driver.

If it is not specified then the default value provided by the Aeron is used.

  • AERON_SO_BUFFER

The size in bytes of the send and receive socket buffers. The length of the buffer must be a power of two. On Linux, it must not exceed the kernel configuration parameters:

net.core.wmem_max

net.core.rmem_max

The default value is 4194304.

  • ERON_TERM_BUFFER

The size in bytes of the Term (a section of data within a stream) buffer. The length of the buffer must be a power of two and must be the same length on both ends.The default value is 67108864.

  • AERON_MTU
    The length of MTU in bytes.

The default value is 65504.

https://docs.genesys.com/Documentation/EZP/9.0.0/Deploy/AeronMediaDriver

shm on Linux System

在Docker中,/dev/shm 默认大小是64MB,完全不够用,因此需要调整大小。然而,在Kubernetes中是不持支持shm-size参数的,所以只能通过启动脚本来修改容器/dev/shm的大小。

前提条件:–priviledge=true

1
2
3
shm_dir=/dev/shm
umount $shm_dir
mount -t tmpfs -o rw,nosuid,nodev,noexec,relatime,size=<size, e.g. 500M, 1G> shm $shm_dir

running docker command

1
2
3
docker run -d --privileged image:tag
# or
docker run -it --privileged image:tag /bin/bash

https://tw.saowen.com/a/d4e0d2129b3fcaa1597c3860ac1f4e77753e738073b67ae599b8b4b10d0b8ee2

Media Driver on Docker

1
2
3
4
5
6
7
8
FROM java:openjdk-8-alpine
COPY init.sh /init.sh
COPY boot.jar /java/boot.jar

RUN chmod +x /init_shm.sh

ENTRYPOINT ["/init.sh"]
CMD ["1G", "/java/boot.jar"]
1
2
3
4
5
6
7
8
9
10
#!/bin/sh
echo "init shm size: $1"
echo "args: $@"

shm_dir=/dev/shm
umount $shm_dir
mount -t tmpfs -o rw,nosuid,nodev,noexec,relatime,size=$1 shm $shm_dir

## running java process
java -jar $2

Unit Testing Attention

  • 在每个单元测试用例执行前,最好重启并清空MediaServer缓冲区,否则遗留的数据可能会被下一个用例读取到。

MediaServer Code

Shm Memory

Each node will reserve aeron.term.buffer.length 12, that said; its default value is 16m hence occupying 192m per JVM in the cluster per shared driver (1 per physical machine), thats is to say that if you have 10 JVMs spread out in 4 physical machines and each machine with a shared driver, each machine will need 192m 10 JVMs which is 1920m, making each machine require at least 4g+ of RAM.

Why 4g+? is not only 1920m, it is 1920m + some overhead > 2g, that IMHO was too much, hence I tuned down aeron.term.buffer.length to 4m hence making it possible to run 10 JVMs among 4 physical machines (in my case virtual machines) with 2g RAM.

https://github.com/akka/akka/issues/21923

LowLatency

1
2
3
4
5
6
7
8
9
10
val ctx = MediaDriver.Context()
.termBufferSparseFile(false)
.threadingMode(ThreadingMode.DEDICATED)
.conductorIdleStrategy(BusySpinIdleStrategy())
.receiverIdleStrategy(BusySpinIdleStrategy())
.senderIdleStrategy(BusySpinIdleStrategy())
ctx.errorHandler {
it.printStackTrace()
}
MediaDriver.launch(ctx)

JVM参数

  • -XX:+UnlockDiagnosticVMOptions
  • -XX:GuaranteedSafepointInterval=300000
  • -XX:BiasedLockingStartupDelay=0

Aeron

Buffering Considerations

The length of term buffers is controlled by aeron.term.buffer.length and aeron.ipc.term.buffer.length and aeron.term.buffer.max.length properties.

  • aeron.term.buffer.length
  • aeron.term.buffer.max.length

Kuberntes


Configration

How increase FragmentHandler’directBuffer size

1
2
3
val context = MediaDriver.Context()
// default 16777216
context.publicationTermBufferLength(16777216 * 2)

Multiple Destinations

Both Publications and Subscriptions in Aeron can support the concept of multiple simultaneous destinations.

For Publications, this means the outgoing stream is sent to each destination individually.

For subscriptions, this means the incoming stream(s) may be received by a number of individual endpoints.

测试结果表明,在点对点的传输速度可以达到 150 MB/s ~ 160 MB/s,而如果增加了publisher,数据每秒传输速度下降。
单条消息大小 75 Byte,190 W ~ 210 W 条 / sec

ExclusivePublication and Publications

Publications

Publications are thread safe within and across processes. A Publication object can be used concurrently from many threads.

When separate processes add the same Publication (channel and stream id) then they will map to the same underlying memory-mapped file and can be safely used concurrently.

Messages offered to the same publication will be serialised. Publications with a different channel and stream id are completely independent from each other and operate in parallel.

ExclusivePublication


Refer

https://github.com/real-logic/aeron/wiki/Best-Practices-Guide
https://stackoverflow.com/questions/32243664/what-is-the-largest-message-aeron-can-process
https://github.com/real-logic/aeron/wiki/Multiple-Destinations