Isto irá apagar a página "TURN-Performance-and-Load-Balance". Por favor, certifique-se.
This TURN server uses a mixed operation model similar to one used by node.js and nginx: an asynchronous IO is combined with the threading model to utilize multi-CPUs hardware. The number of threads is limited (close to number of CPUs) and the resource consumption is optimal. Many client connections can be handled concurrently. The TURN Server tells the operating system, through epoll (Linux), kqueue (BSD and Mac OS X), event ports (Solaris), poll (Cygwin) or select, that it should be notified when a new event arises, and then it goes to sleep. When a new packet arrives, or when a new timer event happens, then the TURN Server wakes up and it executes the callback. Each per-event operation is very short so the system has a near-real-time reaction time (as it is necessary for efficient media traffic handling).
Of course, in the high load scenario when many TURN sessions are involved, the turnserver process has virtually no sleep time, and it operates in a semi-pull mode.
For the relatively long-term actions (database interaction, CLI handling) the TURN Server uses separate threads, so the traffic-handling processing is not affected by the longer operations.
The system is implemented in C language, for efficiency and for portability. For IO and timer events multiplexing, the libevent2 library is used.
The TURN Server also incorporates some optimizations specific to a network protocol server, for example some elements of "pull" processing are included (as opposed to the usual "push" approach).
Each TURN session (allocation) consumes relatively few resources:
Of course TURN sessions still require resources and they still consume CPU power and network throughput. Each system, no matter how large it is, still can handle just a limited number of TURN allocations (thousands or tens of thousands of usual traffic streams). This document is about maximizing that number.
Media group:
Network topology and protocols group:
Hardware, OS and software settings group:
The fastest database option is SQLite - because there is no inter-process or network communications involved, the database management library is linked directly into the TURN server process. But you may need a more robust database management system (like PostgreSQL, MySQL, Redis, or MongoDB) if:
In a moderate sized setup, SQLite is the simplest and the fastest option. Also, SQLite has an extra security benefit - it cannot be accessed outside of the TURN server box.
The TURN Server performance very significantly depends on how efficiently the operational system handles the TCP/IP stack operations. Performance optimization may be achieved by combination of several approaches:
If you have nice expensive NICs with optimal drivers, then you can turn ON the RSS feature, and that will help to distribute the network load among multiple CPUs. The TURN server is inherently multi-threaded application that was designed to utilize multiple CPUs. But the network packet receiving kernel process may become a bottleneck. RSS is a way to overcome that bottleneck.
See also the next section.
If your network card does not support RSS, then your can use an OS that emulates that functionality in software, for example recent versions of BSD and Linux. Read the documentation on your OS to find out how to turn ON RSS and RPS. General description what RSS and RPS are can be found here:
Scaling in the Linux Networking Stack
This is the best and most efficient performance tuning approach. It allows near-linear performance improvement - proportional to the number of available systems; virtually unlimited scalability can be achieved.
You have three options here:
This section is mostly for Linux systems with pre-2.6.32-431 kernels, or for kernels from 3.0 to 3.8. Examples are CentOS 6.4 and Debian Wheezy. The Cygwin users can apply the same technique, too.
This section is not applicable to:
If you are using an older Linux system system, great performance is still achievable - read on.
As it was said before, the TURN Server performance mostly depends on how efficiently the operational system handles the TCP/IP stack operations. Usually, the TCP part of the stack is properly optimized, but the UDP handling is often suboptimal - so far, a typical UDP stack implementation is not very well tuned for the "persistent" UDP sessions like ones used in TURN. For example, by default the Linux kernel hashes all UDP sockets in just 128 buckets; if you have thousands UDP sessions then you have lots of UDP sockets which are handled inefficiently.
In other words, the implementation of UDP in the Linux kernel makes use a hash table to store socket structures. In older 2.6.32 kernels (default in CentOS 6.0-6.4, for example) this hash table is hardcoded to have 128 entries. So with a large number of sockets, the performance of the table degrades to a linked list which must be traversed for each incoming packet.
If you have a Linux kernel before 2.6.33 then you can change the hardcoded hash size in the kernel code and re-compile the kernel. If you have a more recent kernel then you can do that without kernel recompilation. Kernel 2.6.33 introduced a configurable UDP hash table size, and a second UDP hash table, keyed by IP+port (previous to this it was only by port). You can configure the hash table size by setting the "uhash_entries" boot-time kernel variable (for example, in /etc/grub.conf). For best performance, set it to 65536.
Other OSs have similar issues. Check documentation to your OS for configuration instructions.
The TURN Server is designed as a multi-threaded network packets routing system. But multi-threading is not always the best option for a particular system configuration. The default start-up TURN Server configuration is a compromise between memory, thread affinity, cache and CPU resources, and between TCP and UDP typical "vanilla" OS networking stack implementations. This is not a specialized system tuned for a very particular hardware, a particular OS or a particular load pattern. The TURN Server design is tuned for very wide range of possible applications, so it may not be 100% most optimal for a particular application on the particular platform. But it must be good enough to be usable everywhere.
The TURN server parameter -m allows tuning of the threading configuration. You can turn off the multi-threading by using "-m 0" parameter (or "--relay-threads=0"). It will keep all network packets processing within one thread, eliminating context switch etc - it may be a more efficient option for your system. There still will be separate authentication threads because the authentication process must not be holding the normal packets routing.
You can use "-m 0" option and run multiple TURN servers on your system, one per CPU core. That would be probably the best performance option in terms of scalability. Each TURN server must have its own network listening address and its own relay IP and/or relay ports range - so the configuration will be complicated - but the performance will be the best possible. You can use ALTERNATE-SERVER mechanism to present the whole "pack" as a single "initial" TURN server front-end to the external world.
For a large-scale TURN server, efficient handling of the multiple sockets may become the most serious problem. The TURN server uses libevent2 tool for that purpose. Libevent2 utilizes the most efficient events multiplexing facility available on the current platform:
Among those facilities, kqueue is probably the most advanced:
Scalable Event Multiplexing: epoll vs. kqueue
So with a huge number of sockets, a BSD system may be an attractive option, especially when TCP protocol performance is critical. For example, the DragonFlyBSD is a performance-oriented BSD variant. FreeBSD is a good choice, too.
When the client traffic is mostly over UDP (or DTLS), then the most recent Linux kernel versions (with Google networking patch) may have an advantage. The recent Linux kernels allow UDP processing in multi-threaded environment over just a few "front-end" sockets (all sessions belonging to the same thread are sharing the same UDP socket). The less number of sockets, the more efficient is the traffic processing by the event multiplexing library. So the systems with the Google kernel patch have an edge (CentOS 6.5, ArchLinux and Fedora, at the time of the writing).
Our combined tests (which include all types of various loads) usually show that FreeBSD 9.x and DragonflyBSD are the best performers as a platform for the TURN Server. It is very difficult to analyze why. But, of course, your mileage may vary.
Isto irá apagar a página "TURN-Performance-and-Load-Balance". Por favor, certifique-se.