DNSサーバの性能試験

とあるイベントNOCでキャッシュDNSサーバーを構築する機会があったので、チューニングをやってみることにした。

やっていることが実際にはGoogle Public DNSへの負荷試験に近い状態になっていたので方法のみ参考にしてほしい。

サーバー側の環境構築

スペック

  • OS: CentOS Linux release 7.7.1908 (Core)
  • CPU: 4コア
  • メモリ: 4GB
  • ストレージ: 100GB
    • SWAP: 1.6GB
  • ネットワーク: GigabitEthernet *1

Unboundの導入

パッケージでインストールすると古いので本来はビルドしたほうがいい。

yum update
yum install unbound

設定ファイルを作成

/etc/unbound/unbound.conf

## The server clause sets the main parameters.
server:
	verbosity: 1
	statistics-interval: 0
	statistics-cumulative: no
	extended-statistics: yes

	interface: 0.0.0.0
	interface: ::0

	interface-automatic: no
	so-reuseport: yes
	ip-transparent: yes

	# use all CPUs
	# equal to `ls /dev/cpu/ | wc -l`
	num-threads: 4

	# power of 2 close to num-threads
	msg-cache-slabs: 4096
	rrset-cache-slabs: 4096
	infra-cache-slabs: 4096
	key-cache-slabs: 4096

	# more cache memory, rrset=msg*2
	rrset-cache-size: 100m
	msg-cache-size: 50m

	# more outgoing connections
	# depends on number of cores: 1024/cores - 50
	outgoing-range: 206

	# Larger socket buffer.  OS may need config.
	so-rcvbuf: 4m

	# extend tcp connections
	incoming-num-tcp: 1000
	outgoing-num-tcp: 1000

	# IPv4
	access-control: 0.0.0.0/0 refuse
	access-control: 127.0.0.0/8 allow

	# IPv6
	access-control: ::0/0 refuse
	access-control: ::1 allow
	access-control: ::ffff:127.0.0.1 allow

	username: "unbound"
	directory: "/etc/unbound"
	chroot: ""

	# Log identity to report. if empty, defaults to the name of argv[0]
	# (usually "unbound").
	# log-identity: ""

	# print UTC timestamp in ascii to logfile, default is epoch in seconds.
	log-time-ascii: yes

	# print one line with time, IP, name, type, class for every query.
	log-queries: yes

	# print one line per reply, with time, IP, name, type, class, rcode,
	# timetoresolve, fromcache and responsesize.
	# log-replies: no

	harden-glue: yes
	harden-dnssec-stripped: yes
	harden-below-nxdomain: yes
	harden-referral-path: yes

	unwanted-reply-threshold: 10000000

	prefetch: yes
	prefetch-key: yes
	rrset-roundrobin: yes

	# if yes, Unbound doesn't insert authority/additional sections
	# into response messages when those sections are not required.
	minimal-responses: yes
	module-config: "ipsecmod validator iterator"
	trust-anchor-signaling: yes
	trusted-keys-file: /etc/unbound/keys.d/*.key
	auto-trust-anchor-file: "/var/lib/unbound/root.key"

	val-clean-additional: yes
	val-permissive-mode: no
	val-log-level: 1

	include: /etc/unbound/local.d/*.conf

	ipsecmod-enabled: no
	ipsecmod-hook: "/usr/libexec/ipsec/_unbound-hook"

## Remote control config section.
remote-control:
	control-enable: no

## Stub and Forward zones
include: /etc/unbound/conf.d/*.conf

## Forward zones
forward-zone:
	name: "."
	forward-addr: 8.8.8.8

ファイルディスクリプタの確認

参考: https://tweeeety.hateblo.jp/entry/20131220/1387508776

## cat /proc/sys/fs/file-max
381622
##
## cat /proc/sys/fs/file-nr
1248	0	381622
## ps aux | grep unb
unbound  13949  0.2  8.2 1066060 321520 ?      Ssl  22:04   0:00 /usr/sbin/unbound -d
root     13958  0.0  0.0 112728   968 pts/0    S+   22:07   0:00 grep --color=auto unb
##
## ps aux | grep unb
unbound  13949  0.2  8.2 1066060 321520 ?      Ssl  22:04   0:00 /usr/sbin/unbound -d
root     13958  0.0  0.0 112728   968 pts/0    S+   22:07   0:00 grep --color=auto unb
##
## cat /proc/13949/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             15065                15065                processes
Max open files            12886                12886                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       15065                15065                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

クライアント側の環境構築

スペック

  • OS: CentOS Linux release 7.7.1908 (Core)
  • CPU: 4コア
  • メモリ: 16GB
  • ストレージ: 100GB
    • SWAP: 7.8GB
  • ネットワーク: GigabitEthernet *1

負荷試験ツールの導入

DNSサーバへの負荷試験ツールdnsperfを使用した。

https://www.dns-oarc.net/tools/dnsperf

アップデート

yum install -y bind-devel krb5-devel openssl-devel libcap-devel libxml2-devel json-c-devel GeoIP-devel

ビルド

git clone https://github.com/DNS-OARC/dnsperf.git
cd dnsperf
./autogen.sh
./configure
make
make install

クエリ一覧の作成

以下から頻出ドメインの一覧を取得した。

https://github.com/opendns/public-domain-lists

git clone https://github.com/opendns/public-domain-lists
cd public-domain-lists/

## データセットを修正
sed -i 's/$/ A/g' opendns-top-domains.txt
sed -i 's/$/ A/g' opendns-random-domains.txt

負荷試験の実施

1回目

$ dnsperf -s 10.1.1.21 -S 1 -d public-domain-lists/opendns-top-domains.txt

Statistics:

  Queries sent:         10000
  Queries completed:    9988 (99.88%)
  Queries lost:         12 (0.12%)

  Response codes:       NOERROR 9107 (91.18%), SERVFAIL 158 (1.58%), NXDOMAIN 723 (7.24%)
  Average packet size:  request 30, response 65
  Run time (s):         34.363365
  Queries per second:   290.658380

  Average Latency (s):  0.299928 (min 0.001580, max 4.893059)
  Latency StdDev (s):   0.487916

キャッシュがないため 290 qps だった。

2回目

$ dnsperf -s 10.1.1.21 -S 1 -d public-domain-lists/opendns-top-domains.txt

Statistics:

  Queries sent:         10000
  Queries completed:    9968 (99.68%)
  Queries lost:         32 (0.32%)

  Response codes:       NOERROR 9121 (91.50%), SERVFAIL 113 (1.13%), NXDOMAIN 734 (7.36%)
  Average packet size:  request 30, response 65
  Run time (s):         8.277772
  Queries per second:   1204.188760

  Average Latency (s):  0.039892 (min 0.000070, max 4.847847)
  Latency StdDev (s):   0.269064

キャッシュの効果により 1204 qps だった。キャッシュを削除するにはunbound-control reset cache や再起動をする。

3回目

キャッシュを削除して実行した。

$ dnsperf -s 10.1.1.21 -q 200 -d public-domain-lists/opendns-top-domains.txt

Statistics:

  Queries sent:         10000
  Queries completed:    9991 (99.91%)
  Queries lost:         9 (0.09%)

  Response codes:       NOERROR 9116 (91.24%), SERVFAIL 144 (1.44%), NXDOMAIN 731 (7.32%)
  Average packet size:  request 30, response 65
  Run time (s):         15.859268
  Queries per second:   629.978635

  Average Latency (s):  0.230750 (min 0.000303, max 4.774134)
  Latency StdDev (s):   0.379951

CPUとメモリリソースに余裕がみられたため、デフォルトが100のオプション -q200 に変えて実行した。その結果 629 qps が得られた。

4回目

$ dnsperf -s 10.1.1.21 -q 200 -d public-domain-lists/opendns-top-domains.txt

Statistics:

  Queries sent:         10000
  Queries completed:    9957 (99.57%)
  Queries lost:         43 (0.43%)

  Response codes:       NOERROR 9120 (91.59%), SERVFAIL 103 (1.03%), NXDOMAIN 734 (7.37%)
  Average packet size:  request 30, response 65
  Run time (s):         6.805150
  Queries per second:   1463.156580

  Average Latency (s):  0.047700 (min 0.000070, max 4.923105)
  Latency StdDev (s):   0.261557

キャッシュの効果で 1463 qps だった。

5回目

キャッシュを削除して実行した。

$ dnsperf -s 10.1.1.21 -q 300 -d public-domain-lists/opendns-top-domains.txt

Statistics:

  Queries sent:         10000
  Queries completed:    9958 (99.58%)
  Queries lost:         42 (0.42%)

  Response codes:       NOERROR 9102 (91.40%), SERVFAIL 137 (1.38%), NXDOMAIN 719 (7.22%)
  Average packet size:  request 30, response 65
  Run time (s):         17.690169
  Queries per second:   562.911524

  Average Latency (s):  0.406831 (min 0.004011, max 4.923452)
  Latency StdDev (s):   0.575371

562 qpsq=200 との違いがみられなかった。

6回目

$ dnsperf -s 10.1.1.21 -q 300 -d public-domain-lists/opendns-top-domains.txt

Statistics:

  Queries sent:         10000
  Queries completed:    9958 (99.58%)
  Queries lost:         42 (0.42%)

  Response codes:       NOERROR 9120 (91.58%), SERVFAIL 104 (1.04%), NXDOMAIN 734 (7.37%)
  Average packet size:  request 30, response 65
  Run time (s):         5.823530
  Queries per second:   1709.959423

  Average Latency (s):  0.048118 (min 0.000083, max 4.958302)
  Latency StdDev (s):   0.261235

キャッシュの効果で 1709 qps だった。

関連記事

SCSKの方の資料が参考になった。