BXadmin:AFS performance testing
From CCGB
Conversation with Matt Benjamin detailing testing work to be done for cache-bypass on 1.5.x:
(12:58:08) matt: Well, then 1.5.x is your branch. I'd be interested in getting comparative results first with the unmodified branch, on memcache with appropriate chunksize (e.g., 18?), with and without cache bypass enabled. You can use one cm binary, built with --enable-cache-bypass, setting a low fs bypasthreshold to enable bypassing. Read vs. mixed read-write workloads are interesting--the latter should have a significant penalty, should ideally, workload should be read-heavy. Then, repeating with cache-bypass refcounting patch in gerrit. (12:58:30) matt: When we have a connection pooling patch worth running, repeating with thiat. (12:58:32) matt: that (12:59:31) matt: The refcounting patch should have no measurable effect on performance, starting from 1.5.x + refcounting would probably be reasonable. (13:00:34) phalenor: do you care about disk cache performance? (13:02:14) phalenor: and when you say with and without cache bypass, do you mean with and without --enable-cache-bypass at build time? (13:02:59) matt: with and without: no, with fs bypassthresh default (0?, -1? don't recall...) or vs. with some small value (which enables bypassing) (13:05:02) phalenor: okay. so the only stumbling block I see then is I don't have any machines handy with new enough autoconf, etc to run regen.sh, though that could be rectified I suppose (13:05:52) matt: But as regards disk. You get massive "improvement" but relative to memcache, the results are unrealistically scaled. But it should work regardless, and reduces disk workload, obviously, though it's now background work, since Simon's changes of summer. (13:06:01) matt: Yeah, you just need to put an autoconf somewhere... (13:07:05) matt: And parallel fetching is still happening, of course. I'm just admitting that I barely ran cache-bypassing with disk cache. (13:07:49) phalenor: for the most part, our 'big' machines run with around a half gig of memcache because they have the memory to spare (some 16G, most 32, one 64), workstations and machines on 100Mb are still disk cache, as even with 1.4 disc cache becomes network bound (13:08:02) matt: Yes, that's the ticket. (13:08:29) matt: Oh, and you want to increase -daemons, esp. with more calls patch to come. (13:09:00) phalenor: right now we're running with 12, so more than that? (13:09:26) shadow@gmail.com/owl1EA1D463: -daemons 12 is probably fine for now. (13:09:28) matt: Actually, that's probably fine. Worth looking into, perhaps. (13:10:09) phalenor: I haven't tested while varying that number, but I suppose I could fiddle with it a bit (13:10:15) matt: With clones, I never used more than 4xRX_MAXCALLS = 12 anyway, btw. (13:10:33) matt: So there would be no improvement, unless we were starving something else. (13:10:55) matt: sorry, 3x
iozone
http://www.bx.psu.edu/~phalenor/afs_performance_results/
/afs/bx.psu.edu/user/phalenor/afs_performance/results
Apache
1.4.x vs 1.5.x. vs 1.6.x
web-1 and web-2 are 32-bit CentOS 5.5 VMs running under VMWare ESXi.
Test was performed with wget against a 500MB iso living in a public_html directory in my home directory, served out by 1.4.11 on Solaris 10u8, ZFS on 4GFC.
- web-1: 1.4.12.1, 1GB 2GFC disk cache. 4 vCPUs, pcnet32, -stat 9600 -daemons 6 -volumes 512 -chunksize 19
- wget: 2.03MB/s
- iperf: ~80Mb/s
- (high load average because of pcnet32 emulation?)
- web-2: 1.5.77, 1GB 2GFC disk cache, 4 vCPUs, vmxnet, -stat 9600 -daemons 12 -volumes 512 -chunksize 19 -rxpck 2048
- wget: 14.0MB/s
- iperf: >400Mb/s
- wget (cold cache): 3.11MB/s
- 1.6.0pre2
- wget (cold cache): 5.53MB/s
- curl (warm cache): 32MB/s
- curl (cache bypass): 35MB/s
- web-1: 1.4.12.1, 1GB 2GFC disk cache. 4 vCPUs, vmxnet, -stat 9600 -daemons 6 -volumes 512 -chunksize 19
- iperf: >400Mb/s
- wget: 2.99 MB/s
- web-2: bypassthreshold=1 (crashed)
- wget: 44.9MB/s
- web-2: 1.6.0pre4 1GB 2GFC disk cache, 4 vCPUs, vmxnet, -stat 9600 -daemons 12 -volumes 512 -chunksize 19 -rxpck 2048
- Test file is 1GB in size, and in a volume in our fastest fileserver (fs8)
- wget (cold cache): 6.12MB/s. httpd was using ~30% CPU.
- wget (warm cache): 30.6MB/s (peaked around 40MB/s). httpd was using 97% CPU.
- wget (cache bypassthreshold=1, cachesize of 1): 26.2MB/s, which peaks around 40MB/s. httpd was using ~95% CPU
misc
1.6.0pre3 64-bit Linux client, 655360 block memcache 1.4.14 fileserver on osiris
$ time dd if=/dev/zero of=test bs=1M count=10240 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 134.894 seconds, 79.6 MB/s real 2m15.352s user 0m0.008s sys 1m26.993s $ ls -lh ../../software/samfs/* -rw-r--r-- 1 phalenor phalenor 465M Apr 8 2010 ../../software/samfs/SUN_SAM-FS_5.1.iso $ time cp ../../software/samfs/SUN_SAM-FS_5.1.iso ./ real 0m22.332s user 0m0.050s sys 0m4.742s $ time dd if=SUN_SAM-FS_5.1.iso of=/dev/null bs=1M 464+1 records in 464+1 records out 487266304 bytes (487 MB) copied, 0.187631 seconds, 2.6 GB/s real 0m0.189s user 0m0.001s sys 0m0.188s $ fs flush SUN_SAM-FS_5.1.iso $ time dd if=SUN_SAM-FS_5.1.iso of=/dev/null bs=1M 464+1 records in 464+1 records out 487266304 bytes (487 MB) copied, 6.86487 seconds, 71.0 MB/s real 0m6.867s user 0m0.001s sys 0m0.829s $ time dd if=SUN_SAM-FS_5.1.iso of=/dev/null bs=1M 464+1 records in 464+1 records out 487266304 bytes (487 MB) copied, 0.188059 seconds, 2.6 GB/s real 0m0.189s user 0m0.001s sys 0m0.190s