Difference between revisions of "BXadmin:AFS performance testing"

From CCGB
Jump to: navigation, search
(Apache)
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
<pre>
+
Conversation with Matt Benjamin detailing testing work to be done for cache-bypass on 1.5.x:
(12:58:08) matt: Well, then 1.5.x is your branch.  I'd be interested in getting comparative results first with the unmodified branch, on memcache with appropriate
+
 
chunksize (e.g., 18?), with and without cache bypass enabled.  You can use one cm binary, built with --enable-cache-bypass, setting a low fs bypasthreshold to enable bypassing.  Read vs. mixed read-write workloads are interesting--the latter should have a significant penalty, should ideally, workload should be read-heavy.
+
<pre>(12:58:08) matt: Well, then 1.5.x is your branch.  I'd be interested in getting comparative results first with the unmodified branch, on memcache with appropriate
                          ax Then, repeating with cache-bypass refcounting patch in gerrit.
+
chunksize (e.g., 18?), with and without cache bypass enabled.  You can use one cm binary, built with --enable-cache-bypass, setting a low fs bypasthreshold to enable bypassing.  Read vs. mixed read-write workloads are interesting--the latter should have a significant penalty, should ideally, workload should be read-heavy. Then, repeating with cache-bypass refcounting patch in gerrit.
                          ax(12:58:30) matt: When we have a connection pooling patch worth running, repeating with thiat.
+
(12:58:30) matt: When we have a connection pooling patch worth running, repeating with thiat.
                          ax(12:58:32) matt: that
+
(12:58:32) matt: that
                          ax(12:59:31) matt: The refcounting patch should have no measurable effect on performance, starting from 1.5.x + refcounting would probably be reasonable.
+
(12:59:31) matt: The refcounting patch should have no measurable effect on performance, starting from 1.5.x + refcounting would probably be reasonable.
                          ax(13:00:34) phalenor: do you care about disk cache performance?
+
(13:00:34) phalenor: do you care about disk cache performance?
                          ax(13:02:14) phalenor: and when you say with and without cache bypass, do you mean with and without --enable-cache-bypass at build time?
+
(13:02:14) phalenor: and when you say with and without cache bypass, do you mean with and without --enable-cache-bypass at build time?
                          ax(13:02:59) matt: with and without:  no, with fs bypassthresh default (0?, -1?  don't recall...) or vs. with some small value (which enables bypassing)
+
(13:02:59) matt: with and without:  no, with fs bypassthresh default (0?, -1?  don't recall...) or vs. with some small value (which enables bypassing)
                          ax(13:05:02) phalenor: okay. so the only stumbling block I see then is I don't have any machines handy with new enough autoconf, etc to run regen.sh, though that
+
(13:05:02) phalenor: okay. so the only stumbling block I see then is I don't have any machines handy with new enough autoconf, etc to run regen.sh, though that could be rectified I suppose
                          axcould be rectified I suppose
+
(13:05:52) matt: But as regards disk.  You get massive "improvement" but relative to memcache, the results are unrealistically scaled.  But it should work regardless, and reduces disk workload, obviously, though it's now background work, since Simon's changes of summer.
                          ax(13:05:52) matt: But as regards disk.  You get massive "improvement" but relative to memcache, the results are unrealistically scaled.  But it should work
+
(13:06:01) matt: Yeah, you just need to put an autoconf somewhere...
                          axregardless, and reduces disk workload, obviously, though it's now background work, since Simon's changes of summer.
+
(13:07:05) matt: And parallel fetching is still happening, of course.  I'm just admitting that I barely ran cache-bypassing with disk cache.
                          ax(13:06:01) matt: Yeah, you just need to put an autoconf somewhere...
+
(13:07:49) phalenor: for the most part, our 'big' machines run with around a half gig of memcache because they have the memory to spare (some 16G, most 32, one 64), workstations and machines on 100Mb are still disk cache, as even with 1.4 disc cache becomes network bound
                          ax(13:07:05) matt: And parallel fetching is still happening, of course.  I'm just admitting that I barely ran cache-bypassing with disk cache.
+
(13:08:02) matt: Yes, that's the ticket.
                          ax(13:07:49) phalenor: for the most part, our 'big' machines run with around a half gig of memcache because they have the memory to spare (some 16G, most 32, one 64),
+
(13:08:29) matt: Oh, and you want to increase -daemons, esp. with more calls patch to come.
                          axworkstations and machines on 100Mb are still disk cache, as even with 1.4 disc cache becomes network bound
+
(13:09:00) phalenor: right now we're running with 12, so more than that?
                          ax(13:08:02) matt: Yes, that's the ticket.                                                                                                                            
+
(13:09:26) shadow@gmail.com/owl1EA1D463: -daemons 12 is probably fine for now.
                          ax(13:08:29) matt: Oh, and you want to increase -daemons, esp. with more calls patch to come.                                                                        
+
(13:09:28) matt: Actually, that's probably fine.  Worth looking into, perhaps.
                          ax(13:09:00) phalenor: right now we're running with 12, so more than that?                                                                                            
+
(13:10:09) phalenor: I haven't tested while varying that number, but I suppose I could fiddle with it a bit
                          ax(13:09:26) shadow@gmail.com/owl1EA1D463: -daemons 12 is probably fine for now.                                                                                      
+
(13:10:15) matt: With clones, I never used more than 4xRX_MAXCALLS = 12 anyway, btw.
                          ax(13:09:28) matt: Actually, that's probably fine.  Worth looking into, perhaps.                                                                                      
+
(13:10:33) matt: So there would be no improvement, unless we were starving something else.
                          ax(13:10:09) phalenor: I haven't tested while varying that number, but I suppose I could fiddle with it a bit                                                        
+
(13:10:55) matt: sorry, 3x</pre>
                          ax(13:10:15) matt: With clones, I never used more than 4xRX_MAXCALLS = 12 anyway, btw.                                                                                
+
 
                          x(13:10:33) matt: So there would be no improvement, unless we were starving something else.                                                                          
+
= iozone =
lqqqqqqqqqqqqqqqqqqqqqqqqqkx(13:10:55) matt: sorry, 3x
+
 
 +
http://www.bx.psu.edu/~phalenor/afs_performance_results/
 +
 
 +
/afs/bx.psu.edu/user/phalenor/afs_performance/results
 +
 
 +
= Apache =
 +
 
 +
== 1.4.x vs 1.5.x. vs 1.6.x ==
 +
 
 +
web-1 and web-2 are 32-bit CentOS 5.5 VMs running under VMWare ESXi.
 +
 
 +
Test was performed with wget against a 500MB iso living in a public_html directory in my home directory, served out by 1.4.11 on Solaris 10u8, ZFS on 4GFC.
 +
 
 +
* web-1: 1.4.12.1, 1GB 2GFC disk cache. 4 vCPUs, pcnet32, -stat 9600 -daemons 6 -volumes 512 -chunksize 19
 +
** wget: 2.03MB/s
 +
** iperf: ~80Mb/s
 +
** (high load average because of pcnet32 emulation?)
 +
 
 +
* web-2: 1.5.77, 1GB 2GFC disk cache, 4 vCPUs, vmxnet, -stat 9600 -daemons 12 -volumes 512 -chunksize 19 -rxpck 2048
 +
** wget: 14.0MB/s
 +
** iperf: >400Mb/s
 +
** wget (cold cache): 3.11MB/s
 +
** 1.6.0pre2
 +
*** wget (cold cache): 5.53MB/s
 +
*** curl (warm cache): 32MB/s
 +
*** curl (cache bypass): 35MB/s
 +
 
 +
* web-1: 1.4.12.1, 1GB 2GFC disk cache. 4 vCPUs, vmxnet, -stat 9600 -daemons 6 -volumes 512 -chunksize 19
 +
** iperf: >400Mb/s
 +
** wget: 2.99 MB/s
 +
       
 +
* web-2: bypassthreshold=1 (crashed)
 +
** wget: 44.9MB/s
 +
 
 +
* web-2: 1.6.0pre4 1GB 2GFC disk cache, 4 vCPUs, vmxnet, -stat 9600 -daemons 12 -volumes 512 -chunksize 19 -rxpck 2048
 +
** Test file is 1GB in size, and in a volume in our fastest fileserver (fs8)
 +
** wget (cold cache): 6.12MB/s. httpd was using ~30% CPU.
 +
** wget (warm cache): 30.6MB/s (peaked around 40MB/s). httpd was using 97% CPU.
 +
** wget (cache bypassthreshold=1, cachesize of 1): 26.2MB/s, which peaks around 40MB/s. httpd was using ~95% CPU
 +
 
 +
= misc =
 +
1.6.0pre3 64-bit Linux client, 655360 block memcache
 +
1.4.14 fileserver on osiris
 +
<pre>$ time dd if=/dev/zero of=test bs=1M count=10240
 +
10240+0 records in
 +
10240+0 records out
 +
10737418240 bytes (11 GB) copied, 134.894 seconds, 79.6 MB/s
 +
 
 +
real    2m15.352s
 +
user    0m0.008s
 +
sys    1m26.993s
 +
 
 +
$ ls -lh ../../software/samfs/*
 +
-rw-r--r-- 1 phalenor      phalenor 465M Apr  8  2010 ../../software/samfs/SUN_SAM-FS_5.1.iso
 +
 
 +
$ time cp ../../software/samfs/SUN_SAM-FS_5.1.iso ./
 +
 
 +
real    0m22.332s
 +
user    0m0.050s
 +
sys    0m4.742s
 +
 
 +
$ time dd if=SUN_SAM-FS_5.1.iso of=/dev/null bs=1M
 +
464+1 records in
 +
464+1 records out
 +
487266304 bytes (487 MB) copied, 0.187631 seconds, 2.6 GB/s
 +
 
 +
real    0m0.189s
 +
user    0m0.001s
 +
sys    0m0.188s
 +
 
 +
$ fs flush SUN_SAM-FS_5.1.iso
 +
$ time dd if=SUN_SAM-FS_5.1.iso of=/dev/null bs=1M
 +
464+1 records in
 +
464+1 records out
 +
487266304 bytes (487 MB) copied, 6.86487 seconds, 71.0 MB/s
 +
 
 +
real    0m6.867s
 +
user    0m0.001s
 +
sys    0m0.829s
 +
 
 +
$ time dd if=SUN_SAM-FS_5.1.iso of=/dev/null bs=1M
 +
464+1 records in
 +
464+1 records out
 +
487266304 bytes (487 MB) copied, 0.188059 seconds, 2.6 GB/s
 +
 
 +
real    0m0.189s
 +
user    0m0.001s
 +
sys    0m0.190s
 +
 
 
</pre>
 
</pre>

Latest revision as of 13:28, 25 April 2011

Conversation with Matt Benjamin detailing testing work to be done for cache-bypass on 1.5.x:

(12:58:08) matt: Well, then 1.5.x is your branch.  I'd be interested in getting comparative results first with the unmodified branch, on memcache with appropriate
chunksize (e.g., 18?), with and without cache bypass enabled.  You can use one cm binary, built with --enable-cache-bypass, setting a low fs bypasthreshold to enable bypassing.  Read vs. mixed read-write workloads are interesting--the latter should have a significant penalty, should ideally, workload should be read-heavy. Then, repeating with cache-bypass refcounting patch in gerrit.
(12:58:30) matt: When we have a connection pooling patch worth running, repeating with thiat.
(12:58:32) matt: that
(12:59:31) matt: The refcounting patch should have no measurable effect on performance, starting from 1.5.x + refcounting would probably be reasonable.
(13:00:34) phalenor: do you care about disk cache performance?
(13:02:14) phalenor: and when you say with and without cache bypass, do you mean with and without --enable-cache-bypass at build time?
(13:02:59) matt: with and without:  no, with fs bypassthresh default (0?, -1?  don't recall...) or vs. with some small value (which enables bypassing)
(13:05:02) phalenor: okay. so the only stumbling block I see then is I don't have any machines handy with new enough autoconf, etc to run regen.sh, though that could be rectified I suppose
(13:05:52) matt: But as regards disk.  You get massive "improvement" but relative to memcache, the results are unrealistically scaled.  But it should work regardless, and reduces disk workload, obviously, though it's now background work, since Simon's changes of summer.
(13:06:01) matt: Yeah, you just need to put an autoconf somewhere...
(13:07:05) matt: And parallel fetching is still happening, of course.  I'm just admitting that I barely ran cache-bypassing with disk cache.
(13:07:49) phalenor: for the most part, our 'big' machines run with around a half gig of memcache because they have the memory to spare (some 16G, most 32, one 64), workstations and machines on 100Mb are still disk cache, as even with 1.4 disc cache becomes network bound
(13:08:02) matt: Yes, that's the ticket.
(13:08:29) matt: Oh, and you want to increase -daemons, esp. with more calls patch to come.
(13:09:00) phalenor: right now we're running with 12, so more than that?
(13:09:26) shadow@gmail.com/owl1EA1D463: -daemons 12 is probably fine for now.
(13:09:28) matt: Actually, that's probably fine.  Worth looking into, perhaps.
(13:10:09) phalenor: I haven't tested while varying that number, but I suppose I could fiddle with it a bit
(13:10:15) matt: With clones, I never used more than 4xRX_MAXCALLS = 12 anyway, btw.
(13:10:33) matt: So there would be no improvement, unless we were starving something else.
(13:10:55) matt: sorry, 3x

iozone

http://www.bx.psu.edu/~phalenor/afs_performance_results/

/afs/bx.psu.edu/user/phalenor/afs_performance/results

Apache

1.4.x vs 1.5.x. vs 1.6.x

web-1 and web-2 are 32-bit CentOS 5.5 VMs running under VMWare ESXi.

Test was performed with wget against a 500MB iso living in a public_html directory in my home directory, served out by 1.4.11 on Solaris 10u8, ZFS on 4GFC.

  • web-1: 1.4.12.1, 1GB 2GFC disk cache. 4 vCPUs, pcnet32, -stat 9600 -daemons 6 -volumes 512 -chunksize 19
    • wget: 2.03MB/s
    • iperf: ~80Mb/s
    • (high load average because of pcnet32 emulation?)
  • web-2: 1.5.77, 1GB 2GFC disk cache, 4 vCPUs, vmxnet, -stat 9600 -daemons 12 -volumes 512 -chunksize 19 -rxpck 2048
    • wget: 14.0MB/s
    • iperf: >400Mb/s
    • wget (cold cache): 3.11MB/s
    • 1.6.0pre2
      • wget (cold cache): 5.53MB/s
      • curl (warm cache): 32MB/s
      • curl (cache bypass): 35MB/s
  • web-1: 1.4.12.1, 1GB 2GFC disk cache. 4 vCPUs, vmxnet, -stat 9600 -daemons 6 -volumes 512 -chunksize 19
    • iperf: >400Mb/s
    • wget: 2.99 MB/s
  • web-2: bypassthreshold=1 (crashed)
    • wget: 44.9MB/s
  • web-2: 1.6.0pre4 1GB 2GFC disk cache, 4 vCPUs, vmxnet, -stat 9600 -daemons 12 -volumes 512 -chunksize 19 -rxpck 2048
    • Test file is 1GB in size, and in a volume in our fastest fileserver (fs8)
    • wget (cold cache): 6.12MB/s. httpd was using ~30% CPU.
    • wget (warm cache): 30.6MB/s (peaked around 40MB/s). httpd was using 97% CPU.
    • wget (cache bypassthreshold=1, cachesize of 1): 26.2MB/s, which peaks around 40MB/s. httpd was using ~95% CPU

misc

1.6.0pre3 64-bit Linux client, 655360 block memcache 1.4.14 fileserver on osiris

$ time dd if=/dev/zero of=test bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 134.894 seconds, 79.6 MB/s

real    2m15.352s
user    0m0.008s
sys     1m26.993s

$ ls -lh ../../software/samfs/*
-rw-r--r-- 1 phalenor       phalenor 465M Apr  8  2010 ../../software/samfs/SUN_SAM-FS_5.1.iso

$ time cp ../../software/samfs/SUN_SAM-FS_5.1.iso ./

real    0m22.332s
user    0m0.050s
sys     0m4.742s

$ time dd if=SUN_SAM-FS_5.1.iso of=/dev/null bs=1M
464+1 records in
464+1 records out
487266304 bytes (487 MB) copied, 0.187631 seconds, 2.6 GB/s

real    0m0.189s
user    0m0.001s
sys     0m0.188s

$ fs flush SUN_SAM-FS_5.1.iso
$ time dd if=SUN_SAM-FS_5.1.iso of=/dev/null bs=1M
464+1 records in
464+1 records out
487266304 bytes (487 MB) copied, 6.86487 seconds, 71.0 MB/s

real    0m6.867s
user    0m0.001s
sys     0m0.829s

$ time dd if=SUN_SAM-FS_5.1.iso of=/dev/null bs=1M
464+1 records in
464+1 records out
487266304 bytes (487 MB) copied, 0.188059 seconds, 2.6 GB/s

real    0m0.189s
user    0m0.001s
sys     0m0.190s