dnetc on GeForce (CUDA) - だ・か・らっ、Dia“l”yだってばさ!

毎日欠かさずCUDA用clientのページをCheckしております。(^^;A
本日、変化がありました。
と言っても、「(beta)」の記述が加えられただけです。
全然進歩では御座いませんね。 orz
　
Bugzilla Bug 4030をよく読んでみたら、8600GTと8800GTのベンチが載ってました。
いずれもWin上のclientでtestされたようです。
こちらが8600GTの結果で、

dnetc v2.9102-508-GTR-08121316 for Win32 (WindowsNT 5.1).
Using email address (distributed.net ID) 'philippe@faure.ca'
[Dec 19 00:05:16 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd busy wait).
[Dec 19 00:05:35 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd busy wait)
0.00:00:16.25 [81,527,027 keys/sec]
[Dec 19 00:05:35 UTC] RC5-72: using core #1 (CUDA 1-pipe 64-thd sleep dynamic).
[Dec 19 00:05:54 UTC] RC5-72: Benchmark for core #1 (CUDA 1-pipe 64-thd sleep dynamic)
0.00:00:16.25 [81,366,291 keys/sec]
[Dec 19 00:05:54 UTC] RC5-72: using core #2 (CUDA 1-pipe 128-thd busy wait).
[Dec 19 00:06:13 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 128-thd busy wait)
0.00:00:16.40 [81,815,103 keys/sec]
[Dec 19 00:06:13 UTC] RC5-72: using core #3 (CUDA 1-pipe 128-thd sleep dynamic).
[Dec 19 00:06:31 UTC] RC5-72: Benchmark for core #3 (CUDA 1-pipe 128-thd sleep dynamic)
0.00:00:16.48 [80,047,542 keys/sec]
[Dec 19 00:06:31 UTC] RC5-72: using core #4 (CUDA 1-pipe 256-thd busy wait).
[Dec 19 00:06:51 UTC] RC5-72: Benchmark for core #4 (CUDA 1-pipe 256-thd busy wait)
0.00:00:16.53 [79,930,236 keys/sec]
[Dec 19 00:06:51 UTC] RC5-72: using core #5 (CUDA 1-pipe 256-thd sleep dynamic).
[Dec 19 00:07:10 UTC] RC5-72: Benchmark for core #5 (CUDA 1-pipe 256-thd sleep dynamic)
0.00:00:16.73 [73,565,144 keys/sec]
[Dec 19 00:07:10 UTC] RC5-72: using core #6 (CUDA 2-pipe 64-thd busy wait).
[Dec 19 00:07:28 UTC] RC5-72: Benchmark for core #6 (CUDA 2-pipe 64-thd busy wait)
0.00:00:16.14 [82,719,683 keys/sec]
[Dec 19 00:07:28 UTC] RC5-72: using core #7 (CUDA 2-pipe 64-thd sleep dynamic).
[Dec 19 00:07:48 UTC] RC5-72: Benchmark for core #7 (CUDA 2-pipe 64-thd sleep dynamic)
0.00:00:17.10 [79,758,307 keys/sec]
[Dec 19 00:07:48 UTC] RC5-72: using core #8 (CUDA 2-pipe 128-thd busy wait).
[Dec 19 00:08:06 UTC] RC5-72: Benchmark for core #8 (CUDA 2-pipe 128-thd busy wait)
0.00:00:16.21 [71,020,560 keys/sec]
[Dec 19 00:08:06 UTC] RC5-72: using core #9 (CUDA 2-pipe 128-thd sleep dynamic).
[Dec 19 00:08:26 UTC] RC5-72: Benchmark for core #9 (CUDA 2-pipe 128-thd sleep dynamic)
0.00:00:16.93 [70,620,523 keys/sec]
[Dec 19 00:08:26 UTC] RC5-72: using core #10 (CUDA 4-pipe 64-thd busy wait).
[Dec 19 00:08:45 UTC] RC5-72: Benchmark for core #10 (CUDA 4-pipe 64-thd busy wait)
0.00:00:16.15 [76,264,222 keys/sec]
[Dec 19 00:08:45 UTC] RC5-72: using core #11 (CUDA 4-pipe 64-thd sleep dynamic).
[Dec 19 00:09:04 UTC] RC5-72: Benchmark for core #11 (CUDA 4-pipe 64-thd sleep dynamic)
0.00:00:16.39 [76,697,616 keys/sec]
[Dec 19 00:09:04 UTC] RC5-72: using core #12 (CUDA 4-pipe 128-thd busy wait).
[Dec 19 00:09:23 UTC] RC5-72: Benchmark for core #12 (CUDA 4-pipe 128-thd busy wait)
0.00:00:17.03 [70,631,611 keys/sec]
[Dec 19 00:09:23 UTC] RC5-72: using core #13 (CUDA 4-pipe 128-thd sleep dynamic).
[Dec 19 00:09:42 UTC] RC5-72: Benchmark for core #13 (CUDA 4-pipe 128-thd sleep dynamic)
0.00:00:16.51 [69,902,125 keys/sec]

こちらが8800GTの結果です。

{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss\fcharset0 Arial;}}
{\*\generator Msftedit 5.41.21.2500;}\viewkind4\uc1\pard\f0\fs20 MY System:\par
XP Pro 64, Version 2003, SP2\par
AMD X2 5200\par
8800GT\par
\par

bench\par

\par
distributed.net client for Win32 Copyright 1997-2008, distributed.net\par
Please visit http://www.distributed.net/ for up-to-date contest information.\par
Start the client with '-help' for a list of valid command line options.\par
\par
\par
dnetc v2.9102-508-CTR-08121911 for Win32 (WindowsNT 5.2).\par
Please provide the *entire* version descriptor when submitting bug reports.\par
The distributed.net bug report pages are at http://bugs.distributed.net/\par
Using email address (distributed.net ID) 'nodcs4me@yahoo.com'\par
\par
[Dec 20 15:08:02 UTC] RC5-72: using core #0 (CUDA 1-pipe 128-thd).\par
[Dec 20 15:08:19 UTC] RC5-72: Benchmark for core #0 (CUDA 1-pipe 128-thd)\par
0.00:00:14.53 [299,875,439 keys/sec]\par
[Dec 20 15:08:19 UTC] RC5-72: using core #1 (CUDA 2-pipe 64-thd).\par
[Dec 20 15:08:36 UTC] RC5-72: Benchmark for core #1 (CUDA 2-pipe 64-thd)\par
0.00:00:14.28 [303,366,378 keys/sec]\par
[Dec 20 15:08:36 UTC] RC5-72: using core #2 (CUDA 1-pipe 64-thd).\par
[Dec 20 15:08:54 UTC] RC5-72: Benchmark for core #2 (CUDA 1-pipe 64-thd)\par
0.00:00:15.93 [273,870,661 keys/sec]\par
[Dec 20 15:08:54 UTC] RC5-72: using core #3 (CUDA 1-pipe 256-thd).\par
[Dec 20 15:09:11 UTC] RC5-72: Benchmark for core #3 (CUDA 1-pipe 256-thd)\par
0.00:00:14.93 [290,828,685 keys/sec]\par
[Dec 20 15:09:11 UTC] RC5-72: using core #4 (CUDA 2-pipe 128-thd).\par
[Dec 20 15:09:30 UTC] RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd)\par
0.00:00:16.06 [252,730,234 keys/sec]\par
[Dec 20 15:09:30 UTC] RC5-72: using core #5 (CUDA 1-pipe 64-thd sleep 100us).\par
[Dec 20 15:09:48 UTC] RC5-72: Benchmark for core #5 (CUDA 1-pipe 64-thd sle ...\par
0.00:00:16.18 [265,328,395 keys/sec]\par
[Dec 20 15:09:48 UTC] RC5-72: using core #6 (CUDA 1-pipe 64-thd sleep dynamic).\par
[Dec 20 15:10:07 UTC] RC5-72: Benchmark for core #6 (CUDA 1-pipe 64-thd sle ...\par
0.00:00:15.96 [274,440,495 keys/sec]\par
[Dec 20 15:10:07 UTC] RC5-72: using core #7 (CUDA 1-pipe 64-thd busy wait).\par
[Dec 20 15:10:25 UTC] RC5-72: Benchmark for core #7 (CUDA 1-pipe 64-thd bus ...\par
0.00:00:14.57 [300,691,526 keys/sec]\par
[Dec 20 15:10:25 UTC] RC5-72: using core #9 (CUDA 4-pipe 64-thd).\par
[Dec 20 15:10:43 UTC] RC5-72: Benchmark for core #9 (CUDA 4-pipe 64-thd)\par
0.00:00:16.18 [262,444,878 keys/sec]\par
[Dec 20 15:10:43 UTC] RC5-72: using core #10 (CUDA 4-pipe 128-thd).\par
[Dec 20 15:11:02 UTC] RC5-72: Benchmark for core #10 (CUDA 4-pipe 128-thd)\par
0.00:00:16.60 [202,495,659 keys/sec]\par
[Dec 20 15:11:02 UTC] RC5-72: using core #12 (CUDA 1-pipe 64-thd sleep gput ...\par
[Dec 20 15:11:21 UTC] RC5-72: Benchmark for core #12 (CUDA 1-pipe 64-thd sl ...\par
0.00:00:16.11 [267,409,707 keys/sec]\par
\par
}
�

8600GTの搾乳速度そのものは、12/20にもお示し示したが、9600GTの結果よりかなり劣ります。
一方、8800GTは、9600GTを軽く凌駕しております。
ざっくり比べますと、

8600GT: 80 Mkeys/s
8800GT: 270 Mkeys/s
9600GT: 170 Mkeys/s

となります。
価格を調べてみますと、8800GTは\20k前後しますので9600GTより遙かに高価です。
その価格差に見合うだけの解析速度比なのか・・・
微妙ですね。
実際に店舗まで足を運んで、価格差を検証しなければ。
　
しかし、それ以上の問題、というか疑問がわき上がります。
上記の引用中では、8600GTでは14 core、8800GTでは11 coreが並列に走っているように見受けられます。
先日ご紹介いただいた、Linux上のCUDAで9600GTを動かしている例では、それほど多くのcoreで走っているようには見えないのですよ。
LINK先のpageにある情報の写真では2 core、下方の写真では1 coreでしか走っていないように見受けられます。
やはり実際に試してみないと何とも言えないというところでしょうか・・・