DCAWD.com

Posted: **Tue Jan 04, 2011 11:39 am**

Power and Cooling, Memory Needs Drive Change

The trend toward in-memory analytics is being driven in part by power and cooling and memory demands.
Power and cooling needs expand as more and more servers are required as the database grows, as power and cooling requirements grow with the server count. Larger and larger amounts of memory are also needed, given the number of cores and the available processing power for each CPU. And the more memory you require, the more power and cooling you need. The number of watts for DDR3 with even 4 DIMM slots can exceed the number of watts required by a single processor, depending on the processor. Using more than 4 DIMM slots per processor is becoming more common, and of course the core count is growing with no end in sight. If you make the assumption that you need 2 GB of memory per core for efficient analytics processing, a 12 core CPU will need 24 GB of memory. The number of flash drives or PCIe flash devices needed to achieve the performance of main memory is not cost-effective, given the number of PCIe buses needed, the cost of the devices and the complexity of using them compared to just using memory.

I/O Complexity Slows the Data Path

Another reason that many data analytics programs have moved to memory-only methods is because of I/O complexity and latency.
Historically I/O has been thousands of times slower than memory in both latency and performance. Even with flash devices, I/O latency is milliseconds of latency compared to microseconds for main memory. Even if flash device latency improves, it still has to go through the OS and PCIe bus compared to a direct hardware translation to address memory.
Knowledge of how to do I/O efficiently is limited because I/O programming is not taught in schools. Other than using the C library fopen/fread/fwrite, not much more is taught from what I have seen, and even if high-performance, low-latency I/O programming were taught, there are still significant limits to performance because of minimal interfaces.
The cost of I/O in terms of operating system interrupt overhead, latency and the path through the I/O stack is another limitation. Whenever I/O is done, the operating system must be called to do the I/O. This has significant overhead and latency and cannot be eliminated given current operating system implementations.
The problem is that the I/O stack has not changed much at all in 30 years. This is what the data path looks like currently:

There are no major changes on the horizon for the I/O stack, which means that any application still has to go through interrupting the operating system, the file system POSIX layer and the SCSI/SATA driver. There are some flash PCIe vendors that have developed changes to the I/O stack, but they are proprietary. I see nothing on the standards horizon that looks like a proposal, much less something that all of the vendors can agree upon. The standards process is controlled by a myriad of different groups, so I have little hope of change, which is why storage will be relegated to checkpointing and restarting these in-memory applications. It is clear that data analytics cannot be efficiently accomplished using disk drives, even flash drives, as you will have expensive CPUs sitting idle.

Future Analytics Architectures

As data analytics demand more and more memory because of the increase in CPU core counts, new memory technologies will need to be developed and brought to market to address the requirements. Things like double-stacked DDR3, phase change memory (PCM), memristor and other technologies are going to be required to meet the needs of this market. Data analytics is a memory-intensive application, and even high-performance storage does not have enough bandwidth to address the requirements. Combine that with the fact that data analytics applications are latency intolerant and you have a memory-based application, as the storage stack is high latency and OS hungry compared to memory, even slower memory such as PCM or memristor, and this latency cannot be changed.

Interesting article... not something we all probably didn't see happening though.

Posted: **Tue Jan 11, 2011 11:56 pm**

everything has to evolve, right?

Posted: **Wed Jan 12, 2011 10:04 am**

I work with arguably the (one of the) top guy in memory analytics in the country (if not the world). I'll have to shoot this to him.

Posted: **Wed Jan 12, 2011 10:22 am**

Keep in mind they are talking about something a little different here... but he will still probably find it interesting

Posted: **Wed Jan 12, 2011 10:28 am**

different how?

Posted: **Wed Jan 12, 2011 10:45 am**

Doesn't your guy deal with pulling the bits off of the sticks of RAM for analysis after an attack on a server? The above is talking about using more RAM to do analytics because it's faster.

Posted: **Wed Jan 12, 2011 11:15 am**

we do virtualized computing (which is inherently is memory intensive) - he's doing realtime analytics of that memory to protect the hypervisor from cloud-bursting exploits.

Posted: **Wed Jan 12, 2011 12:12 pm**

Ah, gotcha

Posted: **Wed Jan 12, 2011 12:42 pm**

256GB on a blade? Yep....memory is where its at.

Posted: **Wed Jan 12, 2011 1:33 pm**

Maaaaaan, everyone gets to play with cooler toys than me.

Posted: **Wed Jan 12, 2011 2:31 pm**

oh, I don't play with it. I just help sell it.

Posted: **Wed Jan 12, 2011 7:15 pm**

I have been just going over this the past few weeks.

Have you guys had a chance to play with these?
http://www.fusionio.com/

I checked out a one of the IO duals a few weeks ago... SEXY as anything I have messed with in several years..... stripe a pair and get 1.9M IOps/SEC. INSANE!!! Pack 6 or 8 i/O cards in a DL890 and you are talking several (5sih) TB rockin disk speed, back it with 8 sockets and 2TB of 1333RAM... ALL IN 1 8U box.

Also some of the aps are licensing out core-linar. So trying it figure out how to pack 4000 cores in a few racks, It's been a fun week.

Posted: **Wed Jan 12, 2011 11:14 pm**

i've only ever read about their awesomeness... never had the chance first hand.

sounds like a crazy amount of fun/potential.

Posted: **Thu Jan 13, 2011 12:04 am**

steed77 wrote:I have been just going over this the past few weeks.

Have you guys had a chance to play with these?
http://www.fusionio.com/
...

Also some of the aps are licensing out core-linar. So trying it figure out how to pack 4000 cores in a few racks, It's been a fun week.

I'm going to have to check those out!

4000 cores in a few racks eh? hmm.. Quad AMD 6176se's would get you there in two racks. I'd be more inclined to go with something based on the X5680 though for raw speed.

Posted: **Thu Jan 13, 2011 8:13 am**

AMD is a no go here.... must be Intel.

I do have a 1500 core rack solution, but it will require special cooling provisions. Currently I am running ~780/rack. Also need some good size of RAM to support, so figure ~10-12TB per rack

.

Posted: **Thu Jan 13, 2011 10:27 am**

Who's the solution from? Even using Supermicro's TwinBlade system I can only get to 1440cores/42u.

Posted: **Thu Jan 13, 2011 1:40 pm**

I have been looking at some of the HP solutions

BL2x220 G6's will stack out with 4 c7000's with 1,536 cores, and 12TB Mem at 42U...
OR
DL2000 w/DL170e Nodes, at 42U 1,008 cores and 16TB of Mem.

This is with the current offerings. Also there is lots of room for Fusion i/o cards.. still not sure on that yet.

Still working on thermals thu...

Posted: **Thu Jan 13, 2011 2:09 pm**

Dave/Acquacow is an SE with Fusion-IO, by the way.

Posted: **Thu Jan 13, 2011 2:14 pm**

steed77 wrote:I have been looking at some of the HP solutions

BL2x220 G6's will stack out with 4 c7000's with 1,536 cores, and 12TB Mem at 42U...
OR
DL2000 w/DL170e Nodes, at 42U 1,008 cores and 16TB of Mem.

This is with the current offerings. Also there is lots of room for Fusion i/o cards.. still not sure on that yet.

Still working on thermals thu...

we've had really good luck with both the dl2000's and the blades. the dl's are great to load up with disks for tasks you don't want stored on a san... been using them since beige G3's.

jesus herald christ a fully-loaded blade chassis is heavy! it takes a while to unpack the pallet. watch out for bad switches, especially if you opt for the cisco models. we've had a couple show up doa with a fried backplane. :-/

Posted: **Thu Jan 13, 2011 5:33 pm**

complacent wrote:
steed77 wrote:I have been looking at some of the HP solutions

BL2x220 G6's will stack out with 4 c7000's with 1,536 cores, and 12TB Mem at 42U...
OR
DL2000 w/DL170e Nodes, at 42U 1,008 cores and 16TB of Mem.

This is with the current offerings. Also there is lots of room for Fusion i/o cards.. still not sure on that yet.

Still working on thermals thu...
we've had really good luck with both the dl2000's and the blades. the dl's are great to load up with disks for tasks you don't want stored on a san... been using them since beige G3's.

jesus herald christ a fully-loaded blade chassis is heavy! it takes a while to unpack the pallet. watch out for bad switches, especially if you opt for the cisco models. we've had a couple show up doa with a fried backplane. :-/

Yea I am leaning toward the dl2000's myself. Have to get it past mgn't but the $$$ will be a determining factor, sorta.

Heck yes.. I had to rack 8 of them a few weeks back. With 2 people and no blades, they upper parts of the rack were tricky. Also did not have a lift at this site... that day sucked.
I think fully loaded it's ~850 lbs. Been lucky so far with the Cisco switches. However that will soon end, since Cisco and HP are not playing well together. Sure wish the Nesus line would make it to the c7000, but That will not happen for a long time.

Side note: been totally blow away by the DL360 g7's... Almost fully loaded up; 192GB RAM, 2 procs, 8 10K 146GB drives, 4 onboard nics, 2 slots avail (1 1/2 height, 1 full height) and in a 1U form.

ok enough rambling.

thanks for the input guys.

Posted: **Thu Feb 17, 2011 2:26 pm**

BTW, update to the servers for this project (which I presume you have already done).
Dell releases C6145:

The PowerEdge C6145 is a 2U server offering up to 96 cores of processing power, designed for HPC applications, video rendering, virtualization, and Electronic Design Automation (EDA) workloads requiring high core counts, memory density and expanded I/O capabilities. The C6145 features the AMD Opteron 6000, available in two independent 4-socket server nodes, allowing users to cale up to 96 cores and up to 1 terabyte of memory.

So 2016 cores, 42TB of RAM and a ton of HD space in 42U.

DCAWD.com

Analytics and In-Memory Databases Are Changing Data Centers

Analytics and In-Memory Databases Are Changing Data Centers

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent

Re: Analytics and In-Memory Databases Are Changing Data Cent