Getting my geek on today... - Page 3

General » General Discussion

[ARCHIVED THREAD] - Getting my geek on today... (Page 3 of 13)

Posted: 6/7/2015 10:09:47 AM EDT

[#1]

Quote History

Quoted:
Get massively parallel, download some free AI software from Google and IBM, and build an intelligent being

View Quote

Eh I've never seen those two words used in a sentence before like that, you are joking right?

Posted: 6/7/2015 10:34:13 AM EDT

[#2]

Quote History

Quoted:

Eh I've never seen those two words used in a sentence before like that, you are joking right?

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:
Get massively parallel, download some free AI software from Google and IBM, and build an intelligent being

Eh I've never seen those two words used in a sentence before like that, you are joking right?

IBM is letting people try Watson's analytics for free, but they're doing it on the SoftLayer infrastructure (IBM bought SL a year and a half ago). You can hand it a big ole chunk of data and let it do it's thing. I don't think they allow that code to leave and run on anything else.

Posted: 6/7/2015 11:00:46 AM EDT

[#3]

It does it never ceases to amaze me

you should consider writing about your adventures with the parallella

Posted: 6/7/2015 12:52:40 PM EDT

[#4]

Quote History

Quoted:

It does it never ceases to amaze me

you should consider writing about your adventures with the parallella

View Quote

I just posted over on their forum, I was able to squeeze out some performance in the eprime example app by unrolling the loop a bit in e_prime.c

Posted: 6/7/2015 5:01:20 PM EDT

[#5]

If I could go back in time, I would have gotten into computational astrophysics, I love this stuff.

This simulation took a couple months to process on a supercomputer.

Probably take 10,000 years on a parallella.

Posted: 6/7/2015 5:03:15 PM EDT

[#6]

Cool.

I upgraded my home Cisco CUCM, Unity Connection and IM&P servers from version 9.1 to version 10.5 last night.

Posted: 6/7/2015 6:17:15 PM EDT

[#7]

Quote History

Quoted:
Eh I've never seen those two words used in a sentence before like that, you are joking right?

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:

Get massively parallel, download some free AI software from Google and IBM, and build an intelligent being

Eh I've never seen those two words used in a sentence before like that, you are joking right?

Here is an example of free AI software from Google:

https://code.google.com/p/word2vec/

IBM's Watson, which won at Jeopardy, was entirely built with Open Source Software

Here is one article but there are articles describing exactly how to build it with free software

http://www.aaai.org/Magazine/Watson/watson.php

Posted: 6/8/2015 3:05:37 PM EDT

[#8]

If any fellow geeks are interested in developing FPGA in C my latest blog just posted

http://forums.xilinx.com/t5/Xcell-Daily-Blog/Adam-Taylor-s-MicroZed-Chronicles-Part-85-SDSoC-the-first/ba-p/633707

Posted: 6/8/2015 3:08:07 PM EDT

[#9]

Quote History

Quoted:

If any fellow geeks are interested in developing FPGA in C my latest blog just posted

http://forums.xilinx.com/t5/Xcell-Daily-Blog/Adam-Taylor-s-MicroZed-Chronicles-Part-85-SDSoC-the-first/ba-p/633707

View Quote

Nice!

You misspelled 'installment'

Posted: 6/8/2015 3:18:40 PM EDT

[#10]

I blame the editor

Posted: 6/8/2015 3:31:21 PM EDT

[#11]

Quote History

Quoted:
If any fellow geeks are interested in developing FPGA in C my latest blog just posted

http://forums.xilinx.com/t5/Xcell-Daily-Blog/Adam-Taylor-s-MicroZed-Chronicles-Part-85-SDSoC-the-first/ba-p/633707

View Quote

Thanks! You most surely need to be more active in the nerd threads! Everything from Arduino to FPGA gets a thread every other week or so around here.

Posted: 6/8/2015 3:40:02 PM EDT

[#12]

Got my server version parallella in the mail today.

Next week I will order the cluster case and start building that out.

Posted: 6/8/2015 3:47:08 PM EDT

[#13]

I will try to I have a load of Arduinos floating around the study.

Posted: 6/8/2015 3:48:05 PM EDT

[#14]

Quote History

Quoted:
Got my server version parallella in the mail today.

Next week I will order the cluster case and start building that out.

View Quote

Very exciting any ideas what application you are going to run on it.

Posted: 6/8/2015 4:03:51 PM EDT

[#15]

Quote History

Quoted:
Very exciting any ideas what application you are going to run on it.

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:

Got my server version parallella in the mail today.

Next week I will order the cluster case and start building that out.

Very exciting any ideas what application you are going to run on it.

lol, no.

I just like to tinker, doubt I will do anything that puts Adapteva on the map.

I like astronomy and astrophysics stuff, have found source code repositories for that on the web, may start out trying port some of those to parallella, I think the lack of double precision in Epiphany may hinder some of that though.

Posted: 6/8/2015 4:11:16 PM EDT

[#16]

Quote History

lol, no.

I just like to tinker, doubt I will do anything that puts Adapteva on the map.

I like astronomy and astrophysics stuff, have found source code repositories for that on the web, may start out trying port some of those to parallella, I think the lack of double precision in Epiphany may hinder some of that though.

View Quote

interesting my day job involves astronomy and space systems

Posted: 6/8/2015 4:15:30 PM EDT

[#17]

Quote History

Quoted:
interesting my day job involves astronomy and space systems

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

lol, no.

I just like to tinker, doubt I will do anything that puts Adapteva on the map.
I like astronomy and astrophysics stuff, have found source code repositories for that on the web, may start out trying port some of those to parallella, I think the lack of double precision in Epiphany may hinder some of that though.

interesting my day job involves astronomy and space systems

I know nothing about the science of it, just like a programming challenge and that those topics fascinate me enough to get my butt in gear and start coding.

I have a wall street background so that is another possibility.

Posted: 6/8/2015 4:26:55 PM EDT

[#18]

"wall street background" that sounds pretty cool

Posted: 6/8/2015 4:34:50 PM EDT

[#19]

Quote History

Quoted:

"wall street background" that sounds pretty cool

View Quote

Worked on some forex trading systems 20 years ago, exotic option pricing, etc, for a big investment bank.

Too long ago to remember much, but I saw some CUDA example option pricing code I may play with.

Again, the lack of double precision I think is the biggest hinderance to Adapteva on these versions of Epiphany.

The low power requirements are cool, but serious work needs serious precision, and people who need that can pay for the power (Wall St) or get grants (academia).

Posted: 6/8/2015 4:41:15 PM EDT

[#20]

Quote History

Quoted:

Too long ago to remember much, but I saw some CUDA example option pricing code I may play with.

View Quote

CUDA Coding... Maybe you can tell my why an app I wrote works on a GTX 460 but won't work on a GTX 960, even after being re-compiled?

Am I missing something?

Posted: 6/8/2015 4:42:26 PM EDT

[#21]

Quote History

Quoted:
CUDA Coding... Maybe you can tell my why an app I wrote works on a GTX 460 but won't work on a GTX 960, even after being re-compiled?

Am I missing something?

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:

Too long ago to remember much, but I saw some CUDA example option pricing code I may play with.

CUDA Coding... Maybe you can tell my why an app I wrote works on a GTX 460 but won't work on a GTX 960, even after being re-compiled?

Am I missing something?

No idea, just looking at the example code and trying to figure out how to shoe-horn into Epiphany.

Posted: 6/9/2015 4:07:15 PM EDT

[#22]

Just working on a pretty simple nbody simulation for now, since astronomy subjects interest me.

Once I get it working ok on one parallella, I will work on trying to get MPI to distribute it across multiple boards.

For me, the programming challenges are more interesting than the end result sometimes.

geeks are easily amused.

When I am kayaking or boating I am thinking about solutions to these kinds of challenges all the time.

Others are thinking 'tits' the whole time.

Posted: 6/9/2015 4:42:21 PM EDT

[#23]

Haven't shipped the other board yet. It should go out tomorrow.

Quote History

Quoted:
Just working on a pretty simple nbody simulation for now, since astronomy subjects interest me.

Once I get it working ok on one parallella, I will work on trying to get MPI to distribute it across multiple boards.

For me, the programming challenges are more interesting than the end result sometimes.

geeks are easily amused.

When I am kayaking or boating I am thinking about solutions to these kinds of challenges all the time.

Others are thinking 'tits' the whole time.

View Quote

Posted: 6/9/2015 5:03:39 PM EDT

[#24]

Quote History

Quoted:

Haven't shipped the other board yet. It should go out tomorrow.

View Quote

No problem, I really appreciate you sending it to me.

Posted: 6/9/2015 5:45:43 PM EDT

[#25]

Quote History

Quoted:
Cool.

I upgraded my home Cisco CUCM, Unity Connection and IM&P servers from version 9.1 to version 10.5 last night.

View Quote

I'm in the middle of going from 8.5 to 10.5 at work.

It's way past time.

Posted: 6/9/2015 7:36:01 PM EDT

[#26]

Quote History

Quoted:

I'm in the middle of going from 8.5 to 10.5 at work.

It's way past time.

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:
Cool.

I upgraded my home Cisco CUCM, Unity Connection and IM&P servers from version 9.1 to version 10.5 last night.

I'm in the middle of going from 8.5 to 10.5 at work.

It's way past time.

If you need any advice, let me know. I've done 6 or 8 of those.

Posted: 6/9/2015 10:14:15 PM EDT

[#27]

dbltap

Posted: 6/9/2015 10:15:11 PM EDT

[#28]

Quote History

Quoted:
The code that runs on the epiphany cores

int main(void) {
e_coreid_t coreid;
unsigned row, col;
char *outbuf;
int sqr;
int core;
outbuf = (char *) 0x0000;

// Who am I? Query the CoreID from hardware.
coreid = e_get_coreid();
e_coords_from_coreid(coreid, &row, &col);

core = (4 * row) + (col + 1);
sqr = core * core;

sprintf(outbuf, "Greetings from core 0x%03x! These are my coordinates: %d, %d - sqr = %d", coreid, row, col, sqr);

return EXIT_SUCCESS;
}

View Quote

Question, you have outbuf as a pointer to 0x0000, how does that give output, or is it going to the 0x0100000 address from the C code?

Little slow tonight..

Posted: 6/10/2015 6:29:50 AM EDT

[#29]

Quote History

Quoted:
Question, you have outbuf as a pointer to 0x0000, how does that give output, or is it going to the 0x0100000 address from the C code?
Little slow tonight..

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:

The code that runs on the epiphany cores
int main(void) {

e_coreid_t coreid;

unsigned row, col;

char *outbuf;

int sqr;

int core;

outbuf = (char *) 0x0000;
// Who am I? Query the CoreID from hardware.

coreid = e_get_coreid();

e_coords_from_coreid(coreid, &row, &col);
core = (4 * row) + (col + 1);

sqr = core * core;

sprintf(outbuf, "Greetings from core 0x%03x! These are my coordinates: %d, %d - sqr = %d", coreid, row, col, sqr);
return EXIT_SUCCESS;

}

Question, you have outbuf as a pointer to 0x0000, how does that give output, or is it going to the 0x0100000 address from the C code?
Little slow tonight..

It is storing that string created by the sprintf() command in local memory for that epiphany core at address 0, this code runs on all 16 cores.

The host program running on the ARM can read each cores memory to retrieve the string and print it out.

In the host code this is where I defined it to run on all 16 cores, that e_open() is creating a workgroup of cores that is 4 cores wide by 4 cores deep.

e_open(&dev, 0, 0, 4, 4);

e_reset_group(&dev);

And this is where it is reading the string from the cores memory, row and col is the cores position in the 4x4 workgroup.

e_read(&dev, row, col, 0x0000, emsg, _BufSize);

I don't need this line in the code posted on page 2: e_alloc(&emem, _BufOffset, _BufSize);

The original example I was hacking up was passing the string back in shared dram, another way of passing data between the host and core.

Posted: 6/10/2015 8:42:46 AM EDT

[#30]

Quote History

Quoted:

If you need any advice, let me know. I've done 6 or 8 of those.

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:
Cool.

I upgraded my home Cisco CUCM, Unity Connection and IM&P servers from version 9.1 to version 10.5 last night.

I'm in the middle of going from 8.5 to 10.5 at work.

It's way past time.

If you need any advice, let me know. I've done 6 or 8 of those.

Wil do. I'm getting rusty, having not worked for a Cisco partner in 6 years.

Posted: 6/10/2015 8:31:44 PM EDT

[#31]

Quote History

Quoted:
It is storing that string created by the sprintf() command in local memory for that epiphany core at address 0, this code runs on all 16 cores.

The host program running on the ARM can read each cores memory to retrieve the string and print it out.

In the host code this is where I defined it to run on all 16 cores, that e_open() is creating a workgroup of cores that is 4 cores wide by 4 cores deep.

e_open(&dev, 0, 0, 4, 4);
e_reset_group(&dev);

And this is where it is reading the string from the cores memory, row and col is the cores position in the 4x4 workgroup.

e_read(&dev, row, col, 0x0000, emsg, _BufSize);

I don't need this line in the code posted on page 2: e_alloc(&emem, _BufOffset, _BufSize);

The original example I was hacking up was passing the string back in shared dram, another way of passing data between the host and core.

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:
The code that runs on the epiphany cores

int main(void) {
e_coreid_t coreid;
unsigned row, col;
char *outbuf;
int sqr;
int core;
outbuf = (char *) 0x0000;

// Who am I? Query the CoreID from hardware.
coreid = e_get_coreid();
e_coords_from_coreid(coreid, &row, &col);

core = (4 * row) + (col + 1);
sqr = core * core;

sprintf(outbuf, "Greetings from core 0x%03x! These are my coordinates: %d, %d - sqr = %d", coreid, row, col, sqr);

return EXIT_SUCCESS;
}

Question, you have outbuf as a pointer to 0x0000, how does that give output, or is it going to the 0x0100000 address from the C code?

Little slow tonight..

It is storing that string created by the sprintf() command in local memory for that epiphany core at address 0, this code runs on all 16 cores.

The host program running on the ARM can read each cores memory to retrieve the string and print it out.

In the host code this is where I defined it to run on all 16 cores, that e_open() is creating a workgroup of cores that is 4 cores wide by 4 cores deep.

e_open(&dev, 0, 0, 4, 4);
e_reset_group(&dev);

And this is where it is reading the string from the cores memory, row and col is the cores position in the 4x4 workgroup.

e_read(&dev, row, col, 0x0000, emsg, _BufSize);

I don't need this line in the code posted on page 2: e_alloc(&emem, _BufOffset, _BufSize);

The original example I was hacking up was passing the string back in shared dram, another way of passing data between the host and core.

Isn't your current program passing the core info back and forth in ram? I guess my point of confusion seems to be that all cores are writing to the same hard coded *outbuf, so they'd overwrite the data at 0x000 when called from each core. I guess I would use the sprintf return value to add to outbuf so *outbuf was "clean" RAM, or are you randomly printing which core has written to that area most recently? I tend to veer away from using the 0x0 RAM address, myself on architectures different than this one, I go from top down.

Is there an installed RAM call on the parallel cores, or is that something the ARM would pass to them?

Posted: 6/11/2015 5:01:19 AM EDT

[#32]

Quote History

Quoted:
Isn't your current program passing the core info back and forth in ram? I guess my point of confusion seems to be that all cores are writing to the same hard coded *outbuf, so they'd overwrite the data at 0x000 when called from each core. I guess I would use the sprintf return value to add to outbuf so *outbuf was "clean" RAM, or are you randomly printing which core has written to that area most recently? I tend to veer away from using the 0x0 RAM address, myself on architectures different than this one, I go from top down.
Is there an installed RAM call on the parallel cores, or is that something the ARM would pass to them?

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Question, you have outbuf as a pointer to 0x0000, how does that give output, or is it going to the 0x0100000 address from the C code?
Little slow tonight..

It is storing that string created by the sprintf() command in local memory for that epiphany core at address 0, this code runs on all 16 cores.
The host program running on the ARM can read each cores memory to retrieve the string and print it out.

In the host code this is where I defined it to run on all 16 cores, that e_open() is creating a workgroup of cores that is 4 cores wide by 4 cores deep.
e_open(&dev, 0, 0, 4, 4);

e_reset_group(&dev);
And this is where it is reading the string from the cores memory, row and col is the cores position in the 4x4 workgroup.
e_read(&dev, row, col, 0x0000, emsg, _BufSize);

I don't need this line in the code posted on page 2: e_alloc(&emem, _BufOffset, _BufSize);
The original example I was hacking up was passing the string back in shared dram, another way of passing data between the host and core.

Isn't your current program passing the core info back and forth in ram? I guess my point of confusion seems to be that all cores are writing to the same hard coded *outbuf, so they'd overwrite the data at 0x000 when called from each core. I guess I would use the sprintf return value to add to outbuf so *outbuf was "clean" RAM, or are you randomly printing which core has written to that area most recently? I tend to veer away from using the 0x0 RAM address, myself on architectures different than this one, I go from top down.
Is there an installed RAM call on the parallel cores, or is that something the ARM would pass to them?

Each core has it's own local memory it is writing to, and that is what the host program is reading from.

This is telling the the host program to read the local memory of the core at row, col in the workgroup

e_read(&dev, row, col, 0x0000, emsg, _BufSize);

Posted: 6/11/2015 5:12:07 AM EDT

[#33]

Freakin parallel processing...and here I am having trouble with an Arduino project that is missing keypad scanning inputs at times.

I'm a tard.

Posted: 6/11/2015 5:57:57 AM EDT

[#34]

Just snagged a brand new parallella on eBay for $88

Going to order the 4 board case tomorrow and have a nice little mini cluster built out soon.

The case comes from UK though so may take a while to get.

Posted: 6/11/2015 10:24:43 AM EDT

[#35]

Quote History

Quoted:Each core has it's own local memory it is writing to, and that is what the host program is reading from.

This is telling the the host program to read the local memory of the core at row, col in the workgroup

e_read(&dev, row, col, 0x0000, emsg, _BufSize);

View Quote

Ahh, got it. I thought it was all using the same RAM pool and addresses.

Posted: 6/11/2015 11:29:10 AM EDT

[#36]

How much local RAM does each core get? Is that merged as part of the main RAM, or part of the FPGA?

I guess I should start saving for one.

Posted: 6/11/2015 11:31:29 AM EDT

[#37]

Quote History

Quoted:

How much local RAM does each core get? Is that merged as part of the main RAM, or part of the FPGA?

I guess I should start saving for one.

View Quote

This is member AD_UKs parallella chronicles entry on the memory

https://www.parallella.org/2015/02/28/parallella-chronicles-part-five/

Posted: 6/11/2015 3:22:43 PM EDT

[#38]

Getting close to having version 0.0001 of an nbody astronomy simulation coded.

Those are the stars orbiting each other type of simulation.

No graphics yet, just sending the 'stars' to each core to process.

Once I get this working ok on a single board, I will work on getting it distributed across 4 parallella boards and using all 64 cores (using MPI I guess), and add some graphics to display the motions of the stars.

This is an nvidia example nbody simulation in case anyone is 'WTF' is an nbody simulation. Mine won't be nearly as nice or fast, lol

Posted: 6/11/2015 3:59:25 PM EDT

[#39]

Here is an nbody simulation with 10^10 stars

For the math challenged, that is a big ass number

It is cool how stars / galaxies seem to align in filaments on huge scales, probably dark matter? causing that my guess

Even pics of the real universe show these filaments

Posted: 6/11/2015 5:51:41 PM EDT

[#40]

This might be something to tackle first, a fluid simulation with N particles (no gravity, just collisions). It's written for Processing, an easy to use language for PC, but the algorithms are there.

I'd like to see the results, the native code is a bit slow once the particle count goes up in WebGL, full visual water simulation in browser if you have a 3D card Here (fun to play with even if you don't want to code it)

Just a couple of ideas. Is just the C source code for the parallelia Bernoulli set program available somewhere without downloading the entire package?

Posted: 6/11/2015 7:12:25 PM EDT

[#41]

Quote History

Quoted:

This might be something to tackle first, a fluid simulation with N particles (no gravity, just collisions). It's written for Processing, an easy to use language for PC, but the algorithms are there.
I'd like to see the results, the native code is a bit slow once the particle count goes up in WebGL, full visual water simulation in browser if you have a 3D card Here (fun to play with even if you don't want to code it)
Just a couple of ideas. Is just the C source code for the parallelia Bernoulli set program available somewhere without downloading the entire package?

View Quote

Those are cool.

You want the mandelbrot one?

Here:

http://github.com/parallella/parallella-examples/tree/master/mandelbrot

Posted: 6/12/2015 1:22:18 PM EDT

[#42]

This line is killing my nbody sim performance on the 16 cores

invDist = 1.0f / sqrtf(distSqr);

Epiphany being a RISC processor doesn't have division as a built in function, and God knows how that sqrtf() function is being handled.

Sigh, the joys of being a coder.

Tons more googling to do on this.

I did order the the 4 board cluster case for it today, $167 after exchange rates and shipping from UK, lol.

Posted: 6/12/2015 1:35:06 PM EDT

[#43]

My parallel code is running on the 16 cores and I am getting the same results from code running strictly on the ARM processor, so I am pretty confident my parallel logic is sound, just that division and square root headache to overcome.

And people think computer geeks have a cushy job in an air conditioned cube all day.

Posted: 6/12/2015 4:37:03 PM EDT

[#44]

Quote History

Quoted:
This line is killing my nbody sim performance on the 16 cores

invDist = 1.0f / sqrtf(distSqr);

Epiphany being a RISC processor doesn't have division as a built in function, and God knows how that sqrtf() function is being handled.

Sigh, the joys of being a coder.

View Quote

Don't use the library sqrt function. Use an iterative Newton's Approximation of Square root to the accuracy you desire. It will run faster than the library function and be accurate to a few significant digits.

It all depends in how accurate the sqrt must be, how much error can be tolerated. For visual things, you can use a 2 decimal point square root and be close, while being extremely fast.

There are other sqrt algorithms, depending on accuracy vs. speed. The sqrtf() function goes for accuracy.

Newton's approximation uses the handy ½, which is a right shift, and the rest is addition and multiplication.

Posted: 6/12/2015 4:39:07 PM EDT

[#45]

Quote History

Quoted:
Don't use the library sqrt function. Use an iterative Newton's Approximation of Square root to the accuracy you desire. It will run faster than the library function and be accurate to a few significant digits.

It all depends in how accurate the sqrt must be, how much error can be tolerated. For visual things, you can use a 2 decimal point square root and be close, while being extremely fast.

There are other sqrt algorithms, depending on accuracy vs. speed. The sqrtf() function goes for accuracy.

Newton's approximation uses the handy ½, which is a right shift, and the rest is addition and multiplication.

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:

This line is killing my nbody sim performance on the 16 cores

invDist = 1.0f / sqrtf(distSqr);
Epiphany being a RISC processor doesn't have division as a built in function, and God knows how that sqrtf() function is being handled.
Sigh, the joys of being a coder.

Don't use the library sqrt function. Use an iterative Newton's Approximation of Square root to the accuracy you desire. It will run faster than the library function and be accurate to a few significant digits.

It all depends in how accurate the sqrt must be, how much error can be tolerated. For visual things, you can use a 2 decimal point square root and be close, while being extremely fast.

There are other sqrt algorithms, depending on accuracy vs. speed. The sqrtf() function goes for accuracy.

Newton's approximation uses the handy ½, which is a right shift, and the rest is addition and multiplication.

Thanks, I will look into that.

The division is a problem too though

Posted: 6/12/2015 5:36:44 PM EDT

[#46]

Quote History

Quoted:
Thanks, I will look into that.

The division is a problem too though

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:
This line is killing my nbody sim performance on the 16 cores

invDist = 1.0f / sqrtf(distSqr);

Epiphany being a RISC processor doesn't have division as a built in function, and God knows how that sqrtf() function is being handled.

Sigh, the joys of being a coder.

Don't use the library sqrt function. Use an iterative Newton's Approximation of Square root to the accuracy you desire. It will run faster than the library function and be accurate to a few significant digits.

It all depends in how accurate the sqrt must be, how much error can be tolerated. For visual things, you can use a 2 decimal point square root and be close, while being extremely fast.

There are other sqrt algorithms, depending on accuracy vs. speed. The sqrtf() function goes for accuracy.

Newton's approximation uses the handy ½, which is a right shift, and the rest is addition and multiplication.

Thanks, I will look into that.

The division is a problem too though

The div isn't a patch on the slowness of most sqrt libraries, I believe the cores have that instruction built in from what I've read. Is there a code profiler in the SDK, or do you need to do it the old fashioned way?

Posted: 6/12/2015 5:43:23 PM EDT

[#47]

Quote History

Quoted:
The div isn't a patch on the slowness of most sqrt libraries, I believe the cores have that instruction built in from what I've read. Is there a code profiler in the SDK, or do you need to do it the old fashioned way?

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:

This line is killing my nbody sim performance on the 16 cores

invDist = 1.0f / sqrtf(distSqr);
Epiphany being a RISC processor doesn't have division as a built in function, and God knows how that sqrtf() function is being handled.
Sigh, the joys of being a coder.

Don't use the library sqrt function. Use an iterative Newton's Approximation of Square root to the accuracy you desire. It will run faster than the library function and be accurate to a few significant digits.

It all depends in how accurate the sqrt must be, how much error can be tolerated. For visual things, you can use a 2 decimal point square root and be close, while being extremely fast.

There are other sqrt algorithms, depending on accuracy vs. speed. The sqrtf() function goes for accuracy.

Newton's approximation uses the handy ½, which is a right shift, and the rest is addition and multiplication.

Thanks, I will look into that.

The division is a problem too though

The div isn't a patch on the slowness of most sqrt libraries, I believe the cores have that instruction built in from what I've read. Is there a code profiler in the SDK, or do you need to do it the old fashioned way?

No hardware division

Posted: 6/12/2015 7:45:33 PM EDT

[#48]

I love the internet.

Found this, very fast in my code, loses a little accuracy, but orders of magnitude faster.

Does exactly what I am trying to do without division or square root calls.

Used in Quake III Arena code.

https://en.wikipedia.org/wiki/Fast_inverse_square_root

Even has the C code in that wiki, I uncommented that last line for a little better accuracy.

The number in red below is the magic sauce and kind of a mystery how someone derived it.

float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
 
x2 = number * 0.5F;
y  = number;
i  = * ( long * ) &y;                       // evil floating point bit level hacking
i  = 0x5f3759df - ( i >> 1 );               // what the fuck? 
y  = * ( float * ) &i;
y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//      y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed
 
return y;
}

Posted: 6/12/2015 7:58:07 PM EDT

[#49]

So if you all can plot out the universe movements on this thing can you model or simulate how drug molecules dock into proteins? Now that'd be cool and very useful in pharma world. Just sayin'

Posted: 6/12/2015 8:03:43 PM EDT

[#50]

Quote History

Quoted:
What I did with my Raspberry Pi, and of course I finished it one month before the Raspberry Pi 2 came out which can do N64 games... So I will need to do an update.

http://i.imgur.com/cDN8fre.jpg?1

http://i.imgur.com/MhW43K4.jpg?1

http://i.imgur.com/PWcIOqX.jpg?1

Edit: Forgot my startup video!

Vertical cam warning!

http://youtu.be/xqBEzHY7xy4

View Quote

how can a retard make this?

ive been looking at buying one off the internet that has hundreds of games and will cost about 4k

[ARCHIVED THREAD] - Getting my geek on today... (Page 3 of 13)

General » General Discussion

Warning

Confirm Action

About AR15.COM

Stay Connected

Newsletter

Contact Us