Getting my geek on today... - Page 4

[ARCHIVED THREAD] - Getting my geek on today... (Page 4 of 13)

Posted: 6/12/2015 8:11:03 PM EDT

[#1]

I'd forgotten about the inverse root function, I remember it only because of that strange constant in the algorithm.

At least Newton's Method is somewhere in there.

Posted: 6/12/2015 8:32:44 PM EDT

[#2]

Quote History

Quoted:

I'd forgotten about the inverse root function, I remember it only because of that strange constant in the algorithm.

At least Newton's Method is somewhere in there.

View Quote

Yeah it is a slick piece of code.

My code running on the 16 cores now is noticeable faster than the similar code running on the ARM only.

Before I found this magic code it was way slower.

Still performance tweaks to make but the parallel programming model is a lot less 'black magic voodoo' to me now.

Posted: 6/12/2015 8:34:26 PM EDT

[#3]

Quote History

Quoted:
So if you all can plot out the universe movements on this thing can you model or simulate how drug molecules dock into proteins? Now that'd be cool and very useful in pharma world. Just sayin'

View Quote

There are already accelerated ASIC solutions that are faster than the board talked about in this thread.

The Parallelia (board in this thread) runs on gives you 16 or 64 "general purpose" CPUs to work together on a problem. These can all be programmed in C, making it relatively easy to use.

The USB "Miners" for bitcoin (or protein folding) are all ASICs (Application Specific Integrated Circuit), and they can only do what they were designed to do (fold protein or mine bitcoins), they are a parallel computer that has only one task possible. Whereas the Zynq Processor on the Parallelia board give you General Purpose CPUs that run at the same time (in parallel), letting you do any calculation faster, as long as it can be broken up into parts, such as each core rendering part of an image.

Not all programs scale easily into parallel computing. In PC computing, for example Firefox still mostly runs on one core of your CPU, even if you have 8 cores it can run on. Software needs to be re-written a good deal so that it can be run on several cores at once so that it works faster. Photoshop is an example where different "chunks" of images and the effect function you are applying to that image are sent to different cores, so with 8 of them working on doing a blur, it finishes almost 8 times faster. That's the beauty of parallel computing, instead of 8 cores, a video card (like an nVidia 960) will have thousands of cores each of which is a wizard at math (floating point and matrix), all designed to render video quickly.

Posted: 6/13/2015 6:34:55 AM EDT

[#4]

Crappy video demo of the difference of the nbody code running strictly on the CPU vs parallel version on the 16 core epiphany chip.

The first run is the CPU only, second run on the 16 cores

Both are calculating an 800 star system over 10 iterations. The more stars I add the the bigger the difference between the two.

Posted: 6/13/2015 9:55:33 AM EDT

[#5]

Quote History

Quoted:
Crappy video demo of the difference of the nbody code running strictly on the CPU vs parallel version on the 16 core epiphany chip.

The first run is the CPU only, second run on the 16 cores

Both are calculating an 800 star system over 10 iterations. The more stars I add the the bigger the difference between the two.

http://www.youtube.com/watch?v=7z6G5J-OB9Y

View Quote

That is a remarkable speed-up. How much do the "answers" differ from the sqrt lib function and the fast inverse root function? Is it small enough to still be modeling a galaxy once you are into the billions of iterations?

Posted: 6/13/2015 11:12:40 AM EDT

[#6]

Quote History

Quoted:
That is a remarkable speed-up. How much do the "answers" differ from the sqrt lib function and the fast inverse root function? Is it small enough to still be modeling a galaxy once you are into the billions of iterations?

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:

Crappy video demo of the difference of the nbody code running strictly on the CPU vs parallel version on the 16 core epiphany chip.
The first run is the CPU only, second run on the 16 cores
Both are calculating an 800 star system over 10 iterations. The more stars I add the the bigger the difference between the two.

http://www.youtube.com/watch?v=7z6G5J-OB9Y

That is a remarkable speed-up. How much do the "answers" differ from the sqrt lib function and the fast inverse root function? Is it small enough to still be modeling a galaxy once you are into the billions of iterations?

They are slightly off, the more newtonian iterations I do the more accurate it is, just a speed trade-off.

I am currently doing 5 newtonian iterations and the numbers are close enough for me, I am not doing any kind of rigorous scientific analysis.

For comparison, the x coordinate for star # 1 is -0.319215 on the CPU after 10 iterations

The same x coordinate on the 16 core run is -0.319304 after 10 iterations.

Posted: 6/13/2015 11:34:51 AM EDT

[#7]

I bumped up the code to take the 800 stars through 1000 iterations instead of 10

CPU took 6 minutes 24 seconds

Epiphany took 1 minute 17 seconds

There is still room to squeeze out performance in the parallel code, it is just my first attempt, I am sure there are tricks, techniques, etc I can apply to make it even faster.

Posted: 6/13/2015 2:59:06 PM EDT

[#8]

Quote History

Quoted:
I bumped up the code to take the 800 stars through 1000 iterations instead of 10

CPU took 6 minutes 24 seconds
Epiphany took 1 minute 17 seconds

There is still room to squeeze out performance in the parallel code, it is just my first attempt, I am sure there are tricks, techniques, etc I can apply to make it even faster.

View Quote

Is any of that time spent writing to the frame buffer / display?

Posted: 6/13/2015 3:00:36 PM EDT

[#9]

goloud, I got the parallella you sent me in the mail today, thank you very much!

Once I get my 4 board case I will build out the cluster.

Posted: 6/13/2015 3:05:28 PM EDT

[#10]

Quote History

Quoted:
Is any of that time spent writing to the frame buffer / display?

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:

I bumped up the code to take the 800 stars through 1000 iterations instead of 10
CPU took 6 minutes 24 seconds

Epiphany took 1 minute 17 seconds
There is still room to squeeze out performance in the parallel code, it is just my first attempt, I am sure there are tricks, techniques, etc I can apply to make it even faster.

Is any of that time spent writing to the frame buffer / display?

I turned off all the printing to the screen didn't make a difference speed wise.

One thing I did change is previously I was writing the 800 star data to all 16 cores local memory.

I changed that to only write to core #1 memory, and all the cores read from there. Trying to cut down on all the data moving back and forth.

Also have the code running on the epiphany to have core #1 sum up all the results from all the other cores when they signal they are done, previously I was downloading all the data to the ARM and summing there.

Also noticed on that fast inverse squareroot code if I newtonian iterate an even number of times, the values diverge rapidly. Any odd number of iterations keeps everything on track.

Posted: 6/13/2015 7:59:29 PM EDT

[#11]

Someone requested to see the code, too big for IM so posting here
Code not cleaned up yet, I tend to code to get something working, then go back and make it 'pretty'
It is a work in progress, logic will change as I get more experience with these things.

cut / paste screwed up the formatting

ETA: Code formatted properly below

vvvvvvvvvvvvvvvvvvvvvvv

Posted: 6/13/2015 10:17:49 PM EDT

[#12]

Let me see if the code /code tags work.


//ARM Supervisor Code

#include <stdlib.h>

#include <stdio.h>

#include <string.h>

#include <unistd.h>

#include <inttypes.h>

#include <e-hal.h>

#define _BufSize (128)

#define _BufOffset (0x01000000)

#define _SeqLen (20)

typedef struct {

    float x, y, z, vx, vy, vz;

} Body;

static inline void nano_wait(uint32_t sec, uint32_t nsec)

{

    struct timespec ts;

    ts.tv_sec = sec;

    ts.tv_nsec = nsec;

    nanosleep(&ts, NULL);

}

void randomizeBodies(Body *data, int n) {

    int i;

    for (i = 0; i < n; i++) {

        data[i].x = 2.0f * (rand() / (float)RAND_MAX) - 1.0f;

        data[i].y = 2.0f * (rand() / (float)RAND_MAX) - 1.0f;

        data[i].z = 2.0f * (rand() / (float)RAND_MAX) - 1.0f;

        data[i].vx = 2.0f * (rand() / (float)RAND_MAX) - 1.0f;

        data[i].vy = 2.0f * (rand() / (float)RAND_MAX) - 1.0f;

        data[i].vz = 2.0f * (rand() / (float)RAND_MAX) - 1.0f;

    }

}

int main(int argc, char *argv[])

{

    unsigned row, col, core;

    e_platform_t platform;

    e_epiphany_t dev;

    e_mem_t emem;

    char emsg[_BufSize];

    int all_done, core_done;

    int i, x;

    srand(1);

    int nBodies = 30000;

    int startFlag = 0x00000001;

    if (argc > 1) nBodies = atoi(argv[1]);

    const float dt = 0.01f;

    // time step

    const int iters = 1000;

    // simulation iterations

    Body *buf = (Body *) malloc(sizeof(Body) * nBodies);

    Body *bufOutput = (Body *) malloc(sizeof(Body) * nBodies);

    randomizeBodies(buf, nBodies);

    // Init pos / vel data

    e_init(NULL);

    e_reset_system();

    e_get_platform_info(&platform);

    e_open(&dev, 0, 0, 4, 4);

    e_reset_group(&dev);

    e_load_group("e_rob_nbody.srec", &dev, 0, 0, 4, 4, E_FALSE);

    for (row = 0; row < platform.rows; row++){

        for (col = 0; col < platform.cols; col++){

            e_write(&dev, row, col, 0x0004, &nBodies, sizeof(int));

            // No longer needed, not suming on ARM any more

            // e_write(&dev, row, col, 0x1000, (Body *) buf, sizeof(Body) * nBodies);

        }

    }

    e_write(&dev, 0, 0, 0x1000, (Body *) buf, sizeof(Body) * nBodies);

    for(x = 0; x < iters; x++){

        fprintf(stderr, "Iter %dn", x);

        for (row = 0; row < platform.rows; row++){

            for (col = 0; col < platform.cols; col++){

                e_write(&dev, row, col, 0x0008, &startFlag, sizeof(int));

            }

        }

        e_start_group(&dev);

        //Check if all cores are done

        while(1){

            all_done = 0;

            for (row = 0; row < platform.rows; row++){

                for (col = 0; col < platform.cols; col++){

                    e_read(&dev, row, col, 0x0008, &core_done, sizeof(int));

                    all_done += core_done;

                }

            }

            if(all_done == 0){

                break;

            }

        }

        e_read(&dev, 0, 0, 0x1000, (Body *) bufOutput, sizeof(Body) * nBodies);

        fprintf(stderr, "buf[0].x = %fn", bufOutput[0].x);

    }

    e_close(&dev);

    // Release the allocated buffer and finalize the

    // e-platform connection.

    e_finalize();

    return 0;


Epiphany parallel computing code



#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include "e_lib.h"

#define SOFTENING 1e-9f

float Q_rsqrt(float number);



typedef struct {

    float x, y, z, vx, vy, vz;

} Body;



void sync_isr(int);



int main(void) {

    e_coreid_t coreid;

    unsigned row, col, *done;

    char *outbuf;

    int core;

    Body *q = (Body *) 0x1000;

    Body *p = (Body *) e_get_global_address(0, 0, q); // All 16 cores read / write to core 0,0 mem

    int *n = (int *) 0x0004; //numBodies

    int *o = (int *) 0x0008; //Done

    const float dt = 0.01f; // time step

    outbuf = (char *) 0x0000;



    while(1){

        e_irq_attach(E_SYNC, sync_isr);

        e_irq_mask(E_SYNC, E_FALSE);

        e_irq_global_mask(E_FALSE);

        // Who am I? Query the CoreID from hardware.

        coreid = e_get_coreid();

        e_coords_from_coreid(coreid, &row, &col);

        core = (4 * row) + (col + 1);



        int i, j;

        float dx;

        float dy;

        float dz;

        float distSqr;

        float invDist;

        float invDist3;

        // only work on this core's slice of the data array

        for (i = core - 1; i < *n; i += 16) {

            float Fx = 0.0f; float Fy = 0.0f; float Fz = 0.0f;

            for (j = 0; j < *n; j++) {

                dx = p[j].x - p[i].x;

                dy = p[j].y - p[i].y;

                dz = p[j].z - p[i].z;

                distSqr = dx*dx + dy*dy + dz*dz + SOFTENING;

                // Bad performance

                //invDist = 1.0f / sqrtf(distSqr);

                invDist = Q_rsqrt(distSqr);

                invDist3 = invDist * invDist * invDist;

                Fx += dx * invDist3;

                Fy += dy * invDist3;

                Fz += dz * invDist3;

            }

            p[i].vx += dt*Fx;

            p[i].vy += dt*Fy;

            p[i].vz += dt*Fz;

        }

        if(core == 1){  // Core 1 holds all data

            int *z[16];

            int num = 0;

            int all_done;

            for(i = 0; i < 4; i++){

                for(j = 0; j < 4; j++){

                    z[num++] = e_get_global_address(i, j, o);

                }

            }

            while(1){

                all_done = 0;

                for (i = 1; i < 16; i++){

                    all_done += *z[i];

                }

                if(all_done == 0){

                    break;

                }

            }

            // all cores done processing this iterations, calculate new x, y, z coordinates for next iteration

            for (i = 0; i < *n; i++) {

                p[i].x += p[i].vx * dt;

                p[i].y += p[i].vy * dt;

                p[i].z += p[i].vz * dt;

            }

        }

        (*(o)) = 0x00000000;

        while(((*(o)) == 0x00000000))

        __asm__ __volatile__ ("idle");

    }

}

void __attribute__((interrupt)) sync_isr(int x)

{

    return;

}

float Q_rsqrt(float number)

{

    long i;

    float x2, y;

    const float threehalfs = 1.5F;

    x2 = number * 0.5F;

    y = number;

    i = * ( long * ) &y;

    i = 0x5f3759df - ( i >> 1 );

    y = * (float *) &i;

    // 3 Iterations give good result, even # iterations don't work properly

    y = y * ( threehalfs - ( x2 * y * y ));

    y = y * ( threehalfs - ( x2 * y * y ));

    y = y * ( threehalfs - ( x2 * y * y ));

    return y;

}

Posted: 6/14/2015 1:38:06 AM EDT

[#13]

Quote History

Quoted:

Let me see if the code /code tags work.

View Quote

Thanks!

Posted: 6/14/2015 12:00:46 PM EDT

[#14]

This part from the epiphany code, is it running on one core or all cores? I lost myself in the braces and think this is under core 1 code:


            // all cores done processing this iterations, calculate new x, y, z coordinates for next iteration



            for (i = 0; i < *n; i++) {



                p[i].x += p[i].vx * dt;



                p[i].y += p[i].vy * dt;



                p[i].z += p[i].vz * dt;



            }

Where each core only adds the velocity * time step to the objects in their "sector". Maybe make it an extra subroutine? With the small object and step, it doesn't matter, but when you get to millions of points by 64 cores, falling back on one for doing all new positions may lag it down a good deal.

Second, this part:


 for(i = 0; i < 4; i++){



                for(j = 0; j < 4; j++){



                    z[num++] = e_get_global_address(i, j, o);

Isn't that something that would only need to happen once, and be outside a loop, rather than run on all cores? That's a lot of calls to e_get_global_address()

Again, my understanding of the architecture is imperfect here.

Lastly, can you make the cores into two workgroups, so that all the cores don't do the check to see if they are core 1? Define 15 cores to run that code without the check for core 1, and the 16th cored as core 1. Do they still get to share memory the way you are using it when making groups of cores into virtual CPUs?

Did you time the code to see if the summing on the ARM was appreciable slower than summing on Core #1?

Posted: 6/14/2015 3:06:27 PM EDT

[#15]

Quote History

Quoted:

This part from the epiphany code, is it running on one core or all cores? I lost myself in the braces and think this is under core 1 code:

            // all cores done processing this iterations, calculate new x, y, z coordinates for next iteration            for (i = 0; i < *n; i++) {                p[i].x += p[i].vx * dt;                p[i].y += p[i].vy * dt;                p[i].z += p[i].vz * dt;            }

Where each core only adds the velocity * time step to the objects in their "sector". Maybe make it an extra subroutine? With the small object and step, it doesn't matter, but when you get to millions of points by 64 cores, falling back on one for doing all new positions may lag it down a good deal.

Second, this part:

 for(i = 0; i < 4; i++){                for(j = 0; j < 4; j++){                    z[num++] = e_get_global_address(i, j, o);

Isn't that something that would only need to happen once, and be outside a loop, rather than run on all cores? That's a lot of calls to e_get_global_address()

Again, my understanding of the architecture is imperfect here.

Lastly, can you make the cores into two workgroups, so that all the cores don't do the check to see if they are core 1? Define 15 cores to run that code without the check for core 1, and the 16th cored as core 1. Do they still get to share memory the way you are using it when making groups of cores into virtual CPUs?

Did you time the code to see if the summing on the ARM was appreciable slower than summing on Core #1?

View Quote

That first for() only happens on core #1

I am keeping it on core #1 to avoid the data move from local memory to ARM memory, there is a performance penalty to moving it over and moving it back after summing.

The algorithm is a brute force algorithm, comparing every body to every other body to get new vx, vy, vx values based off x, y, z. I don't want to start calculating new x, y, z until all bodies have been compared to all the others.

I am keeping the code for all cores the same for simplicity.

Posted: 6/14/2015 5:24:51 PM EDT

[#16]

My God! It's full of stars!

Sorry for my crappy iPhone video, wish I could capture the HDMI output, but here is a crude first cut at some graphics.

Most of the stars escape each other, but a small cluster of stars start obriting each other.

Posted: 6/14/2015 5:29:50 PM EDT

[#17]

Just seen this from airport bar ( off on a cruise) I am impressed can we talk when I get back?

Posted: 6/14/2015 7:13:03 PM EDT

[#18]

Quote History

Quoted:

Just seen this from airport bar ( off on a cruise) I am impressed can we talk when I get back?

View Quote

Thanks, when something interests me I get motived.

I am mostly just piecing together logic I am finding online.

Here is a run with just 64 stars, it shows the gravity entanglement of the stars better.

Posted: 6/14/2015 10:58:51 PM EDT

[#19]

That's pretty awesome! Nice work adding the display to it. I assume the ones that zip off the display at the start are "slingshots"?

What are your planned addition/expansion ideas?

Posted: 6/15/2015 3:13:08 AM EDT

[#20]

Quote History

Quoted:

That's pretty awesome! Nice work adding the display to it. I assume the ones that zip off the display at the start are "slingshots"?

What are your planned addition/expansion ideas?

View Quote

Yes, those are 2 stars that whipped around each other close by in a way to accelerate them both off in opposite directions at high speed.

Kinda like NASA using a planet's gravity to fling a space probe off towards it's final destination.

Once I get the 4 board cluster build I will try to get this scaled up to 64 processors.

One thing I will add too is the ability to assign a mass to each star, right now they are all treated as equal mass

Then what I can do is create a few massive stars that should attract clouds of lighter stars forming mini galaxies.

Then, as 'God' of my digital universe I can send a couple mini galaxies on a collision course or try to get them to orbit each other.

All kinds of possibilities.

Posted: 6/15/2015 11:48:00 AM EDT

[#21]

Having mass increase as they 'glom' together would make for an awesome simulation.

Maybe make the boundaries toroidal space, so no mass is lost? Toroidal space gives wrapraound, like the asteroids game, where if something goes off the left side of the display, it reappears at the right side with same mass and velocity, same for top and bottom edges.

--ETA: You don't have to actually model and transform the objects to be traveling on the surface of a toroid, just edge checking, like x>720, x=0, and x<0, x=720 (or whatever your display width/height are.

Posted: 6/15/2015 2:36:05 PM EDT

[#22]

I am having so much fun with this.

Here is a 192 star system, 2 mega stars with 100,000x the mass of the others.

I was wondering why more stars are not captured in orbit, but now realize a stable orbit in the universe is probably kind of rare, unless I had billions of stars to play with here, or let it run for a hundred years, I am not going to randomly get many.

The two mega stars are slowly being pulled together as well, would need to let it run a long time to get there though.

Also, the space has 'depth' so it may look like a star passes near a mega star and not get affected much by it's gravity, in reality it is far in front or behind the star.

What's interesting is I have been letting this run for quite a while now, and stars that left off the screen are coming back, so some long elliptical orbits going on there.

Down to just a handful of stars now but more keep coming back from off screen periodically.

Posted: 6/15/2015 3:38:46 PM EDT

[#23]

I am running it now with 800 stars, I think the more stars there are they keep each other in check as each star has an influence on all the other stars gravitationally. The star system seems more stable now, or just moving slower due to more stars lol.

If I could scale this up to 50,000 stars or so that probably would be a pretty good simulation.

There is a Barnes-Hut (Piet Hut from the 'Institute for Advanced Study' of Einstein fame) algorithm that works with larger datasets and reduces the number of calculations required.

I will look into implementing that.

Posted: 6/15/2015 4:17:36 PM EDT

[#24]

Quote History

Quoted:
I am having so much fun with this.

Here is a 192 star system, 2 mega stars with 100,000x the mass of the others.

I was wondering why more stars are not captured in orbit, but now realize a stable orbit in the universe is probably kind of rare, unless I had billions of stars to play with here, or let it run for a hundred years, I am not going to randomly get many.

The two mega stars are slowly being pulled together as well, would need to let it run a long time to get there though.

Also, the space has 'depth' so it may look like a star passes near a mega star and not get affected much by it's gravity, in reality it is far in front or behind the star.

What's interesting is I have been letting this run for quite a while now, and stars that left off the screen are coming back, so some long elliptical orbits going on there.

Down to just a handful of stars now but more keep coming back from off screen periodically.

http://www.youtube.com/watch?v=GDupI2e5SQo

View Quote

That's actually pretty amazing considering you've had the board less than a week, and weren't doing a ton of parallel code prior to that!

Keep the vids going! I have no idea how to capture video direct on board, you could use fbgrab for screenshots, but I think you would need to run through another computer to save it as a video file. The camera at display works well enough. It's pretty cool to see how the objects react to the large mass objects.

Posted: 6/15/2015 5:12:01 PM EDT

[#25]

Quote History

Quoted:
That's actually pretty amazing considering you've had the board less than a week, and weren't doing a ton of parallel code prior to that!

Keep the vids going! I have no idea how to capture video direct on board, you could use fbgrab for screenshots, but I think you would need to run through another computer to save it as a video file. The camera at display works well enough. It's pretty cool to see how the objects react to the large mass objects.

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:

Down to just a handful of stars now but more keep coming back from off screen periodically.

http://www.youtube.com/watch?v=GDupI2e5SQo

That's actually pretty amazing considering you've had the board less than a week, and weren't doing a ton of parallel code prior to that!

Keep the vids going! I have no idea how to capture video direct on board, you could use fbgrab for screenshots, but I think you would need to run through another computer to save it as a video file. The camera at display works well enough. It's pretty cool to see how the objects react to the large mass objects.

google helps! I have found most of the solutions for this via google and the parallella forum.

Posted: 6/15/2015 5:20:53 PM EDT

[#26]

Quote History

Quoted:
google helps! I have found most of the solutions for this via google and the parallella forum.

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:

Down to just a handful of stars now but more keep coming back from off screen periodically.

http://www.youtube.com/watch?v=GDupI2e5SQo

That's actually pretty amazing considering you've had the board less than a week, and weren't doing a ton of parallel code prior to that!

Keep the vids going! I have no idea how to capture video direct on board, you could use fbgrab for screenshots, but I think you would need to run through another computer to save it as a video file. The camera at display works well enough. It's pretty cool to see how the objects react to the large mass objects.

google helps! I have found most of the solutions for this via google and the parallella forum.

You can't give google all the credit, don't sell yourself short as a True Geek¹. You've shown that you full comprehend the code that is running. All google may have done is save you some typing and let you skip past a couple parts of trial and error.

^{1: Meant as a badge of pride.}

Posted: 6/15/2015 5:31:08 PM EDT

[#27]

Quote History

Quoted:
You can't give google all the credit, don't sell yourself short as a True Geek¹. You've shown that you full comprehend the code that is running. All google may have done is save you some typing and let you skip past a couple parts of trial and error.

^{1: Meant as a badge of pride.}

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:google helps! I have found most of the solutions for this via google and the parallella forum.

You can't give google all the credit, don't sell yourself short as a True Geek¹. You've shown that you full comprehend the code that is running. All google may have done is save you some typing and let you skip past a couple parts of trial and error.

^{1: Meant as a badge of pride.}

Yeah, I showed this sim running to the wife, she rolled her eyes and called me a geek! lol

Posted: 6/15/2015 6:20:04 PM EDT

[#28]

Quote History

Quoted:

Yeah, I showed this sim running to the wife, she rolled her eyes and called me a geek! lol

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:google helps! I have found most of the solutions for this via google and the parallella forum.

You can't give google all the credit, don't sell yourself short as a True Geek¹. You've shown that you full comprehend the code that is running. All google may have done is save you some typing and let you skip past a couple parts of trial and error.

^{1: Meant as a badge of pride.}

Yeah, I showed this sim running to the wife, she rolled her eyes and called me a geek! lol

I posted the last vid over at parallella forum, we will see what they say, if anything, they are mostly serious hard core geeks over there.

Posted: 6/15/2015 7:15:12 PM EDT

[#29]

Achieved a stable orbit, found a defect in my code, I have 768 layers of depth in the virtual space, all the stars were at between 1 and 2 depth, not so many stars slingshotting off into space anymore, and more orbits going on.

Here is one stable orbit around a mega star that has been going on for 10 minutes now

Posted: 6/15/2015 9:38:02 PM EDT

[#30]

Quote History

Quoted:
Achieved a stable orbit, found a defect in my code, I have 768 layers of depth in the virtual space, all the stars were at between 1 and 2 depth, not so many stars slingshotting off into space anymore, and more orbits going on.

Here is one stable orbit around a mega star that has been going on for 10 minutes now

http://www.youtube.com/watch?v=NVhRDWDPjmI

View Quote

Can you add a splash of color to show Z buffer (in-out) distance? Varying shade of blue or something, top 3 bits of depth for the blue intensity from 128-255?

Posted: 6/16/2015 5:26:37 AM EDT

[#31]

Quote History

Originally Posted By brass

Can you add a splash of color to show Z buffer (in-out) distance? Varying shade of blue or something, top 3 bits of depth for the blue intensity from 128-255?

View Quote

I was thinking about adding some 'red shift' 'blue shift' depending on if the star is moving toward the screen or away from it.

Will try and tackle that today sometime.

Posted: 6/16/2015 10:03:20 AM EDT

[#32]

This thread is interesting, but confusing. Can someone explain in plain, non tech speak, what is happening?

Posted: 6/16/2015 11:32:35 AM EDT

[#33]

Quote History

Quoted:

This thread is interesting, but confusing. Can someone explain in plain, non tech speak, what is happening?

View Quote

I bought a little computer that has a custom 16 core processor.

Think of the 16 core processor as 16 little computers.

I coded up a simulation that shows stars moving around with gravity from all the stars affecting the speed, direction of motion, etc of each other star. Say I have 1600 stars in the simulation, instead of 1 CPU having to do all those calculations for 1600 stars, my code has 100 stars being processed on each of the 16 cores.

All 16 are processing at the same time, so much faster to finish 1600 star calculations that way than one CPU chewing through the calculations.

Posted: 6/16/2015 12:32:58 PM EDT

[#34]

Quote History

Quoted:
I bought a little computer that has a custom 16 core processor.

Think of the 16 core processor as 16 little computers.

I coded up a simulation that shows stars moving around with gravity from all the stars affecting the speed, direction of motion, etc of each other star. Say I have 1600 stars in the simulation, instead of 1 CPU having to do all those calculations for 1600 stars, my code has 100 stars being processed on each of the 16 cores.

All 16 are processing at the same time, so much faster to finish 1600 star calculations that way than one CPU chewing through the calculations.

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:
This thread is interesting, but confusing. Can someone explain in plain, non tech speak, what is happening?

I bought a little computer that has a custom 16 core processor.

Think of the 16 core processor as 16 little computers.

I coded up a simulation that shows stars moving around with gravity from all the stars affecting the speed, direction of motion, etc of each other star. Say I have 1600 stars in the simulation, instead of 1 CPU having to do all those calculations for 1600 stars, my code has 100 stars being processed on each of the 16 cores.

All 16 are processing at the same time, so much faster to finish 1600 star calculations that way than one CPU chewing through the calculations.

Thank you, and that is fucking awesome. Kind of like I read a while back that a bunch of regular computers linked together could be equivalent to one super computer...

Could this be done for large excel files that crash regular Windows?

Posted: 6/16/2015 1:46:18 PM EDT

[#35]

Quote History

Quoted:
Thank you, and that is fucking awesome. Kind of like I read a while back that a bunch of regular computers linked together could be equivalent to one super computer...

Could this be done for large excel files that crash regular Windows?

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:

This thread is interesting, but confusing. Can someone explain in plain, non tech speak, what is happening?

I bought a little computer that has a custom 16 core processor.

Think of the 16 core processor as 16 little computers.

I coded up a simulation that shows stars moving around with gravity from all the stars affecting the speed, direction of motion, etc of each other star. Say I have 1600 stars in the simulation, instead of 1 CPU having to do all those calculations for 1600 stars, my code has 100 stars being processed on each of the 16 cores.

All 16 are processing at the same time, so much faster to finish 1600 star calculations that way than one CPU chewing through the calculations.

Thank you, and that is fucking awesome. Kind of like I read a while back that a bunch of regular computers linked together could be equivalent to one super computer...

Could this be done for large excel files that crash regular Windows?

Only if a version of excel is written that takes advantage of multi cores.

Posted: 6/16/2015 1:49:16 PM EDT

[#36]

Quote History

Quoted:

I was thinking about adding some 'red shift' 'blue shift' depending on if the star is moving toward the screen or away from it.

Will try and tackle that today sometime.

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Originally Posted By brass

Can you add a splash of color to show Z buffer (in-out) distance? Varying shade of blue or something, top 3 bits of depth for the blue intensity from 128-255?

I was thinking about adding some 'red shift' 'blue shift' depending on if the star is moving toward the screen or away from it.

Will try and tackle that today sometime.

I did some color coding, if the velocity towards the screen is > 100, I turn the star blue to simulate blue shift, if moving > 100 away from the screen I red shift it.

Makes it a little easier to visualize what is going on.

Now a guy in Australia wants the code, he saw the vid I posted on the parallella forum.

I told him I will email it to him.

I guess I need to get on github.

Posted: 6/16/2015 2:59:13 PM EDT

[#37]

That's great! github is free for one repository, and it will walk you through how to set it up and such. Pretty easy.

Posted: 6/16/2015 6:08:14 PM EDT

[#38]

Don't know if you got this resolved, but you don't need hardware "multiple/divide". That's what shift is for. Shifts and masking are awesome speed ups over library calls.

Posted: 6/16/2015 6:10:46 PM EDT

[#39]

Quote History

Quoted:

Don't know if you got this resolved, but you don't need hardware "multiple/divide". That's what shift is for. Shifts and masking are awesome speed ups over library calls.

View Quote

I found some code from the old Quake video game that did exactly what I needed to do.

Posted: 6/16/2015 7:17:26 PM EDT

[#40]

Quote History

Quoted:

That's great! github is free for one repository, and it will walk you through how to set it up and such. Pretty easy.

View Quote

Ok, I think I got this setup properly.

https://github.com/capnrob97/rob_nbody

Posted: 6/16/2015 8:20:14 PM EDT

[#41]

Quote History

Quoted:
Ok, I think I got this setup properly.

https://github.com/capnrob97/rob_nbody

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:
That's great! github is free for one repository, and it will walk you through how to set it up and such. Pretty easy.

Ok, I think I got this setup properly.

https://github.com/capnrob97/rob_nbody

Yep. Now you can see the colorized and formatted code without having the forum mess it up.

Following on from what was posted above about mashing bits for speed....

I've been thinking of a way to quickly do the color shift. Rather than comparing velocity to 100, it would essentially compare to zero, but you wouldn't notice.

Start out all particles grey (127,127,127, or 7F,7F,7F in RGB. If MSB of velocity vector is set (negative velocity), AND sz with Blue value, otherwise, AND sz with Red Value.

When the velocities are under 10 or so, the color shift won't be extremely noticeable. If they are, clear sign bit and shift right before the AND.

It might be quicker or slower than a couple of compares followed by addition and subtraction, or the compiler may optimize it to something similar anyway, resulting in just a bit more difficult read code without much gain.

Do you have a video of your colorized output version?

Lastly, you may want to change the comments on the code to match yours a bit. Or at least make the title of the code in the comments match your file name, instead of hello_world.c.

Posted: 6/17/2015 6:13:46 AM EDT

[#42]

Yeah, I'll clean up the code at some point, just used the example hello_world as a starting point.

I turned off the red shift / blue shift din't look as good as I thought it would.

Also had to slow it down for smaller number of stars, they zip around so fast can't see what is happening, the dt variable in e_rob_nbody.c controls how much time advances each iteration, it made it 100 smaller with 32 stars, easier to watch the motion.

Gonna code to auto set that based on the number of stars created.

Next step is to try shared memory for the star data to see if I can scale up past 800 stars, shared mem can be 32 meg, local mem on the Epiphany is 32k

Then try to implement the Barnes-Hut algorithm. I think this will get me started:

http://runge.math.smu.edu/Math6370/_downloads/burtscher_pingali-2011.pdf

Posted: 6/17/2015 8:00:55 AM EDT

[#43]

Quote History

Quoted:

Only if a version of excel is written that takes advantage of multi cores.

View Quote

Fucking Bill Gates

Posted: 6/17/2015 8:20:36 AM EDT

[#44]

Quote History

Quoted:

Fucking Bill Gates

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:

Only if a version of excel is written that takes advantage of multi cores.

Fucking Bill Gates

Excel has used multiple cores for a number of years now.

Posted: 6/17/2015 9:37:58 AM EDT

[#45]

Quote History

Quoted:

Excel has used multiple cores for a number of years now.

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:

Only if a version of excel is written that takes advantage of multi cores.

Fucking Bill Gates

Excel has used multiple cores for a number of years now.

Oh?? Ok, I take it back, Mr. Gates...

Is there an easy way that you know of that a non tech guy like me could explain to our it guys on how to implement that on our network? I.e. any simple to follow instructions / guidelines out there?

Posted: 6/17/2015 10:52:52 AM EDT

[#46]

Quote History

Quoted:

Oh?? Ok, I take it back, Mr. Gates...

Is there an easy way that you know of that a non tech guy like me could explain to our it guys on how to implement that on our network? I.e. any simple to follow instructions / guidelines out there?

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:

Only if a version of excel is written that takes advantage of multi cores.

Fucking Bill Gates

Excel has used multiple cores for a number of years now.

Oh?? Ok, I take it back, Mr. Gates...

Is there an easy way that you know of that a non tech guy like me could explain to our it guys on how to implement that on our network? I.e. any simple to follow instructions / guidelines out there?

If you have office 2010 or higher, just running it will use all the cores on your CPU (typically 2 to 8). Hitting CTRL+SHIFT+ESC and clicking the "Performance" tab will show the CPU load for each core. If only one is running at peak, look at your processes to see which one that is.

Posted: 6/17/2015 11:27:21 AM EDT

[#47]

I got a shared memory version going but it is very slow, shared memory access is way more time consuming than local memory.

If I can't scale above 800 stars I probably won't fiddle with the Barnes-Hut algorithm either, as the brute force code is handling it.

Posted: 6/17/2015 12:01:09 PM EDT

[#48]

Not sure how I've missed this thread until now.

I did some MPI programming in college but haven't touched it since. I might have to pick up one of these boards, but I don't have a very good use for one other than as a toy.

Posted: 6/17/2015 12:04:27 PM EDT

[#49]

Quote History

Quoted:
I got a shared memory version going but it is very slow, shared memory access is way more time consuming than local memory.

If I can't scale above 800 stars I probably won't fiddle with the Barnes-Hut algorithm either, as the brute force code is handling it.

View Quote

Could you work it in quadrants, using local RAM for 1/4 of the stars, writeback to main RAM, go to next quadrant work it in local RAM and writeback, etc. etc. ?

Posted: 6/17/2015 12:35:41 PM EDT

[#50]

Quote History

Quoted:
Could you work it in quadrants, using local RAM for 1/4 of the stars, writeback to main RAM, go to next quadrant work it in local RAM and writeback, etc. etc. ?

View Quote View All Quotes

View All Quotes

Quote History

Quoted:

Quoted:

I got a shared memory version going but it is very slow, shared memory access is way more time consuming than local memory.

If I can't scale above 800 stars I probably won't fiddle with the Barnes-Hut algorithm either, as the brute force code is handling it.

Could you work it in quadrants, using local RAM for 1/4 of the stars, writeback to main RAM, go to next quadrant work it in local RAM and writeback, etc. etc. ?

There is a dma_copy() function I think I can use to speed things up.

[ARCHIVED THREAD] - Getting my geek on today... (Page 4 of 13)

General » General Discussion

Warning

Confirm Action

About AR15.COM

Stay Connected

Newsletter

Contact Us