Warning

 

Close

Confirm Action

Are you sure you wish to do this?

Confirm Cancel
BCM
User Panel

Page / 13
Link Posted: 6/12/2015 8:11:03 PM EDT
[#1]
Link Posted: 6/12/2015 8:32:44 PM EDT
[#2]

Discussion ForumsJump to Quoted PostQuote History
Quoted:


I'd forgotten about the inverse root function, I remember it only because of that strange constant in the algorithm.



At least Newton's Method is somewhere in there.
View Quote
Yeah it is a slick piece of code.

 



My code running on the 16 cores now is noticeable faster than the similar code running on the ARM only.




Before I found this magic code it was way slower.




Still performance tweaks to make but the parallel programming model is a lot less 'black magic voodoo' to me now.
Link Posted: 6/12/2015 8:34:26 PM EDT
[#3]
Link Posted: 6/13/2015 6:34:55 AM EDT
[#4]
Crappy video demo of the difference of the nbody code running strictly on the CPU vs parallel version on the 16 core epiphany chip.





The first run is the CPU only, second run on the 16 cores







Both are calculating an 800 star system over 10 iterations. The more stars I add the the bigger the difference between the two.












 
Link Posted: 6/13/2015 9:55:33 AM EDT
[#5]
Link Posted: 6/13/2015 11:12:40 AM EDT
[#6]




Discussion ForumsJump to Quoted PostQuote History
Quoted:
That is a remarkable speed-up.  How much do the "answers" differ from the sqrt lib function and the fast inverse root function?  Is it small enough to still be modeling a galaxy once you are into the billions of iterations?
View Quote View All Quotes
View All Quotes
Discussion ForumsJump to Quoted PostQuote History
Quoted:
Quoted:




Crappy video demo of the difference of the nbody code running strictly on the CPU vs parallel version on the 16 core epiphany chip.
The first run is the CPU only, second run on the 16 cores
Both are calculating an 800 star system over 10 iterations. The more stars I add the the bigger the difference between the two.
http://www.youtube.com/watch?v=7z6G5J-OB9Y
 

That is a remarkable speed-up.  How much do the "answers" differ from the sqrt lib function and the fast inverse root function?  Is it small enough to still be modeling a galaxy once you are into the billions of iterations?
They are slightly off, the more newtonian iterations I do the more accurate it is, just a speed trade-off.

 









I am currently doing 5 newtonian iterations and the numbers are close enough for me, I am not doing any kind of rigorous scientific analysis.








For comparison, the x coordinate for star # 1 is -0.319215 on the CPU after 10 iterations


The same x coordinate on the 16 core run is -0.319304 after 10 iterations.





 
Link Posted: 6/13/2015 11:34:51 AM EDT
[#7]
I bumped up the code to take the 800 stars through 1000 iterations instead of 10



CPU took 6 minutes 24 seconds

Epiphany took 1 minute 17 seconds




There is still room to squeeze out performance in the parallel code, it is just my first attempt, I am sure there are tricks, techniques, etc I can apply to make it even faster.
Link Posted: 6/13/2015 2:59:06 PM EDT
[#8]
Link Posted: 6/13/2015 3:00:36 PM EDT
[#9]
goloud, I got the parallella you sent me in the mail today, thank you very much!





Once I get my 4 board case I will build out the cluster.











 
Link Posted: 6/13/2015 3:05:28 PM EDT
[#10]




Discussion ForumsJump to Quoted PostQuote History
Quoted:
Is any of that time spent writing to the frame buffer / display?
View Quote View All Quotes
View All Quotes
Discussion ForumsJump to Quoted PostQuote History
Quoted:
Quoted:




I bumped up the code to take the 800 stars through 1000 iterations instead of 10
CPU took 6 minutes 24 seconds




Epiphany took 1 minute 17 seconds
There is still room to squeeze out performance in the parallel code, it is just my first attempt, I am sure there are tricks, techniques, etc I can apply to make it even faster.





Is any of that time spent writing to the frame buffer / display?
I turned off all the printing to the screen didn't make a difference speed wise.

 









One thing I did change is previously I was writing the 800 star data to all 16 cores local memory.






I changed that to only write to core #1 memory, and all the cores read from there. Trying to cut down on all the data moving back and forth.






Also have the code running on the epiphany to have core #1 sum up all the results from all the other cores when they signal they are done, previously I was downloading all the data to the ARM and summing there.






Also noticed on that fast inverse squareroot code if I newtonian iterate an even number of times, the values diverge rapidly. Any odd number of iterations keeps everything on track.






 
Link Posted: 6/13/2015 7:59:29 PM EDT
[#11]
Someone requested to see the code, too big for IM so posting here
Code not cleaned up yet, I tend to code to get something working, then go back and make it 'pretty'
It is a work in progress, logic will change as I get more experience with these things.





cut / paste screwed up the formatting




ETA: Code formatted properly below




vvvvvvvvvvvvvvvvvvvvvvv
Link Posted: 6/13/2015 10:17:49 PM EDT
[#12]
Link Posted: 6/14/2015 1:38:06 AM EDT
[#13]

Discussion ForumsJump to Quoted PostQuote History
Quoted:


Let me see if the code /code tags work.



View Quote
Thanks!

 
Link Posted: 6/14/2015 12:00:46 PM EDT
[#14]
Link Posted: 6/14/2015 3:06:27 PM EDT
[#15]


Discussion ForumsJump to Quoted PostQuote History
Quoted:



This part from the epiphany code, is it running on one core or all cores?  I lost myself in the braces and think this is under core 1 code:
            // all cores done processing this iterations, calculate new x, y, z coordinates for next iteration            for (i = 0; i < *n; i++) {                p[i].x += p[i].vx * dt;                p[i].y += p[i].vy * dt;                p[i].z += p[i].vz * dt;            }






Where each core only adds the velocity * time step to the objects in their "sector".   Maybe make it an extra subroutine?   With the small object and step, it doesn't matter, but when you get to millions of points by 64 cores, falling back on one for doing all new positions may lag it down a good deal.





Second, this part:




 for(i = 0; i < 4; i++){                for(j = 0; j < 4; j++){                    z[num++] = e_get_global_address(i, j, o);






Isn't that something that would only need to happen once, and be outside a loop, rather than run on all cores?  That's a lot of calls to e_get_global_address()  





Again, my understanding of the architecture  is imperfect here.  





Lastly, can you make the cores into two workgroups, so that all the cores don't do the check to see if they are core 1?  Define 15 cores to run that code without the check for core 1, and the 16th cored as core 1.  Do they still get to share memory the way you are using it when making groups of cores into virtual CPUs?





Did you time the code to see if the summing on the ARM was appreciable slower than summing on Core #1?
View Quote
That first for() only happens on core #1

 





I am keeping it on core #1 to avoid the data move from local memory to ARM memory, there is a performance penalty to moving it over and moving it back after summing.







The algorithm is a brute force algorithm, comparing every body to every other body to get new vx, vy, vx values based off x, y, z. I don't want to start calculating new x, y, z until all bodies have been compared to all the others.



I am keeping the code for all cores the same for simplicity.







 
Link Posted: 6/14/2015 5:24:51 PM EDT
[#16]
My God! It's full of stars!





Sorry for my crappy iPhone video, wish I could capture the HDMI output, but here is a crude first cut at some graphics.







Most of the stars escape each other, but a small cluster of stars start obriting each other.












 
Link Posted: 6/14/2015 5:29:50 PM EDT
[#17]
Just seen this from airport bar ( off on a cruise) I am impressed can we talk when I get back?
Link Posted: 6/14/2015 7:13:03 PM EDT
[#18]



Discussion ForumsJump to Quoted PostQuote History
Quoted:




Just seen this from airport bar ( off on a cruise) I am impressed can we talk when I get back?
View Quote
Thanks, when something interests me I get motived.

 







I am mostly just piecing together logic I am finding online.










Here is a run with just 64 stars, it shows the gravity entanglement of the stars better.


















 
Link Posted: 6/14/2015 10:58:51 PM EDT
[#19]
Link Posted: 6/15/2015 3:13:08 AM EDT
[#20]


Discussion ForumsJump to Quoted PostQuote History
Quoted:



That's pretty awesome!  Nice work adding the display to it.   I assume the ones that zip off the display at the start are "slingshots"?





What are your planned addition/expansion ideas?


View Quote
Yes, those are 2 stars that whipped around each other close by in a way to accelerate them both off in opposite directions at high speed.

 





Kinda like NASA using a planet's gravity to fling a space probe off towards it's final destination.







Once I get the 4 board cluster build I will try to get this scaled up to 64 processors.


 



One thing I will add too is the ability to assign a mass to each star, right now they are all treated as equal mass




Then what I can do is create a few massive stars that should attract clouds of lighter stars forming mini galaxies.  




Then, as 'God' of my digital universe I can send a couple mini galaxies on a collision course or try to get them to orbit each other.




All kinds of possibilities.









Link Posted: 6/15/2015 11:48:00 AM EDT
[#21]
Link Posted: 6/15/2015 2:36:05 PM EDT
[#22]
I am having so much fun with this.


Here is a 192 star system, 2 mega stars with 100,000x the mass of the others.


I was wondering why more stars are not captured in orbit, but now realize a stable orbit in the universe is probably kind of rare, unless I had billions of stars to play with here, or let it run for a hundred years, I am not going to randomly get many.



The two mega stars are slowly being pulled together as well, would need to let it run a long time to get there though.





Also, the space has 'depth' so it may look like a star passes near a mega star and not get affected much by it's gravity, in reality it is far in front or behind the star.








What's interesting is I have been letting this run for quite a while now, and stars that left off the screen are coming back, so some long elliptical orbits going on there.







Down to just a handful of stars now but more keep coming back from off screen periodically.













 







 
Link Posted: 6/15/2015 3:38:46 PM EDT
[#23]
I am running it now with 800 stars, I think the more stars there are they keep each other in check as each star has an influence on all the other stars gravitationally. The star system seems more stable now, or just moving slower due to more stars lol.



If I could scale this up to 50,000 stars or so that probably would be a pretty good simulation.
 













There is a Barnes-Hut (Piet Hut from the 'Institute for Advanced Study' of Einstein fame) algorithm that works with larger datasets and reduces the number of calculations required.








I will look into implementing that.
 
Link Posted: 6/15/2015 4:17:36 PM EDT
[#24]
Link Posted: 6/15/2015 5:12:01 PM EDT
[#25]

Discussion ForumsJump to Quoted PostQuote History
Quoted:
That's actually pretty amazing considering you've had the board less than a week, and weren't doing a ton of parallel code prior to that!  



Keep the vids going!  I have no idea how to capture video direct on board, you could use fbgrab for screenshots, but I think you would need to run through another computer to save it as a video file.  The camera at display works well enough.   It's pretty cool to see how the objects react to the large mass objects.
View Quote View All Quotes
View All Quotes
Discussion ForumsJump to Quoted PostQuote History
Quoted:



Quoted:



Down to just a handful of stars now but more keep coming back from off screen periodically.
http://www.youtube.com/watch?v=GDupI2e5SQo



 

 




That's actually pretty amazing considering you've had the board less than a week, and weren't doing a ton of parallel code prior to that!  



Keep the vids going!  I have no idea how to capture video direct on board, you could use fbgrab for screenshots, but I think you would need to run through another computer to save it as a video file.  The camera at display works well enough.   It's pretty cool to see how the objects react to the large mass objects.
google helps! I have found most of the solutions for this via google and the parallella forum.

 
Link Posted: 6/15/2015 5:20:53 PM EDT
[#26]
Link Posted: 6/15/2015 5:31:08 PM EDT
[#27]

Discussion ForumsJump to Quoted PostQuote History
Quoted:
You can't give google all the credit, don't sell yourself short as a True Geek1.  You've shown that you full comprehend the code that is running.  All google may have done is save you some typing and let you skip past a couple parts of trial and error.





1: Meant as a badge of pride.
View Quote View All Quotes
View All Quotes
Discussion ForumsJump to Quoted PostQuote History
Quoted:



Quoted:


Quoted:google helps! I have found most of the solutions for this via google and the parallella forum.




You can't give google all the credit, don't sell yourself short as a True Geek1.  You've shown that you full comprehend the code that is running.  All google may have done is save you some typing and let you skip past a couple parts of trial and error.





1: Meant as a badge of pride.
Yeah, I showed this sim running to the wife, she rolled her eyes and called me a geek! lol
Link Posted: 6/15/2015 6:20:04 PM EDT
[#28]

Discussion ForumsJump to Quoted PostQuote History
Quoted:



Yeah, I showed this sim running to the wife, she rolled her eyes and called me a geek! lol
View Quote View All Quotes
View All Quotes
Discussion ForumsJump to Quoted PostQuote History
Quoted:



Quoted:


Quoted:


Quoted:google helps! I have found most of the solutions for this via google and the parallella forum.




You can't give google all the credit, don't sell yourself short as a True Geek1.  You've shown that you full comprehend the code that is running.  All google may have done is save you some typing and let you skip past a couple parts of trial and error.





1: Meant as a badge of pride.
Yeah, I showed this sim running to the wife, she rolled her eyes and called me a geek! lol
I posted the last vid over at parallella forum, we will see what they say, if anything, they are mostly serious hard core geeks over there.

 
Link Posted: 6/15/2015 7:15:12 PM EDT
[#29]
Achieved a stable orbit, found a defect in my code, I have 768 layers of depth in the virtual space, all the stars were at between 1 and 2 depth, not so many stars slingshotting off into space anymore, and more orbits going on.





Here is one stable orbit around a mega star that has been going on for 10 minutes now












 
Link Posted: 6/15/2015 9:38:02 PM EDT
[#30]
Link Posted: 6/16/2015 5:26:37 AM EDT
[#31]

Discussion ForumsJump to Quoted PostQuote History
Originally Posted By brass





Can you add a splash of color  to show Z buffer (in-out) distance? Varying shade of blue or something, top 3 bits of depth for the blue intensity from 128-255?
View Quote
I was thinking about adding some 'red shift' 'blue shift' depending on if the star is moving toward the screen or away from it.

 



Will try and tackle that today sometime.
Link Posted: 6/16/2015 10:03:20 AM EDT
[#32]
This thread is interesting, but confusing.  Can someone explain in plain, non tech speak, what is happening?
Link Posted: 6/16/2015 11:32:35 AM EDT
[#33]

Discussion ForumsJump to Quoted PostQuote History
Quoted:


This thread is interesting, but confusing.  Can someone explain in plain, non tech speak, what is happening?
View Quote
I bought a little computer that has a custom 16 core processor.

 



Think of the 16 core processor as 16 little computers.




I coded up a simulation that shows stars moving around with gravity from all the stars affecting the speed, direction of motion, etc of each other star. Say I have 1600 stars in the simulation, instead of 1 CPU having to do all those calculations for 1600 stars, my code has 100 stars being processed on each of the 16 cores.




All 16 are processing at the same time, so much faster to finish 1600 star calculations that way than one CPU chewing through the calculations.
Link Posted: 6/16/2015 12:32:58 PM EDT
[#34]
Discussion ForumsJump to Quoted PostQuote History
Quoted:
I bought a little computer that has a custom 16 core processor.  

Think of the 16 core processor as 16 little computers.


I coded up a simulation that shows stars moving around with gravity from all the stars affecting the speed, direction of motion, etc of each other star. Say I have 1600 stars in the simulation, instead of 1 CPU having to do all those calculations for 1600 stars, my code has 100 stars being processed on each of the 16 cores.


All 16 are processing at the same time, so much faster to finish 1600 star calculations that way than one CPU chewing through the calculations.
View Quote View All Quotes
View All Quotes
Discussion ForumsJump to Quoted PostQuote History
Quoted:
Quoted:
This thread is interesting, but confusing.  Can someone explain in plain, non tech speak, what is happening?
I bought a little computer that has a custom 16 core processor.  

Think of the 16 core processor as 16 little computers.


I coded up a simulation that shows stars moving around with gravity from all the stars affecting the speed, direction of motion, etc of each other star. Say I have 1600 stars in the simulation, instead of 1 CPU having to do all those calculations for 1600 stars, my code has 100 stars being processed on each of the 16 cores.


All 16 are processing at the same time, so much faster to finish 1600 star calculations that way than one CPU chewing through the calculations.


Thank you, and that is fucking awesome.  Kind of like I read a while back that a bunch of regular computers linked together could be equivalent to one super computer...

Could this be done for large excel files that crash regular Windows?


Link Posted: 6/16/2015 1:46:18 PM EDT
[#35]

Discussion ForumsJump to Quoted PostQuote History
Quoted:
Thank you, and that is fucking awesome.  Kind of like I read a while back that a bunch of regular computers linked together could be equivalent to one super computer...



Could this be done for large excel files that crash regular Windows?





View Quote View All Quotes
View All Quotes
Discussion ForumsJump to Quoted PostQuote History
Quoted:



Quoted:


Quoted:

This thread is interesting, but confusing.  Can someone explain in plain, non tech speak, what is happening?
I bought a little computer that has a custom 16 core processor.  



Think of the 16 core processor as 16 little computers.





I coded up a simulation that shows stars moving around with gravity from all the stars affecting the speed, direction of motion, etc of each other star. Say I have 1600 stars in the simulation, instead of 1 CPU having to do all those calculations for 1600 stars, my code has 100 stars being processed on each of the 16 cores.





All 16 are processing at the same time, so much faster to finish 1600 star calculations that way than one CPU chewing through the calculations.





Thank you, and that is fucking awesome.  Kind of like I read a while back that a bunch of regular computers linked together could be equivalent to one super computer...



Could this be done for large excel files that crash regular Windows?





Only if a version of excel is written that takes advantage of multi cores.

 
Link Posted: 6/16/2015 1:49:16 PM EDT
[#36]

Discussion ForumsJump to Quoted PostQuote History
Quoted:



I was thinking about adding some 'red shift' 'blue shift' depending on if the star is moving toward the screen or away from it.  



Will try and tackle that today sometime.

View Quote View All Quotes
View All Quotes
Discussion ForumsJump to Quoted PostQuote History
Quoted:



Originally Posted By brass





Can you add a splash of color  to show Z buffer (in-out) distance? Varying shade of blue or something, top 3 bits of depth for the blue intensity from 128-255?
I was thinking about adding some 'red shift' 'blue shift' depending on if the star is moving toward the screen or away from it.  



Will try and tackle that today sometime.

I did some color coding, if the velocity towards the screen is > 100, I turn the star blue to simulate blue shift, if moving > 100 away from the screen I red shift it.

 



Makes it a little easier to visualize what is going on.




Now a guy in Australia wants the code, he saw the vid I posted on the parallella forum.




I told him I will email it to him.




I guess I need to get on github.
Link Posted: 6/16/2015 2:59:13 PM EDT
[#37]
Link Posted: 6/16/2015 6:08:14 PM EDT
[#38]
Don't know if you got this resolved, but you don't need hardware "multiple/divide". That's what shift is for.  Shifts and masking are awesome speed ups over library calls.
Link Posted: 6/16/2015 6:10:46 PM EDT
[#39]

Discussion ForumsJump to Quoted PostQuote History
Quoted:


Don't know if you got this resolved, but you don't need hardware "multiple/divide". That's what shift is for.  Shifts and masking are awesome speed ups over library calls.
View Quote
I found some code from the old Quake video game that did exactly what I needed to do.

 
Link Posted: 6/16/2015 7:17:26 PM EDT
[#40]


Discussion ForumsJump to Quoted PostQuote History
Quoted:



That's great!   github is free for one repository, and it will walk you through how to set it up and such.  Pretty easy.





View Quote
Ok, I think I got this setup properly.

 




https://github.com/capnrob97/rob_nbody


 
Link Posted: 6/16/2015 8:20:14 PM EDT
[#41]
Link Posted: 6/17/2015 6:13:46 AM EDT
[#42]
Yeah, I'll clean up the code at some point, just used the example hello_world as a starting point.









I turned off the red shift / blue shift din't look as good as I thought it would.






Also had to slow it down for smaller number of stars, they zip around so fast can't see what is happening, the dt variable in e_rob_nbody.c controls how much time advances each iteration, it made it 100 smaller with 32 stars, easier to watch the motion.


Gonna code to auto set that based on the number of stars created.






Next step is to try shared memory for the star data to see if I can scale up past 800 stars, shared mem can be 32 meg, local mem on the Epiphany is 32k






Then try to implement the Barnes-Hut algorithm. I think this will get me started:



 
Link Posted: 6/17/2015 8:00:55 AM EDT
[#43]
Discussion ForumsJump to Quoted PostQuote History
Quoted:


Only if a version of excel is written that takes advantage of multi cores.  
View Quote


Fucking Bill Gates

Link Posted: 6/17/2015 8:20:36 AM EDT
[#44]
Discussion ForumsJump to Quoted PostQuote History
Quoted:


Fucking Bill Gates

View Quote View All Quotes
View All Quotes
Discussion ForumsJump to Quoted PostQuote History
Quoted:
Quoted:


Only if a version of excel is written that takes advantage of multi cores.  


Fucking Bill Gates



Excel has used multiple cores for a number of years now.
Link Posted: 6/17/2015 9:37:58 AM EDT
[#45]
Discussion ForumsJump to Quoted PostQuote History
Quoted:


Excel has used multiple cores for a number of years now.
View Quote View All Quotes
View All Quotes
Discussion ForumsJump to Quoted PostQuote History
Quoted:
Quoted:
Quoted:


Only if a version of excel is written that takes advantage of multi cores.  


Fucking Bill Gates



Excel has used multiple cores for a number of years now.


Oh??  Ok, I take it back, Mr. Gates...

Is there an easy way that you know of that a non tech guy like me could explain to our it guys on how to implement that on our network?  I.e. any simple to follow instructions / guidelines out there?
Link Posted: 6/17/2015 10:52:52 AM EDT
[#46]
Link Posted: 6/17/2015 11:27:21 AM EDT
[#47]
I got a shared memory version going but it is very slow, shared memory access is way more time consuming than local memory.



If I can't scale above 800 stars I probably won't fiddle with the Barnes-Hut algorithm either, as the brute force code is handling it.
Link Posted: 6/17/2015 12:01:09 PM EDT
[#48]
Not sure how I've missed this thread until now.

I did some MPI programming in college but haven't touched it since. I might have to pick up one of these boards, but I don't have a very good use for one other than as a toy.
Link Posted: 6/17/2015 12:04:27 PM EDT
[#49]
Link Posted: 6/17/2015 12:35:41 PM EDT
[#50]

Discussion ForumsJump to Quoted PostQuote History
Quoted:
Could you work it in quadrants, using local RAM for 1/4 of the stars, writeback to main RAM, go to next quadrant work it in local RAM and writeback, etc. etc. ?



View Quote View All Quotes
View All Quotes
Discussion ForumsJump to Quoted PostQuote History
Quoted:



Quoted:

I got a shared memory version going but it is very slow, shared memory access is way more time consuming than local memory.



If I can't scale above 800 stars I probably won't fiddle with the Barnes-Hut algorithm either, as the brute force code is handling it.





Could you work it in quadrants, using local RAM for 1/4 of the stars, writeback to main RAM, go to next quadrant work it in local RAM and writeback, etc. etc. ?



There is a dma_copy() function I think I can use to speed things up.

 
Page / 13
Close Join Our Mail List to Stay Up To Date! Win a FREE Membership!

Sign up for the ARFCOM weekly newsletter and be entered to win a free ARFCOM membership. One new winner* is announced every week!

You will receive an email every Friday morning featuring the latest chatter from the hottest topics, breaking news surrounding legislation, as well as exclusive deals only available to ARFCOM email subscribers.


By signing up you agree to our User Agreement. *Must have a registered ARFCOM account to win.
Top Top