User Panel
Quoted:
Got the example MPI hello, world working across all 4 boards in the cluster. Spending way too much time playing 'linux system admin' though to get to this point, cutting into coding time Hello World from MPI Process 3 on machine parallella1 Hello World from MPI Process 1 on machine parallella1 Hello World from MPI Process 2 on machine parallella1 Hello World from MPI Process 9 on machine parallella3 Hello World from MPI Process 0 on machine parallella1 Hello World from MPI Process 8 on machine parallella3 Hello World from MPI Process 10 on machine parallella3 Hello World from MPI Process 5 on machine parallella2 Hello World from MPI Process 4 on machine parallella2 Hello World from MPI Process 13 on machine parallella4 Hello World from MPI Process 6 on machine parallella2 Hello World from MPI Process 14 on machine parallella4 Hello World from MPI Process 12 on machine parallella4 Hello World from MPI Process 7 on machine parallella2 Hello World from MPI Process 15 on machine parallella4 Hello World from MPI Process 11 on machine parallella3 View Quote I'm honestly unsure how to code that with the epiphany cores. Similar base code, same with epiphany code (I think, other than a common RAM pool), but getting the right work blocks to the correct epiphany core and back to the display is where I'm hitting a speed choke point in my head. I've only administered clusters, haven't coded across them since they were acting as a virtual server (code ready made)... |
|
Quoted:
I'm honestly unsure how to code that with the epiphany cores. Similar base code, same with epiphany code (I think, other than a common RAM pool), but getting the right work blocks to the correct epiphany core and back to the display is where I'm hitting a speed choke point in my head. I've only administered clusters, haven't coded across them since they were acting as a virtual server (code ready made)... View Quote View All Quotes View All Quotes Quoted:
Quoted:
Got the example MPI hello, world working across all 4 boards in the cluster. Spending way too much time playing 'linux system admin' though to get to this point, cutting into coding time Hello World from MPI Process 3 on machine parallella1 Hello World from MPI Process 1 on machine parallella1 Hello World from MPI Process 2 on machine parallella1 Hello World from MPI Process 9 on machine parallella3 Hello World from MPI Process 0 on machine parallella1 Hello World from MPI Process 8 on machine parallella3 Hello World from MPI Process 10 on machine parallella3 Hello World from MPI Process 5 on machine parallella2 Hello World from MPI Process 4 on machine parallella2 Hello World from MPI Process 13 on machine parallella4 Hello World from MPI Process 6 on machine parallella2 Hello World from MPI Process 14 on machine parallella4 Hello World from MPI Process 12 on machine parallella4 Hello World from MPI Process 7 on machine parallella2 Hello World from MPI Process 15 on machine parallella4 Hello World from MPI Process 11 on machine parallella3 I'm honestly unsure how to code that with the epiphany cores. Similar base code, same with epiphany code (I think, other than a common RAM pool), but getting the right work blocks to the correct epiphany core and back to the display is where I'm hitting a speed choke point in my head. I've only administered clusters, haven't coded across them since they were acting as a virtual server (code ready made)... Me too, I have a steep learning curve I think |
|
The MPI part I understand, it's the MPI + coprocessor supervisor that I'm twisting on, one board should be the "supervisor" just like the ARM is to the Zynq, but the amount of data you are working with is what's making me think that in cluster mode, offline/file rendering might be the simplest way to go.
|
|
That's interesting, I might have to look at it to see if it fits with anything I've been working on.
I do have Windows 10 Core currently running on a Raspberry Pi 2. Basically just been piddling with the settings, management, deployment and just getting something on there to be able to browse to a web app. It may prove to be a very economical replacement for some of the thin client stuff we currently have. |
|
I am thinking I will need controller programs running on each board waiting for MPI messages to kick off processing of the current iterations data,
|
|
Quoted:
I am thinking I will need controller programs running on each board waiting for MPI messages to kick off processing of the current iterations data, View Quote How much data is one computation? bodies * [x,y,z,dx,dy,dz] or 6 bytes per star, or ~6k for 1024 bodies? Send all the body file to all the boards, have each one work out 1/4 of the list, and send the results back, merge the quarters into a new body, and repeat? There shouldn't be any overlap/merge problems if the same math is done with the entire field, you would end up hopefully with the same number of stars.... |
|
I am going to go with openCL it looks like, they have a parallella openCL example for matrix multiplication that uses the epiphany cores, not sure if it spans across boards yet or not though.
|
|
Quoted:
I am going to go with openCL it looks like, they have a parallella openCL example for matrix multiplication that uses the epiphany cores, not sure if it spans across boards yet or not though. View Quote Are there comments on any of the parallella examples like the Bernoulli for scaling to cluster? In parallella source directory, you could try: egrep -iR 'cluster|beowulf|mpi' *.c *.h That will search every C file in the directory and all sub directories for the words "cluster" or "beowulf" or MPI (case insensitive) which might give you a quick clue in case they tossed some comments in an example.... --ETA: Add this to your ~/.bash_profile or ~/.bashrc, so it highlights matches. alias grep='egrep --color ' |
|
Actually now I found an example Nbody sim from Brown Deer Technology that uses MPI to spread the load across the epiphany cores on a single board.
This is confusing. I am having trouble getting their example built, may go that route if I can get it working on one board then scale to the others if possible. Too much new stuff to learn. |
|
Quoted:
Actually now I found an example Nbody sim from Brown Deer Technology that uses MPI to spread the load across the epiphany cores on a single board. This is confusing. I am having trouble getting their example built, may go that route if I can get it working on one board then scale to the others if possible. Too much new stuff to learn. View Quote It seems like the shift from single threaded, sequential code we grew up with (Timex Sinclair ZX80 to Commodore to Atari) and transitioning to multi-threaded code when dual processor boards came out. Object Oriented Programming and Multi-Threaded Libraries solved the first part. The add in the task of deciding which parts to "send" to the graphics co-processor (now a standard that's easy to "talk to" via DirectX and OpenGL/OpenCL which handles all of that). The problem lies in that API (OpenGL/CL) is for interfacing with multiple GPUs, rather than multiple epiphany cores. The same problems are there, just in a smaller scale. That cluster is where a Dual CPU Pentium Pro with a Graphics card was in 1996. Very little code utilized all three, mostly only video editing stuff, which only supported certain video cards (aka epiphany cores here). Few people here remember games that card independent until VESA came out as a standard to communicate with all GPUs. (Remember trying to re-build Linux X-Server for your particular video card, if it was supported back in the pre-Linux 1.0 days? That was a nightmare....) The higher end boards tended to support OpenGL out of the box way back then, but Microsoft being Microsoft and wanting everything proprietary, MS Windows refused to use that standard and made DirectX instead. In this regard, Mac OS X started at base as OpenGL/X Server driven, like the workstations of the 80s and 90s, but obviously, proprietary. People don't realize the sheer amount of abstracted-ness there is when "Writing a windows App". In the above, substitute "Epiphany" for GPU, and when you are as close to the metal on the cluster as you are, that abstraction API is the sort of thing you need to, well, build. Not a universal API, just enough of one for your app to run, if that makes sense. That's why my brain locks up on it... |
|
Alright, finally got the nbody mpi example built, the coprthr library that Brown Deer has prebuilt for download on their website is not the most current one, so I had to get that library off their github and build it myself, which was another challenge but I got it done.
Now I have something I can dig into and start hacking away at, which is what I do best. |
|
Quoted:
Alright, finally got the nbody mpi example built, the coprthr library that Brown Deer has prebuilt for download on their website is not the most current one, so I had to get that library off their github and build it myself, which was another challenge but I got it done. Now I have something I can dig into and start hacking away at, which is what I do best. View Quote Well, scratch my last post then. |
|
|
Quoted:
4096 stars using MPI on one board. My code is faster than the MPI version on smaller number of stars, but the MPI version is faster on bigger numbers. Not as smooth as I would like yet, but it's a start. http://youtu.be/DOi9BTTJTaY View Quote Is that using the ARM cores MPI, or one board with the coprocessor lib? I'm confused... Looks good, though! |
|
Quoted:
Is that using the ARM cores MPI, or one board with the coprocessor lib? I'm confused... Looks good, though! View Quote View All Quotes View All Quotes Quoted:
Quoted:
4096 stars using MPI on one board. My code is faster than the MPI version on smaller number of stars, but the MPI version is faster on bigger numbers. Not as smooth as I would like yet, but it's a start. http://youtu.be/DOi9BTTJTaY Is that using the ARM cores MPI, or one board with the coprocessor lib? I'm confused... Looks good, though! Using the 16 core epiphany on one board I have the MPI Hello World example running across 4 boards, so I may be able to work some magic here and get the star sim spread over 4 boards using MPI |
|
Quoted:
Using the 16 core epiphany on one board I have the MPI Hello World example running across 4 boards, so I may be able to work some magic here and get the star sim spread over 4 boards using MPI View Quote View All Quotes View All Quotes Quoted:
Quoted:
Quoted:
4096 stars using MPI on one board. My code is faster than the MPI version on smaller number of stars, but the MPI version is faster on bigger numbers. Not as smooth as I would like yet, but it's a start. http://youtu.be/DOi9BTTJTaY Is that using the ARM cores MPI, or one board with the coprocessor lib? I'm confused... Looks good, though! Using the 16 core epiphany on one board I have the MPI Hello World example running across 4 boards, so I may be able to work some magic here and get the star sim spread over 4 boards using MPI So the demo above with more stars is using both cores of the ARM plus the Copper Threads, or just both cores plus the same epiphany code you have been using? Too many cores to clearly get which ones you are using MPI with... |
|
Quoted:
So the demo above with more stars is using both cores of the ARM plus the Copper Threads, or just both cores plus the same epiphany code you have been using? Too many cores to clearly get which ones you are using MPI with... View Quote View All Quotes View All Quotes Quoted:
Quoted:
Quoted:
Quoted:
4096 stars using MPI on one board. My code is faster than the MPI version on smaller number of stars, but the MPI version is faster on bigger numbers. Not as smooth as I would like yet, but it's a start. http://youtu.be/DOi9BTTJTaY Is that using the ARM cores MPI, or one board with the coprocessor lib? I'm confused... Looks good, though! Using the 16 core epiphany on one board I have the MPI Hello World example running across 4 boards, so I may be able to work some magic here and get the star sim spread over 4 boards using MPI So the demo above with more stars is using both cores of the ARM plus the Copper Threads, or just both cores plus the same epiphany code you have been using? Too many cores to clearly get which ones you are using MPI with... The star calculations are all done on the 16 epiphany cores, the ARM is just doing the actual display of the stars, the 16 cores are doing the heavy lifting. |
|
Quoted:
The star calculations are all done on the 16 epiphany cores, the ARM is just doing the actual display of the stars, the 16 cores are doing the heavy lifting. View Quote View All Quotes View All Quotes Quoted:
Quoted:
Quoted:
Quoted:
Quoted:
4096 stars using MPI on one board. My code is faster than the MPI version on smaller number of stars, but the MPI version is faster on bigger numbers. Not as smooth as I would like yet, but it's a start. http://youtu.be/DOi9BTTJTaY Is that using the ARM cores MPI, or one board with the coprocessor lib? I'm confused... Looks good, though! Using the 16 core epiphany on one board I have the MPI Hello World example running across 4 boards, so I may be able to work some magic here and get the star sim spread over 4 boards using MPI So the demo above with more stars is using both cores of the ARM plus the Copper Threads, or just both cores plus the same epiphany code you have been using? Too many cores to clearly get which ones you are using MPI with... The star calculations are all done on the 16 epiphany cores, the ARM is just doing the actual display of the stars, the 16 cores are doing the heavy lifting. The MPI on ARM cores boosted performance that much!? You must have gotten dma_copy working, I take it? I didn't realize that the single core of the ARM was the bottleneck. |
|
An idea to help, get the source to xstar
It's a *nix program that makes an X11 window and does an n-body simulation by a number of algorithms, which you can choose. XStar is an X11 client that ''solves'' the n-body problem, and displays the results on the screen. It starts by putting a bunch of stars on the screen, and then it lets the inter-body gravitational forces move the stars around. The result is a lot of neat wandering paths, as the stars interact and collide. Try using the display mode options (-c, -C, -R, or -M) to make things more colorful. View Quote Source Code |
|
Quoted:
An idea to help, get the source to xstar It's a *nix program that makes an X11 window and does an n-body simulation by a number of algorithms, which you can choose. Source Code View Quote View All Quotes View All Quotes Quoted:
An idea to help, get the source to xstar It's a *nix program that makes an X11 window and does an n-body simulation by a number of algorithms, which you can choose. XStar is an X11 client that ''solves'' the n-body problem, and displays the results on the screen. It starts by putting a bunch of stars on the screen, and then it lets the inter-body gravitational forces move the stars around. The result is a lot of neat wandering paths, as the stars interact and collide. Try using the display mode options (-c, -C, -R, or -M) to make things more colorful. Source Code I'll check it out, thanks I am working on moving the MPI graphics to the ePiphany cores and dma copying the star positions into the frame buffer from there. If I get that working it should blow the doors off, speed wise. |
|
|
|
Awesome! Is that using all 64 epiphany cores, or just the single board yet?
|
|
|
Quoted:
Just one board, 16 cores. It won't work on all 4 boards since I can't DMA copy from one board to another's frame buffer. Going to have them generate openGL commands I think and display on my iMac. Kinda like this guy did with his parallella and a raspberry pi http://www.youtube.com/watch?v=6S-Epb6m6mI View Quote View All Quotes View All Quotes Quoted:
Quoted:
Awesome! Is that using all 64 epiphany cores, or just the single board yet? Just one board, 16 cores. It won't work on all 4 boards since I can't DMA copy from one board to another's frame buffer. Going to have them generate openGL commands I think and display on my iMac. Kinda like this guy did with his parallella and a raspberry pi http://www.youtube.com/watch?v=6S-Epb6m6mI That's showing more horsepower than I expected! DMA is a wonderful thing. |
|
Yeah, this is fast enough now I am not even going to bother with trying to code a Barnes-Hut algorithm.
|
|
Now make it X11 enabled.
(same problem with memory/network speed, I know...) |
|
|
|
Cool, glad you got it ok.
That is the server version, I am not as up to speed on those, check parallella.org for more info. They have a good forum, not as active as here (who is?) but tons of info I got to where I am today with this from there. |
|
Quoted:
Cool, glad you got it ok. That is the server version, I am not as up to speed on those, check parallella.org for more info. They have a good forum, not as active as here (who is?) but tons of info I got to where I am today with this from there. View Quote Will do that. An additional big hump for me is re-learning X11 coding, haven't started "raw" for a long time, and even then was with Motif objects. I'll try to run it through a Raspberry Pi to OpenGL using that guy's code in the video if I can find it... Off for more Pi... |
|
Quoted:
Will do that. An additional big hump for me is re-learning X11 coding, haven't started "raw" for a long time, and even then was with Motif objects. I'll try to run it through a Raspberry Pi to OpenGL using that guy's code in the video if I can find it... Off for more Pi... View Quote View All Quotes View All Quotes Quoted:
Quoted:
Cool, glad you got it ok. That is the server version, I am not as up to speed on those, check parallella.org for more info. They have a good forum, not as active as here (who is?) but tons of info I got to where I am today with this from there. Will do that. An additional big hump for me is re-learning X11 coding, haven't started "raw" for a long time, and even then was with Motif objects. I'll try to run it through a Raspberry Pi to OpenGL using that guy's code in the video if I can find it... Off for more Pi... Ok. My first job on Wall St was doing X11 Motif coding for trading systems, too long ago (20 years) to remember that stuff, but may be able to help locate what you need. |
|
Quoted:
Ok. My first job on Wall St was doing X11 Motif coding for trading systems, too long ago (20 years) to remember that stuff, but may be able to help locate what you need. View Quote It's amazing how when you wrote code 20 years ago and make some cool stuff, you tend to think it's burned into your head and you won't make the same learning mistakes again. Stop doing it for a decade or two, and you don't even remember the includes. I still look at some code I've written and was proud of (debian package maintainer for home automation stuff, retired) and curse myself for not enough comments. |
|
Go to here for the nBody code of my latest videos.
https://github.com/USArmyResearchLab/mpi-epiphany The path to get to where it all compiles and runs is kind of a pain, but I can walk you though it. That example has no stars output, just text, so should run on that server version ok once you work through the .lib issues I did. |
|
Link to the Epiphany SDK document
http://adapteva.com/docs/epiphany_sdk_ref.pdf Also, the parallella chronicles arfcom member 'AD_UK' creates https://www.parallella.org/2014/11/25/parallella-chronicles-part-one-2/ |
|
|
Quoted:
If I could do a 'redo' on my life, I would have skipped Wall St and done this stuff, fascinates me like no other non-hot female subject does. https://www.youtube.com/watch?v=-S-T_iTiAxQ View Quote This one is pretty cool too: |
|
View Quote View All Quotes View All Quotes Quoted:
Quoted:
If I could do a 'redo' on my life, I would have skipped Wall St and done this stuff, fascinates me like no other non-hot female subject does. https://www.youtube.com/watch?v=-S-T_iTiAxQ This one is pretty cool too: https://youtu.be/MncUDWhPB_E I have seen that one before. Pretty cool, but it reminds me of a wet paint brush slinging paint around I am sure the science behind it is spot on, but the rendering looks like cotton candy, someone could tune that up a bit I think. |
|
Quoted:
I have seen that one before. Pretty cool, but it reminds me of a wet paint brush slinging paint around I am sure the science behind it is spot on, but the rendering looks like cotton candy, someone could tune that up a bit I think. View Quote View All Quotes View All Quotes Quoted:
Quoted:
Quoted:
If I could do a 'redo' on my life, I would have skipped Wall St and done this stuff, fascinates me like no other non-hot female subject does. https://www.youtube.com/watch?v=-S-T_iTiAxQ This one is pretty cool too: https://youtu.be/MncUDWhPB_E I have seen that one before. Pretty cool, but it reminds me of a wet paint brush slinging paint around I am sure the science behind it is spot on, but the rendering looks like cotton candy, someone could tune that up a bit I think. They spent 8 months waiting for it. I guess I'd use what I had after that much time too. They could probably re-render it in another color pretty quick today and re-post it. That reminds me, how much torture would your code go through to run on that nVidia card you showed the smoke simulation on? Be interesting to compare them on a same size problem. |
|
Quoted:
They spent 8 months waiting for it. I guess I'd use what I had after that much time too. They could probably re-render it in another color pretty quick today and re-post it. That reminds me, how much torture would your code go through to run on that nVidia card you showed the smoke simulation on? Be interesting to compare them on a same size problem. View Quote View All Quotes View All Quotes Quoted:
Quoted:
Quoted:
Quoted:
If I could do a 'redo' on my life, I would have skipped Wall St and done this stuff, fascinates me like no other non-hot female subject does. https://www.youtube.com/watch?v=-S-T_iTiAxQ This one is pretty cool too: https://youtu.be/MncUDWhPB_E I have seen that one before. Pretty cool, but it reminds me of a wet paint brush slinging paint around I am sure the science behind it is spot on, but the rendering looks like cotton candy, someone could tune that up a bit I think. They spent 8 months waiting for it. I guess I'd use what I had after that much time too. They could probably re-render it in another color pretty quick today and re-post it. That reminds me, how much torture would your code go through to run on that nVidia card you showed the smoke simulation on? Be interesting to compare them on a same size problem. The Nvidia Jetson would blow the Parallella out of the water for this, but that is not a knock against the Parallella. Parallella is very cool for what it is, and I am learning a ton playing with it. It just wasn't made to compete with the likes of GPU, more a low wattage embedded system type chip. Samsung invested $3 mill into the company I believe, so maybe they will make an appearance in cell phones or tablets one of these days. Tough to be a small company in this space these days, I wish them well, the founder of Adapteva seems like a great guy. Also, there are 64 core chips out in the wild, not many, some kickstarter backers got some. I think they could ramp up to 1024 cores pretty quick if the right customer comes along to pay for it. |
|
|
|
|
|
Quoted:
Sort of on topic... 5 Star System Found http://cdnph.upi.com/sv/i/para/upi/UPI-1161436444896/2015/7/14364475097304/Astronomers-find-rare-five-star-system.jpg View Quote Awesome, that must be a complicated orbital pattern |
|
Quoted:
I have been asked to scale this up to 4096 stars, yikes! View Quote Congratulations! You could fudge it a bit, push to 4096 and increase the time, but.... Have you used the epiphany timers to see how many I/O waits and other "pauses" the cores bump into? That's "old school profiling" that might find the cycles you need. |
|
Sign up for the ARFCOM weekly newsletter and be entered to win a free ARFCOM membership. One new winner* is announced every week!
You will receive an email every Friday morning featuring the latest chatter from the hottest topics, breaking news surrounding legislation, as well as exclusive deals only available to ARFCOM email subscribers.
AR15.COM is the world's largest firearm community and is a gathering place for firearm enthusiasts of all types.
From hunters and military members, to competition shooters and general firearm enthusiasts, we welcome anyone who values and respects the way of the firearm.
Subscribe to our monthly Newsletter to receive firearm news, product discounts from your favorite Industry Partners, and more.
Copyright © 1996-2024 AR15.COM LLC. All Rights Reserved.
Any use of this content without express written consent is prohibited.
AR15.Com reserves the right to overwrite or replace any affiliate, commercial, or monetizable links, posted by users, with our own.