Warning

 

Close

Confirm Action

Are you sure you wish to do this?

Confirm Cancel
BCM
User Panel

Site Notices
Posted: 5/10/2023 5:46:39 PM EDT
We recently discovered a bug in our software.

If you run 2 or 3 instances of our software simultaneously, you can get it to crash.

Leadership somehow got it into their head that it was a system performance issue. So I've been going back and forth with senior leadership for the last two weeks trying to draft the public messaging.

Just today I got a message from them saying "Avoid CPU utilization above 90%" and "communicate system requirements"

The problem is that it will crash on any system with 2 or 3 copies running at the same time. It doesn't matter what the system specs are.

But leadership keeps trying to assign hardware specifications to the issue, even after I told them more than a week ago "It's an operator behavior issue"


Also there is no purpose to running multiple copies at the same time.
Link Posted: 5/10/2023 5:58:35 PM EDT
[#1]
Link Posted: 5/10/2023 6:03:18 PM EDT
[#2]
Discussion ForumsJump to Quoted PostQuote History
Quoted:
Sounds like they are trying to do damage prevention but what clients really want is ownership and a fix yesterday. You can actually enhance your value to a client by owning an issue. Too many software OEMs do what your senior management is doing and it has given software companies a reputation akin to used car dealers.

Your bosses aren't going to buy into your outlook until you can prove to them that it affects the bottom line more positively than theirs. Does it make them money, save them money or make the process more efficient? You need to prove at least one of those.
View Quote


The error only occurs when someone is using two copies of our software at the same time, which they should never be doing anyway. It doesn't save them any time.
We do have a long-term fix in the pipeline but it will take quite a while to implement for reasons I can't get into.

Also this is gov't owned and funded software so profit isn't really a consideration.
Link Posted: 5/10/2023 6:28:01 PM EDT
[#3]
Link Posted: 5/10/2023 7:07:29 PM EDT
[#4]
Quoted:
We recently discovered a bug in our software.

If you run 2 or 3 instances of our software simultaneously, you can get it to crash.

Leadership somehow got it into their head that it was a system performance issue. So I've been going back and forth with senior leadership for the last two weeks trying to draft the public messaging.

Just today I got a message from them saying "Avoid CPU utilization above 90%" and "communicate system requirements"

The problem is that it will crash on any system with 2 or 3 copies running at the same time. It doesn't matter what the system specs are.

But leadership keeps trying to assign hardware specifications to the issue, even after I told them more than a week ago "It's an operator behavior issue"


Also there is no purpose to running multiple copies at the same time.
View Quote

1. Make sure you've communicated the actual root cause over email or something else that leaves a record.

2. Give them whatever they ask for.
Link Posted: 5/10/2023 7:30:12 PM EDT
[#5]
Discussion ForumsJump to Quoted PostQuote History
Quoted:
Easy fix - Have it try to open a socket at start up. If the socket is already in use, there's already a copy running so shut down. The end user can try to start copies all day long but only one will run at a time.

Make the socket configurable to account for sockets that are in use by other applications.
View Quote



A mutex would work as well and probably be simpler.
Link Posted: 5/11/2023 5:45:56 AM EDT
[#6]
Discussion ForumsJump to Quoted PostQuote History
Quoted:

1. Make sure you've communicated the actual root cause over email or something else that leaves a record.

2. Give them whatever they ask for.
View Quote


Always CYA!
Link Posted: 5/11/2023 8:15:55 AM EDT
[#7]
Link Posted: 5/11/2023 11:02:30 AM EDT
[#8]
Discussion ForumsJump to Quoted PostQuote History
Quoted:
True, if the issue involves protecting a particular resource to prevent a race condition, in which case a mutex would be a good solution. The socket technique, on the other hand, prevents multiple iterations of an application from being run at the same time.
View Quote View All Quotes
View All Quotes
Discussion ForumsJump to Quoted PostQuote History
Quoted:
Quoted:
Quoted:
Easy fix - Have it try to open a socket at start up. If the socket is already in use, there's already a copy running so shut down. The end user can try to start copies all day long but only one will run at a time.

Make the socket configurable to account for sockets that are in use by other applications.



A mutex would work as well and probably be simpler.
True, if the issue involves protecting a particular resource to prevent a race condition, in which case a mutex would be a good solution. The socket technique, on the other hand, prevents multiple iterations of an application from being run at the same time.


I use Global mutexes in C# to make sure only a single instance of an application can run at the same time.  I suppose other languages could necessitate a different technique though.
Link Posted: 5/11/2023 8:04:04 PM EDT
[#9]
Quoted:...Just today I got a message from them saying "Avoid CPU utilization above 90%" and "communicate system requirements"...
View Quote

OK I can give them the "communicate system requirements" but why would management think that you telling the customer "Avoid CPU utilization above 90%" is in any way different than telling them "Avoid running more than 1 instance on a system at a time" ???

LOL if its merely because the first one there is the wording that *THEY* came up with.
Link Posted: 5/11/2023 10:25:10 PM EDT
[#10]
Ask them to add verbiage that states each instance of the app takes 50% of system resources. That would be compatible with their guidance. Sometimes you just have to manage up.
Link Posted: 5/16/2023 1:25:30 AM EDT
[#11]
Well this gets more fun.

Turns out the tester was now able to get it to fail with only 1 instance of the program. Executing certain memory-heavy tasks underloads the CPU compared to other less-memory-intesive tasks. So with a different scenario he can get the errors.

3% errors in each simulation scenario
errored simulations can be re-ran and typically succeed the 2nd time (with 97% success)

And I have to figure out how to communicate this issue to the users.
Close Join Our Mail List to Stay Up To Date! Win a FREE Membership!

Sign up for the ARFCOM weekly newsletter and be entered to win a free ARFCOM membership. One new winner* is announced every week!

You will receive an email every Friday morning featuring the latest chatter from the hottest topics, breaking news surrounding legislation, as well as exclusive deals only available to ARFCOM email subscribers.


By signing up you agree to our User Agreement. *Must have a registered ARFCOM account to win.
Top Top