Warning

 

Close
Confirm Action

Are you sure you wish to do this?

Cancel Confirm
AR15.COM
12/6/2011 4:27:31 PM EDT
I am looking for some advice regarding data validation.

I am looking to validate a value against historical min and max to check for reasonableness...  Lets say I am downloading the market cap for a stock every night.  I want to check that this value falls with in a reasonable range for data points prior to the current day.  I can use a simple min and max function but I need to exclude outliers that are caused by invalid data.  If a data point in the history is incorrect I would like to ignore it.  What do you guys think?
12/6/2011 4:28:35 PM EDT
[#1]
Business logic doesn't belong in the database, IMO.
12/6/2011 4:45:53 PM EDT
[#2]
Quoted:
Business logic doesn't belong in the database, IMO.


This everything goes in the database. If you want to manipulate it do so in a query. If you don't trust the initial data, fix it at the source.
12/6/2011 5:13:16 PM EDT
[#3]
If there is a great deal of data, it makes sense to do the work in the database for performance reasons.

If I were you, I would normalize the data, bring the flyers into the group, or delete those data points, then fix your source or import.

Generally I don't approve of business logic being in the database, although we do it in the form of stored procedures.

There are statistical functions in SQLServer that you can use to determine mean and deviation.
12/6/2011 5:22:17 PM EDT
[#4]
Quoted:
If there is a great deal of data, it makes sense to do the work in the database for performance reasons.

If I were you, I would normalize the data, bring the flyers into the group, or delete those data points, then fix your source or import.

Generally I don't approve of business logic being in the database, although we do it in the form of stored procedures.

There are statistical functions in SQLServer that you can use to determine mean and deviation.



yes I am using stored procs.  in theory it would be nice to write a c# validation layer but as a one man show I hardly have the time to do so.
12/6/2011 6:25:46 PM EDT
[#5]
I think your underlying premise is flawed.  I would check to make sure you have correctly downloaded the value, but it is what it is.  A value that might appear to be invalid compared to historical trends could just as readily be valid.  Markets can be volatile.

You might just want to flag it if it varies by more than X% from the previous day's value.
12/7/2011 3:28:56 AM EDT
[#6]
Quoted:
I think your underlying premise is flawed.  I would check to make sure you have correctly downloaded the value, but it is what it is.  A value that might appear to be invalid compared to historical trends could just as readily be valid.  Markets can be volatile.

You might just want to flag it if it varies by more than X% from the previous day's value.


you're right... but there are cases where a data provider just plain screws up.
12/7/2011 3:38:11 AM EDT
[#7]
Quoted:
Quoted:
If there is a great deal of data, it makes sense to do the work in the database for performance reasons.

If I were you, I would normalize the data, bring the flyers into the group, or delete those data points, then fix your source or import.

Generally I don't approve of business logic being in the database, although we do it in the form of stored procedures.

There are statistical functions in SQLServer that you can use to determine mean and deviation.



yes I am using stored procs.  in theory it would be nice to write a c# validation layer but as a one man show I hardly have the time to do so.


BTDT, I feel for ya. I'll never be a one man IT shop again.
12/7/2011 3:42:58 AM EDT
[#8]



Quoted:





BTDT, I feel for ya. I'll never be a one man IT shop again.







 
yep. i'd argue that to be effective, almost anything IT needs a team these days. the complexity has exploded over the past several years. trying to do it alone almost guarantees mediocre results - at best.
12/7/2011 4:01:07 AM EDT
[#9]



Quoted:


I am looking for some advice regarding data validation.



I am looking to validate a value against historical min and max to check for reasonableness...  Lets say I am downloading the market cap for a stock every night.  I want to check that this value falls with in a reasonable range for data points prior to the current day.  I can use a simple min and max function but I need to exclude outliers that are caused by invalid data.  If a data point in the history is incorrect I would like to ignore it.  What do you guys think?


Is this an enterprise level app or at least something complex?  Will other developers need to support/extend it in the future?  If so, I agree with keeping the business logic out of the database, but what I believe in more than that is keeping it in one place.  It's either all in sprocs or all in code.  Otherwise, don't worry about architecture too much.  Just make it do what you need it to do.



Do you own the source data?  If so, don't run this logic each time, fix your source up front with some scrubbing.



Nearly all my SQL experience is OLTP type stuff so I can't help to much off the top of my head with what sounds more like a decision support type problem.  If you don't have one, sign up for a Safari Books Online trial.  I've been a subscriber for years and there are quite a few SQL recipe type books available in there.



JMHO



-Chris