Monday, June 2, 2014

Stop Judging My Data

Data are.  Period.  Hard stop.

Data quality and data governance are powerful discussions and can have a huge impact on the ways in which data can be used in analysis.  It certainly makes things clearer when we have consistency of data entry, code sets, workflow, etc.  It occurred to me today, though, that the big data movement gives us at least one technique for dealing with suboptimal data quality.  (This is especially great for those of us who work in healthcare and like to complain about how contaminated our data environments are.)

At StampedeCon this past week, Kilian Weinberger described machine learning (a key technique in big data analysis) this way:

  • Traditional computer science takes input data and program instructions to generate output data.
  • Machine learning takes input data and output data to generate inferred program instructions.

This approach begs the question: who are we to judge the data?  If we have no program instructions to which the data is expected to either comply or produce particular results, then who are we to judge the so called data quality?  Let the machine be the judge of the data and infer from it what there is to infer, regardless of what our preconceived notions of quality might be.

Even the quality of the predictive models that come out of machine learning aren't really a judgement of data quality.  In most cases, the input and output used to train machine learning algorithms are the input and output of some other human process or more complex workflow.  If a good machine learning algorithm can't create a highly predictive model from that input and output, its probably an indication that the existing process is somewhat indeterminate.  That represents a measure of process quality, not data quality.

I may be over-reaching a bit on my desire to throw data quality arguments out the window.  It's just that I've heard data quality used as an excuse too many times in my career, when many of those cases were just a matter of not trying hard enough to understand what was really going on with the data.

31 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

    rpa training in velachery| rpa training in tambaram |rpa training in sholinganallur | rpa training in annanagar| rpa training in kalyannagar

    ReplyDelete
  4. I really like your blog. You make it interesting to read and entertaining at the same time. I cant wait to read more from you.
    Data Science course in Chennai | Data science course in bangalore
    Data science course in pune | Data science online course
    Python course in Kalyan nagar

    ReplyDelete
  5. Were a gaggle of volunteers as well as starting off a brand new gumption within a community. Your blog furnished us precious details to be effective on. You've got completed any amazing work!
    python training in rajajinagar | Python training in btm | Python training in usa

    ReplyDelete
  6. Howdy, would you mind letting me know which web host you’re utilizing? I’ve loaded your blog in 3 completely different web browsers, and I must say this blog loads a lot quicker than most. Can you suggest a good internet hosting provider at a reasonable price?
    Best AWS Training Institute in BTM Layout Bangalore ,AWS Coursesin BTM
    Best AWS Training in Marathahalli | AWS Training in Marathahalli
    Amazon Web Services Training in Jaya Nagar | Best AWS Training in Jaya Nagar
    AWS Training in BTM Layout |Best AWS Training in BTM Layout

    ReplyDelete
  7. Quickbooks enterprise support number +1 (833) 400-1001 is available for troubleshooting QuickBooks Enterprise through QuickBooks Enterprise Support. Call our Quickbooks support team at +1 (833) 400-1001 and contact our certified QuickBooks specialist for help.

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.

    data analytics courses

    data science interview questions

    business analytics courses

    data science course in mumbai

    ReplyDelete
  11. I wanted to leave a little comment to support you and wish you a good continuation. .I would like to thank you for the efforts you had made for writing this awesome article.
    python training in chennai

    python online training in chennai

    python training in bangalore

    python training in hyderabad

    python online training

    python flask training

    python flask online training

    python training in coimbatore



    ReplyDelete



  12. Nice article and thanks for sharing with us. Its very informative


    AI Training in Hyderabad

    ReplyDelete


  13. Nice article and thanks for sharing with us. Its very informative



    AI Training in Hyderabad

    ReplyDelete


  14. Nice article and thanks for sharing with us. Its very informative




    AI Training in Hyderabad

    ReplyDelete
  15. Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.
    data science training

    ReplyDelete
  16. This post is so interactive and informative.keep update more information...
    Tally Course in Tambaram
    Tally course in Chennai

    ReplyDelete
  17. Excellent effort to make this blog more wonderful and attractive.
    data science coaching in hyderabad

    ReplyDelete
  18. wow, great, I was wondering how to cure acne naturally. I found your site on Google, learned a lot, and now I'm a bit clearer. I’ve bookmarked your site and also added rss. keep us updated.
    data science training institute in hyderabad

    ReplyDelete
  19. Such a priceless piece of information. It was quite interesting to read this article. I would want to thank you for your efforts in writing this fantastic essay.
    CMA Coaching in Hyderabad

    ReplyDelete
  20. Everyone is talking the same thing over and over again, but I had the opportunity to find some beneficial facts in your page. I like your writing style and would want to recommend your blog to my circle of dudes.
    SAP MM Training in Hyderabad

    ReplyDelete
  21. Thank you for sharing this helpful information and informative blog with us.
    Best CEC Colleges In Hyderabad

    ReplyDelete