Monday, June 2, 2014

Stop Judging My Data

Data are.  Period.  Hard stop.

Data quality and data governance are powerful discussions and can have a huge impact on the ways in which data can be used in analysis.  It certainly makes things clearer when we have consistency of data entry, code sets, workflow, etc.  It occurred to me today, though, that the big data movement gives us at least one technique for dealing with suboptimal data quality.  (This is especially great for those of us who work in healthcare and like to complain about how contaminated our data environments are.)

At StampedeCon this past week, Kilian Weinberger described machine learning (a key technique in big data analysis) this way:

  • Traditional computer science takes input data and program instructions to generate output data.
  • Machine learning takes input data and output data to generate inferred program instructions.

This approach begs the question: who are we to judge the data?  If we have no program instructions to which the data is expected to either comply or produce particular results, then who are we to judge the so called data quality?  Let the machine be the judge of the data and infer from it what there is to infer, regardless of what our preconceived notions of quality might be.

Even the quality of the predictive models that come out of machine learning aren't really a judgement of data quality.  In most cases, the input and output used to train machine learning algorithms are the input and output of some other human process or more complex workflow.  If a good machine learning algorithm can't create a highly predictive model from that input and output, its probably an indication that the existing process is somewhat indeterminate.  That represents a measure of process quality, not data quality.

I may be over-reaching a bit on my desire to throw data quality arguments out the window.  It's just that I've heard data quality used as an excuse too many times in my career, when many of those cases were just a matter of not trying hard enough to understand what was really going on with the data.

45 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

    rpa training in velachery| rpa training in tambaram |rpa training in sholinganallur | rpa training in annanagar| rpa training in kalyannagar

    ReplyDelete
  4. I really like your blog. You make it interesting to read and entertaining at the same time. I cant wait to read more from you.
    Data Science course in Chennai | Data science course in bangalore
    Data science course in pune | Data science online course
    Python course in Kalyan nagar

    ReplyDelete
  5. Were a gaggle of volunteers as well as starting off a brand new gumption within a community. Your blog furnished us precious details to be effective on. You've got completed any amazing work!
    python training in rajajinagar | Python training in btm | Python training in usa

    ReplyDelete
  6. Howdy, would you mind letting me know which web host you’re utilizing? I’ve loaded your blog in 3 completely different web browsers, and I must say this blog loads a lot quicker than most. Can you suggest a good internet hosting provider at a reasonable price?
    Best AWS Training Institute in BTM Layout Bangalore ,AWS Coursesin BTM
    Best AWS Training in Marathahalli | AWS Training in Marathahalli
    Amazon Web Services Training in Jaya Nagar | Best AWS Training in Jaya Nagar
    AWS Training in BTM Layout |Best AWS Training in BTM Layout

    ReplyDelete
  7. Quickbooks enterprise support number +1 (833) 400-1001 is available for troubleshooting QuickBooks Enterprise through QuickBooks Enterprise Support. Call our Quickbooks support team at +1 (833) 400-1001 and contact our certified QuickBooks specialist for help.

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.

    data analytics courses

    data science interview questions

    business analytics courses

    data science course in mumbai

    ReplyDelete
  11. I wanted to leave a little comment to support you and wish you a good continuation. .I would like to thank you for the efforts you had made for writing this awesome article.
    python training in chennai

    python online training in chennai

    python training in bangalore

    python training in hyderabad

    python online training

    python flask training

    python flask online training

    python training in coimbatore



    ReplyDelete



  12. Nice article and thanks for sharing with us. Its very informative


    AI Training in Hyderabad

    ReplyDelete


  13. Nice article and thanks for sharing with us. Its very informative



    AI Training in Hyderabad

    ReplyDelete


  14. Nice article and thanks for sharing with us. Its very informative




    AI Training in Hyderabad

    ReplyDelete
  15. Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.
    data science training

    ReplyDelete
  16. This post is so interactive and informative.keep update more information...
    Tally Course in Tambaram
    Tally course in Chennai

    ReplyDelete
  17. Excellent effort to make this blog more wonderful and attractive.
    data science coaching in hyderabad

    ReplyDelete
  18. wow, great, I was wondering how to cure acne naturally. I found your site on Google, learned a lot, and now I'm a bit clearer. I’ve bookmarked your site and also added rss. keep us updated.
    data science training institute in hyderabad

    ReplyDelete
  19. Such a priceless piece of information. It was quite interesting to read this article. I would want to thank you for your efforts in writing this fantastic essay.
    CMA Coaching in Hyderabad

    ReplyDelete
  20. Everyone is talking the same thing over and over again, but I had the opportunity to find some beneficial facts in your page. I like your writing style and would want to recommend your blog to my circle of dudes.
    SAP MM Training in Hyderabad

    ReplyDelete
  21. Thank you for sharing this helpful information and informative blog with us.
    Best CEC Colleges In Hyderabad

    ReplyDelete
  22. What a thought-provoking post! Your perspective on data judgment challenges common misconceptions and encourages deeper understanding. Keep pushing the conversation forward—your insights are essential for fostering a more informed dialogue!
    Data Science Courses in Singapore

    ReplyDelete
  23. This blog presents a thought-provoking perspective on the relationship between data quality and machine learning, particularly in the context of big data and healthcare. The author's argument that machine learning can help mitigate the challenges posed by suboptimal data quality is compelling. By emphasizing the machine's ability to infer useful insights from imperfect data, the piece challenges traditional notions of data quality assessment. It’s a refreshing take that encourages readers to reconsider how they view and utilize data, especially in complex environments. Overall, it’s an engaging and insightful read that prompts further reflection on the evolving role of data in analysis.
    data analytics courses in dubai

    ReplyDelete
  24. "I took IIM Skills’ Data science while living in Mumbai, and it has been fantastic. The online format fits seamlessly into my schedule."

    ReplyDelete
  25. This perspective on data quality and machine learning is insightful! It’s true that the machine learning approach challenges traditional views on data “cleanliness,” especially since it’s designed to work with real-world data complexities.
    Data science courses in Mysore

    ReplyDelete
  26. Stop Judging My Data emphasizes the importance of examining data without bias or preconceived notions. In data analysis, judgment and assumptions can skew results, leading to misinterpretations or inaccurate conclusions. This concept encourages data professionals to maintain objectivity and allow the data to tell its own story. By resisting the urge to judge, analysts can uncover patterns and insights that may otherwise go unnoticed. Embracing an open, unbiased approach not only strengthens the integrity of the analysis but also fosters trust in data-driven decisions and outcomes.
    Thank you.
    Data science Courses in Germany






    ReplyDelete
  27. "Data analysis should be objective and based on facts, not assumptions. Stop judging data without understanding its context, as it can lead to incorrect conclusions."

    Data Science Course in Chennai

    ReplyDelete
  28. "Great post! The demand for data science skills is growing rapidly, and it's exciting to see opportunities available even in regions like Iraq. For those interested, Data science courses in Iraq can provide the perfect start to building a strong foundation in this field. Highly recommended for anyone looking to pursue a career in data science!"

    ReplyDelete
  29. This blog provides an interesting perspective on how data should be understood, not judged. A thought-provoking read for those passionate about data analysis and its interpretation!
    Data science course in Gurgaon

    ReplyDelete
  30. The blog offers an interesting perspective on data analysis, challenging assumptions and emphasizing the importance of understanding the data before making judgments.

    Data Science Course in Delhi

    ReplyDelete
  31. Great post! It's a refreshing take on how data should be treated without judgment. Understanding the context and significance of data is crucial before making assumptions or conclusions. Your perspective encourages a more thoughtful approach to working with data. Thanks for sharing this insightful post!

    Data science courses in Bangladesh

    ReplyDelete
  32. An interesting perspective! Embracing machine learning's ability to work with imperfect data challenges traditional views on data quality. Instead of focusing on judging the data, we can let algorithms infer meaning and patterns, even from "suboptimal" datasets. This shift helps us focus more on improving processes and understanding the data's context rather than getting stuck on perfection. Investment Banking Course

    ReplyDelete