97 Things About Ethics Everyone in Data Science Should Know: Collective Wisdom from the Experts

97 thingsMost of the high-profile cases of real or perceived unethical activity in data science aren’t matters of bad intent. Rather, they occur because the ethics simply aren’t thought through well enough. Being ethical takes constant diligence, and in many situations identifying the right choice can be difficult.

In this in-depth book, contributors from top companies in technology, finance, and other industries share experiences and lessons learned from collecting, managing, and analyzing data ethically. Data science professionals, managers, and tech leaders will gain a better understanding of ethics through powerful, real-world best practices.

In the chapter Triage and Artificial Intelligence, Peter Bruce discusses the role of AI in scenarios where it is inappropriate for an algorithm to make final decisions, and its role is analogous to that of the triage system making rapid intermediate decisions in a medical setting. In his chapter Random Selection at Harvard, he posits random selection as a statistical solution to the problem of human bias in college selection. In the chapter, The Ethical Dilemma of Model Interpretability, Grant Fleming discusses the ethical dimensions of the loss of transparency in black-box models, and how model interpretability is important.

Contributing Authors: Peter Bruce, Grant Fleming, Published August 25, 2020, O'Reilly Media


Mining Your Own Business - How to Use Analytics in Your Business


If you’re preparing to lead or participate in a data analytics initiative, Mining Your Own Business is the one book you must read.

In this practical guide for organizational leaders and top-level executives, industry experts Jeff Deal and Gerhard Pilcher explain in clear, understandable English.

  • What data mining and predictive analytics are
  • Why they are such powerful management tools
  • How to establish and manage a data science service

Complete with solid advice and instructive case studies, this book demonstrates how to harness the power of data mining and predictive analytics, and avoid costly mistakes.

Use it to gain a quick overview of analytics and as a handy resource to be referred to during a project.

Download the EBook to preview Chapter 3 - Leading a Data Analytics Initiative

Receiving early, strong praise from business government leaders who are using these powerful management tools to achieve dramatic goals for projects and their organizations.

“Mining Your Own Business is unbelievably helpful in framing the challenges we face in a way that non-data people can understand” —Lauren Purnell, Director of Data Analytics, Locus Health

"Government and Industry Executives, if you have not been comfortable with buying into a program based on advanced analytics, algorithms, and data scientists, this book will be transformational for you and your program. It captures all the critical elements and decades of experiences into a few clear pages that will light the path for predictive improvements. I will be sharing it with my leadership and program managers. Great job, gentlemen, in making a complex equation simple to follow!" — Fred Walker, Technical Director Counterintelligence, National Security Agency

“Data science and big data often don’t live up to their silver bullet hype.Why? Because IT and business are so different, and so hard to harmonize. This book is an excellent remedy; Deal and Pilcher distill a decade of analytics experience into a vital guide to what works. Mining Your Own Business is a must read for anyone interested in being right—by harnessing data to drive decisions.”—Peter Aiken, PhD, Founding Director, Data Blueprint

“This book is a must primer for the business leader wanting to leverage their institutional data to empower their decision making abilities. As someone who is beginning this process for my organization the book helps me understand the various steps one must take to fully appreciate the power of data as a corporate asset and how to maximize it's potential as a strong asset. This is not a simple task, and the book does a fine job laying out the various steps that must be taken to reach the ultimate solution of being in a position to make the best decisions possible given the information that organizations already have at their disposal."—Joby Giacalone, Director of Information Systems and Strategic Technology Solutions, The University of Virginia

Authors: Jeff Deal, Gerhard Pilcher, Published September 19, 2016, Data Science Publishing


Handbook of Statistical Analysis and Data Mining Applications

Cover photo of Handbook of Statistical Analysis and Data Mining Applications book

The Handbook of Statistical Analysis and Data Mining Applications is a comprehensive professional reference book that guides business analysts, scientists, engineers and researchers (both academic and industrial) through all stages of data analysis, model building and implementation. The Handbook helps one discern the technical and business problem, understand the strengths and weaknesses of modern data mining algorithms, and employ the right statistical methods for practical application.

Use this book to address massive and complex datasets with novel statistical approaches and be able to objectively evaluate analyses and solutions. It has clear, intuitive explanations of the principles and tools for solving problems using modern analytic techniques, and discusses their application to real problems, in ways accessible and beneficial to practitioners across industries - from science and engineering, to medicine, academia and commerce. This handbook brings together, in a single resource, all the information a beginner will need to understand the tools and issues in data mining to build successful data mining solutions.

"Rarely do authors succeed in writing THE comprehensive guide to anything, particularly when the subject matter is as complex, multifaceted, and rapidly changing as the field of data mining. The Handbook of Statistical Analysis & Data Mining Applications far exceeds that worthy goal. The text is well-organized, thoughtfully written and intuitive." — Colleen McCue, PhD

Authors: Robert Nisbet, Ph.D. John Elder, IV, Ph.D. Gary Miner, Ph.D. Published June 5, 2009, Elsevier Publishing

  • Awarded the 2009 American Publishers PROSE (Professional and Scholarly Excellence) award for mathematics! The PROSE awards annually recognize the very best in professional and scholarly publishing.
  • Written "by practitioners for practitioners"
  • Non-technical explanations build understanding without jargon and equations
  • Tutorials in numerous fields of study provide step-by-step instruction on how to use supplied tools to build models using Statistica, SAS and SPSS software
  • Practical advice from successful real-world implementations
  • Includes extensive case studies, examples, MS PowerPoint slides and datasets
  • CD-DVD with valuable fully-working 90-day software included: "Complete Data Miner - QC-Miner - Text Miner" bound with book


Practical Text Mining - Applying Analytics and Modeling

cover photo of Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications book by John Elder and Andrew Fast

In one comprehensive resource, Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications provides complete coverage of statistical and analytical concepts, techniques, and applications for text mining. Its step-by-step examples will aid professionals, practitioners, researchers, and advanced students—all those who need to learn how to rapidly distill text information into useful insights and actions for good decision making. This thorough reference reveals an in-depth examination of core text mining concepts, tools, and operations, and explains advanced techniques for pre-processing, knowledge representation, and visualization. Twenty-eight tutorials demonstrate realworld, mission-critical applications of text mining in such fields as insurance, finance, fraud detection, counter-terrorism, business intelligence, and genomics.

"...the definitive, go-to text mining resource." Eric Siegel, Ph.D.

Only recently has it become practical to begin to tap vast stores of text data for their valuable information. This comprehensive professional reference brings together all the techniques, tools, and methods a professional will need to efficiently use text mining applications. Extensive case studies and tutorials show how to use leading tools to solve real problems in varied fields from corporate finance to business intelligence, and genomics research to counterterrorism. The 1,000-page Handbook divides the field of text mining into seven Practice Areas, defined by the type of text data you have and your goal, and then shows how to accomplish useful tasks in each. Dozens of tutorials, illustrations, and real-world examples make it a tremendous time saver for practitioners seeking to create text-driven solutions.

The book was awarded the 2012 American Publishers PROSE (Professional and Scholarly Excellence) award for Computing and Information Sciences! The PROSE awards annually recognize the very best in professional and scholarly publishing.

Co-Authored by Andrew Fast and John Elder, along with Drs. Gary Miner, Dursun Delen, Thomas Hill, and Robert Nisbet. Click here for a two-page brochure about the book. Click here to view a sample chapter from the book.


Ensemble Methods in Data Mining

E cover photo of Ensemble Data Mining book by John Elder

Ensemble methods have been called the most influential development in Data Mining and Machine Learning in the past decade. They combine multiple models into one usually more accurate than the best of its components. Ensembles can provide a critical boost to industrial challenges—from investment timing to drug discovery, and fraud detection to recommendation systems—where predictive accuracy is more vital than model interpretability. 

"The practical implementations of ensemble methods are enormous. Most current implementations of them are quite primitive and this book will definitely raise the state of the art. Giovanni Seni's thorough mastery of the cutting-edge research and John Elder's practical experience have combined to make an extremely readable and useful book."  Jaffray Woodriff, Quantitative Investment Management

Ensembles are useful with all modeling algorithms, but this book focuses on decision trees to explain them most clearly. After describing trees and their strengths and weaknesses, the authors provide an overview of regularization -- today understood to be a key reason for the superior performance of modern ensembling algorithms. The book continues with a clear description of two recent developments: Importance Sampling (IS) and Rule Ensembles (RE). IS reveals classic ensemble methods -- bagging, random forests, and boosting -- to be special cases of a single algorithm, thereby showing how to improve their accuracy and speed. REs are linear rule models derived from decision tree ensembles. They are the most interpretable version of ensembles, which is essential to applications such as credit scoring and fault diagnosis. Lastly, the authors explain the paradox of how ensembles achieve greater accuracy on new data despite their (apparently much greater) complexity.

Ensemble Methods in Data Mining is aimed at novice and advanced analytic researchers and practitioners -- especially in Engineering, Statistics, and Computer Science. Those with little exposure to ensembles will learn why and how to employ this breakthrough method, and advanced practitioners will gain insight into building even more powerful models. Throughout, snippets of code in R are provided to illustrate the algorithms described and to encourage the reader to try the techniques.

This book is a required textbook for the Advanced Modeling Techniques  graduate course in Northwestern University's Master's of Science in Predictive Analytics program.


Practical Statistics for Data Scientists: 50 Essential Concepts

Practical Statistics for Data Scientists-1

Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective.

Practical Statistics explains how to apply key statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not.

If you’re familiar with the R or Python programming languages, and have had some exposure to statistics but want to learn a deeper statistical perspective, this quick reference bridges the gap in an accessible, readable format, and covers:

  • Exploratory data analysis
  • Data and sampling distributions
  • Statistical experiments and significance testing
  • Regression and prediction
  • Classification
  • Statistical machine learning
  • Unsupervised learning

The first edition of this book was a best-seller for O’Reilly and was translated into Japanese; the second edition adds Python. Co-Authored by Peter Bruce, Andrew Bruce, and Peter Gedeck.


Data Mining for Business Analytics

Data Mining for Business AnalyticsData Mining for Business Analytics is used at over 560 universities and colleges, and has been translated into Korean and Chinese. It has been adapted for four software environments (R, Python, Excel and JMP) and, since it was first published in 2007, has been through 11 editions.

Popular with practitioners, researchers and students, it presents an applied approach to data mining and predictive analytics with clear exposition, hands-on exercises, and real-life case studies. This book serves to anchor the curriculum in predictive analytics, which was developed and is taught by the author team.

The book was co-authored by Galit Shmueli, Peter Bruce, Nitin R. Patel, Inbal Yahav, and Peter Gedeck. This author team teaches 13 courses at The Institute for Statistics Education at (an Elder Research Company).

“This book has by far the most comprehensive review of business analytics methods that I have ever seen, covering everything from classical approaches such as linear and logistic regression, through to modern methods like neural networks, bagging and boosting, and even much more business specific procedures such as social network analysis and text mining. If not the bible, it is at least a definitive manual on the subject.” Gareth James, University of Southern California and co-author (with Witten, Hastie and Tibshirani) of the best selling book An Introduction to Statistical Learning, with Applications in R.

Click here to view the Table of Contents from the book.


Introductory Statistics and Analytics: A Resampling Perspective

Introductory Statistics and AnalyticsIntroductory Statistics and Analytics: A Resampling Perspective provides an accessible approach to statistical analytics, resampling, and the bootstrap for readers at multiple levels of exposure to basic probability and statistics. By developing key statistical concepts via resampling and demystifying traditional formulas, the book enhances statistical literacy and understanding and demonstrates the fundamental basis for statistical inference. Unlike most traditional statistics texts, Introductory Statistics and Analytics: A Resampling Perspective provides the linkages needed to connect statistics to the rapidly growing fields of data analytics and data science.

This book constitutes the foundation for the Introductory Statistics curriculum at the Institute for Statistics Education at (an Elder Research Company), a curriculum that was developed and is taught by Peter Bruce.

Click here to view the Table of Contents from the book.


Journeys to Data Mining: Experiences from 15 Renowned Researchers

cover photo of Journeys to Data Mining book

Elder Research's John Elder and Dustin Hux are featured in the book Journeys to Data Mining: Experiences from 15 Renowned Researchers, where career journey stories are shared by many who "helped the field to gain the reputation and importance it enjoys today, through the many valuable contributions they have made." Others featured include Elder Research alumni Dean Abbott and Cheryl Howard, and Elder Research friend Colleen McCue.


Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die

 Cover photo of Eric Siegel's book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die

Eric Siegel's popular new book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, features John Elder prominently in the first chapter and mentions Elder Research's Stein Kretsinger in several other stories throughout the book.