Skip to main content

Posts

Showing posts from November, 2021

99/1 is the new 80/20

 ILLUSTRATION BY JACKIE FERRENTINO An obvious but often neglected fact is the overemphasized value of accuracy as a performance metric. In a two-class problem where 99% of the cases are of 0 (Not a spam email), achieving an accuracy of 99% is as easy as classifying all emails as safe. Sensitivity, specificity, and other metrics exist for a reason. The story of Waymo , Google's self-driving car, resembles the value of solving the remaining 1% of the problem where conventional machine learning gets stuck due to the limitations of training data. If 1% of the error turns into a make or break point, one needs to get creative. On a long tail that extends to infinity, walking faster or running does not probably help as much as a leap of imagination. I must note that it's not fair to expect an autonomous car to be "error-free" given we do not expect human drivers to perform error-free at the driver license exams and road tests. The two will just make different errors. #predic

When to normalize / apply weights

To me, this is interesting not because of the lack of transparency in methodology but the potential reason for the rankings to be wrong. I want to believe that this is a mistake not fraud, but really? Applying the weights before normalizing the scores? And the Bloomberg Businessweek spokesperson says "the magazine’s methodology was vetted by multiple data scientists." I have created a quick scenario as a reminder to my former (and current) students (posted in the comments as LinkedIn doesn't allow here). In the example, the scores are standardized across the five items (which are randomly generated and assigned weights). In the Businessweek rankings, standardization is supposed to be across institutions so that the weights proportionately affect each institution's score on the corresponding item. Nevertheless, the source of the error is the same. If the weights are applied before normalizing the data, the scores are adjusted by the weights disproportionately. Ranking

Algorithmic fashioning

PHOTOGRAPHER: JUSTIN CHIN/BLOOMBERG For years, Zara has been my go-to case to discuss data centricity in fashion retail. Zara is a staple example of how a focus on data and analytics combined with the right, complementary business processes can create wonders even in a market with high degrees of demand uncertainty due to the hedonic nature of consumption. Shein seems to be emerging as a contender, moving further into data-driven (not only data-informed) fast fashion. Its operation is also called real-time fashion rather than fast fashion. Shein doesn't own any physical stores (none at all) and ships all of its products directly from China. Bloomberg reports that "Shein has developed proprietary technology that harvests customers’ search data from the app and shares it with suppliers, to help guide decisions about design, capacity and production. It generates recommendations for raw materials and where to buy them, and gives suppliers access to a deep database of

“But it would be naïve to predict that unpredictable events won’t happen in the future.”

"Zillow Quits Home-Flipping Business, Cites Inability to Forecast Prices," WSJ reports.* I try to avoid passing along news stories but it's not everyday I receive a predictive analytics story as breaking news. I wonder whether the reason is really "an inability to forecast the prices" or "relying too much on an ability to forecast the prices" for a "$20 billion a year" venture as it was debuted. Zillow announced plans for this data-driven venture in 2018 by citing consumers who "expect magic to happen with a simple push of a button." In a statement yesterday, Zillow seems to have realized magic is not happening: “But it would be naïve to predict that unpredictable events won’t happen in the future.” Maybe it is never a good idea to develop a whole business model that grossly underestimates the changes in error (both reducible and irreducible ) due to potential bifurcations in market forces. * #nytimes coverage without a paywall #pre