Skip to main content

Posts

If tech is everything, then it is nothing

ILLUSTRATION: GEORGE WYLESOL FOR BLOOMBERG BUSINESSWEEK What do #Facebook, #Tesla, #DoorDash, #Nvidia, and #GM* have in common? They are all "tech" companies. Alex Webb of Bloomberg offers a linguistic explanation for why technology ceased to be meaningful: "English lacked an equivalent to the French technique and German Technik. The English word “technique” hadn’t caught up with the innovations of the Industrial Revolution, and it still applied solely to the way in which an artist or artisan performed a skill." He contrasts technique as in "artistic technique" in English with technique as in "Lufthansa Technik" in German and argues that technology emerged in the early 20th century for the lack of a better alternative. Whether the reason is linguistic, sheer overhype, or semantic satiation, we may be better off dropping the "tech company" reference at this point unless it is elaborated further. For the companies that are more tech than

Data-driven paralysis

Data-driven decision making can lead to paralysis. Last week, the FDA and CDC committees couldn't make a decision about the booster shots because (complete) data was not available. Well, making decisions in the absence of complete data is a process of imagination and deep thinking, one that puts hypothesis development at the center and humans continue to prevail over machines in the process. To avoid such a paralysis, more focus can be put on developing and rethinking hypotheses and their likelihoods. In emergent problems, an in-depth discussion on hypotheses and likelihoods is probably more helpful than an obsession to access complete data. Otherwise, by defining complete data as a prerequisite, as it would be in data-driven decision making, we will continue to be paralyzed looking into the future. If we turn to data-informed decision making, however, hypotheses would take more control (not gut feeling but properly developed hypotheses*). We could then make decisions to be improve

To log or how to log

I avoid posting technical notes here. This is an exception because I have an agenda. Log transformation is widely used in modeling data for several reasons: Making data "behave," calculating elasticity etc. When an outcome variable naturally has zeros, however, log transformation is tricky. Many data modelers (including seasoned researchers) instinctively add a positive constant to each value in the outcome variable. One popular idea is to add 1 to the variable and transform raw zeros to log-transformed zeros. Another idea is to add a very small constant, especially when the scale of the outcome variable is small. Well, bad news is these are arbitrary choices and the resulting estimations may be biased. To me, if an analysis is correlational (as most are), a small bias may not be a big concern. If it is causal, and for example, an estimated elasticity will be used to take action (with an intention to change an outcome), that's trouble waiting to happen. This is a problem

In defense of Amazon (Trends)

#WSJ continues to report on #Amazon's shady practices. An earlier article said Amazon used sales data on third-party sellers to offer copycat, private-label products (like AmazonBasics). It was a coherent story but making hasty generalizations.  Another piece  showed how Amazon manipulates product search ads to favor its products. Both articles (linked within) underlined a data access problem: Amazon has access to the data on its rivals and exploits it for competitive advantage. This latest article  is not as coherent and a bit all over the place, but Amazon's response is not helping either. Amazon says "Offering products inspired by the trends to which customers are responding is a common practice across the retail industry." Amazon needs to nurture trust in its ecosystem but seems to be doing the opposite. I don't actually see any rampant issues except for access to product search data. Amazon is the dominant leader of the product search market (above Google a

Visualizing the death of James Wolfe

History paintings are like data visualizations. Here, NYT's Jason Farago presents  Benjamin West's 1770 painting "The Death of General Wolfe." If your dashboard looks like West's painting, you are in trouble. Then you need a Jason Farago to make it accessible to the management team. Dashboards summarize data, as West did in this history painting in 1770 (accurately or not -See Jason's walkthrough on that). The higher the density of information, the lower the chances of communicating successfully. Businesses increasingly need data translators or communicators, not so much "data artists." West is the data artist. Jason is the data translator. West skillfully abuses ggplot and matplotlib for the sake of art. Jason further masters Plotly, Shiny, and Dash. #dataart #datascience #visualization #dataviz #r #rstats #python #datacentricity archive.gtozer.net

Even guesswork starts with "I don't know"

To guess is to admit not knowing in the first place. The problem with Dilbert's coworkers and with most managerial teams is resisting to admit they don't know. Even horoscopes and guesswork should start with the acknowledgment of a knowledge gap. Without such an acknowledgment, the time and effort needed to formulate and solve a problem is not justified. To guess is then to pretend knowing. Guesswork supersedes learning from data because there is nothing to learn when it is all known. Successful data centric companies need a culture that encourages not knowing as much as knowing. #data #analytics #datamining #dataanalysis #datacentricity archive.gtozer.net

Data worker vs. intelligent agent of AI

Absent of imagination , data workers perform at best on par with intelligent agents, finding associations but failing in causality. Identifying causal links requires thinking in counterfactuals, which, in turn, requires imagining what could have been. What is absent must be imagined while what is present remains obvious, even to an algorithm. Data centric companies should invest at least as much in the thinking skills and imaginative ability as in the coding skills of their data workers for value creation. #data #analytics #ai #imagination #causality #causalinference #datacentricity archive.gtozer.net