Tag Archives: data science

April election poll check-in, or “it’s just the fading price shocks in gas and groceries, stupid”

Here’s where we stand as I write this on April 3, 2024. Sure, there are all sorts of reasons the polls might be wrong and it is a long time until election day…but I would rather be ahead in the polls and saying that than behind, wouldn’t you? Or even behind and getting less behind.

STATE2020 RESULTMost Recent Real Clear Politics Poll Average (as of 4/3/24)
ArizonaBiden +0.4%Trump +5.2% (March 1: Trump +5.5)
GeorgiaBiden +0.3%Trump +4.5% (March 1: Trump +6.5)
WisconsinBiden +0.6%Trump +0.6% (March 1: Trump +1.0%)
North CarolinaTrump +1.3%Trump +4.6% (March 1: Trump +5.7%)
PennsylvaniaBiden +1.2%Trump +0.6% (March 1: Biden +0.8%)
MichiganBiden +2.8%Trump +3.4% (March 1: Trump +3.6%)
NevadaBiden +2.4%Trump +3.2% (March 1: Trump +7.7%)

The electoral college vote, as it stands at the moment, would be 312 for Trump to 226 for Biden. (March 1: 293 for Trump to 245 for Biden)

So the verdict is…Biden behind but getting less behind in every swing state (6 out of 7) except Pennsylvania. The Nevada, Georgia, and North Carolina moves are all more than 1% towards Biden. Arizona, Wisconsin, and Michigan are less than 1% towards Biden. The Pennsylvania move is less than 1% towards Trump, but because this flips the state from slight Biden to slight Trump, Trump now leads all swing states and the electoral college looks even worse for Biden than a month ago.

Have we gone from “it’s the economy, stupid” to “it’s the rate of change in the rate of change in the price of groceries, compared to the rate of change of the rate of change in the price of groceries two years ago, stupid”? Maybe it’s that simple. Sure, there is plenty going on in the world in terms of war and peace and the collapsing biosphere that supports all life. But we are Americans, and we don’t base our votes on these things. At least not enough of us, enough of the time to make a difference compared to the damn price of groceries. All things being equal, I would wager on this trend continuing over the next seven months. Of course, all things will probably not be equal – a significant recession that throws a significant number of voters out of work would be the worst possible thing for Biden. Because it doesn’t matter so much how much the damn groceries cost if you have no money at all. On the other hand, most other crises might tend to give Biden a chance to show some leadership, which at least some voters might like. And of course, Biden and/or Trump could drop dead at any time. I am not predicting any of these things, just defining a range of things that could happen.

weather forecasting

This is interesting. It is not 100% clear to me what the measure of accuracy is below, but the plot shows how much weather forecasting has improved over the last 50 years or so. A 3-5 day forecast is highly accurate now, and 3-5 are not that different. It’s interesting to me that there is such as large drop off in accuracy between a 7 and 10 day forecast – that is not necessarily intuitive, but useful even in everyday life. A 10-day forecast is basically a coin flip, while check back 3 days later and you are closer to 80/20 odds. This is based on pressure measured at a certain height I think, so it doesn’t necessarily mean forecasts of precipitation depth and intensity, rain vs. snow vs. ice, thunder and lightning, tornadoes, etc. are going to be as accurate as this implies.

Our World in Data

There is some suggesting that AI (meaning purely statistical approaches, or AI choosing any blend of statistics and physics it wants?) might make forecasting much faster, cheaper, and easier yet again.

most popular R books of 2023

Here is something useful (to me, personally, and maybe too others), and thankfully not too pessimistic or morally fraught.

A Crash Course in Geographic Information Systems (GIS) using R – yes, please! We must end the tyranny of the monopolistic Environmental Systems “Research Institute”. Okay, they make some nice products, but just admit you are a rapacious for-profit corporation, please!

A ggplot2 Tutorial for Beautiful Plotting in R – Who doesn’t need to improve their data visualization and communication game?

just start your y-axis at zero

Seriously, just do that and it will work out most of the time. The only exception in my mind is if you are comparing the range or spread of two data sets and neither one is close to zero.

Snopes

I’ve been to Indonesia, and people there are normal human beings who are in fact somewhat shorter than Europeans on average. But their heads were typically around my shoulder height, not my knees. Some political violence has occurred there in the not-so-distant past, but I found the culture warm and hospitable. Like almost any country not at war, the biggest risk to your physical safety is probably being in a car accident or hit by a car. The next biggest if you are there for any length of time might be air pollution and second hand smoke. Once an Indonesian woman yelled at me to not sit next to her on a ferry. The ferry was crowded and there was nowhere else to sit, but I was eventually able to solve the problem by swapping seats with another woman (my gender being what made her uncomfortable apparently.) Other times I had groups of female Indonesian tourists stop me on the street and ask to take vacation pictures with me to show their friends back home. This was when I was quite a bit younger than I am now.

tile maps

Tile maps, which visually show areas with unequal areas as having equal area, are, somewhat obviously, appropriate when you don’t want the unequal geographic area to distort the message you are trying to communicate. An example might be if you want to show a variable by congressional districts, which have (roughly) equal populations but variable (spatial) areas.

A couple other ideas with tile maps are (1) to use rectangles of equal shape but different length/width ratios, and (2) to use words spatially arranged and with a variety of properties (font, size, color) to denote a variety of variables.

accuracy of a model vs. its “decisional quality”

I like the way the abstract of this paper distinguishes between (1) the accuracy of a model as measured by comparing it to physical observations (always assuming those are an accurate or at least unbiased measurement of the true state of the universe and (2) the appropriateness of a model to be used in decision making. I find these concepts very, very difficult to get across even to scientists and engineers.

Ecological forecasting models: Accuracy versus decisional quality

We consider here forecasting models in ecology or in agronomy, aiming at decision making based upon exceeding a quantitative threshold. We address specifically how to link the intrinsic quality of the model (its accuracy) with its decisional quality, ie its capacity to avoid false decisions and their associated costs. The accuracy of the model can be evaluated by the [Greek symbol rho – I don’t know what they mean by this just from reading the abstract] of the regression of observed values versus estimated ones or by the determination coefficient. We show that the decisional quality depends not only of this accuracy but also of the threshold retained to make the decision as well as on the state of nature. The two kinds of decisional errors consists either in deciding no action while an action is required (false negatives) or to act while it is useless (false positives). We also prove that the costs associated to those decisions depend also both of the accuracy of the model and of the value of the decision threshold.

Ecological Modeling

(slightly less) depressing stats on the U.S.: suicides

Here are some suicide stats from Our World in Data. It would be nice if they would add some more groupings like OECD, but I have chosen a somewhat arbitrary sample of peer countries. It surprised me that even though we are hearing about “deaths of despair”, the U.S. is not doing terribly on this metric compared to peers. We are doing a bit worse than our close cultural cousins Canada and Australia. The UK does surprisingly well on this metric, even a bit better than Germany and Denmark. Latin America (I picked Mexico because they’re our neighbor and Brazil because they’re big) doesn’t seem to have a big issue with suicide. The two Asian countries I picked do seem to have an issue – Japan has a higher suicide rate than all the European countries I picked. Then there is a big jump to the two worst countries (that I picked arbitrarily), South Korea and Russia. Russia is the worst, but has brought its rate down a lot if you buy into this data analysis.

538 – best charts of 2022

There is nothing in 538’s best charts of 2002 that truly bowled me over. I mean, there are some graphics and maps that are effective at telling a story about their underlying data. There just aren’t any types of charts or applications of old types of charts that were a big surprise to me and that I thought I would want to copy if I could. Just purely for personal interest in the subject matter, the one I found most interesting was the map showing how college football conferences are losing all geographic meaning. I find myself slowly being less interested in college football with each passing year, and this is one reason why. My team’s losing campaign, loss to the NFL or “transfer portal” of many of their best players, blowout of the junior varsity squad in the mid-December bowl game they were lucky to even be selected for, and lackluster recruiting class are other reasons.

measuring inflation is hard

Measuring inflation is hard for a variety of reasons, and it gets even harder when you try to compare across countries and regions. Some of the reasons include methodological choices in averaging, weighting, how housing and transportation are accounted for, how urban and rural consumers are included, and many others. There is a measure called the Harmonized Index of Consumer Prices (HICP) that is used to try to compare across countries and regions. This differs from the U.S. CPI in a variety of ways.