Tag Archives: data science

flow maps

Here is an interesting paper proposing design principles for flow maps, which “visualize movement using a static image and demonstrate not only which places have been affected by movement but also the direction and volume of movement.”

Design principles for origin-destination flow maps

Origin-destination flow maps are often difficult to read due to overlapping flows. Cartographers have developed design principles in manual cartography for origin-destination flow maps to reduce overlaps and increase readability. These design principles are identified and documented using a quantitative content analysis of 97 geographic origin-destination flow maps without branching or merging flows. The effectiveness of selected design principles is verified in a user study with 215 participants. Findings show that (a) curved flows are more effective than straight
flows, (b) arrows indicate direction more effectively than tapered line widths, and (c) flows between nodes are more effective than flows between areas. These findings, combined with results from user studies in graph drawing, conclude that effective and efficient origin-destination flow maps should be designed according to the following design principles: overlaps between flows are minimized; symmetric flows are preferred to asymmetric flows; longer flows are curved
more than shorter or peripheral flows; acute angles between crossing flows are avoided; sharp bends in flow lines are avoided; flows do not pass under unconnected nodes; flows are radially distributed around nodes; flow direction is indicated with arrowheads; and flow width is scaled with represented quantity.

Tobler’s first law of geography

Since I seem to be on a kick of writing about key theories I didn’t learn in school (and perhaps I am a bit burned out thinking about politics and climate change, and I don’t have any amazing new technologies to share today), here is the first law of geography:

The first law of geography was developed by Waldo Tobler in 1970 and it makes the observation that ‘everything is usually related to all else but those which are near to each other are more related when compared to those that are further away’.  This observation which Tobler made is closely related to the ‘Law of Universal Gravitation’ and the ‘Law of Demand’ as well. The concept was first applied by Tobler to urban growth systems and was not popularly received when it was first published.  It wasn’t until the 1990s when this formulation of the concept of spatial autocorrelation became an important underlying concept in the field of GIS.

best practices for writing code

Here’s another R post I am saving for my own reference – some best practices for writing code. This is something I actually can say I learned in engineering school – it was a covered in 15 minutes or so in a required intro to computer science course I took around 1994. Perhaps it’s time to brush up. Again, these are skills that are useful these days in many fields beyond just computer science and software development.

relational algebra

R bloggers has a nice post on the theory behind database organization, and some tools that can used to manage and manipulate data through R. Maybe this seems very specialized, but many of our jobs involve dealing with data these days, so this knowledge and tools is potentially relevant to us, and yet I don’t think many of us even in technical fields outside math and computer science learn this stuff in school.

November 2016 in Review

Sometimes you look back on a month and feel like nothing very important happened. But November 2016 was obviously not one of those months! I am not going to make any attempt to be apolitical here. I was once a registered independent and still do not consider myself a strong partisan. However, I like to think of myself as being on the side of facts, logic, problem solving, morality and basic goodness. Besides, this blog is about the future of our human civilization and human race. I can’t pretend our chances didn’t just take a turn for the worse.

3 most frightening stories

  • Is there really any doubt what the most frightening story of November 2016 was? The United Nations Environment Program says we are on a track for 3 degrees C over pre-industrial temperatures, not the “less than 2” almost all serious people (a category that excludes 46% of U.S. voters, apparently) agree is needed. This story was released before the U.S. elected an immoral science denier as its leader. One theory is that our culture has lost all ability to separate fact from fiction. Perhaps states could take on more of a leadership role if the federal government is going to be immoral? Washington State voters considered a carbon tax that could have been a model for other states, and voted it down, in part because environmental groups didn’t like that it was revenue neutral. Adding insult to injury, WWF released its 2016 Living Planet Report, which along with more fun climate change info includes fun facts like 58% of all wild animals have disappeared. There is a 70-99% chance of a U.S. Southwest “mega-drought” lasting 35 years or longer this century. But don’t worry, this is only “if emissions of greenhouse gases remain unchecked”. Oh, and climate change is going to begin to strain the food supply worldwide, which is already strained by population, demand growth, and water resources depletion even without it.
  • Technological unemployment may be starting to take hold, and might be an underlying reason behind some of the resentment directed at mainstream politicians. If you want a really clear and concise explanation of this issue, you could ask a smart person like, say, Barack Obama.
  • According to left wing sources like Forbes, an explosion of debt-financed spending on conventional and nuclear weapons is an expected consequence of the election. Please, Mr. Trump, prove them wrong!

3 most hopeful stories

3 most interesting stories

Nate Silver and college football

I thought Nate Silver only looked at professional sports. I was wrong – here is a cool interactive web page he has put together for college football. The numbers don’t always give you the answers you want to hear though – even if my beloved Gators somehow win all the rest of their games, which would include beating Alabama in the conference championship game, he gives them only a 13% chance of winning the national championship. Another nice thing about Nate Silver – he always explains his methodology.

We’ll be updating the numbers twice weekly: first, on Sunday morning (or very late Saturday evening) after the week’s games are complete; and second, on Tuesday evening after the new committee rankings come out. In addition to a probabilistic estimate of each team’s chances of winning its conference, making the playoff, and winning the national championship, we’ll also list three inputs to the model: their current committee ranking, FPI, and Elo. Let me explain the role that each of these play…

FPI is ESPN’s Football Power Index. We consider it the best predictor of future college games so that’s the role it plays in the model: if we say Team A has a 72 percent chance of beating Team B, that prediction is derived from FPI. Technically speaking, we’re using a simplified version of FPI that accounts for only each team’s current rating and home field advantage; the FPI-based predictons you see on ESPN.com may differ slightly because they also account for travel distance and days of rest…

Our college football Elo ratings are a little different, however. Instead of being designed to maximize predictive accuracy — we have FPI for that — they’re designed to mimic how humans rank the teams instead.4 Their parameters are set so as to place a lot of emphasis on strength of schedule and especially on recent “big wins,” because that’s what human voters have historically done too. They aren’t very forgiving of losses, conversely, even if they came by a narrow margin under tough circumstances. And they assume that, instead of everyone starting with a truly blank slate, human beings look a little bit at how a team fared in previous seasons. Alabama is more likely to get the benefit of the doubt than Vanderbilt, for example, other factors held equal.

R code to read Nate Silver’s data

Thanks to Nate Silver for posting all his polling data in a convenient text file that anyone can read! It’s a nice thing to do. Even though not many of us can do as interesting things with it as Nate Silver, it is a fun data set to play and practice with. Here is an R-bloggers post with some ideas on how to play with it.

 

weather forecasting history

I recently wrote about earthquake forecasting and how many scientists think it is essentially impossible. But it is interesting to compare that with the state of weather forecasting in the 1800s:

Before the Royal Charter storm, FitzRoy had been agitating in London for government funding for collection of weather data. He and other Victorian men of meteorology knew that the more they could parse what the weather had done in the past, the better they could warn what it might do in the future. FitzRoy called the concept “forecasting.” To show just how ludicrous that idea seemed at the time, Moore unearths a telling 1854 Commons debate. When a scientifically enthusiastic member of Parliament suggested that amassing weather observations from sea and land could someday mean “we might know in this metropolis the condition of the weather 24 hours beforehand,” laughter broke out raucously enough to stop the proceeding.