Home

New Twitter Algorithms

November 5, 2017

My Twitter block list got unmanageably large, and blocktogether.org was not even able to remove blocks at any sort of a reasonable rate to help me fix it. So, I used my employer’s mighty-fine search engine to look for any Go packages for the Twitter API, and found Anaconda.

I had to spend a little time checking back at the Twitter API pages, and reading the source code, but pretty quickly (spare time in one afternoon) I had a program put together to remove my old block list, printing it as it goes. I’m going to include two programs here in case anyone else wants a leg up to do something of their own, because I can only spend so much time on this. Maybe I should toss these back to the Anaconda author as examples.

This program gets your block list. If your list is long, it takes a while because of throttling:

package main

import (
	"fmt"
	"net/url"
	"github.com/ChimeraCoder/anaconda"
)

func main() {
	// Next three lines use secret strings from Twitter developer API.
	// Go there, follow your nose.  See in particular:
	// https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens
	anaconda.SetConsumerKey("your_consumer_key")
	anaconda.SetConsumerSecret("your_consumer_secret")
	api := anaconda.NewTwitterApi("your_access_token", "your_access_token_secret")
	fmt.Println(*api.Credentials)

	v := url.Values{}

	cursor := "-1" // Initial cursor value
	for cursor != "0" {
		v.Set("cursor", cursor)
		v.Set("count", "5000") // 200 might be a better number
		result, err := api.GetBlocksList(v)

		if err != nil {
			fmt.Println("Err = ", err)
			return
		}
		fmt.Printf("#Users=%d\n", len(result.Users))
		for _, user := range result.Users {
			fmt.Printf("id=%s, name=%s\n", user.IdStr, user.Name)
		}
		cursor = result.Next_cursor_str
	}
}

This program undoes a block list supplied on standard input, printing it as it goes. I had previously downloaded mine from blocktogether using a shell script someone provided as a workaround on the relevant blocktogether bug. Again, throttling will slow you down. I started it running last night, it’s still running today (just had breakfast, started writing this):

package main

import (
	"bytes"
	"fmt"
	"io/ioutil"
	"net/url"
	"os"
	"strings"
	"github.com/ChimeraCoder/anaconda"
)

func main() {
	// Next three lines use secret strings obtained from Twitter developer API.
	// Go there, follow your nose.  See in particular:
	// https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens
	anaconda.SetConsumerKey("your_consumer_key")
	anaconda.SetConsumerSecret("your_consumer_secret")
	api := anaconda.NewTwitterApi("your_access_token", "your_access_token_secret")
	fmt.Println(*api.Credentials)

	b, err := ioutil.ReadAll(os.Stdin)
	if err != nil {
		fmt.Printf("err=%v\n", err)
		return
	}
	lb := bytes.Split(b, []byte{'\n'})
	fmt.Printf("Number of lines = %d\n", len(lb))

	for i, bb := range lb {
		v := url.Values{}
		u := strings.TrimSpace(string(bb))
		if u == "" {
			continue
		}
		v.Set("user_id", u)
		user, err := api.Unblock(v)
		if err != nil {
			fmt.Printf("Unblock %s, err=%v\n", u, err)
		} else {
			fmt.Printf("Unblock %s ok, id=%s, name=%s, #%d\n", u, user.IdStr, user.Name, i)
		}
	}
}

The old block list had lots of people on it worth blocking, but also lots of people accidentally swept up in the huge pile of blocks. The plan for the new list is to create two sets of twitter ids, “okay” and “vile” and use those to obtain a smaller and more accurate block list.

Step one is to create the okay list; anyone that I follow is okay, anyone that those people follow is okay. Maybe I take that one iteration further; because these queries are rate-limited there’s a limit to how quickly I can form these sets.

Step two is to form the vile list.
That is anyone from the old block list whose name satisfies the following (case-insensitive) pattern:

'kek|deplorable|pepe|maga|gamer.*gate'

I did a quick scan of the people on that list, they’re all terrible. Anyone who would follow such a terrible person for any reason other than “what are the horrible crazy people saying today?” is not someone whose opinions I need to read, and they won’t listen to mine, and arguably by blocking them I will slightly reduce the noise on Twitter. For all I know they’re fake accounts intended to stir up trouble. So all those people get blocked. Assembling that list will take some time; at one probe per minute (which I think will usually get all of one user’s followers) it will take days. One sanity check is to see if anyone on the “ok” list appears to be landing on the “block” list. I think the initial treatment is to not block them, but note the exception for manual correction.

That’s all I feel confident doing right now; I’ll watch for mistakes (both positive and negative) and see if I can create more refined definitions of “ok” and “vile”. “Vile” is actually easy — just look at someone’s profile, if it’s horrible, and if what they tweet is so horrible, that only a horrible person would follow them, then they’re vile. “Okay” is harder because I think it might be much larger and much vaguer; merely being someone I disagree with should not disqualify them from “okay”. The size is also an impediment because of rate-limiting; obviously I need to maintain a cache so I don’t wait to refetch information I already have.

Over time I expect I will discover more “vile” people, and so I need something into which I can just drop a name and have it automatically alert me if it overlaps the “okay” list in a big way, and otherwise just block all their followers. This is pretty much what the Twitter blockchain app does, but that lacks as comprehensive a definition of “okay”, and I lose track of the difference between the “vile” people and those are merely blocked, so I’d like to keep this information myself.

Two other programs that would be nice to write would implement time-limited block and mute; muting, especially, is just to get someone who’s gone on some stupid rant out of your display so you don’t need to consciously ignore them (for example, if some otherwise sane person decides they want to rehash the 2016 Democratic primaries) and they’ll eventually stop ranting and normally they say worthwhile stuff, that’s why you follow them. A time-limited block might be for when an otherwise sane person says something that really pisses you off temporarily.

And no, I’m not creating a bubble, I grew up a mile from a KKK bookstore, grew up with plenty of racists and children of racists, I can read the news anytime I want to see what the Nazis and racists are up to and how President Very-Fine-People has excused their vileness this week. I use Twitter for my purposes, not somebody else’s.

I guess people think I am tedious on this subject (people on Facebook think I am tedious on this subject), but bike share is very safe. By design, it can count the number of trips. We’re pretty good at counting deaths, too. In over 100 million trips there have been two deaths. (Death #1)

On “normal” bicycles, we’d expect to see 20 deaths in that many trips. In cars, we’d expect to see 9 deaths in that many trips. Instead, two. (As a sanity check on those rates, Canadian statistics are similar.)

If driving doesn’t need helmets, then bike share doesn’t need helmets. Don’t be distracted by “look at all the safety devices on cars, look at the safe metal cage” — with all those devices, driving is still 4 times as likely to kill you, and we don’t think it needs helmets (TBI is the cause or co-cause of death in about 40% of car crashes; car crashes are a leading cause of TBI hospitalization and death).

There are several reasons why this might be true.
Bike share bikes come with daytime running lights.
Upright posture appears to put your head at lower risk in a crash.
Bike share tends to require a credit card, and the bikes are sized for adults. Small children don’t have credit cards and thus don’t ride bike share.
Bike share tends to occur in cities and biking is safer in cities (“but wait”, you say, “driving is also safer in cities!” Yes, and that means it is more dangerous than the driving average in suburbs and rural areas, where we still don’t even discuss the need for driving helmets. If that’s not dangerous enough for helmets, then neither is bike share.) People riding bike share are almost never bombing down a mountainside, and almost never engaged in head-down bicycle racing or race training.

If you claim to pay the slightest attention to math, if you claim to take a rational approach to risk, this ought to be the end of the discussion. Does the CDC collect biased statistics? Two separate studies found similar car and bicycle crash-death rates for the US and Canada (with Canada slightly safer, as one might expect if they’d ever driven or biked in both places).

The last time I did this, I had figures through 2011.
Now I have 2012, 2014 and 2105 (2013 seems to be missing).
Now in a Google spreadsheet, so you can look at the numbers directly and poke at the links if you want to see where the numbers came from.

In words — since 2009, each gallon of gasoline or diesel is taxed between 40 and 50 cents too low even if the only purpose of that tax is to pay for road construction and maintenance. Any other taxes (carbon, pollution, noise, congestion, health care) would be on top of that. This also does not include the maintenance or construction that we ought to be doing; this is just what is spent.

Totaled over all the fuel sold, each year since 2009 the annual shortfall totals somewhere between 75 and 100 billion dollars.

This is a (very) rare work-related entry. I mostly work on the compiler for a programming language named “Go”, and one of the problems we face is if and how we should add “generics” to a future version of Go. I can’t possibly summarize easily for a non-technical reader, but the TLDR version is (1) lots of other languages have generics (2) we’re pretty sure they’re useful (3) but they come with associated costs and complexity and we’re not sure they’re worth it. Also, “generics” is not just a single thing, there’s several semantic variants and several ways to implement them (for example, erased versus dictionary-passing versus template-stamping). So our team is collecting example stories of how Go generics would be useful in various situations — the jargon term for this is “use case”. Here’s mine:


The Go compiler uses maps fairly often, but map iteration order changes from one program execution to the next (and there are good reasons for this). It is a requirement that the Go compiler always generate the same output given the same program input, which means that any time the iteration order for a map would affect program output, we can’t directly use it.

The way we usually deal with this is to also build a slice of the elements in the map, where each time a new element is inserted in the map (each time insertion changes its size), that element is also appended to the slice, thus giving us an insertion-ordered map. This makes the code slightly clunkier and also adds a speed bump for new work on the compiler (and especially for new contributors to the compiler), where new code is first expressed plainly in idiomatic Go, but the idiomatic Go lacks a constant order, and then the order is cleaned up in a second pass.

We could also solve our problem with a datastructure handling interface{} elements, but this would add a bit of storage overhead (storing 2-element interfaces instead of 1-element pointers) and a little bit of time overhead, and our users are, for better or worse, remarkably sensitive to the time it takes to compile their programs.

With generics (either templated or dictionary-passing; templated would provide the best performance for us) we could define, say, OrderedMap[[T,U]], that would take care of all the order bookkeepping, be space efficient, and fast. If we could arrange for some “sensible” overloading, the difference between idiomatic Go and the correct Go for compiler implementation would be only the declaration of the data structure itself. I think that the overloading has to be part of the use case; if we have to use different syntax for something that is map-like, and thus could not update a program by updating just the declaration/allocation, that’s not as good.

By sensible overloading, I mean that we could define a generic interface type, say MapLike[[T,U]] with methods Get(T)U, GetOk(T)U,bool, Put(T,U), Delete(T), and Len()int, and any data type that supplied those methods could be manipulated using existing Go map syntax.

Iteration methods are also required, but trickier; a fully general solution would use an iteration state with its own methods, or we could declare that iteration state fits in an integer, which constrains the possible map implementations (though would be a very Go thing to do). The integer version could use three methods on the data structure itself, say, FirstOk(int)int,bool, NextOk(int)int,bool, and GetAt(int)T,U. The you-get-an-integer approach is a little interesting because it allows recursive subdivision as well as iteration, if NextOk works properly given an input that is not itself a valid index.

The iterator-state version could instead use the map method Begin()Iterator[[T,U]] and iterator methods Iterator[[T,U]].Nonempty()bool, Iterator[[T,U]].Next(), Iterator[[T,U]].Key()T, Iterator[[T,U]].Value()U. This doesn’t by itself support recursive subdivision; that would require more methods, perhaps one to split the iterator into a pair of iterators at a point and another to report the length of the iterator itself.

Right Hook Videos

August 5, 2017

I was trying to explain to someone on Facebook that right hooks are a problem, and a problem caused by drivers, not by people riding bicycles. No dice, cyclists are jerks for yelling at drivers when this happens, and jerks for putting their license plates on the internet, thus spake the driver. But it was a lot of work to collect these videos (seriously Youtube, can I have a “search my videos” option?) so here they are:

https://vimeo.com/111294266

https://www.youtube.com/watch?v=Ndvz4ZJRoek

https://www.youtube.com/watch?v=wNbvCgEBrLo

https://www.youtube.com/watch?v=RnHfXaYcQ08

https://www.youtube.com/watch?v=GZII3ImASVQ

https://www.youtube.com/watch?v=PJ2pe3_KClU

https://www.youtube.com/watch?v=LA3EQ4Q6avs

https://www.youtube.com/watch?v=4C6jFk3uVm8

https://vimeo.com/111805880

I’ll just put these articles out here.
Note that we count overdose deaths per 100,000, Europe counts them per million.
Our best state in 2015, Nebraska, had 69 overdose deaths per million, or back of the pack for Europe. Portugal, with decriminalized drugs, had 3 per million.
Here’s the European stats referenced in that article.

Estonia, the European country with the highest overdose rate (127 per million) would sit at 13th among US states, tied with Georgia. 8 states (New Mexico, Massachusetts, Pennsylvania, Rhode Island, Kentucky, Ohio, New Hampshire, and West Virginia) had an overdose death rate about double that in 2015.

And of course, there’s all the violence that comes from pushing drugs into the criminal economy.

Portugal did the experiment, they’re not an especially wealthy nation, and they got great results. We study what they did and copy it exactly; we’ve never had a “War on Drugs” that came anywhere close to their results, and their approach didn’t require jail sentences, corrupted police forces, or no-knock warrants to “preserve evidence”. The humane choice is humane, AND it works.

Twitter algorithms

July 2, 2017

These are my rules for making Twitter more useful.

My goal, on Twitter, is a combination of finding fun and interesting stuff and to expose myself to (certain) other points of view. At work we have training on bias, unconscious and otherwise, and on techniques for reducing it and countering it. One of the instructors mentioned that you can’t just wish unconscious bias away; apparently repeated exposure to normalizing examples is required, but it takes time (this is yet another disturbing/annoying way that our brains resemble neural nets for machine learning; in this light, unconscious bias is just the result of a lifelong biased training set.)

So as a rule, by default, if I see a post from an interesting woman, interesting PoC, interesting LGBTQIA person, I try to be a little more receptive to pushing the follow button. Lately I’ve decided, if it’s someone from another country I don’t necessarily hear from, that ought to count, too.

My subject bias is bikes/transit/housing, tech-especially-security, Boston area, Florida, liberal politics, science, cute animals.

But everywhere you go, especially politics and often science, you find trolls. I can’t even tell if they’re really people, and there’s a lot of them. I won’t learn anything from them, they won’t learn anything from me, it’s annoying to see someone wrong on the internet and not reply, but that’s a total waste of time. I tried blocktogether.org and that worked pretty well once I had imported a couple of lists, but then I heard mention of something called “blockchain”, not the distributed ledger algorithm, but instead a Chrome extension for bulk blocking.

So now, if I’m reading replies to an interesting tweet and I see some especially trolly comment, I visit the troll’s profile, and if it also looks especially trolly, then I select their followers. If I see that several other people I follow also follow the troll, maybe I stop there, I scan a few of the followers to see if they also look slightly troll-aligned (and remember, I’m not sure if these are real people or networks of bots) and if they are, then I click the “Run Block Chain” button and wait. For someone with more than about 10,000 followers, this will eventually error out for some reason, but it does add the ones that it scanned before the error. Twitter block chain is open source so I have a prayer of figuring the bug out if I really cared and fixing it in my copious free time but for now it works well enough and few trolls have that many followers.

Block chain will not block someone you’re already following, but inevitably you’ll pick up someone who you’d follow if you knew about them (@soledadobrien follows 338k accounts, including quite a few trolls). Sooner or later you’ll notice someone you’re following approvingly quote-tweeting someone you’ve blocked (this doesn’t happen that often, but it happens) and when that happens, I look through the block to see who it is, maybe unblock them, maybe follow them (this morning it was @deborahblum).
I’m a little nervous that I’m blocking lots of people I might otherwise follow if I knew about them, but after passing 100k blocked accounts the troll chatter is vastly reduced and that’s a real improvement.

One amusing side-effect is that this method bootstraps itself; once you accumulate a few troll-followers in your block list, you’ll find that any new troll’s followers include quite a few that you’ve already blocked, right now around 50% for me. You can use this to quickly sanity-check whether someone you think might be a troll is likely to be one; if a scan of their followers shows a lot of already-blocked accounts, perhaps the rest are worth blocking as well.

It would be lovely/interesting to do something more nuanced — for example, @deborahblum has 17 followers that I “know”, @soledadobrien has 48 followers I know, that could be a rule for not blocking someone in a followers list. It would be interesting to see how many people on my existing blocklist have more than N “discriminating” (not @soledadobrien) followers that I know, maybe review/unblock/follow some of them. (This smells like a sort of 2-sided pagerank to me.)

Someone might ask “why block, why not mute”? I don’t want to see these people, and I don’t want them to see me. There are other people who are actually harrassed on the internet by networks of trolls; I think this is one way to blunt the effectiveness of those networks.

I use the mute button when someone that I’m following goes off on some tedious unrelated tear and I just don’t want to hear about it for a while. It would be nice if muting had a built-in time limit.