Home

New Twitter Algorithms

November 5, 2017

My Twitter block list got unmanageably large, and blocktogether.org was not even able to remove blocks at any sort of a reasonable rate to help me fix it. So, I used my employer’s mighty-fine search engine to look for any Go packages for the Twitter API, and found Anaconda.

I had to spend a little time checking back at the Twitter API pages, and reading the source code, but pretty quickly (spare time in one afternoon) I had a program put together to remove my old block list, printing it as it goes. I’m going to include two programs here in case anyone else wants a leg up to do something of their own, because I can only spend so much time on this. Maybe I should toss these back to the Anaconda author as examples.

This program gets your block list. If your list is long, it takes a while because of throttling:

package main

import (
	"fmt"
	"net/url"
	"github.com/ChimeraCoder/anaconda"
)

func main() {
	// Next three lines use secret strings from Twitter developer API.
	// Go there, follow your nose.  See in particular:
	// https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens
	anaconda.SetConsumerKey("your_consumer_key")
	anaconda.SetConsumerSecret("your_consumer_secret")
	api := anaconda.NewTwitterApi("your_access_token", "your_access_token_secret")
	fmt.Println(*api.Credentials)

	v := url.Values{}

	cursor := "-1" // Initial cursor value
	for cursor != "0" {
		v.Set("cursor", cursor)
		v.Set("count", "5000") // 200 might be a better number
		result, err := api.GetBlocksList(v)

		if err != nil {
			fmt.Println("Err = ", err)
			return
		}
		fmt.Printf("#Users=%d\n", len(result.Users))
		for _, user := range result.Users {
			fmt.Printf("id=%s, name=%s\n", user.IdStr, user.Name)
		}
		cursor = result.Next_cursor_str
	}
}

This program undoes a block list supplied on standard input, printing it as it goes. I had previously downloaded mine from blocktogether using a shell script someone provided as a workaround on the relevant blocktogether bug. Again, throttling will slow you down. I started it running last night, it’s still running today (just had breakfast, started writing this):

package main

import (
	"bytes"
	"fmt"
	"io/ioutil"
	"net/url"
	"os"
	"strings"
	"github.com/ChimeraCoder/anaconda"
)

func main() {
	// Next three lines use secret strings obtained from Twitter developer API.
	// Go there, follow your nose.  See in particular:
	// https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens
	anaconda.SetConsumerKey("your_consumer_key")
	anaconda.SetConsumerSecret("your_consumer_secret")
	api := anaconda.NewTwitterApi("your_access_token", "your_access_token_secret")
	fmt.Println(*api.Credentials)

	b, err := ioutil.ReadAll(os.Stdin)
	if err != nil {
		fmt.Printf("err=%v\n", err)
		return
	}
	lb := bytes.Split(b, []byte{'\n'})
	fmt.Printf("Number of lines = %d\n", len(lb))

	for i, bb := range lb {
		v := url.Values{}
		u := strings.TrimSpace(string(bb))
		if u == "" {
			continue
		}
		v.Set("user_id", u)
		user, err := api.Unblock(v)
		if err != nil {
			fmt.Printf("Unblock %s, err=%v\n", u, err)
		} else {
			fmt.Printf("Unblock %s ok, id=%s, name=%s, #%d\n", u, user.IdStr, user.Name, i)
		}
	}
}

The old block list had lots of people on it worth blocking, but also lots of people accidentally swept up in the huge pile of blocks. The plan for the new list is to create two sets of twitter ids, “okay” and “vile” and use those to obtain a smaller and more accurate block list.

Step one is to create the okay list; anyone that I follow is okay, anyone that those people follow is okay. Maybe I take that one iteration further; because these queries are rate-limited there’s a limit to how quickly I can form these sets.

Step two is to form the vile list.
That is anyone from the old block list whose name satisfies the following (case-insensitive) pattern:

'kek|deplorable|pepe|maga|gamer.*gate'

I did a quick scan of the people on that list, they’re all terrible. Anyone who would follow such a terrible person for any reason other than “what are the horrible crazy people saying today?” is not someone whose opinions I need to read, and they won’t listen to mine, and arguably by blocking them I will slightly reduce the noise on Twitter. For all I know they’re fake accounts intended to stir up trouble. So all those people get blocked. Assembling that list will take some time; at one probe per minute (which I think will usually get all of one user’s followers) it will take days. One sanity check is to see if anyone on the “ok” list appears to be landing on the “block” list. I think the initial treatment is to not block them, but note the exception for manual correction.

That’s all I feel confident doing right now; I’ll watch for mistakes (both positive and negative) and see if I can create more refined definitions of “ok” and “vile”. “Vile” is actually easy — just look at someone’s profile, if it’s horrible, and if what they tweet is so horrible, that only a horrible person would follow them, then they’re vile. “Okay” is harder because I think it might be much larger and much vaguer; merely being someone I disagree with should not disqualify them from “okay”. The size is also an impediment because of rate-limiting; obviously I need to maintain a cache so I don’t wait to refetch information I already have.

Over time I expect I will discover more “vile” people, and so I need something into which I can just drop a name and have it automatically alert me if it overlaps the “okay” list in a big way, and otherwise just block all their followers. This is pretty much what the Twitter blockchain app does, but that lacks as comprehensive a definition of “okay”, and I lose track of the difference between the “vile” people and those are merely blocked, so I’d like to keep this information myself.

Two other programs that would be nice to write would implement time-limited block and mute; muting, especially, is just to get someone who’s gone on some stupid rant out of your display so you don’t need to consciously ignore them (for example, if some otherwise sane person decides they want to rehash the 2016 Democratic primaries) and they’ll eventually stop ranting and normally they say worthwhile stuff, that’s why you follow them. A time-limited block might be for when an otherwise sane person says something that really pisses you off temporarily.

And no, I’m not creating a bubble, I grew up a mile from a KKK bookstore, grew up with plenty of racists and children of racists, I can read the news anytime I want to see what the Nazis and racists are up to and how President Very-Fine-People has excused their vileness this week. I use Twitter for my purposes, not somebody else’s.

The last time I did this, I had figures through 2011.
Now I have 2012, 2014 and 2105 (2013 seems to be missing).
Now in a Google spreadsheet, so you can look at the numbers directly and poke at the links if you want to see where the numbers came from.

In words — since 2009, each gallon of gasoline or diesel is taxed between 40 and 50 cents too low even if the only purpose of that tax is to pay for road construction and maintenance. Any other taxes (carbon, pollution, noise, congestion, health care) would be on top of that. This also does not include the maintenance or construction that we ought to be doing; this is just what is spent.

Totaled over all the fuel sold, each year since 2009 the annual shortfall totals somewhere between 75 and 100 billion dollars.

Twitter algorithms

July 2, 2017

These are my rules for making Twitter more useful.

My goal, on Twitter, is a combination of finding fun and interesting stuff and to expose myself to (certain) other points of view. At work we have training on bias, unconscious and otherwise, and on techniques for reducing it and countering it. One of the instructors mentioned that you can’t just wish unconscious bias away; apparently repeated exposure to normalizing examples is required, but it takes time (this is yet another disturbing/annoying way that our brains resemble neural nets for machine learning; in this light, unconscious bias is just the result of a lifelong biased training set.)

So as a rule, by default, if I see a post from an interesting woman, interesting PoC, interesting LGBTQIA person, I try to be a little more receptive to pushing the follow button. Lately I’ve decided, if it’s someone from another country I don’t necessarily hear from, that ought to count, too.

My subject bias is bikes/transit/housing, tech-especially-security, Boston area, Florida, liberal politics, science, cute animals.

But everywhere you go, especially politics and often science, you find trolls. I can’t even tell if they’re really people, and there’s a lot of them. I won’t learn anything from them, they won’t learn anything from me, it’s annoying to see someone wrong on the internet and not reply, but that’s a total waste of time. I tried blocktogether.org and that worked pretty well once I had imported a couple of lists, but then I heard mention of something called “blockchain”, not the distributed ledger algorithm, but instead a Chrome extension for bulk blocking.

So now, if I’m reading replies to an interesting tweet and I see some especially trolly comment, I visit the troll’s profile, and if it also looks especially trolly, then I select their followers. If I see that several other people I follow also follow the troll, maybe I stop there, I scan a few of the followers to see if they also look slightly troll-aligned (and remember, I’m not sure if these are real people or networks of bots) and if they are, then I click the “Run Block Chain” button and wait. For someone with more than about 10,000 followers, this will eventually error out for some reason, but it does add the ones that it scanned before the error. Twitter block chain is open source so I have a prayer of figuring the bug out if I really cared and fixing it in my copious free time but for now it works well enough and few trolls have that many followers.

Block chain will not block someone you’re already following, but inevitably you’ll pick up someone who you’d follow if you knew about them (@soledadobrien follows 338k accounts, including quite a few trolls). Sooner or later you’ll notice someone you’re following approvingly quote-tweeting someone you’ve blocked (this doesn’t happen that often, but it happens) and when that happens, I look through the block to see who it is, maybe unblock them, maybe follow them (this morning it was @deborahblum).
I’m a little nervous that I’m blocking lots of people I might otherwise follow if I knew about them, but after passing 100k blocked accounts the troll chatter is vastly reduced and that’s a real improvement.

One amusing side-effect is that this method bootstraps itself; once you accumulate a few troll-followers in your block list, you’ll find that any new troll’s followers include quite a few that you’ve already blocked, right now around 50% for me. You can use this to quickly sanity-check whether someone you think might be a troll is likely to be one; if a scan of their followers shows a lot of already-blocked accounts, perhaps the rest are worth blocking as well.

It would be lovely/interesting to do something more nuanced — for example, @deborahblum has 17 followers that I “know”, @soledadobrien has 48 followers I know, that could be a rule for not blocking someone in a followers list. It would be interesting to see how many people on my existing blocklist have more than N “discriminating” (not @soledadobrien) followers that I know, maybe review/unblock/follow some of them. (This smells like a sort of 2-sided pagerank to me.)

Someone might ask “why block, why not mute”? I don’t want to see these people, and I don’t want them to see me. There are other people who are actually harrassed on the internet by networks of trolls; I think this is one way to blunt the effectiveness of those networks.

I use the mute button when someone that I’m following goes off on some tedious unrelated tear and I just don’t want to hear about it for a while. It would be nice if muting had a built-in time limit.

It’s a little depressing to look at how many hard it is to get all the different factions of the Democratic Party excited about helping each other. I wonder a bit if this is a case of scarcity pushing people towards fighting over scraps, and I wonder how much this is a case of Russians/Republicans using the internet to sow left-wing dissent.

At minimum, people ought to accept that each others’ problems are worthy. Is there really any question that blacks get a raw deal in this country? Or that people who are openly gay or trans are discriminated against? Or that women don’t get promotions and pay commensurate with their skills, productivity, etc? Or that unions are necessary in order to give workers an equal footing in negotiations over pay, hours, benefits, and worker safety? Or that many forms of pollution lead to statistically early death? Lack of an adequate social safety net is clearly a problem, and clearly one that can be solved, because countries that are less wealthy do a better job than we do — notably, they deliver life expectancy and lower infant mortality for less money per capita. They can afford it, so can we. Climate change? It’s happening. Slowly, but steadily, and it’s going to continue for decades-to-centuries after we finally decide to take it seriously; the only question is how fast it’s changing when enough of us finally get alarmed enough to really act. Education? College is stupidly, fantastically expensive, and to the extent this is Baumol’s Cost Disease, we should just subsidize it (other poorer countries manage to do this) and to the extent that it isn’t we should drive prices down by properly supporting public universities. Etc. These are all problems, and the Republican Party is on the wrong side of all of these issues. We shouldn’t pick just one, we should not be put off because we think labor is important but we’re a little nervous about the gays, or focus only on racism to the exclusion of college costs — there’s nothing wrong with wanting it all, we can have it all, and all of us deserve to have these problems addressed. There’s no mutual incompatibility between any of these issues.

And be a little more skeptical, say, when someone on Fox News tries to tell you that anyone who’s LGBTQ is a threat to the womean and children. We’ve done plenty to make life unpleasant for people who aren’t “normal”; if someone’s out of the closet and you notice them, they must feel very strongly about it, and must have been truly miserable in the closet. This has nothing to do with your children, and everything with them wanting to live happier lives. Anyone who tells you otherwise is trying to con you into being mean to other people for no reason at all; ignore them, they’re evil.

Or, similarly, that someone might trot out some bogus statistics to try to make white people nervous about “black crime”. Some of these stats are flat lies, in other cases the data has been tortured into confessing things that aren’t true. In practice, most people are non-violent, most people are law-abiding (well, except for traffic laws, which everyone breaks very often, and traffic violence is actually a big deal). Don’t take the bait, anyone trying to convince you that blacks are a Big Crime Risk is just plain evil, ignore them, change the channel, turn off the radio. They’re trying to turn you into a racist and create dissent on the left.

There are bullshit artists trying to sow doubt about health care, too. One dishonest clown keeps trying to claim that Medicaid is worse than no health care at all, because people on Medicaid (as a population) are sicker than people who aren’t, never mind that if you’re poor and sick you’re much more motivated to sign up for Medicaid than if you’re merely poor, in which case that might seem like more of a hassle than it’s worth. This is what passes for serious statistical analysis on the right; these guys are sad, lying clowns, don’t let their obvious bullshit make you doubt the worth of providing health care.

And so on. There’s probably better examples but I’m a cis het white guy 1%er descended (father’s side) from a family with strong ties to Dartmouth, clearly I’m a traitor to my gender, race, ancestors, etc, it’s a wonder I get any of this right. The main theme is to not let one left-wing cause be split from another, and anytime you catch someone trying to do that, think about why. I honestly wonder how many of the alleged “hard-core Bernie-bros” that get noticed on the internet now are actually left wing or even American; disinformation is a real thing, and sowing dissent is a standard tactic. I supported Bernie, I sent him money, I like (or liked) his politics. But when he didn’t win the nomination, we’re done, support the nominee, got to stay focused on outcomes. I have several friends who did the same. Ask yourself *why* someone on the left would now be interested in prolonging the primary contest after we lost the general election. It makes no sense; the Republicans are uniformly terrible for everything Bernie Sanders has supported over the years, the Democrats are uniformly better, and we tried plenty hard in the primaries and Bernie didn’t make the cut. If we don’t unite, all of us, we lose ground.

Been meaning to write something, always too distracted to “do a good job”, as if getting nothing written was a good job. So….

Just now read a Copenhagenize article on bikes and trains saying something I had believed, but had no data to support. They have data. They also point out by example yet another way we do bikes wrong here in the US. Read the rest of this entry »

Charitable plans

November 12, 2016

SPLC, NAACP, CAIR and/or ICNA, Planned Parenthood, Lambda Legal, Trans Lifeline.

and ACLU, EFF, National Popular Vote.

Any other suggestions? I think I’m a little light on defending rights of immigrants.

Oops — As JF points out in email, ADL.

I’d also like to fund organizations doing voter registration work, especially in swing states, especially in states where Republicans narrowly control legislatures and/or executive. We need to reduce the amount of gerrymandering in this country, we need representation in the House of Representatives that more nearly reflects the popular sentiment, and we need to ensure that we are well safe from crazy constitutional amendments (a constitutionally mandated balanced budget would be a macroeconomic disaster; recessions would turn into depressions).

I realize I am setting myself up for a deluge of please-help-our-worthy-cause solicitians, both electronic and paper. We get those already, plan is to set up a spreadsheet, and just give once a year, every year.

I was just in Mountain View for most of a week on business, biking to and from work and to work dinners in the evening. The roads are much smoother than here near Boston, the weather was warmer, it did rain once, but wimpily, and it’s flat as a board in Silicon Valley. Biking there ought to be great.

Links go to short YouTube videos illustrating claims/points

However, they blow it. If you need to cover any particular distance, it’s easy to find yourself with no choice but a four-lane road with a door zone bike lane that waxes and wanes with the whim of whoever laid out the road, and parking is prioritized enough that you often find yourself squeezed towards traffic.

One shared use path is designed with the apparent assumption that bicycles are OMFG deadly dangerous to pedestrians, so it’s considered appropriate to encourage lower speeds by installing barriers that make high speeds deadly, and that also makes larger bikes (bakfiets, trailers) difficult to pass through, and that guarantee conflicts whenever people are traveling in opposite directions or if there’s a pedestrian and a bike traveling in the same direction. Imagine, for cars, that a crosswalk was made safe not just by installing a narrowing bumpout in each lane, but by narrowing the road to a single lane for both directions.

Note that this is on a straight path where everything is completely visible, so all that’s really needed in most cases is a “slow for pedestrians” sign. Not all people will go as slow as they should, but not all people will negotiate those gates without injury or conflict, either. Later on, a blind intersection with plenty of cross traffic on the Google Campus goes completely unremarked, and several curves past that are gratuitously blind, either because of untrimmed vegetation, or because bicycles were routed between two chain link fences, and for no particular reason one side (the one that matters) is intentionally made opaque by slatting installed in the fence so that it’s impossible to see oncoming bicycle or pedestrian traffic on the fence-narrowed path.

Incomprehensibly, an underpass with over 7 feet of clearance (I reached a hand up to measure as I passed under, so that’s an estimate – apparently they couldn’t be tasked with actual measurement, but I ride quite tall and cleared easily) was declared to be dangerously low, and thus we’re told to walk our bikes there, as if.

Actual road crossings are designed with zero thought to the convenience of cyclists. At one there’s a gate to force a U-turn to enter it, then a beg button that imposes an interminable wait despite large gaps in motor traffic (I didn’t wait). A cyclist obeying traffic laws to the letter could not ride back that same way – the returning lane slips onto San Antonio, and returning on the sidewalk instead one is greeted with a WRONG WAY sign specific to bicycles (and the sidewalk is clearly intended for bicycles, else the sign would read “no bike riding”). It’s not much wonder that I just wing it.

At another crossing on the Permanente Creek trail, cyclists are vaguely directed to enter traffic and then make a u-turn at the light, as if that is preferable to looking for a gap (which we’d need to look for anyway, to enter traffic to make that u-turn) and just crossing on foot. There’s a sidewalk, but it’s twisty and too narrow for two-way traffic. Crossing on foot is necessary because there’s a big-ass curb in the middle of the road. The same can be seen on parts of Middlefield, where children crossing to/from school have worn goat paths in the median strip, far from any crosswalk. (Video is not great; there were kids, they were waiting to cross, and the median is cut by little footpaths.)

At a larger level, multilane Alma/Central and the RR tracks make a nasty barrier to traveling (peninsula-compass) east-west in Mountain View. Crossings are not well signed, Google Maps doesn’t seem to know about them, the entry is tight, the mirrors at the bottom make it clear the bicycles are known/expected to be there, but the ramps are quite narrow, guaranteeing conflict if there’s 2-way traffic or pedestrians.

This is all doubly annoying because it could be so nice. Remember, flat topography and a mild climate. If there were good, comfortable, safe routes that led anywhere interesting, lots of people could and almost certainly would use them. But right now, Mountain View is failing both in the small (annoying and insulting inattention to details of intersections and safety) and in the large (arteries are for cars – wide, fast, and with varying-width door-zone bike lanes, sometimes very fast).

And yeah, I know, “reasons”. Y’all ought to look at yourselves, a 10-lane highway jammed up every morning, even with thousands of employees delivered by buses instead of single-occupancy vehicles. I rode a bike to dinner after work and beat the people driving. Here’s two free clues as to why Mountain View ought to install a ton of really nice bicycle infrastructure. #1, no matter what you do about traffic, more cars will always arrive to fill the voids that you create, and with high tech salaries I’m not sure even congestion charges would do the job. #2, if you install really nice bicycle infrastructure, if you need to get around your own town, you won’t care about that traffic, and because the land is so flat and the climate so mild, that’ll be true all year. You might want to knock out a few parking spaces and replace them with bike corrals to make this really be true, but I managed to find bicycle parking a lot closer to the restaurant than anyone driving there.