Rich Lafferty's LiveJournal (mendel) wrote,
Rich Lafferty's LiveJournal

  • Mood:
  • Music:

How to Lie with Statistics

I really try to avoid complaining about work here, because I've got a job I love working with interesting and qualified people and an interesting product, and there's little that's worth complaining about. But today's story is just surreal enough to share.

So our project manager stops me and asks me to pull statistics out of the ticket-tracking system we've thrown together. The system is an 80/20 solution that got about 30% done, so it's pretty bare-bones -- it lets support follow their tickets, and that's about it. Aside from queue, subject and priority, it lets people record what they need -- there isn't even anything to enforce usernames, so one guy's "pkn", another's "Chris", and so on.

I point out that there isn't very much we can pull out of that. He settles for number of tickets since 5.1.2 shipped, and of those the number that were spam/duplicate/garbage, the number that were about versions prior to 5.1.2, the number that were about 5.1.2, and the number that were about the Mitel Networks 6000, our hardware bundle.

The first bits are easy, but the last bit isn't.

I point out that we don't record version anywhere unless someone happens to write it down, and that even then, I can't programmatically tell the difference between someone asking a customer "Are you using 5.1.2?" and a customer writing "I'm using 5.1.2". So he asks me to estimate. I point out that the number will be entirely unreliable, that I'll essentially have to pull it out of the air, and that's OK, someone somewhere needs this number.

Now, my academic background is in the social sciences, and generating statistics out of thin air isn't particularly ethically sound, and I point this out to him. He's sort of wavering, so I ask him to run it past my manager, which he has to do anyhow if he wants some of my time. Manager says exactly what I did -- that we don't record that, and that it'll be a fabricated number, but if you want a fabricated number, sure.

Still bugging me.

So I tell the project manager that I'll give him his number, but first I want him to send me email noting the requirements and acknowledging that it is impossible to determine the version statistics he wants, and that he wants me to estimate it. I don't want to get into a situation where someone comes back and says "This number is wrong, how did you come up with it" and have to answer "Well, see, I had these dice" and then get blamed for fabricating the numbers.

He does. "I fully realize that the percentages that you are going to provide are pure guesses", he writes. And in my response, I write "Based on an unreliably-distributed sample..." and "one might erroneously conclude a distribution like...". And this generates no complaints. If things go pear-shaped and someone starts wondering about the numbers, there's an audit trail of why they were made up. They're probably reasonably accurate guesses, but they could also be absolutely wrong, and it's not my problem.

This is a pretty good example of why I like my job.


  • New Year's resolution

    I'm going to post this on my zen blog later this week, but right now I want to post it somewhere and I'm too tired to compose a post over…

  • how's this work again

    So uh hi there everyone, long time no see? So I've got this theory where I think it'd be good for me to just write about stuff that's going on here,…

  • o hai lj.

    I should totally start using this again.

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded