Testing Android Software

Dear frantic and desperate programmers: if you are looking for an example of a populated android activity test file, go here.

Dear everyone else: you may wonder why that was necessary.  Let me explain.  There are five billion tutorials on how to build android apps and two on how to test them.  Both are full lessons requiring a lot of knowledge of android to make sense of them and are based on Eclipse (deprecated) and not Android Studio, neither provides a decent sample file for steal-and-mutate learning.  For those of us who want to write tests because we are learning to program for android and are worried a new technique might break our lovingly created app in ways we don’t know enough to fix, this is frustrating.

[For the slightly more advanced among you: yes, you could try searching github for the name of the class you know must be involved in your testing.  Unfortunately the push for testing means that 95% of what you find are stubs with no actual tests.]

To be fair, I’m pretty weird in wanting to write tests so early.  I complained to a stranger at a party that work on Hunger Tracker had stopped because I couldn’t find an android testing tutorial and he expressed surprised, because he’d watched people release much more complicated apps without a single automated test.  I know that works for some people, but it would drive me absolutely nuts.

The point of testing isn’t to make the product work.  The product is going to work, or you’ve failed.  The point of testing is to make your work on the project faster and less stressful.  Every time you change anything, and sometimes even when you’re sure you’ve changed nothing, you risk breaking something. The more time or changes that pass between you breaking something and you noticing it is broken, the more effort it takes to fix.  You could manually test every time you change every little thing, but it’s boring, time consuming and prone to error.  Computer science selects for the opposite of people who are good at doing the same thing over and over again.

So you do the programmer thing and write a program to do it for you.  It’s just that the program is tests of the software.  You can run this new program as often as paranoia or check in rules compel you.  This frees up your time from manual testing, your mind from tracking your changes and worrying what might possibly have broken, and decreases the time to notice breaks. The mere act of writing the tests often encourages good code structure by making you think about it rigorously, the same way writing a program makes you define your problem until you practically don’t need a computer to do it.  And in my particular case, when the project is as much about learning as it is producing anything, it’s good practice.

Good testing is especially critical in my case, because I’m working on this project in little bits at a time.  If I break something and don’t notice it until two CLs later, I might as well be debugging someone else’s code.  I know what people about the devs whose code they are forced to debug without tests or documentation, because I’ve said it, loudly, and at length.  I may have expressed a wish for weapons.  I would hate for someone to think that about me, especially myself, and testing is the cheapest way out of it.

Hunger Tracker 0.1

When I started this blog I intended to leave programming for medicine fairly soon.  After a long medical leave that let me recover from burnout, I realized that programming is actually a really valuable skill and before I throw it away and spend several years in school, I should see how far it can take me towards my goals.  My goals are unchanged (mostly around nutrition and mental health, but with some bonus input from Effective Altruism), but maybe there’s a better way I can contribute.

Eventually this means founding or joining a company working on something I care about, but I’m still not capable of consistently working sufficient hours to work with other people.  But I can play on my own projects with no coordination costs, so I started learning to program for android.  Meanwhile on the health end, my nutritionist informs me there are states of hunger between “unpleasantly bloated” and “would shank an infant for juice” that are useful to experience, and that the first step to achieving that is tracking my hunger consciously.  She meant with pen and paper, but I am an engineer, so I wanted something on my magic pocket computer to do it for me.  I’m sure something exists on android that would do this, but I couldn’t find it, so I did the logical thing and started developing my own.

Thus was born the Hunger Tracker app.  Here is my dream feature list for Hunger Tracker:

  • Alarm goes off at pre-set or random times (your choice), to shut it off you enter a number between 0 and 9, representing your fullness level.
  • Timestamp + fullness level data is dumped…somewhere, I don’t know.  A google docs spreadsheet would work for me but there’s probably other services I should integrate with.
  • UI not actively offensive

Here’s what version 0.1 does

  • No alarm, user must manually call up app.
  • User enters number in ugly ass UI.  Actually user can enter any arbitrary string, but don’t, it will break the retrieval.
  • User can retrieve first 10 numbers entered.  Any further entries are skipped, if there are too few entries the last one is repeated.
  • It’s a debug build rather than a release build because Android Studio won’t produce a working release build and fixing that is not at the top of my priority list.

If you are interested, you can download version 0.1 here and install it the usual way for non-market apps.  If you are spectacularly interested, you can check out the source code at github.  Comments are extremely welcome

My next step is not actually features, but testing, which I will explain in the next entry.

Male Birth Control

I complain about hormones being the go to mechanism for women’s birth control, but that isn’t half as dumb as hormonal male birth control.  Women at least do spend a majority of their time not able to conceive.  In fact, the number of ovulations a modern first world woman goes through is way higher than they would have in the ancestral environment.*  But men have to be seriously ill or starving to stop producing sperm (temporarily.  Obviously there’s some good options if you want to permanently lose the capacity).  Nonetheless, at least until a few years ago a lot of research into male birth control focused on hormones.

For years India has had what looks like a panacea of male birth control: cheap and easy to administer, completely reversible (although not quite as cheaply or easily as it is administered) , and no side effects.  RISUG works by injecting a polymer into the vans deferens (the tubes that funnel sperm out of the testes).  Sperm can still travel through the tube (which is important because completely blocked deferens tend to cause medical problems and become spontaneously unblocked), but their cell membranes are disrupted, rendering them incompetent to bond with the egg.  You could have a sperm count of 200 million/mL and still be completely infertile.

I’ve heard many people say that male birth control is dead on arrival, no man will want to take it and no woman will trust a man to.  These people are not talking to my male friends, every one of whom would give a kidney to get this technology.  My female friends’ tend to think that by the time they’re not using condoms with someone, they trust them to accurately report their birth control, and they will be just fine not perpetually feeling half pregnant, thank you very much.  The exception would be the women who can tolerate hormonal IUDs, whose attitude is more “do what you want I’m still going to enjoy not bleeding every month kthnxbai.”

It does slightly concern me that they’re not sure how RISUG actually works.  How did they discover it then?  Did they just keep injecting different plastics into monkey testicles until they found something that worked?  But no one is really sure how the copper IUD works either, and that doesn’t seem to bother anyone despite it having obvious painful side effects (compared to RISUG’s apparent zero side effects).

RISUG is being developed for the US under the brand name Vasalgel.  Interestingly, it’s not a big pharma company that’s doing it, but a non-profit dedicated to getting neglected treatments into practice.  This is pretty much the best possible solution under the current circumstances (where bringing treatments to market is very expensive).  They are currently taking donations, and I am seriously tempted to give them some.

*This is also true for menstruation.  The Pill was originally designed to give women false periods, and all the pain and health implications, to appease the Catholic church.  It didn’t work, and it took 35 years for someone to notice and suggest not doing that.

Alternate Birth Control

Hormonal birth control was a world changing invention that gave women a lot of new freedom and power.  I am so, so glad it was invented.  I am less glad that we as a society somehow got stuck on hormones as the ultimate way to prevent babies.  Hormones are weird and unpredictable and involved in every part of metabolism.  It would be extremely weird if they didn’t have side effects beyond preventing ovulation.  And yet the past 40 years of contraceptive history are mostly people refining hormones, rather than inventing something more awesome.

If you want permanent birth control there are some very good options, but temporary and reversible options are pretty scarce. There’s condoms, which are amazing at some things and cause inconveniences way less than those caused by hormones, but more than zero.  There vagina based barrier + spermicide methods, which are more convenient but less effective, and regularly putting anti-microbials in the vagina has its own health costs.  There’s the copper IUD, which is great at preventing pregnancy but causes horrendous cramping in the vast majority of users.

This is how I got interested in fertility tracking.  There are actually only a few days each cycle sex can lead to pregnancy.  If it’s more than 5 days before ovulation or 1 day after, you can put all the sperm you want anywhere in the body, and it’s not going to cause a pregnancy.  The old way to try and find these days was “counting”, but that doesn’t account for inter- and intra- person variation in cycle length and has a subsequently high failure rate.  Luckily (?) there’s a lot of changes in the body that precede and follow ovulation, because ovulation is caused by hormones which are as previously mentioned extremely complicated and involved in a lot of different things.  These include temperature, cervical mucous,  and saliva electrolyte level (seriously).  Tracking all of those still seems like a lot of work, especially since they need to be done first thing on waking up to avoid contaminating the very subtle signals by doing something like moving.  But there are some newer options on the market that ovulation with a lot less work, and those seemed worth investigating.

The first of these is to directly measure hormone changes in the urine to detect imminent ovulation.  Conveniently, there are hormonal changes that start 5-6 days before actual ovulation.  There is no way to do this yourself.

The second option is temperature monitoring.  Your temperature dips right before ovulation, and is slightly higher for the ~two weeks after ovulation than the ~two weeks before.  For years if not decades, people have tracked this themselves using regular thermometers and paper, and it theoretically works, but you have to do it the second you wake up to dodge the wake up temperature increase, and I personally don’t want to stake contraception on my ability to do fine motor work and record three significant digits exactly when I wake up.

There’s a few other scattered options including monitoring cervical mucus, cervical position, and the shape your saliva dries in (no, seriously).  But science x  capitalism have provided me with magic computers to track the first two things, and not the any of these, so I’m going to ignore them.  Science has patented a toothbrush that will look at your spit for you (no, seriously), but capitalism has yet to put it into production.  There are also tools that could be combined with graph paper to turn them from fertility devices to contraception devices, but I’m going to focus on the options that require the least thought possible.

Note that none of these devices are available in the US for the purposes of contraception.  You can buy hormone tests to predict ovulation for the purposes of creating babies, but not preventing them.  Purely for educational value I’ve looked up the prices on amazon.co.uk, but you should definitely not circumvent the wise and benevolent FDA by ordering from them.

Persona tracks hormones alone.  It costs  52 pounds (~$76) for a starter kit and an additional 13 pounds (~$20) per month.  It requires peeing on a stick 16 mornings out of your first month and 8 on subsequent months, and you have a good six hour window in which to do it.  You feed the sticks into the magic computer machine and it tells you when you are close enough to ovulation that you should abstain from unprotected sex.  It reports 94% success rate.

Lady Comp  tracks temperature alone.  It costs 415 pounds and claims 99.3% rate.  That’s higher than a hormonal IUD, which does everything short of going back in time and murdering your partner’s mother to prevent pregnancy, and it seems unlikely that that could be matched by something that could be confused by me using my mattress heating pad.*

Cyclotest is tracks temperature and optionally several other indicators.  It costs 150 pounds (~$220 dollars).  You need to monitor your temperature for approximately half your cycle.  It allows you to enter your cervical mucus viscosity and cervical position to provide additional data but declines to teach you how to do so.   Its display is nuanced enough to be used both for contraception and trying to conceive, and if you’re trying to conceive you can supplement the temperature data with hormone tests (~15 pounds/month).  It too reports a 94-99% success rate, which makes me think that is either the hard limit of predicting ovulation (maybe failures come from super long lived gametes rather than bad timing), or that Persona and Cyclotest are using the same data to support their different methods.

Of these three, Cyclotest was my favorite before I even looked at the data, because Cyclotest’s website explains both the science of how it works and the mechanics of using their system, whereas Persona organizes it very counterintuitively and Lady Comp just repeats Natural and No Side Effects until your ears bleed.  I think I went to amazon to find out what it actually did.

I managed to find the study Lady-Comp must have gotten their 99.3% success rate from.  To be fair, that is what it says (full text PDF).  And it’s the real world success rate, not the theoretical rate.  But it was a retrospective study, meaning they just reached out to people who have previously purchased the device and asked them if they had gotten pregnant.  A number of people were unreachable, and not all that were reachable returned the survey.  They don’t even report the full number of surveys they sent out, which makes the data completely unusuable.  The paper also indicates Lady-Comp requires significantly more data than temperature alone, which would sure be a good thing to indicate on the website.

Persona has a publications section on their webpage, but their contraception page only has articles on awareness, not efficacy, so really this is just one more reason to hate the word awareness.  I was unable to find the source of their quoted 94% success rate, but I did find a paper criticizing it.

Cyclotest’s quoted 1-6% failure rate is as good or better than condoms, which surprises me.  This paper doesn’t give an absolute failure rate but does say it misidentified fertile periods as non-fertile a vanishingly small percent of the time, which would be more impressive if that were based on hormones or actual conception rates rather than simple calendar checks..  This paper puts the actual failure rate of temperature only methods at up to 3% per month, which is way too high, but not necessarily worse than actual use rates of other methods.

This paper puts the correct use pregnancy rate of an unknown computer aided system at 2%/year and the actual use rate at 12%/year.  It’s always hard to parse the real world failure rates.  Take condoms.  I don’t think someone simply forgoing condoms half the time counts as a strike against condoms.  But “correct” use of condoms is actually a lot of work, and at a certain point I feel like condom manufacturers are just trying to pass the buck.   Using lube to reduce friction to prevent breakage is a reasonable action you either do or don’t, but no one has ever defined for me exactly how much  “vigorous or prolonged thrusting” necessitates changing a condom.  And it’s just bad design to have something break exactly when people are least likely to want to check or fix it.

I was really hoping for better than this.  Another beautiful theory, killed by an ugly gang of facts

*Which is, side note, the best thing ever and you should totally get one.

Pain, ADHD, and happiness

I jokingly referred to pain-induced ADD on Monday, but I’m becoming more and more convinced that is actually what was happening.  After prior surgeries I was too exhausted to notice anything, but this time I was energetic enough to experience the pain.  I mean, unless I tried to go outside or something.  That led to a really entertaining systems crash in the supermarket.  But if I stayed inside I was able to do things like get food and put away dishes without strain.  Contrast with when my pain meds sabotaged my cortisol production.  Intellectually I was there and able to do things like read and blog, but physically it was a struggle to make myself a smoothie.

After surgery I could not read or write or even enjoy a movie.  It was more than pain making everything 70% less fun, it was that everything was annoying and frustrating and no fun at all.  I couldn’t enter a state of flow or concentration or even relaxing for any length of time.  Except when I played video games or the piano.  Neither were fun, exactly, and I was still in pain, but they were at least distracting and rewarding.  Looking back, this explains a lot of my behavior when I was in constant pain last year, it just took being out of pain and then very sharply in a lot of pain to make the pattern obvious.

At first I thought this was  Harrison Bergeron type thing, where pain was sending out interrupts too often for me to get into a groove on anything.  But then I read this blog post (blogs were just about in my power) by Sara Constantin on dopamine, explaining Peter Redgrave’s hypothesis that the spike (phasic increase) of dopamine is not itself a reward (which is how pop journalism usually describes it) but a timestamp that lets you know what actions should get credit for the actual reward chemicals you are about to receive.  That would explain why humans and animals with broken dopamine systems do feel pleasure when eating but will nonetheless starve to death unless you put the food directly in their mouth.

Many of the drugs used to treat ADHD inhibit dopamine reuptake, which raises your tonic (baseline) dopamine levels.  Constantin hypothesizes that if the baseline is too low than stimuli that should be ignored suddenly are interpreted as important, leading to a lot of SQUIRREL.

[ I was going to make this a gif but putting unpausable moving pictures in a post on ADHD just seemed cruel]

If this is correct, it offers an explanation for why ADHDers are so drawn to things like videogames and sex:  the time gap between doing the correct thing and getting the chemical reward is so short they can still determine causality, even against the a background of SQUIRRELs.  This needn’t be purely about hedonism- if it was, something consistently pleasant would work.  I think it’s about having an internal locus of control and self-efficancy.  Humans are happiest they feel like they have the power to change their own circumstances and have an impact on the world.  It’s hard to feel those things if your attention is constantly being torn away from what you choose and you can’t (on a neural level) determine what made you feel the emotion you are currently feeling.  This is one reason the toll of ADHD shouldn’t be measured in lost productivity alone; even people with very successful coping mechanisms are being denied that internal locus of control, and that’s miserable.

Here’s my contribution: my description of being in pain sounds a lot like other people’s description of ADHD, right down to video games being rewarding without strictly being fun.  And as it turns out the basal ganglion, the area Redgrave believes is using dopamine to timestamp causes so they can be matched with effects, also releases dopamine in response to pain.  It seems entirely possible to me that high baseline levels of dopamine could diminish the effect of a spike.  Instead of everything being timestamped “good job”, nothing is, with similar results

But let’s make it even more interesting.  Several anti-depressants are also useful in treating chronic pain, and NSAIDS (usually mild pain killers) treat depression.  I had previously put this down to “pain is depressing”, “depression appears to be connected to inflammation in ways we don’t understand” and plain old “brains are squishy and they don’t make sense”, but if there was a causal link?  The symptoms of depression include fatigue, feelings of helplessness and lost of interest or enjoyment of previously liked activities, which sure sounds related.  Quick googling found a very tiny study showing a connection between low dopamine and suicide, and this fascinating study suggesting that inflammation reduced the basal ganglia’s production of dopamine, which would tie all of this up in a very pretty bow.  Something causes pain and/or inflammation (the two often go together), which long term causes inflammation in the basal ganglia, which causes depression and reduces your body’s natural analgesics.

Look body, if you were worried about us getting high off of pain, maybe you could have releases fewer happy chemicals in response to pain, instead of making it just as fun but also cause depression some time later.

This would also explain why ADHD medicines are promising in treating depression (source, source, and a large showing among my friends), and why ADHD and depression so often go together.*

I cannot stress enough how unqualified I am to make this hypothesis.  Lots of people know lots more on all of these things than me.  But it comes together to be an extremely plausible explanation for both the literature I’ve read and my personal experiences.

*There’s a lot of evidence that depressed parents correlate with ADHD kids, but it’s probably environmental.

Simple Screening for Depression?

A  new study by Reid et al claims to demonstrate a biological marker for the presence of depression.  First we have the boring criticisms, like “32 is not a real sample size, “shotgunning 20 RNA markers and noticing which ones were increased in depressed patients and decreased after treatment is painting the target after you shot the gun” and “you’re comparing treatment group re-draws to control group baseline draws” but anyone could make those.  The authors make several of those points themselves.  And there are some statistical criticisms that pretty much invalidate the whole thing.*  What I find interesting is that even if the results are correct, they may not be useful.

If you look at the table comparing the marker rates in depressed and non-depressed patients, there are 9 markers that differ in a statistically significant way. The problem is that they’re still not very far apart.  What you would ideally like to see in a diagnostic test is the following:

Two bell curves with no overlap
Two bell curves with no overlap

because then it easy to translate a test score into a health status.  But the markers in this study are more like

two overlapping bell curves
two overlapping bell curves

Which means that if you know someone is depressed you can generate a pretty good idea of their marker score, but there’s a wide range where knowing their marker score doesn’t give you a good idea if they’re depressed.   That makes it pretty useless for a screening test.

But it’s actually worse than that.  There are many more undepressed people than depressed people, so the curves could look more like

weighted

Under this graph, the sick mean could be four standard deviations out from the health mean, and yet a person with a low marker score is approximately equally likely to be depressed or not.  This is a bayesian reasoning problem and doctors are frighteningly bad at those, but then, they’re worse than chance at frequentist statistics too.

In summary, I’m not hopeful this proves to be a useful screening tool for depression.

*They don’t actually prove that the marker values of cured people converge with those of never-depressed people, they just fail to prove they’re statistically different.  Those are different things.  They also switch between two (equally valid) statistical tests (T-test and Fischer’s) without saying why, which means there is a high probability the answer is “we liked those answers more.”

Humans are complicated, children are even more complicated

[Had more dental surgery this week and am currently suffering from pain-induced ADD.  Expect less research and more wild speculation]

Consider pre-emptive testing for psychiatric or developmental issues in children.  If you’re too aggressive, you end up misdiagnosing a lot of perfectly normal deviations from the exact median as development issues in need of treatment.  Development is complicated, different systems come on line at different rates and in different orders in different kids, and they should be allowed to do that without being corralled into fitting a predetermined schedule .

But if you’re not aggressive enough, the kids develop coping mechanisms that hide the disability, making it harder to diagnose and treat.  Sometimes people treat this as solving the problem (especially for conditions that are often conflated with character flaws, like ADHD or some forms of depression), but they are wrong.  At best lack of treatment holds people back from their true potential, at worst it twists up their internal structure in ways that break at the worst possible time (usually grad school).  It’s a big problem with twice exceptional children, who have both brain-based deficiencies and a lot of raw intelligence, and I suspect for people with atypical presentations of their disabilities.  E.g. girls with ADHD or autism spectrum issues, boys with depression* or trauma from sexual abuse.**

Even perfectly accurate testing won’t fix this, because developmental asynchronies do not necessarily indicate a future problem, and treating them can prevent the issue from fixing itself.  The real issue is distinguishing natural, healthy leveling out from the development of costly compensation mechanisms, and we don’t know how to do that.

*Assuming the comomn adult male pattern of depression being expressed as anger holds true for boys as well.

**I think, couldn’t actually find data on this.