Monday, December 31, 2012

Goodbye 2012

I have been out of town to visit friends and family for all of December and had an opportunity to take some interesting pictures.  At least one side project I've been making progress on will be ready in a few weeks, so I look forward to showing off these results when the time comes!

2013 will be an awesome year:  I start my training at Google in Mountain View on January 7, my e-book will be released sometime later this year, I have some fun trips already planed, and my artistic and scientific side-projects will come to fruition.  For now, I hope you have a safe and wonderful new year, and please enjoy some of my favorite photos from my December travels!



From Gettysburg, PA--the most portal-dense area to play Ingress in that I have ever seen.

Simiao, at a holiday concert in Elgin, IL.
A well-populated Saguaro at Lost Gold Mine Trail near Phoenix, AZ
Longhorn cattle shared the trail with us.
A verdin nest in a teddy bear cholla,  Its resident startled me when it evacuated its home as I walked by!



Thursday, November 29, 2012

The Assassination of the Everyday Superhero

A cairn at Discovery Park,
photo by Simiao.
I was really excited to play Hitman: Absolution as a birthday present to myself, and had pre-ordered it on Steam just as soon as it was available because I loved the previous Hitman games.

In fact, I loved Hitman, Thief, and Deus Ex because the protagonists all had the elements of being an everyday superhero:  a person with some talents that was able to make ethical statements through his actions.  Certainly, a hitman in the traditional sense is a generally poor choice for a protagonist.  It is decidedly unethical to kill for profit.  Agent 47 used to be different.  He killed human traffickers, high-ranking drug kingpins, pedophiles, mafia dons, and violent gangsters while (in my playthroughs, at least) leaving the innocent party guests, security guards, and maintenance personnel just doing a job to make a buck untouched, or at worst sedated and left somewhere out of the way to wake up the next morning.  To say that I find killing of innocents distasteful is an understatement:  I refuse to do it even in an imaginary video game.  My character, in my mind, is taking out the garbage that slipped through the fingers of law enforcement and will inflict only a career change on the underlings who neither profit nor even know about the actions of their superiors.  One of the most satisfying parts of Hitman:  Blood Money was the next day's newspaper describing a suspicious accident, poison or a sniper's bullet that took out the high-profile criminal target while nobody else surrounding him claims to have so much as seen the assassin.

If, like me, this is what you loved about the Hitman franchise, then Hitman: Absolution is the Hitman game for those that preferred to play Halo or perhaps Max Payne.  I've played up to Dexter Industries so far, and from what I have seen it replaces the lush, non-linear environments and pre-mission checklist of open-ended objectives of the earlier Hitman games with linear scenarios and concrete objectives like "obtain the keycard", "pacify Lenny" and "talk to the bartender."  The entire effect smacks of the degeneration of video game environments.  Many areas of the game are obviously designed to encourage the user to select a specific tool from the abilities available in Absolution such as point shooting and instinct.  The trouble is that on Purist difficulty, and as a player that wants to make an ethical statement through my actions, I do not want a bloodbath and I do want to out-wit my enemy, not cheat to blind them for a moment.  It seems like IO software tried to make Hitman into a Halo-like FPS game where between checkpoints the player is to select the appropriate tool:  sneaking, bloodbath, disguise, hostage-taking or exploration.  I'll admit that there are moments where this actually works.  In the "Rosewood" level, I am open to using violence against a mob of assassins who are in the process of or already have brutally murdered an orphanage full of innocents.  I also generally like the level where the player must save Birdie from imminent death by eliminating the three assassins in a crowded Chinatown, as well as the level where the player must kidnap Lenny.  These have the common features of feeling quite non-linear, as well as offering a huge array of opportunities for different styles of play.  

Unfortunately, to get to these gems that I actually enjoyed, I had to trudge through some really painful chores.  The game made a lousy first impression on me by suggesting that I demonstrate defenestration with an innocent guard who just learned that he was free of prostate cancer, and by forcing me to massacre four security personnel guarding a teenage girl in cold blood.   There was another scene where the goal was clearly to get me to outgun a dozen police officers inexplicably searching for me in an abandoned building behind a nightclub, and one where the police were prepared to open fire on me on a crowded train platform, though apparently playing a strategically placed skill crane game was sufficient to convince these eagle-eyed officers who were able to spot me in a crowd across the platform that I was not in fact the man they were looking for.  Other levels stood out as simply bizarre:  one where the entire goal was to walk into a redneck bar and talk to the bartender which is apparently a huge challenge, and another where the game tried to be America's Army by suggesting that I partake in some OCD target practice instead of the first-person sneaker game I wanted.

AI behavior also takes me out of the game.  Aside from skill cranes serving as magical invisibility cloaks and cops opening fire on a crowded train platform, apparently every street vendor in Chinatown knows every other vendor and will call the cops if they see any newcomer.  Apparently cleaners in an act of mass murder can distinguish gunshots from one of their people from gunshots fired by somebody else.  Apparently there exist redneck bars with a reverse dress code such that entering while wearing a suit and tie will get you shot by bouncers.  Since when did redneck bars have bouncers anyway?  Was there no Wal-Mart in this town that would sell me a change of clothes?

Overall, I can appreciate some of the changes:  I actually like the classic James Bond feel of occasionally having to simply run past security.  I actually liked the time that I had to take a human shield and back onto a train just before it departed, revealing my bluff when I pushed him off the train car at the last second.  However, I can't help but feel resentful at the times when the game suggests or enforces that I kill innocents.  I can't help but wonder if the first syllable of "Hitman" was confusing to IO software when they inserted ridiculous missions on rails without even an actual target.  Finally, I can't help but mourn for an everyday superhero who IO software thinks would pull an innocent maintenance worker who "hadn't planned on being a father at his age" through a window to his death.  Not on my watch.

This AAA title is a failure in my book.  For those that liked early James Bond who never killed anybody, or those who got the Pacifist achievement in Deus Ex: Human Revolution, or those who felt that killing people was the mark of an amateur in Thief, or those who can empathize with Dexter Morgan, this is the planned assassination of the everyday superhero that was Agent 47 in favor of a thinly veiled shooter on rails.  Hitman: Absolution is a bland, generic title that will resonate with those who would have preferred to play Halo, Call of Duty or America's Army featuring a protagonist that absolutely nobody will identify with.

In closing, I would like to plug an independent game that is excellent.  Instead of spending $50 on Hitman: Absolution, spend $10 on Faster Than Light.  It is a real-time strategy space combat simulation game where the player must make the right choices to survive and save the Federation from a very well-armed rebellion.  It's a game design that's a far cry from the one-size-fits-all rehash of working concepts that constitutes a modern AAA title, but that will resonate strongly with players who appreciate a novel design and an adrenaline-pumping challenge.

If you like NetHack but wish it was a little bit shorter, and you liked Star Trek, you will love FTL.

Monday, November 26, 2012

Google Welcome Gifts

Most employers will send you a welcome gift when you accept their offer, and Google is no different.  I was really excited to see that my Google welcome package arrived today, and wanted to share its contents with you.  First, you should know that it came in a plain white box packed with festive Google-colored confetti, shown to the left.  Second, what was inside: in the picture below, you will note  one Google hat, a super-cool Google hoodie, Google sunglasses (with UV protectant), a Google notebook, and a green Google pen.  I had to fight the urge to throw the confetti up in the air and have an impromptu party because I didn't want to spend all night vacuuming up the pieces.  Thanks Google!  I love my new swag.


Sunday, November 25, 2012

30th Birthday Spectacular

This Thanksgiving was my 30th birthday, and Simiao helped me celebrate in style by making a wonderful Thanksgiving dinner for myself and some friends from Sage.  The next day we drove to Breitenbush Hot Springs to spend a weekend hot tubbing and hiking.    Driving back in the daylight, we were intrigued by the partially drained Detroit Lake and stopped at the Detroit Lake State Park Mongold Area to explore the drained lake bed.  There were a few remarkable things about the exposed wasteland:  first, the underabundance of lake life.  No water plants at all were visible, only tree stumps, and bare rock.  The occasional crayfish shell could be found, probably the remains of bait from recreational fishing.  Second, evidence of long submerged human habitation was visible in the form of ancient asphalt and concrete foundations as well as the occasional exposed pipe.  Thirdly, we found a submerged hot spring where steaming hot water flowed from the exposed lakebed!  I've posted some favorite pictures from these adventures below.

I could barely contain my glee at my dangerously chocolaty birthday cake!








Monday, November 19, 2012

Cloud Backups vs. RAID

A friend had an unfortunate but predictable mishap today, and I wanted to do a bit of math on the economics of data backups.  I'll start with two stories about hard drive crashes with very different outcomes.

Some time ago, I fixed a laptop for a friend and in the process had to reformat the hard drive.  So, I backed up all of her files to my RAID array and then copied them back to the hard drive afterwards.  Months later her laptop was stolen, and with it all of her irreplaceable documents and pictures from years ago.  Some time later when I was clearing up space on my backup array, I found that I had forgotten to delete these files after the repair and was able to send them back to her intact.  A fortunate circumstance, after which I did what I felt was responsible and used srm to scrub the files from my archive:  it was not right that I should have all of someone else's personal files.  Today I got a text message asking if I still had those--her hard drive failed, and sadly it seems that this time years of photos and documents may be gone for good.

Having all of your personal files on a single drive is fine if you do not have any important or irreplaceable content on your computer so that you don't care if they disappear forever, but for most people this is not the case.  The average lifetime of a hard drive varies, but in my experience typical use means 3 to 4 years, and I only get some warning of impending failure through the "click of death" or system errors about half the time.  Recovery services are loath to give out a fixed price scheme which is understandable given the number of ways a drive can fail, but $350 is a typical cost if they can be successful.  Obviously, if your drive is stolen, even a data recovery service can do you no good.  You don't think about it every day, but the value of years of pictures, documents and code may be so high that it is impossible to put a number on it.

Now, a story with a happy ending.  For years, I had been using a RAID 1 array as a backup system.  Whenever one drive failed, I would just buy two new drives and copy everything over.  This has worked well for me and I have data from well over a decade ago including some of the earliest programs I ever wrote, 8 years of code from a past career in contract software development, and homework assignments from 6 years of grad school.  I also have hundreds of megabytes of pictures from places that no longer exist and of people that are no longer alive.  Recently one of the two drives in my RAID array failed, and I was left with a choice:  to buy two new drives, or to switch over to a cloud backup service.  I used CrashPlan at work, and was pleased that it was cross-platform (Linux, Windows or Mac) and worked seamlessly and silently in the background.  After examining the price tag, I decided to step into the 21st century and use CrashPlan instead of ordering new drives.  Three days later, the lone surviving drive from my RAID array failed.  If I had ordered new drives via standard shipping, I would not even have received my new drives yet much less had a chance to set them up.  If I hadn't switched to CrashPlan, that could well have been the end of over a decade of irreplaceable data.  Instead, my data was back in my hands within a day or two as I restored it from the cloud.  The cloud backup service paid for itself in the first week of ownership.

Now my question:  is it cheaper to run a 1TB RAID array than to pay for a cloud backup service assuming 1 terabyte of data and a 4-year lifespan of a drive, and a computer that is always on?   When considering prices, I will use the cheapest workable option available from NewEgg, the lowest prices from my electric bill, and an unlimited data use internet plan which may not reflect tractable options for all people.


Criteria1TB Cloud1TB RAID 1Advantage
Direct Costs1TB HD: $80.001TB HD: $80.00RAID 1
4yr Crashplan+ Unlimited: $139.991TB HD: $80.00
Total: $219.99 Enclosure/RAID controller: $10.00
5W×2HD×4yrs×$0.05/kWh: $17.52
Total: $187.52
BandwidthAt least 1TB over 4 yearsNoneRAID 1
Recovery TimeDownload speeds (Mbps)Data transfer speeds (Gbps)RAID 1
ReliabilitySecure datacenterVulnerable to theft or accident such as fire or flood.Cloud
MaintenanceInstall cross-platform software.Relatively advanced RAID setup required.Cloud

Surprisingly to me, if you were to use the absolute cheapest options available to you and software RAID, then RAID 1 is actually about $30 cheaper than CrashPlan over a period of 4 years. Clearly, CrashPlan is taking advantage of an economy of scale in order to provide this service: lots of cheap drives, a data center in an area with affordable power, and mass-produced servers. RAID 1 also provides faster access to your backup in the event of a loss, and does not abuse your internet bandwidth.  I will argue, however, that for most people the cloud will still be the better option, for one reason:  ultimately, the loss of one's data entirely is a disaster whose cost may be innumerable and as such reliability is a more important metric than price for most people.  Running RAID 1, you are still vulnerable to theft, a disaster such as a fire or flood or just knocking the enclosure off of a shelf that simultaneously destroys both drivers, or a virus or software malfunction that leaves you in the unfortunate situation of data loss.  Further, a cloud backup service is much easier for most users to set up and maintain than a RAID array.  For myself and most home users, the superior reliability of a cloud backup service more than justifies the trivial additional cost over RAID 1.

Overall, I would recommend each backup system under the following circumstances:

  • RAID 1:  Justifiable in situations where bandwidth is at a premium or internet is not available, or where backups must be available immediately (as opposed to hours later after a download completes) upon failure of a drive.  The price advantage of $7.50 per year is so small as to be insignificant as a deciding factor.
  • Cloud:  For a typical user such as myself with about 1TB of data and an unlimited internet connection,  it is only slightly more expensive than running RAID 1 but provides much greater reliability.  This option represents the future of computing and of data in general.
Whatever you do, please be responsible and use an always-on backup system.  Remember that we live in an era where much of your life exists only in data, and that many of these seemingly generic zeroes and ones are actually quite dear to you.  I know that if I had waited even another day to subscribe to a cloud backup service, an irreplaceable record of the last 16 years of my life would be gone forever.  



Saturday, November 10, 2012

A Realignment

A beautiful red fox that Nadia and I
spotted while working on a secret project
in Mt. Rainier national park.
No, I have not abandoned or forgotten about my blog!  I have been working behind the scenes on several long-term, high-quality projects that I believe will be far more compelling than anything I could have slapped together in the short term.  Suffice it to say that I myself am wholly on pins and needles1 waiting for the results!  In this case, it is a question of having things done fast or having them done right.

The big news is that I am in the midst of a personal realignment of priorities.  As part of this realignment, my last day at Sage Bionetworks will be November 20, and I am starting as a software engineer at Google in Fremont on January 7.  The friends I've made there will be sorely missed, as I've had the privilege of working alongside some of the most brilliant, thoughtful and wisest people that I've ever encountered.  If you find the suddenness of this uncharacteristic of me and are concerned for whatever reason, I will discuss the rationale on my private blog.  So, if you are interested in some thoughts and details regarding my leaving Sage, let me know and I'll add you to the list of allowed readers.  I don't expect you to, though:  I am making the private post mostly for myself while the reasons are fresh in my mind and to commit some lessons to writing for future reference.

As a function of leaving Sage and starting at Google, I felt it was no longer accurate to have my blog's tagline be, "Open source software and open access research."  I am finished with academia for the foreseeable future, and the software I produce at Google would only incidentally be open source.  I am still profoundly interested in the open source and open access movement and I will continue participating on my own time, but it's no longer a full-time job.  I'm in the market for a new tagline, so if you have any ideas let me know.

Finally, I'm totally pumped about working at Google!  I've long loved their products, and this is an opportunity to help a lot of people and touch a lot of lives by making revolutionary, user friendly and high quality software!

1This is an awesome pun, but won't make sense until one secret project is revealed!

Saturday, September 29, 2012

Vacation: Space Needle and Mt. Rainier

These chipmunks playing at Sunrise Lodge
were just about the cutest thing ever.

Simiao visited me for our 2-year dating anniversary, and to celebrate we visited the Chihuly museum and the Space Needle at Seattle City Center, and then drove out to Mt. Rainier National Park the next day to explore the area around Sunrise Lodge.  Sunrise Lodge is the highest elevation accessible by vehicle, and I hoped we'd get lucky and get a day where it wasn't too cloudy to see Mt. Rainier.  It turns out that it was mostly obscured, but the mists swirling around the trails made a great subject.  As you reach the ridge along the Sourdough Mountains, you can feel the strong wind resulting from the air being forced over the Cascades and can actually see the newly-formed clouds being formed on the mountaintops and whipping down into the valley below.  It's an area that really has to be seen in order to be appreciated, and I'm told that when the alpine meadows are in bloom that the area is a sight to behold.  Unfortunately that part will have to wait until July 2013 or so, but I've included some of our favorite pictures for you to appreciate now.

A unique view of the Space Needle from the Chihuly Museum.

No trees were harmed in the making of this photograph.

Western Pasqueflower in seed.  Obviously not my shot,
Simiao has an eye for composition.
Another shot by Simiao looking South from the Sourdough Trail

Sunday, September 2, 2012

What are the Most Undervalued US Coins?

Since I was very young, I collected US coins.  It was fun to search through rolls of pennies and nickels and look for rare varieties or missing years and mint-marks for my set.  As an adult, I still add the occasional interesting piece to my collection.  Now as a data scientist, I wonder what the relationship between market price and rarity is.  Is there a very strong correlation between these two things?  How are they related?  Are there sets that lie largely above or below this curve?

All of my data is collected from the subscription-access database PCGS Coinfacts.  You can look at a few entries in this database for free, but after that trial you must purchase a subscription to see more data.  I will present the results here in a format that cannot be used to reconstruct the copyrighted data.  I want to get a few additional details out of the way:
  • Please do not construe this article as an endorsement for investing in rare coins.  Rare coins are a terrible investment.  They do not pay dividends and their value is entirely subjective.  If you sell them at auction, you can expect the auction house to take about 15-20% of the buyer's price.  If you choose to collect coins as a hobby, you should do so out of historical and artistic interest and not as an investment.  Further, there is the risk of losing one's investment entirely as an enormous number of high-quality counterfeits are coming out of China.  Some are so good that even experts have a very hard time telling the real thing from frauds!
  • PCGS Coinfacts has numerous, obvious errors in their database.  I will not attempt to correct these errors in my data, and instead try to make big-picture conclusions wherever possible.  
  • I will only consider major varieties of US coins, and not rare minor variants.  
Let's start with a simple assumption, and a counterexample.  Assume that coin collectors will buy a coin for its melt value plus its numismatic value, which is a function only of its rarity.  Then, two coins of similar rarity should sell for the similar prices, plus the value of the metal they're made of.



Survival Estimate (MS60 or better)
2,000
55,833
Market Value (MS60)
$52
$1,890
Melt Value
($0.02)
($1,601.98)
Market Value Corrected for Melt (MS60)
$52
$288
Data and images © Collector's Universe
reproduced for nonprofit educational purposes only.

So what does all of this mean?  It means that of the millions of 1913 Lincoln cents minted, only 2,000 are estimated to exist in mint state (MS60+) today.  On the other hand, of the 361,667 1907 St. Gaudens double eagles (low relief) minted, 55,833 of them are estimated to exist today in mint state.  This is important to know, because numismatic rarity is based on how many are still around today, not how many existed when they were made.  If I correct for the price of gold, I see that people are willing to pay 6 times more for the St. Gaudens that is actually 25 times more common.

Certainly, the double eagle is aesthetically more beautiful, and as such understandably more in demand than the Lincoln cent.  The antiques market is intrinsically subjective, and what each coin is worth is precisely what someone is willing to pay.  We will also see here that varieties that were only made for a brief period, such as this double eagle, tend to sell for far more than you might expect given the rarity.  Lincoln cents have been in production for over 100 years, so perhaps collectors are just as happy looking at the ones in their pocket as the ones from a century ago.  This example illustrates that there are emotional factors in play that sway collectors to pay much more for pieces that are in fact much more common.

Now let's explore the relationship between price and scarcity.  I have collected scarcity and market value for all major varieties of US coins using PCGS Coinfacts.

Figure 1:  Market price (adjusted for melt value) versus scarcity for US coins in mint state.  Copper coins are shown in brown, silver in gray, and gold in yellow.  The regression line shown is ln(price) = -0.6716ln(survival) + 10.4208, with R2 = 0.7614.  In order to reduce the influence of outliers, the regression was performed after taking the natural log of both price and survival.  Modern gold and silver Eagles do not appear on this curve as their numismatic value does not exceed their melt value.
Figure 1 shows market price versus scarcity in mint-state coins.  While I collected data for lower grades, I found the survival estimates to be more plausible for mint state coins than the population at large.  The first interesting thing about this plot is that the number of surviving specimens and the market price have a linear relationship on a log-log plot, and it's worth a bit of discussion with regard to why this is so remarkable.

The linear relationship on the log-log plot indicates to me that the fact that other collectors can't have a specimen is a substantial part of the appeal of having a specimen.  This exclusivity manifests itself in the following way: the fewer coins of a variety exist, the more the total value all specimens of that variety have!  For that reason, on a plot with linear axes, this data looks like an "L" where a coin is either very rare and very valuable, or too common for anyone to value it at all.

I also notice a few trends.  The most overvalued coins tend to be gold.  Copper coins tend to be more undervalued.  Modern gold and silver Eagles could not be placed on figure 1, since their numismatic
value did not exceed the melt value and there is no zero on a log-log plot.

My goal with this study is not to chide people for collecting what they like.  I recognize that the numismatic hobby is not about investment, but rather about artistic and historical interest.   I want to see was which series of US coins are undervalued and which are overvalued per rarity, and suggest to myself and other collectors what might be the most neglected series to focus on to build an interesting collection.  In order to do this, I will find the median percent difference from coins in each series to the regression line ln(price) = -0.6716 ln(survival) + 10.4208.  In this way I will minimize the effect of noise and outliers in each set.

SetMedian Percent
Overvalued MS60+
Comment
4 Stella3822%Only 425 of these were made and
sold to Congressmen.  
Flowing Hair Dollar2925%Ultra-rare, with only some 100 of them
surviving.  
Draped Bust $102204%Far overvalued given its rarity.  Consider the
draped bust half cent or Liberty head Eagle.
Modern Commemorative Gold1423%Modern commemoratives made for collectors
are sold for a premium not justified by rarity.
Sacagawea Dollar1220%These are very inexpensive to begin with.
Millions sit unwanted in the treasury.
Draped Bust $51118%Consider the draped bust half cent for a more
cost effective option.
Capped Bust $51109%Consider the capped bust quarter for a more
cost effective option.
Draped Bust $2.50897%Consider the draped bust half cent for a more
cost effective option.
Flowing Hair Half Dime872%Consider the flowing hair large cent for a
more cost effective option.
Liberty Cap Half Cent723%Only made for a few years.  If you like this,
consider the undervalued classic head half cent.
50 States Quarter707%These are very inexpensive to begin with,
but are far overvalued in mint condition.
Capped Bust $2.50650%Consider the capped bust quarter for a more
cost effective option.
Commemorative Gold637%Commemorative coins start their lives in the
collectors' market and never tend to leave.
Flowing Hair Half Dollar615%Consider the flowing hair large cent for a
more cost effective option.
St. Gaudens $20506%The most beautiful of US coins comes with
a high premium.  Consider modern gold eagles.
Flowing Hair Large Cent434%The least overvalued coin of the flowing hair
set is still quite expensive given its rarity.
Draped Bust Dime424%Consider the draped bust half cent for a more
cost effective option.
Draped Bust Dollar416%Consider the draped bust half cent for a more
cost effective option.
Modern Lincoln Cents393%These are very inexpensive to begin with,
so the percent overvalued is misleading!
Draped Bust Quarter359%Consider the draped bust half cent for a more
cost effective option.
Draped Bust Half Dime345%Consider the draped bust half cent for a more
cost effective option.
Twenty Cent309%If you like this variety, take a look at the three
cent nickels or two-cent piece.
Indian $2.50282%This set is probably valued more highly for
being the only incuse set of US coins.
Liberty Head $20236%The Liberty Head $10 and $2.50 are far less
overvalued for their rarity.
Trade Dollar233%Aesthetically similar to Seated Liberty, except
far more overvalued.   
Peace Dollar224%If you like large silver coins, you should take
a look at Franklin half dollars.
Indian $10217%The Indian $5 is slightly less overvalued if you
must have coins bearing this design.
Classic Head $2.50201%The Classic Head half cent is far less
overvalued given its rarity.
Three Dollar192%Take a look at the gold dollars which are similar
in design but much less overvalued.
Classic Head $5187%The Classic Head half cent is far less
overvalued given its rarity.
Liberty Seated Dollar181%The Liberty Seated half dollar or quarter are
much more cost-effective alternatives!
Indian $5177%The least overvalued of the Indian eagles
in terms of its rarity.
Draped Bust Half Dollar176%The draped bust half cent is far more cost
effective if you like the draped bust series.
Flying Eagle Cent146%Stunningly overvalued for such an unattractive
coin.  Consider the undervalued Indian cent.
Walking Liberty Half Dollar142%The same design appears on modern silver
eagles without the high premium.
Morgan Dollar138%One of the most popular US coins to collect,
comes with a premium to match.
Ike Dollar119%It's not immediately clear why collectors like
Ike so much, but consider the Franklin half.
Classic Head Cent104%The Classic Head Half Cent is far less
overvalued given its rarity.
Draped Bust Cent56%The draped bust half cent is far more cost
effective if you like the draped bust series.
Capped Bust Half Dollar37%The capped bust quarter has a more favorable
price given its rarity.
Washington Quarter31%Many are quite inexpensive, so the fact that it
is overvalued won't set you back much.
Commemorative Silver25%Coins made explicitly for collecting have a
propensity to never disappear from the market.
Liberty Head $524%Consider the Liberty Head $2.50 for a less
overvalued option.
Susan B. Anthony Dollar24%Quite inexpensive and very common, the fact
that it is overvalued won't cost you much.
Barber Half Dollar14%Slightly overvalued given its rarity.  Maybe a
good stand-in for overvalued Morgan Dolllars.
Gold Dollar14%One of the least overvalued US gold coins
that circulated. 
Liberty Head $1013%One of the least overvalued US gold coins,
and quite large for those who prefer larger ones.
Liberty Head $2.5013%The least overvalued set of US gold coins that
circulated.  Great choice if you must have gold!
Capped Bust Dime12%The capped bust quarter, on the other hand,
tends to be undervalued given its rarity.
Modern Silver Eagles0%If you like large, silver coins, consider the
Franklin half.  Worth little beyond melt.
Modern Gold Eagles0%Worth almost nothing beyond melt.
Good stand-in for St. Gaudens $20!
Liberty Seated Half Dollar-17%A great choice for those who prefer large silver
coins.
Capped Bust Quarter-20%If you like the capped bust design, the half dime
is even less expensive given its rarity.
Buffalo Nickel-22%A charming design and lots of interesting
varieties, and a great value for its rarity!
Standing Liberty Quarter-22%The most undervalued coin of the standing
liberty set.
Three Cent Silver-23%Cousin to the also undervalued three cent
nickel.  
Liberty Seated Quarter-23%Another great choice for those who
prefer large silver coins.
Draped Bust Half Cent-30%The most undervalued coin in the draped bust
set.
Jefferson Nickel-33%Probably suffers a lack of interest for being the
standard nickel for such a long time.
Barber Quarter-35%Barber coinage is exceptionally charming but
somewhat undervalued given its rarity.
Liberty Nickel-36%A design similar to the Morgan Dollar in
a much less overvalued series.
Roosevelt Dime-38%The most undervalued coin that is still
ubiquitous.  Also quite inexpensive!
Coronet Head Cent-38%An attractive, affordable and undervalued
series with lots of interesting varieties.
Shield Nickel-41%A less attractive engraving may contribute
to these being rather undervalued.
Two Cent-45%A really interesting and short-lived piece which
doesn't get much attention from collectors.
Barber Dime-49%A very attractive series, which is nonetheless
quite undervalued. 
Liberty Seated Dime-49%Another Barber coin that is rather undervalued
for its rarity.
Three Cent Nickel-50%An interesting and short-lived series that
is grossly undervalued for its rarity.
Braided Hair Half Cent-56%Quite undervalued alongside its one cent cousin.
Still, quite rare and expensive.
Braided Hair Cent-56%The most undervalued coin in the Braided
Hair / Coronet set.
Liberty Seated Half Dime-57%The most undervalued coin in the Liberty
Seated set.
Capped Bust Half Dime-58%While still quite expensive, the most cost
effective of the capped bust series.
Kennedy Half Dollar-64%Modern half dollars seem to be off the radar of
collectors and the population at large.
Classic Head Half Cent-66%While these are quite expensive, rare
and interesting, collectors neglect this variety.
Lincoln Cent Wheat Reverse-71%The modern Lincoln cent's predecessor may
suffer for having the same obverse.
Indian Cent-80%Perhaps people are dissuaded by its tendency
to turn brown with age. 
Mercury Dime-84%This is a beautiful coin!  Perhaps people
don't like it for its small size.
Franklin Half Dollar-84%The most undervalued set of US coins by rarity.
These can be had very inexpensively to boot!
Table 1:  Median percent overvalued by variety and suggested alternatives.  The median percent overvalued is obtained by the following procedure:  First, the coin's value in MS60 is corrected by its melt value.  Second, its expected price for its rarity is obtained from the regression line seen in Figure 1.  Third, the percent overvalued for each coin in the series is found by taking the ratio of the expected price to its actual market price and subtracting 100%.  Finally, the median of these values for the set is reported as the median percent overvalued in MS60.  

As can be seen in the Table 1, gold coins tend to be far more overvalued for their rarity even after correcting for melt value than their silver or copper counterparts.  People may have an instinctive reaction to the fact that something has both numismatic and intrinsic rarity and respond by making an offer higher than they would for an equally numismatically rare coin and a lump of gold of the same weight.  The most undervalued coins given their rarity are the Indian cent, the Mercury dime, and the Franklin half dollar.

Rare coins are not a good investment unless you are a dealer buying from estates and flipping to collectors, an auctioneer who is acting as an agent to sell other people's coins, or an individual hoarding pre-1982 pennies for their melt value.  You should be sure that you are collecting for the right reasons: artistic or historic interest.  The value of antiques is so subjective that the fact that something lies below the price versus rarity curve means nothing unless you can get someone to buy the piece for a price tag closer to that curve.  If you are a collector who wishes to build an interesting and rare collection for the best value possible, the most undervalued sets are Indian cents, Mercury dimes, and Franklin half dollars.  This doesn't imply that you will make any more money collecting these, but it does mean that you can stretch your budget further.

In particular, you can determine what the expected price for any mint-state coin (MS60) should be based on rarity and melt value using this formula:

price  = exp(-0.6716 ln(survival estimate in MS60+) + 10.4208) + melt value

It would be foolish to use this formula to determine what you should actually bid or ask for a particular coin.  As I said above, individual pricing is highly subjective.  The history of a particular piece and its aesthetic value have a profound impact on its market price.  I've seen coins with beautiful toning sell at auction for 10 or more times what other coins technically in the same grade sell for.  This equation's utility is therefore limited to evaluating whether a coin is undervalued based on its rarity alone in the current market.  It's possible that in the future collectors' tastes will change and the undervalued coins will fetch a price more in line with their actual rarity.  It's also possible that they won't.  So, if you wish to use this result, you should use it as a guideline and not as gospel.

In my case, I like to use the recent auction sale prices at Heritage Auctions to determine what I am likely to actually be able to buy a coin for, then compare that price to the rarity-based price.  If my equation gives a higher price than the recent sale prices for a similar coin, I call that a good value.

Friday, August 24, 2012

How to Roll Back Packages in R

Today I was in an unusual position where changes to a common dependency in one R package broke compatibility with several other packages I use for predictive modeling.  When I went to roll back the package, I was surprised to learn that there is no option to install.packages which will simply install a version of my choice.  Instead, Matt taught me a rather simple and straightforward way to install an earlier version using Hadley's devtools package.  The first step is to install the package and load it:

install.packages("devtools")
library(devtools)


Next, I can look up the package and version of my choice on the CRAN archive.

Finally, once I find the package and version I want, I can copy the link from the CRAN archive and use.  For example,

install_url("http://cran.r-project.org/src/contrib/Archive/survival/survival_2.36-12.tar.gz")

Just like that, I've rolled back the package to an earlier version.  Thanks to Matt for introducing me Hadley's useful devtools package!

Sunday, August 19, 2012

The Yacht Race

I'd never sailed before this week, but my friend Matt invited me to a yacht race and then again sailing the next day this week and I'm sold.  There's nothing quite like the sensation of the wind pulling you through the water when you have the sails aligned just right, and the views of Seattle and Mount Rainier from the Puget Sound are really marvellous.  Let it be known:  a cooler full of beer, some friends, and a yacht are a formula for a great evening!  We were in such a hurry to get out the door that I forgot to bring my camera to the yacht race, so the pictures posted here are crude cell-phone photographs that don't quite do the day justice. 




The funny thing is, I think Matt actually appreciated the opportunity to just enjoy the ride and not drive for a change.  Keeping the boat aligned to catch the wind just right takes some getting used to, but just trying is a lot of fun!


Saturday, August 4, 2012

Playing Chicken with History

I watched with some interest the news regarding Chick-Fil-A Appreciation Day.  The concept is that because Chick-Fil-A as a company publicly opposes gay marriage, then you can show your opposition to the same by supporting them.  I wondered if this move could be less a function of the company owners' core beliefs and politics than a marketing scheme designed to drum up sales based on the general opinion of those that live near Chick-Fil-A stores.

I do my best to keep politics out of my blog and focus on science, but this time I feel the need to preface a study with what shouldn't be a politically charged statement.  Majority vote is a pretty good way to deal with mundane, day to day problems.  If most people believe that the patent system should be overhauled, or most people believe that a road should be built, then that should probably happen.  No majority vote, government, religion or individual, however, has the right to disenfranchise anybody of their civil or human rights.  Marriage is a human right, and states that have constitutionally banned gay marriage based on a referendum are wrong.  People who believe that gay marriage should be illegal are also wrong.  They are also on the losing side of history.  I personally will not eat at a Chick-Fil-A restaurant until they publicly renounce their support of bigotry.

As a scientist, what I'm mostly interested in as I read about Chick-Fil-A Appreciation Day is whether the Chick-Fil-A debacle is a marketing move, and whether we will see other businesses take controversial political positions in order to boost sales and brand awareness.

First, I need some data to work with.  I obtained data on the locations of Chick-Fil-A stores using a map of the geographic distribution from their company website.  That map, correct on June 1, 2012 is reproduced below.

Figure 1:  Geographical distribution of Chick-Fil-A stores on June 1, 2012.  This map is copyright CFA properties Inc., and is reproduced for non-profit educational purposes as permitted by section 107 of the copyright law (title 17, US Code.)

It is remarkably difficult to find data on the state-wise public opinion on gay marriage, so as a proxy I used raw data from the Pew Research Center's 2011 Political Typology Survey.  In particular, one question was "q37u:  Should homosexuality be accepted by society?", and I assume that one's answer to this question equals whether or not they support legal recognition of same-sex marriage.  Admittedly, it's possible that some people may see societal acceptance of homosexuality as something different than marriage equality, but it's the best proxy I could find.  If you have better data by state, please let me know!

The first step after downloading the SPSS file was to convert it to an open format that I could actually use.  I found GNU PSPP to be inadequate for this purpose, and instead used R to read a SPSS save file into a data frame:

library(foreign)
pew <- read.spss('~/Desktop/2011 Political Typology public.sav', to.data.frame=T)


Now I am able to break down the answers to this question by state.  Answers 1 and 2 to the question "q37u" felt that homosexuality should be accepted by society, and 3 and 4 said that homosexuality should be discouraged.  Those counts are then given by

accept <- table(pew[as.numeric(pew,"q37u") %in% c(1,2), "state")


and

reject <- table(pew[as.numeric(pew,"q37u") %in% c(3,4), "state")


Some states had very few data points (Wyoming had only 3), and Alaska and Hawaii were totally unrepresented.  As such, the error bars on the ratio of accepters versus discouragers can be quite large.  That said, producing a map of acceptance of homosexuality by state produced a very telling result.  With help from the Revolution Analytics Blog, then:

# gives proportion acceptance
acceptanceRatio <- accept/reject        

# To lower case
names(acceptanceRatio) <- tolower(names(acceptanceRatio))

# Ok, let's get the appropriate regions from the map
require(maps)

# Get the region names
regionNames <- map("state",namesonly=TRUE)

# Need to parse out the name, exactly
stateName <- unlist(lapply(as.list(regionNames), function(x) {return(strsplit(x, ":")[[1]][1])}))

# Set the ratios by state
stateRatios <- acceptanceRatio[stateName]


# Generally accepting, generally rejecting, about even. (60% majority)
color <- apply(stateRatios, 1, function(x) {
  if (x <= 2/3) {return(rgb(red=1, green=0, blue=0))}
  if (x >= 3/2) {return(rgb(red=0, green=1, blue=0))}
  return(rgb(red=1,green=1,blue=0))
})

map("state", fill=TRUE, col=color)


Figure 2:  Opinion on societal acceptance of homosexuality by state.  In green states, over 60% of respondents to Pew Research's 2011 Political Typology Survey supported societal acceptance of homosexuality.  In red states, over 60% of respondents felt that society should discourage homosexuality.  In yellow states, there was no 60% majority.  Please note that there is substantial margin of error here, as some states had as few as three respondents.

Figure 2 was, to me, an extremely telling plot.  First, I'll point out that there is obviously strong overlap between the number of Chick-Fil-A stores in a state (red in figure 1) and sentiment that the state should discourage homosexuality (red and yellow in figure 2).  What really shocked me, though, is that this plot tells me that a vocal minority is responsible for the outcry against same-sex marriage.  Even in North Carolina, where a constitutional amendment was voted into effect to ban gay marriage, the ratio of those who felt that society should accept versus discourage homosexuality is nearly 1:1.  With voter turnout below 50% in most of the country, this means that all that has to happen in order to win marriage equality in most states is for those who support equality to go out and vote.

So, was it a good idea for Chick-Fil-A to market their bigoted position?  Nationwide, no.  More people support equality than reject it.  In the regions of the country with the most Chick-Fil-A stores, it's a dicey measure as the population is mostly split.  The reality, though, is that a look at the national political scene shows that the vocal minority is much more active in promoting their historically doomed cause than the majority which supports acceptance.  In that sense, Chick-Fil-A's move is a pretty safe bet.  Gay rights supporters simply aren't as active as their opponents and may not care enough to give up fried chicken.  Meanwhile, Chick-Fil-A enjoys a surge of popularity from the vocal minority.

In a sense, Amazon quickly followed Chick-Fil-A's experiment with their own.  In Jeff Bezos' case, it makes sense to side with the majority of people when your company makes sales nationwide.  President Obama also recently took the side of the majority, a wise choice when running for office with a nationwide constituency.  It will be interesting to see whether or not we see more examples of high-ranking representatives of companies taking politically controversial positions to market to their primary customer base in the future.

In conclusion,

  • A majority of, or at least a roughly equal number of people, support societal acceptance of homosexuality most of the lower 48 states.
  • Proponents of inequality are apparently much more vocal and active in promoting their position than those who support equality.
  • Chick-Fil-A's position actually conflicts with that of about half of their potential customers even in the Southeast, making their move a dicey one at worst but a good one at best based on the activeness of the anti-gay rights crowd.

Finally,

  • "All that is necessary for the triumph of evil is that good men do nothing." -Edmund Burke

Get out and vote!  If you support equality, you're in good company.  In most states, you represent 60% or more of the constituency.  In nearly all states, you represent 40% or more of the constituency.  All that has to happen for equality to win the day is for you to tell the government what you think!





Sunday, July 22, 2012

Washington State Primary Ballot: Who Are They?

I had a new experience yesterday:  I got my primary election ballot in the mail!  Coming from Pennsylvania and having registered non-partisan, I was not allowed to vote in Pennsylvania primaries.  I always thought that was odd:  if you're a political party, wouldn't you care about what the people with no party affiliation thought about your candidates as much as or more than those in your party?  Since people in your party are not likely to change sides, the independents are the people most likely to be swayed to your side in an election!

Anyhow, I was dismayed when I saw the primary ballot for two reasons.  First, I saw that the ballot included such names as "Mike the Mover" and "Goodspaceguy."  To me, these non-names don't inspire much confidence in the candidates, and it turns out their websites didn't either.  Second, when I started searching for candidates, I noted that many didn't even have a website, or if they did their website didn't include meaningful information about their platform.  I don't want this to be a blog about anything but facts and opinion supported directly by facts, so I will keep this post simple.  What follows is simply a table of candidates on the Washington State Primary Election ballot, and if I could find a website why you should vote for them in their own words.  I hope you will use this information to make the best informed decision in the upcoming primary and election, and may the best candidate win!  Please also be aware of the election pamphlet available at the King County website, which I found helpful but incomplete.


For United States Senator
NamePartyWhy you should vote for
Michael BaumgartnerRepublicanvotebaumgartner.com
Will BakerReformNo Website Found
Chuck JacksonRepublicanscaryreality.com
Timmy (Doc) WilsonDemocratictimwilsonforsenate.org
Art CodayRepublicanartcoday.com
Maria CantwellDemocraticcantwell.senate.gov
Glen (Stocky) R. StockwellRepublicanwashingtonstateeconomicdevelopment.vpweb.com
Mike the MoverRepublicantheoriginalmikethemover.com

For United States Representative
NamePartyWhy you should vote for
Don RiversDemocraticdonriversforcongress.com
GoodspaceguyEmploymentwealthcolonizespace.blogspot.com
Scott SutherlandGOPvote-wa.org
Andrew HughesDemocraticandrewhughesforcongress.com
Jim McDermottDemocraticmcdermottforcongress.com
Ron BemisRepublicanronbemisforcongress.org
Charles AllenDemocraticcharlesallen2012.com

For Washington State Governor
NamePartyWhy you should vote for
Rob HillDemocraticNo Website Found
Rob McKennaRepublicanrobmckenna.org
Jay InsleeDemocraticjayinslee.com
James WhiteIndependentwhiteforgovernor2012.com
Christian JoubertNo Party Preferenceholisticgovernor.com
Shahram HadianRepublicanhadian2012.com
L. Dale SorgenIndependentimagineliberty.us
Max SampsonRepublicanNo Website Found
Javier O. LopezRepublicanWebsite Down

For Washington State Lieutenant Governor
NamePartyWhy you should vote for
Glenn AndersonRepublicanglennanderson2012.org
Brad OwenDemocratbradowen2012.com
James Robert DealNo Party Preferencefluoride-class-action.com
Bill FinkbeinerRepublicanbillfinkbeiner.org
Dave T. Sumner, IVNeopopulistNo Website Found
Mark GreeneDemocracy Independentbrandnewelections.us

For Washington State Secretary of State
NamePartyWhy you should vote for
Jim KastamaDemocraticjimkastama.com
David J AndersonNo Party PreferenceNo Website Found
Sam WrightHuman Rightsthehumanrightsparty.org
Karen MurrayConstitutionmurray4sos.org
Kathleen DrewDemocratickathleendrew2012.com
Kim WymanRepublicankimwyman.com
Greg NickelsDemocraticgregnickels.com

For Washington State Treasurer
NamePartyWhy you should vote for
Jim McIntireDemocraticjimmcintire.com

For Washington State Auditor
NamePartyWhy you should vote for
Troy KelleyDemocratictroykelley.com
James WatkinsRepublicanwatkinsforauditor.com
Mark MilosciaDemocraticmarkmiloscia.com
Craig PridemoreDemocraticcraigpridemore.com

For Washington State Attorney General
NamePartyWhy you should vote for
Bob FergusonDemocraticelectbobferguson.com
Reagan DunnRepublicanreagandunn.com
Stephen PidgeonRepublicanstephenpidgeon4ag.com

For Washington State Commissioner of Public Lands
NamePartyWhy you should vote for
Stephen A SharonNo Party PreferenceNo Website Found
Peter J GoldmarkDemocraticpetergoldmark.com
Clint DidierRepublicanclintdidier.org

For Washington State Superintendent of Public Instruction
NameWhy you should vote for
James Bauckmanjamesbauckmanforspi.wordpress.com
Randy I Dornrandydorn2012.com
Don HanslerNo Website Found
John Patterson Blairjohnblairportal.wordpress.com
Ronald L (Ron) Higginswww.higgins-spi-2012.com

For Washington State Insurance Commissioner
NamePartyWhy you should vote for
John R AdamsRepublicaninfojohnadams.com
Mike KreidlerDemocraticmikekreidler.com
Scott ReillyRepublicanscott-reilly.org
Brian C BerendIndependentNo Website Found

Tuesday, July 17, 2012

Sage/DREAM Breast Cancer Challenge Launch

This morning's webinar marked the public launch of the Sage/DREAM Breast Cancer Challenge.  The goal is simple:  given a large data set consisting of clinical covariates, copy number data, expression data, and survival data, can you produce the best predictions of survival time for new patients?  The unstated goal is a little bit more subtle:  so far, there is no compelling evidence that I have seen that microscopic information such as gene expression or copy number data adds anything to predictive power in macroscopic behavior such as the survival time of the patient.  We hope that this competition will encourage people to build better models of disease collaboratively, and produce some of the first evidence that gene expression and copy number data can actually be useful in predicting patient prognosis.  As an example, I will produce an entry that you can feel free to cannibalize, and will write more about it as I evolve it into a better model of disease.  As a Sage employee, I am not eligible to win and so I hope you will modify and adapt this code and produce your own entry, superior to my own!

Before trying my code, you will want to check out the Breast Cancer Competition Getting Started Guide.

From an internal competition we ran that preceded this launch, we learned that the R RandomSurvivalForest package produced the best models of survival given the data, and that in fact the most critical part of the model was actually which data is chosen for inclusion and how it is pre-processed before being handed over to RandomSurvivalForest.

First, I will produce the core of my submission, the model class file.  I will take only the clinical covariates and totally ignore the copy number and expression data, though I will leave a space where they can be included later.  In fact, you will find that this transformation function is the most important part of the entire competition:  victory will hinge not on the best model, but instead on cleaning up the data in the best way possible.

View my Model Class File on github.

Next, I will train and submit the model.  This code is relatively straightforward, but if you are going to modify my model for your own purposes, you will want to make sure that you submit to the public leaderboard instead of the Sage leaderboard!

View my Submission File on github.

As you can see, my Random Survival Forest Clinical-Only model did pretty well!  On the Sage employees leaderboard, model "Sauerwine RSFModel test 3" got a respectable test score of 0.71.  In my next post, we'll see how to improve that further.  At the time of this post, it actually does better than any model on the Sage/DREAM Public Leaderboard, but since you're free to take my model and improve on it, that's not likely to be the case for long!

Good luck, and happy modeling!