Sunday, April 1, 2012

Should the Crowd Pick my Next Video Card?

I helped a friend design a new computer the other day, and part of this process involved doing some research to find the video card that provided the best value for the money.

Conveniently, this website provided benchmark scores and prices for a wide range of modern video cards, and I ultimately found and recommended the one on NewEgg that had the highest benchmark per price.

I was left wondering, though:  could I have saved time by simply selecting the most popular video card or the highest rated video card?  Should I let the crowd buy my next video card?  In this example, I will use the RCurl and foreach packages to download and parse data from NewEgg and compare to the benchmarks from the website above to determine whether or not the crowd makes the best video card decision.

I'm going to state up-front a few assumptions relevant to this study:

  • I should compare the most popular video card of each variety.  Since this is a question of value versus popularity and I am comparing only one video card of each variety, I should choose the most popular one as my example of this type!
  • The benchmark score is the single point of comparison to determine which video card is best.  In reality, there are a lot of great reasons to choose a piece of equipment outside of just performance! 
  • I retrieved my data on March 27, 2012.  The direct analysis should not be used to pick a video card far from this date, as the best value may change! 
  • NewEgg's price, ratings, and number of reviews are taken as the market price and the confidence of an expert panel in the product.  
  • For my purposes, value equals performance divided by price. 

I like this example, because part of this task involves drawing data from from the web and parsing it.  The following source code retrieved my price data from NewEgg:

library(RCurl)
library(foreach)

# Cut and pasted from videocardbenchmarks website
NameVsPerformance <- read.table("~/Desktop/performance.dat")

Prices <- foreach(i = 1:nrow(NameVsPerformance)) %do% { 
 
  # I manually did a search, and got these 

  # parameters from the query string.
  result <- getForm("http://www.newegg.com/Product/ProductList.aspx",
                    .params = c(Submit = "ENE",
                                N = "100007709",
                                DEPA = "0",
                                IsNodeId = "1",
                                Description = gsub("_", 

                                                   " ", 
                                                   NameVsPerformance[i,1]),
                                bop = "And",
                                Order = "REVIEWS",
                                PageSize = "1"
            )
          )
 
  # Some contingencies to make sure 

  # that my search was successful
  match <- regexpr("We have found 0 active items that match", result)
  matchstr <- regmatches(result, match)
 
  if (length(matchstr > 0)) {
    return(0)
  }
 
  match <- regexpr("Reduce the number of keywords used", result)
  matchstr <- regmatches(result, match)
 
  if (length(matchstr > 0)) {
    return(0)
  }
 
  match <- regexpr("<strong>0 items</strong>", result)
  matchstr <- regmatches(result, match)
 
  if (length(matchstr > 0)) {
    return(0)
  }
 
  # Regular expression out the price
  match <- regexpr("<strong>[[:digit:]]+</strong><sup>.[[:digit:]]+</sup>", result)
  matchstr <- regmatches(result, match)
 
  # And format it properly
  if (length(matchstr) == 0) {
    return(0) # Price is 0 if none was found.

              # I'll drop these later.
  } else {
    return(gsub("[A-Za-z/<>]", "", matchstr[1]))
  }
 
}


It's a pretty trivial change to grab the Rating (1 to 5 eggs at NewEgg) and Popularity (number of reviews can be taken as a proxy for popularity).  I just change the regular expression!

  # For Rating
  match <- regexpr("eggs r[[:digit:]]", result)
  matchstr <- regmatches(result, match)

  # For Number of Reviews (proxy for Popularity)
  match <- regexpr("[(][[:digit:]]+[)]</a>", result)
  matchstr <- regmatches(result, match)

And now I'm ready to make some plots!  First, let's take a look at price versus performance.

# Collect the card data.
cardData <- cbind(NameVsPerformance,
                  as.numeric(Prices),
                  as.numeric(Ratings),
                  as.numeric(Buyers),
                  as.numeric(Prices)/NameVsPerformance[,2])
colnames(cardData) <- c("Name", 

                        "Performance", 
                        "Price", 
                        "Rating", 
                        "Popularity", 
                        "Value")

# Filter out the not found cards
cardData <- cardData[cardData[,"Price"]>0,]
cardData[,"Value"] <- 1/(cardData[,"Value"])


# Look at price vs performance
plot(cardData[,c("Performance", "Price")])



Plot 1:  Price vs Performance


In plot 1, what we see is not surprising: an exponential (or at least nonlinear) increase in price as you go to higher and higher performance.  When I fit it to an exponential curve, it said that price roughly doubles for every 1250 performance points.  If this plot was linear, it really wouldn't matter what video card we picked out.  There are good reasons to believe this should not be linear:  at the bleeding edge of technology, the dual effect of customer enthusiasm and limited supply makes cards very expensive.  At the low end, competition is based on a limited number of customers looking for a component compatible with their older hardware or interested more in a really low absolute price than in value or performance.  When I buy a computer, I want to make sure I get the most for my money.  Let's take a look instead at performance versus value.


Plot 2:  Value vs Performance


Plot 2 is again not surprising.  We see that older technology, lower performance video cards cost more per performance because they may no longer be in production and customers are deciding based on compatibility or absolute price.  Meanwhile, the newest technology is also expensive because everyone wants the cool new item on the market but supply is limited.  In the middle, we see a lot of great deals to be had on the last iteration of technology where performance is great, but the prices have been dropped to make way for the new line.  This is a common phenomenon, visible in everything from cars to video cards--everyone knows that a good time to buy is when they're trying to make room for the new stock!  Incidentally, the extreme of value you see is a refurbished GeForce GTX 465, which was out of stock but had been priced to move at $109.  Let it be known that these savvy buyers got a great deal!  What I really wanted to know, however, was whether I can trust other consumers to make the best possible choice in value.  Let's look at customer ratings and popularity versus value.

Plot 3:  Value vs Rating


From plot 3 it does appear that NewEgg's customer ratings correlate with value, but with considerable scatter (cards with a zero rating had no reviews at all).  There are no one- or two-egg rated products here because I chose the most popular example of each video card type, and nobody wants to buy a buggy graphics card!  Indeed, benchmark scores are not the only reason to buy a video card.  Maybe you're using an older computer that can't support the newest ones, or you don't care about video performance and just want a low price.  So, simply sorting by "best rating" may not be the best way to choose a video card because you don't know what the reviewers' expectations were in the first place.  Let's look at one more plot, value versus popularity, before making some closing remarks.

Plot 4:  Value vs Popularity (Number of Reviews)

Again, the results shown in plot 4 are all over the board.  Interestingly, the most popular video card (which turns out to be a Radeon HD 6850) is actually a really good value!  In fact, it's probably a better value than shown because at the time I searched for this card a rebate brought it from $140 down to $125!  This plot actually shows another interesting phenomenon:  the fact that most video cards have few reviews indicates that a lot of people are swayed to purchase an item based on the "most popular" sort.  For instance, the buyers of the second and third most popular video cards did not get a very good value for their money!  Some video cards appear to be runaway best sellers, while other excellent values are left in the dust.  Indeed, the most popular card is actually a great value for the money, but similar or better value cards do exist that might be otherwise neglected by a customer searching by popularity alone.

From this simple example, I've learned a few things about buying technology online:
  • Do your own research!  You don't know what factors were most important in buying technology for the reviewers.
  • Very old technology and very new technology tend to be a worse value than last-generation bargains.
  • The "best rating" and "most popular" sorts at websites sway a lot of people into buying a product.  Don't jump off the proverbial cliff just because everyone else is, but looking at the outliers in popularity is a good place to start!
If I were an online retailer, I might add this:
  • Consider giving value-conscious consumers metrics other than what's popular to sort by: for instance, benchmark score per price.
And if I were an R programmer interested in some new ways to use data:
  • Rcurl and regular expressions taken together provide a powerful tool to draw data from the web.
Finally, to answer my question:  Should the crowd pick my next video card?  Maybe not, but what's popular is definitely a good place to start looking!

2 comments:

  1. curious if your base performance table had ratings for any of the cards mono and also in crossfire/sli? Esp since the 6850 was the top value contender, would be interesting to see where the crossfire solution comes in... though you might have to factor in a more expensive PSU.

    ReplyDelete
    Replies
    1. True, any direct comparison would be complicated by the fact that you have to buy a motherboard that supports SLI and a more expensive PSU, but there's also the complicating factor of two cards together having an overhead cost for communication. Because of that extra hit to processing power, no two computers together will be able to achieve higher efficiency that the two alone and as such my value metric can only be reduced by SLI.

      I see SLI as something you turn to when you have specific needs that one card alone can't provide. Let's try an example. My friend buys a GeForce GTX 680 (4070 performance points at $500), and I'd like to outperform him on a budget. So, I'll try buying two Radeon HD 6850s (2751 performance points at $125 after rebate). NVidia claims that they can get "up to" 90% efficiency on SLI, which is probably only realizable in tech demos. 70% is a more realistic improvement. So, for $250, you could get 2751x170%=4676 performance points, cleanly beating the GeForce GTX 680.

      Before we get too excited about our savings, remember that we're also paying for a more powerful PSU, a motherboard that supports SLI, probably an improved cooling system because video cards run hot, decreased lifetime of the cards, again because they're hot, and the additional power to run two cards versus one multiplied by the total lifetime of the computer. So, in the end it's probably a wash, and indeed the value metric has decreased by a factor of 170%/2.

      That said, the performance does cleanly beat that of the high-end card at half the price (for the cards alone). So, for anyone with specific performance needs, SLI can theoretically be a better value than high-end cards!

      Delete