Find the secrets to infinite income, and automate it!
30 Jun
We are looking for an exceptional PHP developer to join our talented team here at Connected Ventures. You will be working across our network of sites which include; CollegeHumor, Bustedtees and TodaysBIGthing. Combined they do over 5 million page views a day and reach over 500,000 visitors. This is an on site, full time position at our office in Union Square, New York City. Email your resume and PHP code samples to techjobs@connectedventures.com. Please no recruiters or development shops.
Requirements:
* PHP
* MySQL
* Strong understanding of Object Oriented Programming
* MVC Frameworks
* Caching / Scaling (Memcached, APC)
* Subversion
* E-commerce
* Javascript, HTML/CSS, XML
* Linux or Solaris experience a plus
What’s it like to work here?
It’s confusing when you can’t tell the difference between “work” and “play;” “friends and co-workers;” “office” and “cool place to hang out.” Confusing is about as bad as it gets here at Connected Ventures, where Nerf-gun fights happen, ‘casual Friday’ starts Monday at 10 and ends Friday at 6, and “work” means collaborating with your buddy at the dual screen computer next to yours. Now, don’t assume, however, that it’s all just fun and games here: the CollegeHumor family labors diligently all the time, they just happen to enjoy their business more than most people.
24 Apr

We launched some major changes to PriceAdvance yesterday. We improved merchant coverage from 15 websites to over 150 (full list), added support for shipping price, stock availability and rebate offers. Download it for Firefox (or IE) and leave a review!
CollegeHumor, like many websites that want to reduce database requests and speed up processing, uses memcached as a caching layer. This article will explain our software implementation and discuss a number of things we learned along the way. If you read my blog you will know I’m a PHP guy. CollegeHumor is also coded in PHP. None of the concepts covered in this post will be specific to any particular programming language. There will be a few simple PHP examples.
Let’s jump right into the class setup… Our cache class is a completely static class and can’t be instantiated on its own. When our application starts up a create_instance function is called which will create a connection to memcache in a static object stored in our class. Shortly after the instance is created, our config file is loaded and we add the servers that make up our memcache pool. Really nothing fancy happening thus far…
Besides the create_instance and the add_server functions our class only has three main public functions; get, set and delete. When I say main, those are really the only three functions we use throughout our application, everything else is for internal use by the cache class. There are also a few logging and stats functions which aren’t important for this article.
Set Function
public static function set($key, $value, $expire = -1) { if ($expire == -1) return true; // there was no key passed, to be safe we will create one if (!strlen($key)) $key = md5($value); $value = self::falsify($value); self::instance_set($key, $value); // set the instance cache self::$memcache->set($key, self::dogpile_set($value, $expire), MEMCACHE_COMPRESSED, $expire + self::EXTENDED_LIFE); // set memcache }
Let me explain what’s going on and why… The set function takes 3 parameters: $key, $value and $expire.
1) Check the $expire value. If the expiration is less than zero (we default to -1) we assume this wasn’t meant to be cached and we just return true.
2) Make sure there was a $key passed. If there isn’t a key, we play it safe and set $key equal to the MD5 of the $value. Otherwise, there is a chance of key collisions. We definitely don’t want that.
3) We run the $value through a function called “falsify”. We found that storing the actual value of “false” caused a number of issues. The first being that we interface very closely with our database layer. If our get function returns false, the database layer assumes it didn’t find a value in cache and does a query (more on the solution to this in the get section). Second, the get function will return false when it fails to find data. The solution was the falsify function. It looks at the value and if it’s false the value is turned into a special string we can later identify. Something like “<-F4LS3->“. Basically a value that would never normally be stored. Later on in the get function we will use this to identify the difference between a stored false and not finding results.
4) We created a local instance cache that stores any data we get/set in a static array. This helps prevent unnecessary subsequent requests for data we may have already retrieved. We do this by passing the key/value to an instance_set function which stores the data by its key.
5) Sites with high loads are often subject to the dogpile effect. Basically, if something falls out of cache and two or more users make the same request and receive a miss from memcache, they end up performing the same operation (usually a database query) to refresh the data. For smaller sites the extra database work is fairly minor. But larger sites under heavy load often can’t afford this. Think about something falling out of cache on the homepage of a large site. This could generate hundreds of the same query, locking up tables and eventually bring the database to a crawl. I have seen it, it’s not that awesome.
The way we deal with this problem is by passing the value and expiration time to a function called set_dogpile. This function is simple and you will understand its purpose later on… All this function does is create an array with the key “v” and stores the $value and a key called “t” where we store the $expire and then returns the array.
private static function dogpile_set($data, $expire) { return array('t' => time() + $expire, 'v' => $data); }
6) Finally we can write our data to memcache! We set the key to the $key we were originally passed and the value to the array that was returned from dogpile_set. The expiration time is slightly more complicated. Our cache class has a constant variable called “extended_life”. We use this in conjunction with the dogpile effect prevention. Our extended life is set to 300 seconds. This will be better explained in the get section below, bare with me for now. Set the expiration to the $expire value passed in + the extended_life.
*Its important to note that the version of data stored in the instance cache (Step 4) is the original data, not the dogpile array.
Get Function
public static function get($key, &$found = null) { if(is_array($key)) { // Not used for the purpose of this article // $value = self::get_complex($key); } else { $value = self::get_simple($key); } if($value === false) { $found = false; return false; } else { $found = true; if(is_array($value)) return $value; // complex response (not used in this article) else return self::defalsify($value); // simple response } }
The get function takes two parameters: The first is the $key and the second is an optional referenced parameter called &$found (default to null).
Our get function can take two types of keys. Either a single key request as a string or multiple key requests as an array. The first thing we do in the get function is determine the key type. If it’s a string we send the key off to the “get_simple” function. If it’s an array it goes to the “get_complex” function. For simplicity reasons we are only going to focus on the get_basic function.
If you do explore get_complex on your own, keep in mind that memcache can take an array of keys and retrieve those results in a single request. You will also need to keep track of what is and isn’t found from the instance cache. That way we can do lookups from memcache on the missed keys. There is an example of get_complex at the very end of this article, it should be easy to understand after you grasp get_basic.
Get Simple
private static function get_simple($key) { $value = self::instance_get($key); // instance cache returned nothing, look it up if(!$value) { $value = self::dogpile_get($key, self::$memcache->get($key)); if(!empty($value)) { // results were found in memcache, lets add it to the instance cache self::instance_set($key, $value); } else { // nothing was found, return return false; } } return $value; }
1) First the key is passed to the “get_instance” function. This function checks our static instance array to see if we have already performed a get or set for that key. If data is found, return it, otherwise return false.
We are back in the get_basic function now. If the value returned from get_instance isn’t false, we return the value back to “get”.
2) If the value returned from get_instance is false we need to ask memcache for the data. You should do that now.
Whatever the response is from memcache we will pass the key and the data to the dogpile_get function.
private static function dogpile_get($key, $data) { if(!empty($data)) { $value = $data['v']; if($data['t'] > 0) { if(time() >= $data['t']) { $data['t'] = time() + self::DELAY; // Update the cache time // set the stale value back to memcache for a short 'delay' so no one else tries to write the same data self::$memcache->set($key, $data, MEMCACHE_COMPRESSED, self::DELAY); return false; } } return $value; } return false; }
a. First we need to make sure the value isn’t an empty array. If it fails that check we assume the data is corrupt or snuck into memcache and we return false. If everything looks good, extract the value from the “v” index and the time from the “t” index.
b. Make sure the “t” value is greater then 0. If its not, return the “v” value.
c. Check if the current time is greater than or equal to the “t” value. This is where our dogpile magic happens. The extended_life we set earlier gives us an extra few minutes to let the application determine what’s stale and what’s valid. The “t” value is the actual time we want the data to expire even thought we told memcache to store it for an extra few minutes. Since we got this far, it means the data is meant to expire. To prevent everyone and their mother from reloading the cache we are going to temporarily store the old stale data back in memcache while this user has the opportunity to refresh it.
We need to update the “t” value of the data array that was passed in to be the current time + $delay. Delay is a constant integer; we have out set to 30 seconds (change as needed). This delay is our dogpile buildup prevention. You will see how in a moment… Now we write the stale data back to memcache. Set the key to the $key that was passed in, the value to our updated $data array and the expiration to $delay. Now for the next 30 seconds all subsequent requests will receive the stale data (shouldn’t be a big deal for most sites). The user who found the stale data now has 30 seconds to complete an update. If she fails to do so, the next user to make the request will have that opportunity. Return false, because we want to the request to think we found nothing and fetch fresh data.
3) Now that we have our response and have dealt with dogpiling we need to analyze the response. If the value from get_simple is false we need to set &$found to false and return false. The found variable does exactly as you probably guessed. It tells us if the get function really found something (even if the value was stored as false). It’s a referenced variable, so we can easily use it outside of the cache class.
If the response is not false we set &$found to true and pass the data off to the defalsify function, which compares the value to our constant “false identifier” (<-F4LS3->). If the value equals the identifier, return false. Now the “get” function can finally return the results.
Delete Function
The delete functionality is simple compared to the get/set functionality. The goal is to delete the data from the instance cache and from memcache. This is easily shown with an example.
public static function delete($keys) { if(!is_array($keys)) $keys = array($keys); if($keys) { foreach($keys as $key) { self::$memcache->delete($key); self::instance_delete($key); } } }
Conclusion
I hope this article gives you a better insight into some more advanced caching techniques. I, nor CollegeHumor claim to have invented any of these techniques or methods. This was a high level overview of how our caching layer functions and I hope someone learned something from it. I know it might be difficult to follow all the operations, unfortunately I cant post the entire class code. But, if you have any questions about implementation, caching or anything PHP, feel free to contact me or post a comment.
Get Complex
private static function get_complex($keys) { $results = array(); $missing = array(); $num_keys = count($keys); // lets see if we can find any of these in the instance cache first for($x = 0; $x < $num_keys; $x++) { $key = $keys[$x]; $value = self::instance_get($key); if(!$value) { // nothing was found in the instance cache, lets create a list to look them up in memcache $missing[] = $key; } else { // we found what we were looking for in instance cache! $results[$key] = self::defalsify($value); } } if(empty($missing)) return $results; // we found everything we need in instance cache // look up anything thats missing in memcache $values = self::$memcache->get($missing); if(!empty($values)) { foreach($values as $key => $value) { $value = self::dogpile_get($key, $value); if($value) { $results[$key] = self::defalsify($value); self::instance_set($key, $value); } } } else { // we didnt find what we needed in memcache return false; } return $results; }
12 Feb
In one of my recent posts someone asked in the comments what I thought about Drupal. It felt off topic, so I decided to write a post about Content Management Systems.
I personally don’t have much experience with open source (or enterprise) CMS. I had a short stint with Joomla and eventually ditched it. I have become a bit of a development control freak. More often than not I prefer to roll my own system. This isn’t for everyone and doesn’t make sense for all projects.
My problem with pre designed Content Management Systems, is it tries to solve a problem that can’t always be solved with single solution. A fits everything package usually comes with overhead, constraints and a little bit of a development black hole. These constraints can lead to excessive amounts of “work arounds”, hacks and a lack of understanding. On the other hand a pre packaged solution comes with an entire community of developers, tutorials, patches and plugins. Having resources like that can save time and money. It really comes down to what’s best for your project.
Choosing to build your own system or use an existing one depends on the scope of the project and your skill level. These are a few key points I would pay attention to while making a decision:
For a site like CollegeHumor, we needed to understand and control every line of code. Everything from the framework to the content management application is all home grown. But like I said before this doesn’t make sense for everyone. If you are building a small intranet site and can do the bulk of the work in a CMS, there is no reason to reinvent the wheel. If your site handles several million hits a day you may not want the extra overhead that a CMS comes with maybe rolling your own makes sense. Evaluate what you are working with and make a choice that best fits your project.
One of the best Facebook features is the ability to invite friends to events. The most annoying part about this system is the inability to invite more then 100 friends at a time (Facebook: wheres the invite all!?). In the past I have clicked one by one and sent invites in blocks of 100. Today was the day that I had enough! I made a bookmarklet to automate the bulk of the process. Unfortunately it can’t do all the work for you, but it will automatically adds users to the invite in blocks of 100 with a single click. Heres how it works…
Step 1) Log in to Facebook and browse to the event “Invite people to come” page.
2) Setup a new bookmarklet with this code (or optionally paste in the address bar on that page):
javascript:var e=document.createElement('script');e.setAttribute('type','text/javascript');e.setAttribute('src','http://ajax.googleapis.com/ajax/libs/prototype/1.6.0.2/prototype.js');setTimeout( function() {var friends = $$('#friends_list span input'); var offset = parseInt(prompt('Please enter an offset')); if(offset > friends.length){alert('All done!');}for(var x = offset; x <= offset + 100; x++) {friends[x].onclick();}}, 3000);void(0);
3) Click the bookmarklet and enter 0 when prompted. (0 represents your current offset, you will need to keep track of this.)
4) Click invite
5) Repeat and increase the offset by 100 until all friends have been sent invites
26 Jan
The other day on College Humor and Bustedtees we discovered a fairly serious security vulnerability. Fortunately because of the layout of our code nothing malicious could be exploited (more in another post). We thought our “push” script was skipping .svn folders, it turned out to not be operating correctly.
The hack is simple, documented and easily overlooked. Once the vulnerability was found, I did my best to exploit the shit out of it. I did so very successfully. I even tried it on some other popular websites and was able to access files I should have never been able to. In one instance I gained limited access to a sites admin. I emailed all of these sites to notify them of the security vulnerability. They were most gracious, once company even sent me a gift card!
The hack obviously starts in .svn directory, specifically at the entries file. You can access this file by browsing to:
http://www.somedomain.com/.svn/entires?
This document contains all of the files and folders svn manages in that directory. In some instances you can locate admin directories and the same thing applies…
http://www.somedomain.com/admin/.svn/entries?
So at this point all you have are a bunch of file names. Sometimes you can get some fun information and access to files that were meant to be hidden. Security by obscurity is not a solution, protect files you don’t want the public to access!
Now this is where things get interesting… Any file that has been checked in I can now execute. Either directly or through an svn folder that holds file revisions. Pick any file in the list and browse to:
http://www.somedomain.com/.svn/text-base/filename.php.svn-base
In this example the PHP file will be put through the PHP parser and executed. The results really depend on the layout of the code. Depending on the way the coder uses includes/requires decides on how much access and what kind of output you get. If a file is included using a relative path, the includes won’t be included since your working directory is the text-base dir. If they are using absolute paths, includes will continue through the execution. In one of the sites I poked around in, I found their admin wrapped through some kind of lite template/framework. I was able to bypass the system and go directly to the file without a using password. From there I had limited additional actions, but I still gained access to where I wasn’t welcome.
To do some additional testing I setup a test site to play with other file types. I found that files without a PHP extension, for example .inc files were NOT parsed and instead the contents were spit out to the page. In this test case the .inc file contained passwords and locations to databases. The possible additional damage I could cause from here is endless…
I’m not the first one to discover this hack, although a quick search only revealed obvious prevention methods. Protecting your site is really simple. Add this to your htacces file:
RewriteRule (\.svn)/(.*?) - [F,L]
Another option is blocking .svn folders through your web server config file for all sites.
Update
A number of people have mentioned a better prevention technique… They recommend doing an SVN export instead of a checkout or rsync. This was something I thought about after discovering the exploit. But I am by no means a system admin or the person who deals with that stuff at work. I’m glad these people were able to confirm that idea. Thanks!
function array_flatten_recursive($array) { if($array) { $flat = array(); foreach(new RecursiveIteratorIterator(new RecursiveArrayIterator($array), RecursiveIteratorIterator::SELF_FIRST) as $key=>$value) { if(!is_array($value)) { $flat[] = $value; } } return $flat; } else { return false; } } $array = array( 'A' => array('B' => array( 1, 2, 3, 4, 5)), 'C' => array( 6,7,8,9) ); print_r(array_flatten_recursive($array));
– Response:
Array (
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
[5] => 6
[6] => 7
[7] => 8
[8] => 9
)

The holiday shopping season has been really good to us!
14 Dec
“For the holiday season through December 12, $19.44 billion has been spent online, essentially the same level compared to the corresponding days last year. For the twelve days beginning with December 1 (Cyber Monday), the kick-off to the heaviest part of the online shopping season, sales totaled $8.26 billion, up 3 percent versus year ago. However, the most recent work week (December 8-12) saw e-commerce sales decline by a marginal 1 percent, although December 9 emerged as the highest online spending day on record.”
— comScore - regarding online sales in 2008
via (Fred Wilson)