Introduction to memcached
Matt Sergeant <matt@sergeant.org>
Memory Based Object Caching
The Problem
- Databases are slow
- Requires disk spin
- Requires SQL parsing
- Requires DBI/ODBC/JDBC calls
- Memory is fast
- But we want MORE space
- Solution: use all machines in a cluster
Memcached
- key -> blob cache
- Simple network protocol "GET key\r\n"
- Runs on all machines in a cluster
- Clients select "right" server to query
- Used by:
- LiveJournal
- Facebook
- Slashdot
- Wikipedia
Running Memcached
memcached -d -m 128 -p 11211
Simply start it on all servers.
Using memcached
sub does_user_exist {
my ($name) = @_;
if ($::Cache->get("user_exists:$name")) {
return 1;
}
my $search = $::LDAP->search(base => "ou=People,dc=company,dc=net",
scope => 'sub',
filter => "(uid=$name)",
);
if ($search->is_error || ($search->count == 0)) {
return 0;
}
$::Cache->set("user_exists:$name", 1);
return 1;
}
It's that simple?
- Yes
- You can also do
$memd->get_multi(@keys)
- And
$memd->add($key, $value)
only sets if unset
- And
$memd->replace($key, $value)
- Also has atomic incr/decr
- No support for key iteration
- Couldn't be O(1) so very bad idea to implement
How it all works (client)
- Client selects server based on key
$server = $servers[ hash($key) % num_servers ];
- It's that simple!
- As long as all clients use same algorithm
How it all works (server)
- Hash storage of key => value + expiry + LRU chain
- When key fetched:
- return NULL and delete if expired
- else move to end of LRU chain and return value
- When key set:
- If no memory left, expire from top of LRU chain
- Insert at end of LRU chain
All operations atomic
- Refcount on items
- Refcount++ on GET, Refcount-- when "delivered"
- Expire/delete is just "refcount--"
- If refcount is <= zero at end of operation, really delete
Increment and Decrement
$memd->incr("key")
$memd->decr("key")
- kinda ugly - it doesn't do "set if unset":
my $ret = $::MemCache->incr($key, $value);
if (!defined $ret) {
$value = 1 unless defined $value;
$ret = $::MemCache->set($key, $value);
}
Use Persistent Connections
$::Memd = Cache::Memcached->new(...);
- Servers will be marked dead if you run out of connections
- memcached -c 1024
- Use a high -c value for a big service
- epoll/kqueue will take care of the details
Unusual Usage
- set("seen_$ip" => $filename)
- LRU takes care of expiry
- Command line tool to find spams from a given IP
Unusual Usage
- incr("country_CN", 1)
- Counts countries we see spam from
How we could use at ML
incr("spam_$IP", 1)
-
incr("ham_$IP", 1)
- Next time we get an email from
$IP
we check the spam/ham ratio
-
set("virus_$MD5", 1)
(or incr)- Next time we see that MD5, check the cache
Complex Issues
- Security (or lack thereof)
- Server list maintainence
- Server availability
- Cache migration on server list change
Basic Hashing
Basic Hashing
Consistent Hashing
- Servers map to a continuous line 0..1
- Keys map onto that line
- Closest server (binary search) is the one used
O( log N )
- (demo on whiteboard)
Monitoring
- "stats" command gives counts of:
STAT uptime 2398603
STAT rusage_user 1.029843
STAT rusage_system 89.927328
STAT curr_items 0
STAT bytes 0
STAT curr_connections 1
STAT connection_structures 2
STAT cmd_get 0
STAT cmd_set 0
STAT get_hits 0
STAT get_misses 0
STAT evictions 0
STAT bytes_read 7
STAT bytes_written 0
STAT limit_maxbytes 134217728
STAT threads 4
Monitoring Stats
- Point in time only
- Need to produce mrtg tracking of this
- Monitor mrtg to know when your caches are full