Emulating enums in python

Enums are common constructs in languages like C++ and Java that can improve readability (and maybe performance?) when dealing with status codes and conditional statements over constant sequences, such as in switch-case statements (which, as a matter of fact, also don’t exist in python :P ).

For instance, a days of week enumeration in Java would be like this:

enum DaysOfWeek {
        MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY
}

In this example, each element of the enumeration is represented as an integer, starting from 0. So, MONDAY would be 0, TUESDAY == 1, WEDNESDAY == 2, and so on.

This enum could be used like this, for example:

void parseDay(int dayCode){
    if(dayCode == DaysOfWeek.MONDAY){
        //do something
    } else {
        //do something else
    }
}

I always wondered how to use enums in python. They don’t have a buil-in type for enums, but I found a very simple, elegant and python-ish way to achieve a similar effect. The solution is shown below:

(MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY) = range(7)

What’s happening here? In the left-side of the statement, a list of variables which are named after days of the week is being assigned to a list of values in the right side of the expression. Instead of manually writing the list [0, 1, 2, 3, 4, 5, 6], the range built-in function is used. Quick and simple.

The parseDay function in python would be like this:

def parse_day(dayCode):
    if dayCode == MONDAY:
        # do something
    else:
        # do something else

This hint was extracted from the Dive Into Python book (page 24), which is a pretty good guide to python.

Posted in Uncategorized | 4 Comments

A Glimpse on Storage Area Networks

Traditional approach for storage - Information Islands

The conventional approach for storage in organizations is to have a set of application servers with storage interconnected through a LAN.  In this approach, data is shared between application servers using ad-hoc methods which vary according to application’s requirements. Additionally, this LAN which is used to transfer data is also used to exchange application messages with clients.  Some of the problems of the conventional approach are:

  • Unconnected islands of information
  • Duplication of data
  • Inconsistency between copies
  • Extra operational complexity to replicate and manage replicated data

Storage Area Network - Eliminating Islands of information

In order to address these problems a novel approach for data storage is proposed: Storage Area Networks (SANs). The key point of SANs is to decouple applications servers from the storage. The idea is to provide universal interconnection between computers and storage devices, thus enabling sharing and a better management of data. This is typically achieved with the use of a dedicated fiber-based network infrastructure.

SAN is a generic term and can refer to many technologies, protocols, hardware and software. A formal definition of SAN (extracted from [1]) is:

“A storage area network (SAN) is a network of interconnected computers and data storage devices. The term SAN specifically includes both the interconnection infrastructure and the computers and storage that it links together.”

Benefits of SAN

  • Cost savings
    • No extra temporary space to stage data
    • Reduce costs with tape and robotic media handlers, since now they can be transparently shared among many applications/services
    • Reduce operational costs with copying data from place to place, etc
  • Operational benefits
    • There’s no need to devise mid schedule data transfers between pairs of servers.
    • There’s no need to worry about whether copies of data being used by two computers running different applications are synchronized (i.e., have exactly the same contents), because the two computers are working from the same copy of data.
    • Possibility to distribute data in different zones, since storage is decoupled from application servers
    • Removes I/O traffic from the LAN, increasing performance of existing applications
    • Simplifies back-up, since it can be done in the SAN-side, instead of in the applications

Desired properties of a SAN

  • A SAN must be highly available, because highly critical data will be stored.
  • A San must be scalable – it’s I/O performance must grow as the number of interconnected devices grow.

SAN – Software vs Hardware

“Hardware makes SANs possible; software makes SANs happen” [1]

SANs are typically associated with hardware/networking technologies, however, it’s only possible to take full advantages of highly-available and high-performance SANs through software. Some of the capabilities that can be leveraged through software are:

  • Sharing tape drives and online storage devices
  • Application failover
  • Sharing data between different clients and servers
  • Direct Data Movement between Devices

SAN vs Network-Attached-Storage (NAS)

SANs are typically mistaken with NAS, since both involve sharing data across many storage devices over a network. However, Network-Attached Storages normally operates at the file-system level, while SAN goes beyond and operates at the block level. This means the file systems reside entirely in the client/application machines, that access blocks through the network. Another difference is that current NAS solutions are based on Ethernet and TCP/IP, while NAS are based on dedicated fiber channels. As a rule of thumb:

  • Choose NAS for simplicity of data sharing, particularly among computers and operating systems of different types.
  • Choose SAN for the highest raw I/O performance between data client and data server. Be prepared to do some additional design and operational management to make servers cooperate (or at least not interfere) with each other.

The future

Even though SAN is not a new concept, its potential has not been completely explored. There still a lot of research and development going on and some companies such as EMC, Nexenta, NetApp, among others are exploring this potential. Some of the coming challenges/solutions in this area involve:

  • Interoperability between existing solutions and adoption/improvement/creation of standards
  • Smarter Storage and Appliances
  • Support for Heterogeneous Computer Systems
  • Data Storage as a Service / Cloud Storage

This text is based on [1] – Learn more on: http://snia.org/education/storage_networking_primer/san

References

[1] What Storage Networking Is and What It Can Mean to You
from Storage Area Network Essentials, by Richard Barker & Paul Massiglia.
Copyright© 2002, John Wiley & Sons, Inc. New York. Used by Permission.

Posted in Uncategorized | Leave a comment

Enabling deduplication in a distributed object storage

In this post I will describe the initial architecture of a distributed object storage that supports deduplication: both in the object level and in the block level. This architecture is one of the main contributions of my master thesis on distributed content-addressable storage over a consistent hashing ring.

I have chosen the OpenStack Object Storage (Swift) as the base storage system for my prototype because it has an architecture based on consistent hashing, and its code-base is available as open-source. Moreover, it’s a prominent cloud infrastructure project used by many organizations and I would like to contribute the outcome of this work to Swift by enabling deduplication functionality.

However, this work is intended to be generic and the techniques used in it may be applied to any other object storage system based on consistent hashing (Dynamo-like, DHTs, etc).

Terminology

Object: By object I refer to any arbitrary-sized binary object (blob), such as text files, images, virtual machine images, etc

Block: A binary chunk of data of fixed or variable size that is usually a part of a larger object. In traditional filesystems block sizes are typically small (eg. 4KB) and optimized for the underlying physical storage (such as hard disk).

Consistent hashing: Technique to partition a keyspace among a distributed set of nodes, which enables efficient location of keys and simplifies fault tolerance (read more: http://www.martinbroadhurst.com/Consistent-Hash-Ring.html).

Deduplication: Mechanism to identify duplicate data in a storage system and store only one (or few) copies of the duplicated data, thus, saving storage space.

Content-addressable storage (CAS): Technique to address stored object by its contents, rather than its location (physical or logical). This is typically done by hashing the contents of an object and producing an unique “fingerprint” that will be used to identify/locate the object.

Background on Openstack Swift

Swift is a decentralized object storage system where clients can add, retrieve, update and delete objects indexed by a key. Swift’s architecture is similar to Amazon Simple Storage Service (S3), which builds on the famous Dynamo key-value distributed data store.

Swift uses a consistent hashing ring to distribute data among a set of nodes. Thus, each node is responsible for a range of the storage namespace. In order to reach an object from its key, the key is hashed and the request is routed to the node responsible for the range that contains this hashed value. In order to increase fault tolerance, availability and durability each object is replicated across multiple nodes in the ring.

For more details on Swift architecture, please check here: http://swift.openstack.org/

Background on content-addressable storage/deduplication

For a detailed motivation and background for content-addressable storage and deduplication in distributed storage systems, please read my previous posts: here, here and here.

Object-level deduplication

The main idea is to leverage commonality between stored objects by storing only one copy of duplicated objects. This type of deduplication is suitable for use cases where the same objects are stored multiple times (but with different identifiers or containers), such as in software repositories, archiving, backup, etc.

In order to enable object-level deduplication on Swift an additional “content-addressable storage ring” (CAS-ring) is created in addition to the usual object ring. Both rings are distinct overlays on the top of the same physical nodes. In the CAS-ring, objects are indexed by the hash of its contents. Objects stored in the CAS-ring are unique and immutable, since the key of an object is a cryptographic digest of its contents. Thus, if an object’s contents are changed, the key of the object in the CAS-ring will also change.

Deduplicating Objects

In order to deduplicate an object, a secure cryptographic hash function, such as SHA-1, is used to calculate the “fingerprint” of the object. Then, it is verified if the object is already present in the CAS-ring (duplicated objects will map to the same key). If not, the object is stored in the CAS-ring. The object ring will not store the contents of the object, but instead a metadata reference to the content-addressed object in the CAS-ring.

The deduplication can be done inline(when the object is being inserted/updated) or offline(background post-processing). If it’s done inline, there may be a higher latency for the insert operation, since the whole object needs to be hashed after being transferred to the node responsible for it. If the deduplication is done offline, the insert operation remains unchanged, and a background process will deduplicate recently added objects. Another benefit of offline deduplication is that the resource consumption can be limited not to interfere with client operations in the object storage. However, with this latter approach, additional temporary storage space is needed, to store objects before they are deduplicated.

I have chosen the post-processing deduplication method since it allows a better control over the resources while it doesn’t incur additional overhead to the insert operation.

Retrieving objects

When retrieving a deduplicated object, an additional level of indirection is added: instead of directly fetching objects from the object ring, a client will first need to get the content-address of that object from the object ring, and then fetch the object’s contents from the CAS-ring. For example:

  • Client wants to retrieve object “xpto”:
    • Get metadata for object “xpto” in the object ring
      • Object ring returns fingerprint of object “xpto”: “6b4b0d3255bfef95″
    • Get object contents for object “6b4b0d3255bfef95″ in the CAS-ring

However, I do not believe this additional level of indirection will significantly affect performance of the get operation, as the object transfer will correspond to a major part of the operation latency. I plan to evaluate this overhead for different object sizes.

Removing objects

In addition to an object’s contents, the CAS-ring will also store a reference count for the content-addressed object. When an object is removed from the object ring, its reference count is decremented in the CAS-ring. When the reference count for a particular object in the CAS-ring reaches zero, the object is deleted from the content-addressable storage.

Replication

Swift’s replication mechanism remains the same in both the object ring (metadata) and in the CAS-ring (object’s contents). Thus, it should be ensured that when an object is deduplicated, at least N replicas of the deduplicated object exist in the CAS-ring (where N is the configured replication level).

Block-level deduplication

In order to achieve even higher levels of deduplication it is possible to divide objects into chunks of data (blocks) and apply the deduplication procedure to each block. Many studies have shown that deduplication can save up to 80% of storage space when applied to virtual machine images, which share a large amount of data blocks among them.  If object-level deduplication was applied to different Ubuntu images, they would have a different fingerprint, and thus, would be stored twice. However, if block-level deduplication is applied, the common blocks between the two images are just stored once.

Objects can be divided into variable or fixed size blocks. For simplicity, I am assuming fixed sized blocks, however the proposed architecture can also be applied to variable-sized blocks. Object-level deduplication can be seen as a special case of block-level deduplication, where block size = infinity.

Several challenges arise when deduplicating objects at the block level in a distributed object storage. Some of the challenges are:

  • The large quantity of blocks make it prohibitive to store each block as a separate file. For instance, if the block-size is 4KB, a 2GB object will have 524288 blocks. As the amount of stored objects grow, the amount of stored blocks will exceed the maximum number of files supported by the underlying file system. Traditional CAS solutions solve this by storing multiple blocks in a single file and having an index of blocks for stored file.
  • With small block sizes, even modest-sized objects will have more blocks than partitions in the distributed object storage. For instance, if there’s a 20MB object with a 4KB block-size and 100 nodes in the storage system, assuming block digests are uniformly distributed, each node will store more than 50 blocks of this object. This means that all nodes of the storage system will need to be contacted to rebuild the original object. This may become a bottleneck when a large number of objects is being accessed simultaneously.
  • Block access locality may be lost when blocks are distributed among many nodes (fragmentation). A locality strategy may be used to ensure nearby blocks in the original object are also stored nearby in the storage node.
  • Access to disk may become a bottleneck in the storage nodes if many requests are done concurrently, since many distinct blocks would need to be fetched from different locations on disk. Caching and pre-fetching of blocks (exploiting locality) can mitigate this overhead.

I will try to address these and other challenges throughout my master thesis work. Moreover, I plan to evaluate the impact of each of these issues during the evaluation phase.

Deduplicating objects (at the block-level)

The idea is to have the same “content-addressable storage ring” (CAS-ring), but instead of storing whole-objects, the ring will store content-addressable blocks in a distributed manner. Since inline deduplication of thousands of blocks within an object would add a lot of overhead in the insert operation, the post-processing method for block-level deduplication was chosen.

When an object is inserted into the object storage, it is marked to be deduplicated. The deduplication routine divides an object into blocks (of a specified size), and computes the secure digest of each block, checking if it’s already stored in the CAS-ring. In case it is not already stored, the block is stored in the CAS-ring. After all blocks are stored, the original object is finally replaced in the object-ring by a manifest that describes how the original object should be reconstructed from content-addressable blocks.

Retrieving objects

The object ring will store for each object a piece of metadata (called manifest or recipe) containing fingerprints of the blocks that compose the object. These block addresses will be used in the CAS-ring to retrieve the blocks and reconstruct the original object. For example:

  • Client wants to retrieve object “xpto”:
  • Get manifest/receipe of object “xpto” in the object ring
    • Object ring returns block addresses for object “xpto”: “6b4b0d3255bfef95″, ”890afd80709″, ”da39a3ee5e”.
  • Get blocks “6b4b0d3255bfef95″, ”890afd80709″ and “da39a3ee5e” in the CAS-ring.
  • Reconstruct original object, by concatenating retrieved blocks.

One idea here is to use techniques employed in BitTorrent to speed-up retrieval of large objects from multiple storage nodes.

Removing objects

The initial approach is to keep reference counts for each stored block in the CAS-ring. These counts are updated as objects are added, removed or updated in the object ring. However maintaining reference counts for each stored block may add a considerable amount of metadata overhead depending on the block size. For instance, if the block size is 4KB, the overhead of reference counts is 32 (integer size in bits) / 4 * 1024 * 8 (size of block in bits) = 0.000976562 = 0.097% metadata per block. For a 1TB block storage, the reference count would consume 1GB of storage space. There is a tradeoff between metadata overhead vs block-size vs deduplication level, which I plan to analyze during this work.

An alternative approach would be not to keep reference counts and to have a distributed garbage collection mechanism, that would go through all objects in the object ring and delete unreferenced blocks.

Replication

So far I haven’t thought on block replication in the CAS-ring. The initial approach is to basically replicate blocks in the same way objects are currently replicated in Swift. I will further investigate that.

Conclusion

In this post I presented an architecture for object-level and block-level deduplication in an distributed object store (OpenStack Swift). This is just an initial draft of the prototype I will be implementing and evaluating in the coming months. Feel free to comment and give suggestions on this design: feedback will be very appreciated. I will be posting the progress of this work in this blog.

Posted in Uncategorized | 2 Comments

Using deduplication to reduce storage demands on Cloud Providers

In the paper, “The Effectiveness of Deduplication on Virtual Machine Disk Images“, the authors perform an in-depth analysis of several factors that may or may not impact the level of deduplication of virtual machine images.

So, what’s exactly deduplication?

The main idea is to leverage data commonality in a storage system by identifying duplicate “chunks” of data across multiple files and storing only one copy of each chunk.

How do you do that?

The idea is to compute a digest (such as SHA-1) of each data chunk composing a file, and check if that data is already present in the chunk store. The chunk is only stored in case it is already not there, otherwise a pointer to the already stored chunk is added to the metadata describing how to reconstruct the original file.

How do you divide a file into chunks?

There are two main techniques for chunking: variable-size chunking and fixed-size chunking. Fixed size chunking is straightforward: you define a chunk size (such as 4KB), and divide a file into equal chunks of that size. However, if some data is appended or removed from this file, all the chunks after the modification will become invalid. Variable-size chunking is resistant to modification, since the chunks can have different sizes. A well-known technique for variable-size chunking is to compute a rabin fingerprint of the file stream to define where to place the boundaries of each chunk.

Why use deduplication on virtual machine images?

Virtualization technology is widely adopted in data centers and cloud computing providers in order to better utilize physical resources and to provide isolation between different applications/users. A problem that arises is the amount of storage needed to store multi-gigabyte VM disk images. Several researches identified that different VM images share a considerable amount of data between then, what suggests that the use of deduplication may reduce the total amount of storage needed in VM hosting facilities.

Below are some interesting findings of the aforementioned paper on deduplication in the context of VM images:

  • Deduplication can save 80% of more of storage space when stored VM images are from the same operating system “lineage”, such as Ubuntu or Fedora.
  • For mixed operating systems, the deduplication ratio is about 40%, which is still quite a considerable amount of space saved.
  • Fixed-size chunking outperforms variable-size chunking for VM images, which is good news, since typically that’s easier to implement.
  • Compression of chunks can further increase storage savings
  • Factors that have major impact on deduplication effectiveness:
    • Base operating system (the more homogeneous, the more the level of deduplication)
    • Chunk size (the smaller the chunk, the higher the deduplication level, the higher the overhead to reconstruct the original file)
  • Factors that have little impact on deduplication effectiveness:
    • Package installation or language localization within the same operating system
    • Surprisingly, consecutive releases of a single OS have a similar level of de-duplication of releases away from each other (normally high)

What are the implications of this to my work?

Deduplication in the context of VM images is a great use case for a content-addressable storage, which can be used as a storage backend for the chunk store needed for de-duplication of VM images. Current CAS solutions are either based on costly hardware (such as disk arrays) or centralized. However, a centralized CAS architecture will have limited capacity and will not scale as the amount of stored data grows.

Public and private cloud providers spend a massive amount of storage space to keep user’s VMs. Using a distributed content addressable storage to store VMs have the following advantages (among others):

  • Obvious scalability and elasticity
  • Reduce storage demands for multi-gigabyte VM hosting
  • Use the saved storage space for replicating data chunks, increasing availability and durability
  • Parallel transfer of chunks from multiple servers, possibly in a BitTorrent fashion, what may speed up transfer of a VM image to hosts
Posted in Uncategorized | 1 Comment

Leveraging data commonality with content-addressable storage

There are a lot of similarities between subsequent releases of a software at the binary level. In the figures below, each series represent how many percent of binary data blocks are exactly the same between a reference release of a software package and all previous releases of the same package. The first graph is for the Linux Kernel 2.4 source code releases, and the second graph is for 10 nightly binary releases of Mozilla (in March 2003).

Commonality between Linux 2.4 Kernel releases

Commonality between Mozilla nightly binary releases

On average, about 60% of the blocks from different releases are redudant, and a minimum of 30% blocks are common for all releases. That’s quite a lot! The similarities are more significant for source code then for binary releases, since a small change in the source code may have a large impact in the compiled code. Even though these results are at the block-level, for large software packages a similar level of commonality may also be observed between files of different releases. This is a typical case where content-addressable storage (CAS) may save lots of disk space, since each binary object (either whole file or blocks) is stored only once.

Those results are exciting because they are very related to the use case I’m focusing to build a distributed content-addressable storage. The CernVM file system currently uses a CAS to distribute applications’ software to virtual machines for the LHC experiments at CERN. In that scenario, applications are released every other day. Based on these results, the level of commonality between subsequent releases must be high, what justifies the use of a CAS at CernVM-fs. I wonder what are the actual levels of commonality for experiments’ releases @ CERN. Will see if I can find that out.

The results above were presented in the paper Opportunistic Use of Content Addressable Storage for Distributed File Systems (2003). In this paper they propose a distributed file system on the top of a content-addressable storage. The idea is to divide each file into blocks, hash the contents of each block and write a metadata file that describes how to rebuild the original file from the separate blocks. This metadata is called a recipe, and is shown below:

\begin{figure}\begin{center}\small\begin{verbatim}<?xml version=''1.0'......ist></recipe_choice></recipe>\end{verbatim}\end{center}\end{figure}

Sample File Recipe

The recipe abstraction allows to to split the file in many ways: variable or fixed block size; and to use different hashing algorithms: MD5, SHA-1, etc.

In the CASPER file system, when a client wants to fetch a file, it will try to download the whole file from the server (as in a typical client-server FS: Corba, AFS, NFS, etc). If the connection to the server is slow, the client will instead ask for the recipe of the file it wants to fetch. With the recipe available, the client tries to get individual blocks from nearby content-addressable storage providers. In their view, content-addressable storage providers will be available on local networks with much better bandwidths and latencies. If not all blocks are found on nearby CAS providers, the remaining blocks are fetched from the central server as usual.

Cheers!

Posted in Uncategorized | 1 Comment

A Scalable Architecture for a Distributed Content-Addressable Storage System

Definition

Content-Addressable-Storage (CAS) – A fancy name for a simple storage technique: instead of indexing stored objects by their location (such as file://home/user/example or http://www.example.com/file.jpg), as done in traditional storage systems, index objects by their content.

This is typically done by hashing the contents of the object (using MD5 or SHA-1, for example) and obtaining an unique identifier (UID). For instance, the object “” (empty string), has the MD5 digest: d41d8cd98f00b204e9800998ecf8427e, which can be used as an UID to access this object. Thus, no knowledge of the storage backend or physical data location is needed to retrieve the data. Two benefits of CAS over location-based storage systems are:

  • Automatic data integrity check, which allows to detect corrupted data;
  • Automatic data de-duplication, which can optimize disk space utilization, specially in scenarios where data repetition is common.

Since the UID changes when the content of an object changes, CAS are typically used to store fixed content data (immutable), such as archives or backups. According to analysts, static content account for 50% to 80% of the produced data in organizations nowadays [1][2]. However, content-addressable storage is also used to store mutable data, such as in popular distributed revision control system Git. In order to enable this, additional metadata is needed to correlate multiple versions  of a particular object.

The Problem

In the era of big data, Content Addressable Storage Systems are gaining popularity as means to store large amounts of data, both in volume and in quantity. Besides the traditional use case of archiving documents, cloud-based backup and online file hosting are potential use cases for CAS systems. In this context, critical requirements for modern CAS systems are scalability, fault-tolerance and availability. How can this be achieved? Yes, by distributing!

A distributed content-addressable storage system may attend these requirements by distributing and replicating its data across a set of storage nodes. However, if the UID computation is done at a single node (such as in the client-side), insertion throughput may be compromised if a very large number of objects in inserted at once. This is because the digest computation is normally an expensive operation, and may become a bottleneck during insertion of a large number of objects through a single node.

For instance, in the CernVM project the insertion of a software release composed of 250,000 files (total size: ~10GB) into a repository based on a centralized CAS takes up to 1 hour (UID computation and data compression). This overhead may be reduced if a high-speed connection is available (such as Gigabit Ethernet), since this computation may be distributed among the nodes participating in the DCASS.

The solution (or the way to it..)

The objective of my master thesis is to design a scalable architecture for a Distributed Content-Addressable Storage System (DCASS) of binary objects (blobs). This architecture will focus in the CernVM use-case, where hundreds of thousands of objects may be inserted as a single batch into the DCASS. On the top of the requirements presented previously, this system will have the following additional requirements:

  • Distribution of digest computation across nodes participating in the DCASS when insert throughput can be increased;
  • Elasticity – grow or shrink the DCASS by just adding or removing nodes;
  • Support for additional processing over the data on participating nodes, such as data compression. This additional processing can be enabled through a plugin;
  • Support for multiple storage backend types (such as file system, database system, networked storage, etc)

In contrast with other distributed object storage systems (such as distributed key-value stores), concurrency control does not have to be as strict since the update operation is not supported in a DCASS.

Some of the questions to answer throughout this work are:

  • How is performance (in terms of response time) improved/decreased when using a DCASS instead of other systems in the CernVM repository use case?
  • Given network conditions (bandwidth and delay) connecting the client and the nodes in a DCASS, what is the minimum number of objects (volume and size) that “pays off” distributing the hash computation in contrast of calculating the digests in the client node?
  • What is the overhead introduced by elasticity in a DCASS? (ie. by re-copying files when nodes are added or removed)

Let’s see where this will lead. Suggestions and comments are very welcome!

“Mi CAS es su CAS” ;)

[1] – http://www.internetnews.com/ent-news/article.php/1024271/EMC+Tackles+Fixed+Content+Storage.htm

[2] – http://documentmedia.com/ME2/dirmod.asp?sid=&nm=&type=news&mod=News&mid=9A02E3B96F2A415ABC72CB5F516B4C10&tier=3&nid=DF86E1F49983419F868090D9C5B7498C

Posted in Uncategorized | 6 Comments

Hello, World!

I’ve always wanted to have a tech blog to write about what I’ve been doing at university, eventual side-projects and some random stuff I come across every day. However I never took the time to start it, but now I guess the time has come…

Inspired by fellow EMDC* mates Lalith Suresh and Marcus Ljungblad (I guess I finally learned how to type his surname), who regularly post interesting stuff in their blogs, and in order to share ideas and thoughts about my coming master thesis @ EMDC, I finally decided to start my own blog. Yey! :-)

In the coming months I will mostly be writing about my thesis in designing a distributed storage system. Most of my fellow EMDC classmates will also write about the progress of their thesis in their blogs, which are available in the right sidebar. The cool thing is that the projects are very diverse, in many areas of academia or industry, so lot’s of interesting and cutting-edge content in Distributed Systems will be posted by briliant people, keep an eye! ;)

Looking forward to start writing about my thesis in the next post, some time early this week..

Cheers!

*EMDC = European Master in Distributed Computing – www.kth.se/emdc

Posted in Uncategorized | 1 Comment