kyle_burton ([info]kyle_burton) wrote,
@ 2008-10-21 11:29:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Cloud Con East Notes
Overall Computing Among The Clouds was a good conference.

The main theme I took away from it was that it is a continuing trend
and services will continue to appear and be developed that will make
taking advantage of these resource pools even easier.

The trend for physical data centers will continue to become more and
more outsourced to organizations that can provide those services with
greater economy of scale. Currently Amazon's offerings are slightly
more expensive than a hosted system that you own. More guidelines
came out about when the trade off is appropriate. Keep in mind that as
a trend, cloud computing is still new - it is likely that these trade
offs will shift - even as soon as over the next year or two (eg: it is
likely, in my opinion, that the raw cost of 24x7 allocation will fall
below the cost of ownership due to economies of scale - and I think
this will happen by no longer than 3 years).

Own your own:

- you have a steady load

- AWS, used 24x7x365 is more expensive than owning and hosting
physical machine - though this does not include administrative
costs

- you have hard SLA's

- you can not anticipate peak loads but must have spare capacity to
handle peaks

- you have sensitive data that requires more certainty about where the
data gets stored, etc

- though there are cloud computing platforms that are working
towards security data data protection certifications

Use an AWS style platform:

- your needs scale up and then can scale back down

- you can plan for these needs such that you have time to provision
the

- you have no capitol budget

- you have a service where you can charge in direct proportion to
utilization rather than based on capacity.

Higher Level Application Stacks, like Google App Engine:

- keep in mind these are new and the space is still being explored,
more will appear, a model will develop around these, more languages
and frameworks will be supported - though it mostly will be based
on those that can be easily hosted/sandboxed, which is why you're
seeing Python first and Java following along

- all of the provisioning is hidden from you - Google's offering
dynamically scales your application up and down based on
utilization - this is a significant reduction in your design and
administrative workload

- simplicity of application development _and_ scalability are rarely
found in existing technologies, this is one of the most compelling
aspects of these kinds of application stacks



Organizations and individuals are starting to learn how they need to
change the design of their services and applications to take better
advantage of the Cloud. It takes a bit of a change in mind-set - the
phrase was dropped "Machine Instances are the new processes" and I
think that's an appropriate framing of one of the changes in mindset
you should have to take advantage of the sea of resources that is
becoming available.

Changing your software to be more easily bootstrapped, eliminate the
assumption that you have access to local, disk based services -
everything is pulled remotely, use URLs, and services - don't assume
local interfaces, assume remote. Design to come up / boot faster -
scaling up / down quickly you don't get the same amortization over
time for start up costs. Design with the crash fast mentality - as
robust as these systems are, you should still design with the idea in
mind that the system could go away at a moment's notice -- in addition
to the benefit to unexpected outages, this allows you to scale _down_
faster. Keep your persistent data in the provided data stores and
use the provided queuing systems to distribute work.


What these offerings need to do to gain wider acceptance and the the
value-adds to come:
--------------------------------------------------------

I think that harder SLAs will develop as there is more competition.
Higher level tools will develop on top of the instance-based cloud
offerings (EC2) to allow for more automated provisioning - this will
make it easier for you, but not so easy as for full restricted stacks
like GAE.

I think we'll see tools and offerings develop that will come down
towards traditional data centers to allow a simpler mixing of a
traditional service with bleed over to the cloud as resources need to
be scaled up but also so that you can have control over the processing
or data (sensitive data?) in your own protected environment but push
generic activity up into the cloud as necessary.



Other notable happenings from the conference or that were recently
announced:
------------------------------------------------------------------

- Microsoft is creating AMIs for Windows on EC2.

- Google just announced that Java will be a supported App Engine
development language - previously only Python was supported.




Haskell in the corporate environment
----------------------------------------

Seemed out of place for the event - not really cloud-ish. Though I
personally see Functional Programming being a larger industry trend.
It follows from structured -> procedural -> object oriented ->
functional - with respect to the time line of coming out of academia at
least, not necessarily the pejorative idea of one being 'higher level'
than the other - though so far, time has implied that with the other
patterns.

The presenter (Jeff Polakow) is using it extensively at his current
employer.

Those kinds of firms (wall st) allow a lot of latitude to the
technical staff, so its easier to experiment (R&D) with new
technologies. It's much harder for a company like HMS to decide to
take on something like this - it's hard to find developers, how to
develop, deploy, monitor and design with these technologies is
undetermined for companies like HMS.

I think the FP trend is being pushed into industry by the shift to
multi-core, the past difficulties of developing concurrent,
parallel/distributed applications (its hard and giving developers
access to create threads directly has proven to be a difficult road
to travel), the need for infrastructural level automatic scaling, and
the easier path to robustness that languages like Erlang offer.

In languages like Java, you have to take into consideration all the
libraries you're using with respect to their referential transparency
- it's not the default. In the FP languages referential transparency
is the _default_ case, so you can, in general, make that assumption.
The underlying stack can also make that assumption about your code as
well - which is why the concurrency / distribution model is less
coupled to the implementation than it is in the more imperative
languages.


Horizontal Scaling with HiveDB
----------------------------------------

CafePress has a _large_ catalog. I was kind of shocked to hear that
they have 265 million products. They have a low margin based on the
aggregate amount of data they have to store and serve up, so solutions
like Oracle were just not an option for them simply from a cost basis.

They spent time analyzing their options and didn't find anything that
fit their needs (cost, performance, no off-line resharding), and
created a more scalable data storage architecture.

The solution they created performs better, scales better, is more
robust and has a better SLA than many of the other commercial
solutions.

It is effectively a hibernate extension that uses MySQL and does data
partitioning (pseudo-automatically) by using a set of replicated MySQL
databases as a catalog to map to where the data is stored for your
shard (replicated 3x). The system supports dynamic repartitioning -
migration of shards away from a shard-host to get less busy data away
from data that is more 'hot' - the busiest data sets end up on their
own shard-node with everything else having been pushed away from them.

They only need to 'lock' the data for a single user's data when
migrating it. The system as a whole doesn't go down. The MySQL
catalogs are replicated (3 machines, master-master, writing to 1) and
can be upgraded by taking 1 of the 3 out of the cluster. The same
kind of approach goes for the other sharing servers.


Panel Discussion
----------------------------------------


Hive and Hadoop
----------------------------------------

Hive is a data storage system built on top of Hadoop with its own
query language (HiveQL), built by Facebook. The goals are a bit
different from HiveDB - HiveDB is more for OLTP, while Hive is more
for large-scale analytics. Being built on top of Hadoop, HiveDB is
much more batch oriented. Facebook uses it for doing analytics /
data-mining / machine-learning of their user and transactional data
sets (logs, user activities, etc.) to mine out aggregate and trending
intelligence from the large data set.

Surprising facts: 2Tb of growth _per_ _day_.


Building Scalable Web Applications with Google App Engine
---------------------------------------------------------

Stacks like GAE take a more managed environment approach than the more
raw / primitive services provided by Amazon. The two fit into
different use cases though and, IMO, one will not necessarily supplant
the other.

GAE takes away from you all the concerns about deployment, production
architecture, system management or administration. It gives you a data
store with an OO API, and a web-app development environment that you
develop your application within. There are things you can't do, for
example, you can't run arbitrary software or services on GAE like you
can on the more machine-image based cloud services (AWS EC2).

What you gain from giving up those capabilities is: Google's
infrastructure for scaling is _your_ infrastructure for scaling. Your
app is designed in a psuedo-functional way - the stack encourages you
to design your app to perform all dynamism on put/post time and to
just render/display at get time. This approach helps with the scaling
of the system. Storage location transparency helps with spooling up
other instances of the app in disparate data centers, etc.

This kind of stack really makes it easy to develop the most common
case of web applications - it is both easy to do and it scales. This
is a combination that almost _never_ get together.

I see these kinds of stacks as becoming more established and a large
part of Internet based application development - I think that more
organizations will offer these kind of stacks across more
technologies.

You really should sign up and at least try GAE out.


Developing and Deploying Java applications on the Amazon Elastic Compute Cloud
------------------------------------------------------------------------------

Chris Richardson has created cloud-tools, a package of utilities (and
a maven plug-in) for provisioning EC2 instances, pushing your
application up and executing tasks across your cluster of instances.



(1 comment) - (Post a new comment)

Thanks for the notes
[info]rob.dimarco.myopenid.com
2008-10-21 04:48 pm UTC (link)
Sounds like a very interesting conference.

Cloud services is definitely in the early adopter phase of technology adoption. For it to jump to the mainstream, there will need to be trusted service companies that will provide security nets for enterprises interested in trying out the new technology. For example, look at the role that Sun/IBM/Oracle played in 1998 in getting the world ready for Java and the web. Or the role that IBM and Oracle played in making Linux (specifically SuSE and RedHat) acceptable in the enterprise. Similar commitments will need to happen for cloud computing to really take off.

As for functional programming being a larger trend, I am skeptical. It is still in the innovator/tinkerer phase, as it has been for the last 30 years. There has been a lot of talk, especially among the Y-Combinator set, but I haven't seen any real movement towards it yet.

(Reply to this)


(1 comment) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…