| List Comprehensions in Clojure |
[Nov. 18th, 2008|10:03 pm] |
Just a quick example of list comprehensions in Clojure.
;; generate all the positions on a chess board:
(for [file "ABCDEFGH"
rank (range 1 9)]
(format "%c%d" file rank))
;; ("A1" "A2" "A3" "A4" "A5" "A6" "A7" "A8"
;; "B1" "B2" "B3" "B4" "B5" "B6" "B7" "B8"
;; "C1" "C2" "C3" "C4" "C5" "C6" "C7" "C8"
;; "D1" "D2" "D3" "D4" "D5" "D6" "D7" "D8"
;; "E1" "E2" "E3" "E4" "E5" "E6" "E7" "E8"
;; "F1" "F2" "F3" "F4" "F5" "F6" "F7" "F8"
;; "G1" "G2" "G3" "G4" "G5" "G6" "G7" "G8"
;; "H1" "H2" "H3" "H4" "H5" "H6" "H7" "H8")
(count (for [file "ABCDEFGH"
rank (range 1 9)]
(format "%c%d" file rank)))
;;64
;; the pythagorean triples example:
(for [aa (range 1 10)
bb (range 1 10)
cc (range 1 10)
:when (= (* cc cc)
(+ (* aa aa)
(* bb bb)))]
(list aa bb cc))
;; ((3 4 5) (4 3 5))
;; all permutations?
(defn all-permutations [things]
(if (= 1 (count things))
(list things)
(for [head things
tail (all-permutations (disj (set things) head))]
(do
(cons head tail)))))
(all-permutations '(a b c))
;; ((a c b) (a b c) (b a c) (b c a) (c a b) (c b a))
|
|
|
| Cloud Con East Notes |
[Oct. 21st, 2008|11:29 am] |
Overall Computing Among The Clouds was a good conference.
The main theme I took away from it was that it is a continuing trend and services will continue to appear and be developed that will make taking advantage of these resource pools even easier.
The trend for physical data centers will continue to become more and more outsourced to organizations that can provide those services with greater economy of scale. Currently Amazon's offerings are slightly more expensive than a hosted system that you own. More guidelines came out about when the trade off is appropriate. Keep in mind that as a trend, cloud computing is still new - it is likely that these trade offs will shift - even as soon as over the next year or two (eg: it is likely, in my opinion, that the raw cost of 24x7 allocation will fall below the cost of ownership due to economies of scale - and I think this will happen by no longer than 3 years).
Own your own:
- you have a steady load
- AWS, used 24x7x365 is more expensive than owning and hosting physical machine - though this does not include administrative costs
- you have hard SLA's
- you can not anticipate peak loads but must have spare capacity to handle peaks
- you have sensitive data that requires more certainty about where the data gets stored, etc
- though there are cloud computing platforms that are working towards security data data protection certifications
Use an AWS style platform:
- your needs scale up and then can scale back down
- you can plan for these needs such that you have time to provision the
- you have no capitol budget
- you have a service where you can charge in direct proportion to utilization rather than based on capacity.
Higher Level Application Stacks, like Google App Engine:
- keep in mind these are new and the space is still being explored, more will appear, a model will develop around these, more languages and frameworks will be supported - though it mostly will be based on those that can be easily hosted/sandboxed, which is why you're seeing Python first and Java following along
- all of the provisioning is hidden from you - Google's offering dynamically scales your application up and down based on utilization - this is a significant reduction in your design and administrative workload
- simplicity of application development _and_ scalability are rarely found in existing technologies, this is one of the most compelling aspects of these kinds of application stacks
Organizations and individuals are starting to learn how they need to change the design of their services and applications to take better advantage of the Cloud. It takes a bit of a change in mind-set - the phrase was dropped "Machine Instances are the new processes" and I think that's an appropriate framing of one of the changes in mindset you should have to take advantage of the sea of resources that is becoming available.
Changing your software to be more easily bootstrapped, eliminate the assumption that you have access to local, disk based services - everything is pulled remotely, use URLs, and services - don't assume local interfaces, assume remote. Design to come up / boot faster - scaling up / down quickly you don't get the same amortization over time for start up costs. Design with the crash fast mentality - as robust as these systems are, you should still design with the idea in mind that the system could go away at a moment's notice -- in addition to the benefit to unexpected outages, this allows you to scale _down_ faster. Keep your persistent data in the provided data stores and use the provided queuing systems to distribute work.
What these offerings need to do to gain wider acceptance and the the value-adds to come: --------------------------------------------------------
I think that harder SLAs will develop as there is more competition. Higher level tools will develop on top of the instance-based cloud offerings (EC2) to allow for more automated provisioning - this will make it easier for you, but not so easy as for full restricted stacks like GAE.
I think we'll see tools and offerings develop that will come down towards traditional data centers to allow a simpler mixing of a traditional service with bleed over to the cloud as resources need to be scaled up but also so that you can have control over the processing or data (sensitive data?) in your own protected environment but push generic activity up into the cloud as necessary.
Other notable happenings from the conference or that were recently announced: ------------------------------------------------------------------
- Microsoft is creating AMIs for Windows on EC2.
- Google just announced that Java will be a supported App Engine development language - previously only Python was supported.
Haskell in the corporate environment ----------------------------------------
Seemed out of place for the event - not really cloud-ish. Though I personally see Functional Programming being a larger industry trend. It follows from structured -> procedural -> object oriented -> functional - with respect to the time line of coming out of academia at least, not necessarily the pejorative idea of one being 'higher level' than the other - though so far, time has implied that with the other patterns.
The presenter (Jeff Polakow) is using it extensively at his current employer.
Those kinds of firms (wall st) allow a lot of latitude to the technical staff, so its easier to experiment (R&D) with new technologies. It's much harder for a company like HMS to decide to take on something like this - it's hard to find developers, how to develop, deploy, monitor and design with these technologies is undetermined for companies like HMS.
I think the FP trend is being pushed into industry by the shift to multi-core, the past difficulties of developing concurrent, parallel/distributed applications (its hard and giving developers access to create threads directly has proven to be a difficult road to travel), the need for infrastructural level automatic scaling, and the easier path to robustness that languages like Erlang offer.
In languages like Java, you have to take into consideration all the libraries you're using with respect to their referential transparency - it's not the default. In the FP languages referential transparency is the _default_ case, so you can, in general, make that assumption. The underlying stack can also make that assumption about your code as well - which is why the concurrency / distribution model is less coupled to the implementation than it is in the more imperative languages.
Horizontal Scaling with HiveDB ----------------------------------------
CafePress has a _large_ catalog. I was kind of shocked to hear that they have 265 million products. They have a low margin based on the aggregate amount of data they have to store and serve up, so solutions like Oracle were just not an option for them simply from a cost basis.
They spent time analyzing their options and didn't find anything that fit their needs (cost, performance, no off-line resharding), and created a more scalable data storage architecture.
The solution they created performs better, scales better, is more robust and has a better SLA than many of the other commercial solutions.
It is effectively a hibernate extension that uses MySQL and does data partitioning (pseudo-automatically) by using a set of replicated MySQL databases as a catalog to map to where the data is stored for your shard (replicated 3x). The system supports dynamic repartitioning - migration of shards away from a shard-host to get less busy data away from data that is more 'hot' - the busiest data sets end up on their own shard-node with everything else having been pushed away from them.
They only need to 'lock' the data for a single user's data when migrating it. The system as a whole doesn't go down. The MySQL catalogs are replicated (3 machines, master-master, writing to 1) and can be upgraded by taking 1 of the 3 out of the cluster. The same kind of approach goes for the other sharing servers.
Panel Discussion ----------------------------------------
Hive and Hadoop ----------------------------------------
Hive is a data storage system built on top of Hadoop with its own query language (HiveQL), built by Facebook. The goals are a bit different from HiveDB - HiveDB is more for OLTP, while Hive is more for large-scale analytics. Being built on top of Hadoop, HiveDB is much more batch oriented. Facebook uses it for doing analytics / data-mining / machine-learning of their user and transactional data sets (logs, user activities, etc.) to mine out aggregate and trending intelligence from the large data set.
Surprising facts: 2Tb of growth _per_ _day_.
Building Scalable Web Applications with Google App Engine ---------------------------------------------------------
Stacks like GAE take a more managed environment approach than the more raw / primitive services provided by Amazon. The two fit into different use cases though and, IMO, one will not necessarily supplant the other.
GAE takes away from you all the concerns about deployment, production architecture, system management or administration. It gives you a data store with an OO API, and a web-app development environment that you develop your application within. There are things you can't do, for example, you can't run arbitrary software or services on GAE like you can on the more machine-image based cloud services (AWS EC2).
What you gain from giving up those capabilities is: Google's infrastructure for scaling is _your_ infrastructure for scaling. Your app is designed in a psuedo-functional way - the stack encourages you to design your app to perform all dynamism on put/post time and to just render/display at get time. This approach helps with the scaling of the system. Storage location transparency helps with spooling up other instances of the app in disparate data centers, etc.
This kind of stack really makes it easy to develop the most common case of web applications - it is both easy to do and it scales. This is a combination that almost _never_ get together.
I see these kinds of stacks as becoming more established and a large part of Internet based application development - I think that more organizations will offer these kind of stacks across more technologies.
You really should sign up and at least try GAE out.
Developing and Deploying Java applications on the Amazon Elastic Compute Cloud ------------------------------------------------------------------------------
Chris Richardson has created cloud-tools, a package of utilities (and a maven plug-in) for provisioning EC2 instances, pushing your application up and executing tasks across your cluster of instances. |
|
|
| destructuring-bind |
[Sep. 18th, 2008|08:25 am] |
;; This is a typical usage, for pulling apart a list
(destructuring-bind
(first second)
'(1 2)
(format t "~%~%;;; => first:~a second:~a~&" first second))
;;; => first:1 second:2
;; You can also pull apart improper lists:
(destructuring-bind
(first . second)
'(1 . 2)
(format t "~%~%;;; => first:~a second:~a~&" first second))
;;; => first:1 second:2
;; The first argument to destructuring-bind is a lambda list, but you
;; can grab the remainder by either using a dotted list:
(destructuring-bind
(first second . stuff)
'(1 2 3 4 5)
(format t "~%~%;;; => first:~a second:~a rest:~a~&" first second stuff))
;;; => first:1 second:2 rest:(3 4 5)
;; or you can grab the remainder with &rest, just like you do for
;; functions that take a variable number of arguments:
(destructuring-bind
(first second &rest stuff)
'(1 2 3 4 5)
(format t "~%~%;;; => first:~a second:~a rest:~a~&" first second stuff))
;;; => first:1 second:2 rest:(3 4 5)
;; It really is a lambda list, you can use default parameters:
(destructuring-bind
(first second &optional (third 'default))
'(1 2)
(format t "~%~%;;; => first:~a second:~a third:~a~&" first second third))
;;; => first:1 second:2 third:DEFAULT
(destructuring-bind
(first second &optional (third 'default))
'(1 2 3)
(format t "~%~%;;; => first:~a second:~a third:~a~&" first second third))
;;; => first:1 second:2 third:3
;; And you can use keyword parameters:
(destructuring-bind
(first second &key third)
'(1 2 :third 3)
(format t "~%~%;;; => first:~a second:~a third:~a~&" first second third))
;;; => first:1 second:2 third:3
;; Finally, you can use it to 'unparse' trees as well, which is a
;; really great feature, since your variable declaration matches the
;; 'shape' of the data strucutre you're pulling apart. This technique
;; is really handy for dealing with XML after it's been converted to
;; s-expressions.
(destructuring-bind
(a (b (c d e (f g) h i j)) &rest remainder)
'(1 (2 (3 4 5 (6 7) 8 9 10)) 11 12 13 14 15)
(format t
"~%~%;;; => a:~a b:~a c:~a d:~a e:~a f:~a g:~a h:~a i:~a j:~a remainder:~a ~&"
a b c d e f g h i j remainder))
;;; => a:1 b:2 c:3 d:4 e:5 f:6 g:7 h:8 i:9 j:10 remainder:(11 12 13 14 15)
|
|
|
| Bring me Data! Landmark Parsing in CL |
[Aug. 16th, 2008|12:55 am] |
(require :drakma)
(require :landmark-parser)
(use-package :landmark-parser)
(defvar *doc* (drakma:http-request "http://www.loc.gov/standards/iso639-2/php/code_list.php"))
(defun get-main-table (doc)
(extract-between
(new-parser doc)
'((forward-past "<table ")
(forward-past "<table ")
(forward-past "<tr")
(forward-to "<tr"))
'((forward-to "</table"))))
(defun get-rows (block)
(extract-all (new-parser block)
'((forward-to "<tr"))
'((forward-past "</tr>"))))
(defun row->cells (row)
(extract-all (new-parser row)
'((forward-past "<td")
(forward-past ">"))
'((forward-to "</td>"))))
(with-open-file (out "/home/mortis/iso.tab"
:direction :output
:if-exists :supersede)
(loop for row in (mapcar #'row->cells (get-rows (get-main-table *doc*)))
do
(format out "~{~a~^ ~}~&" row)))
I just uploaded the Landmark Parser tonight.
|
|
|
| OSCon Day 1 |
[Jul. 21st, 2008|10:57 pm] |
Andrew and I arrived for registration and took advantage of the continental breakfast before heading up to the Intro to Python.
O'Reilly had the registration process pretty streamlined. They had a long bank of laptops wich you needed only enter your registration code, or your email address (if you registered on the oscon conference web site). Register, then walk up to the materials station and pick up your ID and badge card.
There were plenty of juices, coffee, fruit and pastries. There was also plenty of seating. To either O'Reilly's or the Oregon Conference Center's credit, things were very well organized.
The conference room we were must have had seating for a few hundred people and it was effectively full. There was limited space for each attendee and their items (it was at least cramped for me) - though they expected a laptop per attendee - there were plenty of power strips, laid along every other row of tables, within easy reach of every single seat. It was well planned and laid out.
The intro to Python got underway at 8:30 and although it was geared toward an audience with some programming experience, it assumed (as the title suggested) no python experience. Steve Holden was a great speaker, filling in twice with anecdotes while technical issues were worked out with equipment (once was a mis configuration of his laptop, the other was a power interruption).
Python is a very capable language. It is more consistent about its OO and syntax when compared to Perl. It is also a lot bigger on conventions, broadly adopted by the community. This is mostly focused on formatting (one expression per line), in-line documentation and coding stle in general.
Functions are first class types, you can take a function into a variable, you can implement the equivalent of funcall and apply in python. Functions can be passed as arguments. Python supports positional parameters, default values for function params and calling functions, any function, with positional arguments, named arguments, a tuple of arguments (similar to funcall), or a dictionary (an indirect way of using named arguments).
Python actually has a lot of features which were inspired by functional programming (including list comprehensions).
Python is byte-compiled, like Java. You write code in a .py file, and the first time it is loaded as a module (import), python compiles the code for you. The time stamp check fo the .pyc vs the .py file is transparent, it's automatically handled.
Strings are immutable, which is something that helps Jython be a natural fit in the JVM.
Python supports some destructuring constructs, based in what it calls tuples. It's easier to show an example:
a, (b, c) = (1, (2, 3))
print a,b,c => 1, 2, 3
Tuples, and this kind of binding syntax, is widely used in processing things like lists, and maps.
An interesting feature of the language is the pair of functions, local() and gobal(). local() returns a dictionary (Python's name for a Map), of all of the variable bindings (and values) that are visible in the current scope (exclusive of global variables). globals() returns the variables in the entire module's scope (not local, lexical or class scope, and not global in the sense of a Perl global - not universally global).
Other highlights:
- the yield() form, which is like a weak kind of continuation - for, and while loops can have an else clause, which is executed when the form terminates normally (as opposed to breaking out of the loop) - the Python try/catch form (try/except/finally) can have an else form, again, which is executed if no exception was thrown in the try block
After a break for lunch, both Andrew and I attended the Introduction to Django, presented by Jacob Kaplan-Moss.
Django is an MVC framework for Python for rapid development of interactive web sites. It is an MVC framework very much in the spirit of Ruby on Rails - I've done a small site in Rails and the parallels were very close between the two frameworks.
Django has a code generation framework, an ORM layer (which is very similar to Rails' ActiveRecord), an html template system (with a default syntax based on PHPs smarty template system), and integrated support for testing.
Django has an interesting testing feature called doctests. If you've worked with an interactive language with a REPL, you have probably used it to explore the behavior of code and to informally test the code. Doctests are a way of (almost literally) taking a cut and paste of the interactive session and vivifying the transcript as a regression test. I like the idea of a recorded test, but as Andrew and I talked about it, he convinced me that the literal representation wasn't all that great a choice for implementing those kinds of tests. I do like the reduction of effort that comes with that kind of testing, and recognize the inherent informality of it.
All that said, Django (like Rails) is big on doing test driven development.
I looked up the status of Django on Jython and apparently it's close to being a 1.0 release (nothing I'd recommend for use at HMS at the moment, but Sun has hired people to work on Jython and Django is one of the frameworks they are concerned with making work).
I'm looking forward to tomorrow. |
|
|
| Sending a signal... |
[Jul. 20th, 2008|11:25 am] |
Something has come up recently that has forced me to think about and articulate the difference in how I perceive occasional cigar smoking and the habit of cigarette smoking.
This is within the context of having kids and having to be in the position of having to be the arbitrator of what they're exposed to. Capable or not I am one of the people who must lead them though experiencing the world. The end goals being to get them to be able to operate independently, handle whatever the world throws at them and be able to help others.
The difference, I've recently come to realize is: it's the signal that each one sends.
Seeing a person decide to smoke a cigar at the end of the day, after most activity is over sends a certain signal. It is something which is done out of the context of normal daily activity. It is something that can wait, it can be delayed, it is optional.
Seeing a person stop all activity to step outside to take themselves away to smoke a cigarette sends a signal. A kid will see this adult decide that what they are about to do is more important than playing another game, more important than starting on desert, than watching the movie, than being with the kid for those five minutes.
The cigarette smoker is compelled to smoke, while the (typical) cigar smoker is not.
Kids pick up on what is important to the adults in their lives, its a necessary part of how they learn about the world. The compulsory behavioral traits that I exhibit in front of my kids seem to be the things which are most influential. I can tell them that brushing their teeth is good for them, but unless they see me do it, consistently, no amount of saying it to them causes the value to transfer. At least not at about 5 years of age.
I've become more and more aware of the signal that I send in a lot of contexts. Having children has made many of these signals much more apparent. |
|
|
| Idempotency or Singleton Memoization |
[Jun. 18th, 2008|09:14 am] |
sub makeDoOnce {
my($sub) = @_;
my $alreadyDone = undef;
my @result = undef;
my $exception = undef;
return sub {
die $exception if $exception;
if ($alreadyDone) {return wantarray ? @result : $result[0];}
eval {
my $w = wantarray;
if (not defined $w) { $sub->(@_)}
if ($w) {@result = $sub->(@_)}
else {$result[0] = $sub->(@_)}
};
$exception = $@ if $@;
die $exception if $exception;
$alreadyDone = 1;
return wantarray ? @result : $result[0];
};
}
|
|
|
| The same and yet different |
[Jun. 7th, 2008|09:05 am] |
I've just opened the book Dive into Python by Mark
Pilgrim. The first example is a function that creates a
(database?) connection string. Knowing Perl, I wrote a close equivalent so I
could look at the differences.
First, the Python:
def buildConnectionString(params):
"""Build a connection string from a dictionary of parameters.
Returns string."""
return ";".join(["%s=%s" % (k,v) for k,v in params.items()])
if "__main__" == __name__:
print buildConnectionString({
"server":"localhost",
"database":"master",
"uid":"sa",
"pwd":"sekret"
})
Ok, defines a function of one parameter (buildConnectionString) -
and that parameter is called out in the signature. This leads me to
suspect that Python may actually know
that the function takes a single argument (though not it's type), and
that information may be available to the runtime (this is the kind of
thing that helps IDEs and other software tools).
Within that function we can provide a docstring. This is
something I'm familiar with from Common Lisp and it's a feature that
I like. I suspect that I'll find that this is available in the Python
runtime later on...
Then Mr. Pilgrim immediately indoctrinates us to Python's list
comprehensions. Good. We can see that literal strings have
methods (we're calling .join on the literal ";"), so they're an object
(in Perl strings are not). I'm not sure what params.items() returns
yet - a flat list? A set of pairs? I'm sure this is waiting for me
on the next few pages.
Ok, this is a simple enough task, and the code is nice and
succinct. The list comprehension makes the task clear (at least to
me). What does the equivalent Perl look like? (well, my equivalent
Perl, this is _my_ way of doing it, TIMTOWTDI
after all).
use strict;
use warnings;
=head2 buildConnectionString
Build a connection string from a dictionary of parameters.
Returns string.
=cut
sub buildConnectionString {
my($params) = @_;
return join(";",map { "$_=$params->{$_}" } keys %$params);
}
if (__PACKAGE__ eq "main") {
print buildConnectionString({
"server" => "localhost",
"database" => "master",
"uid" => "sa",
"pwd" => "sekret"
}),"\n";
}
Ok, right off, I won't write Perl without the 'use strict;' and 'use warnings;' pragmas since
they are big time savers. The first way in which they save time is
that they help identify errors at compile-time that would otherwise be
caught at run-time, and they point out conditions that you likely
weren't handling (like using the contents of a variable before
assigning to it).
Then there is the external documentation. My team has adopted the
javadoc-like habit of making the api documentation in-line with the
code. The difference here though is that there is no formal
relationship between the POD and the function
itself - unlike in the Python docstring (and in Javadoc, where
adjacency is an implicit relationship). This means that the Perl
documentation can't be (easily) the same kind of runtime value-add as
the Python docstring (go Lisp).
The next thing is the function signature, or rather lack of, in
the Perl function. Again, my team has adopted the convention of using
a single line to do function argument destructuring - that way the
first line of our functions is at least representationally equivalent
to a signature. This makes our Perl applications a lot easier to read
and maintain. Of course there are some cases (at least in Perl) when
it can be useful to slightly modify the argument list, replace your
function call with some other and pretend like the first never
happened, though I'm not going into that right now. The main
difference is that our convention isn't a formal part of the language,
while in Python it is - and may additionally be a value-add at runtime
(I'm sure I'll find out later, again, this is the kind of thing that
helps IDEs and tools).
As far as the actual implementation, there is little logical or
semantic difference between Perl's join/map and Python's
join/list-comprehension. Perl's join is not a method. Neither uses
an iterator off of the collection (as Ruby would), but both apply an
operation to every (logical) item in a set (in the Python case it's a
pair, though as I said I'm not sure how the destructuring
happens yet).
There are other smallish differences too, I could have used Perl's
sprintf, which would have more closely aligned with Python's implicit formatter:
sub buildConnectionString {
my($params) = @_;
return join(";",map { sprintf "%s=%s", $_, $params->{$_} } keys %$params);
}
But either language is more than capable. With the overloading of
curly braces, even with my 10+ years of Perl experience, it's still a
bit fuzzy to my eyes where the break is between the end of the
hash-table (which is a map) and the code-block for the map. It's not
that I can't read that code, or read it quickly, the dual meaning of
the curly braces means that I have to use grammatical cues rather than
just syntactic ones.
This is less than a first impression mind you. I'm trying hard to
stay out of the Turing Tar pit.
I like the list comprehensions and I like that Mr. Pilgrim introduced
them right away in the book. I'm looking forward to the rest of
it. |
|
|
| The difference between requirements and code. |
[May. 18th, 2008|10:12 pm] |
|
Programmers tend to know this without questioning it and often without
being able to describe why: you can't reverse engineer requirements
from code. This idea seems to be irresistibly seductive to
non-developers though.
Requirements are, at their heart, declarative. Code is imperative.
Well I suppose that depends and I'll come back to this topic later. A
program is one possible solution to the requirement, one possible
implementation. It is not the requirement.
Requirements declare "clean the car". Code says "Get a bucket. Get
the car wash soap. Get rags. Get the hose. Turn on the hose. Fill
the bucket." and so on. It could also have said "Drive to the car
wash". Of course those instructions aren't necessary if the car isn't
dirty (but you can't tell that from the instructions). They also
could have meant "Wash the dog" or "Make lots of bubbles". If you
didn't already have it in your head that the intent was to wash a car
you'd probably be lost. Worse, you'd be lucky to figure out what it
meant.
And what are the chances that there is a bug in the code? That bug
would leave you with an error in your requirement. What if the
instructions in the code are working around limitations in the
programming language? What about the OS? The network? The database?
What if....what if the program is working around errors in the data,
which were introduced by some other application? What if it's a
legacy version of the program itself?
You can use code as a clue though, as a guide. If you know what the
goal probably was you can use the code as a way to try to disprove
your hunch.
You can't reverse engineer requirements from code - you'll never hit
the spirit or the meaning of the original requirements. You can
certainly use the existing code to produce requirements for a system
that re-implements the code itself (including workarounds, bugs and
all), but you'll never produce the intent or core need that caused the
application to be created in the first place. Not from looking at the
code alone.
So, what about the earlier supposition? When you use non-imperative
approaches to implement requirements what you end up with is a
declarative (or functional) statement of...the requirements. They are
stated in a more formal manner certainly, but they will be a statement
of the requirements nonetheless.
This a core reason DSLs and XML configurations are often good
solutions. SQL is successful because it allows you to say what you
want, not how to get it from some internal, vendor specific, database
API.
|
|
|
| Compile Time Evaluation, w/o Macros |
[Apr. 28th, 2008|05:05 pm] |
We all have constants in our applications, sometimes we have numeric values that are only used once and are clearer if they're left as an expression (like the number of seconds in a day, well these things are better off as constants in many cases, though I'm only using it as an example).
(defun days->seconds (days)
(* days (* 24 60 60)))
(format t "seconds in ~a days:~a~&" 1 (days->seconds 1))
(time (loop repeat 1000000
do (days->seconds 10)))
=> 2.093 sec.
In common lisp there is a reader macro you can use to have these kinds of expressions executed at compile time - and thus use no CPU time during your running application (or even within the compiled executable):
(defun compile-time-days->seconds (days)
(* days #.(* 24 60 60)))
(time (loop repeat 1000000
do (compile-time-days->seconds 10)))
=> 1.87s
Admittedly in this case it's not a worthwhile improvement in performance - but these are the kinds of things many technologies represent as build processes. When you're configuring a build to be specialized for a locale, or particular production environment, you either factor those parts of the application into a configuration file, or you generate code (almost certainly with some tool set which is not your programming language). |
|
|
| Updated Lisp Presentation Slides |
[Apr. 28th, 2008|01:54 pm] |
I took the "make the title of your presentation confrontational" advice and changed the title from "Introduction to Lisp" to "Why Won't Lisp Just Go Away?"
It went much more smoothly the second time around. I also got some good feedback from Rob DiMarco and JP Vossen (author of the Bash Cookbook). It has been difficult to come up with a 60-90 minute talk, keeping it short enough, while still talking up the strong points. |
|
|
| Data Analysis in the [Unix] shell |
[Apr. 28th, 2008|11:51 am] |
|
I often prefer the shell and Unix utilities to having to wait to
load data into a relational database or MS Access (unless a lot of
what I'm doing requires complex joins). It's often possible to not
even have to transform the encoding of the files before analyzing
them. There are a couple of recipes for doing SQL equivalents with
the shell utilities. There are a bunch of them, these are a few that
I just used this morning. Most of the time all it takes is a bit of
imagination about how to create a simple data-flow by composing a
small handful of ubiquitous Unix utilities.
All these examples will also work within
the Cygwin environment for
Windows, or at a Terminal in OS X (especially when combined with the
additional software available
via Fink
or Mac Ports projects..
"SELECT COUNT(*) FROM TABLE"
Just selecting the count of records from an input file is one of the
easiest things to accomplish (if your file is already line-oriented).
The wc, or word count utility can do this easily. By
default it counts characters, words and lines. With '-l' it will emit
only the count of lines.
user@host:~/data$ wc -l table.tab
10
If you want to ignore the header, start with the second line (see
the next example for a more thorough explanation):
user@host:~/data$ tail -n +2 | table.tab
9
"SELECT COUNT(DISTINCT(FIELD1)) FROM TABLE"
For getting a distinct count of values in a column :
user@host:~/data$ cut -f1 table.tab | tail -n +2 | sort | uniq -c
The first part cut is a utility that allows you to take
particular columns from a tab-delimited file, or character ranges from
a fixed-width file. cut also allows you to specify the
delimiter - but be warned that the commonly encountered CSV format
requires more complexity in parsing than just using a comma as a
delimiter. So this takes the first column out of the input file.
The next part of that is the tail command. tail
is a command that outputs the end or 'tail' of a file. The '-n'
option says what line number to start at (counted from the end of the
file) - in this case the '+' tells tail to start at the
second line from the beginning. This effectively tosses out the
header line.
Next the values themselves are sorted. This is necessary for the
uniq command, which will only collapse or count duplicate
lines when they are adjacent.
Finally we reduce duplicate lines with uniq. The '-c'
tells it to emit the count of duplicates when collapsing them.
Dealing with varoius file archive types
I often work with files in zip archives and tar (unix tape archive)
archives, sometimes with additional compression applied to them (.Z,
unix compress; .gz, gzip; and .bz2 bzip). It is possible to work with
these files without having to unarchive or decompress them permanently
if all you need is a simple count of lines or to only process them
once.
Pulling a file from a Zip Archive
To pull one or more files from within a zip archive, and send them
to another command (as part of a pipeline):
user@host:~/data$ unzip -l archive.zip
Archive: archive.zip
Length Date Time Name
-------- ---- ---- ----
34 04-28-08 11:01 table1.tab
56 04-28-08 11:01 table1.tab
user@host:~/data$ unzip -c archive.zip table1.tab table2.tab | wc -l
36
That example uses the unzip command to pull 2 files out
and send them to standard output - to either the screen or the next
command in the pipeline. In this case that is wc to get the
combined record count for the two files. We don't have to worry about
cleaning up the two files when we're done either.
unzip -l lists the files in a zip archive. If we left off
the '-c' (and the '| wc -l') unzip would have extracted just those two
files from the archive (in case there were more and you only wanted a
small handful).
Dealing with various file encodings
The first barrier to using most of these readily available utilities
is often the file formats themselves. The utilities are line-oriented
for the records and tab-oriented for the fields. So the first step is
often figuring out how to even get your data into a tab-delimited
format.
There is more to that than I have time to write right now. I'll
work at following up with examples for another posting. |
|
|
| Some Reflection in CLOS |
[Apr. 25th, 2008|10:22 am] |
CLOS and slot (member) lookup recently came across c.l.l.
;; some reflection in clos
(defclass test-class
()
((s1 :reader :slot1
:writer :set-slot1)
(s2 :reader :slot2
:writer :set-slot2)
(s3 :reader :third-slot
:writer :update-three)))
(defvar x nil)
(setf x (make-instance 'test-class))
(use-package :clos)
(class-direct-slots (class-of x))
;; =<
;; (#<standard-direct-slot-definition s1="S1" #x19f471c9="#x19F471C9">
;; #<standard-direct-slot-definition s2="S2" #x19f471f5="#x19F471F5">
;; #<standard-direct-slot-definition s3="S3" #x19f47221="#x19F47221">)
(class-direct-superclasses (class-of x))
;; =>
;; (#<standard-class standard-object="STANDARD-OBJECT">)
(mapcar #'slot-definition-readers (class-direct-slots (class-of x)))
;; => ((:SLOT1) (:SLOT2) (:THIRD-SLOT))
(mapcar #'slot-definition-writers (class-direct-slots (class-of x)))
;; => ((:SET-SLOT1) (:SET-SLOT2) (:UPDATE-THREE))
|
|
|
| IRC in Emacs with ERC |
[Apr. 14th, 2008|10:29 am] |
Jason Stelzer just turned me on to ERC, an IRC mode for Emacs. It even just works with Windows, and comes with recent versions of Emacs (as of April 2008). There are even easy ways to automatically start a session and connect to a channel - wrapping that up helped make it more automatic:
(defun krb-start-erc ()
(interactive)
(erc-open
"irc-server-name-or-ip"
6667
"your-username"
"your-full-name"
t ;; connect
"your-passwd")
(erc-join-channel "datapump"))
|
|
|
| Keyword symbols vs non-keyword symbols |
[Mar. 10th, 2008|08:57 pm] |
It took too long for it to really click so while I'm thinking about this
again I'm writing it down. There are good
reasons why you should avoid using symbols of the form 'foo in
preference to :foo. The main semantic difference is that the one
preceded with a single quote is tied to the package where it occurrs, while the
one preceded by the colon is placed within the keyword package.
What this means to you and your code may vary. One immediate reason to use
keyword symbols (the ones that start with a colon) is that they are the same
everywhere - they are equal (eq even). This means
that they're shared - if you take symbols as arguments, or return symbols as
values, then the keyword symbols will be easier to use. You won't have to
export them from your package for them to be accessible (except in very rare,
unhygenic, cases, eg: aif).
CL-USER> (defpackage :foo
(:use :common-lisp)
(:export :qux))
#<PACKAGE "FOO">
CL-USER> (in-package :foo)
#<PACKAGE "FOO">
FOO> (defun qux ()
'bar)
QUX
FOO> (equal 'bar (qux))
T
FOO> (in-package :cl-user)
#<PACKAGE "COMMON-LISP">
CL-USER> (equal 'bar (foo:qux))
NIL
CL-USER>
That last one there was somehow confusing to me for a long time. Using
keyword symbols solves that issue neatly:
CL-USER> (defpackage :foo
(:use :common-lisp)
(:export :qux))
#<PACKAGE "FOO">
CL-USER> (in-package :foo)
#<PACKAGE "FOO">
FOO> (defun qux ()
:bar)
STYLE-WARNING: redefining QUX in DEFUN
QUX
FOO> (equal :bar (qux))
T
FOO> (in-package :cl-user)
#<PACKAGE "COMMON-LISP">
CL-USER> (equal :bar (foo:qux))
T
CL-USER>
The other reason to use them is that they're not just interned for the
package they were defined in, but all keyword symbols are interned in
the keyword package - saving memory in your running instance.
This point was lost on me until I started using packages seriously. I had
no context for understanding it until I started using packages to organize my
code.
In general you should probably be using keyword symbols as your default.
|
|
|
| Landmark Parsing |
[Feb. 29th, 2008|12:22 am] |
Someone asked for an example so I dug up my jspwiki tool. Here is the guts of the parser:
sub makeParser {
my($data) = @_;
my $pos = 0;
my $setData = sub { $data = $_[0]; $pos = 0; };
my $start = sub { $pos = 0 };
my $fwd = sub { return -1 if $pos == -1; $pos += $_[0]; $pos = -1 if $pos >= length($data); $pos };
my $bck = sub { return -1 if $pos == -1; $pos -= $_[0]; $pos = -1 if $pos < 0; $pos };
my $bckTo = sub { return -1 if $pos == -1; $pos = rindex $data, $_[0], $pos; };
my $fwdTo = sub { return -1 if $pos == -1; $pos = index $data, $_[0], $pos; };
my $fwdPast = sub {
return $pos if $pos == -1;
$pos = index $data, $_[0], $pos;
return $pos if $pos == -1;
$pos += length($_[0]);
$pos >= length($data) ? $pos = -1 : $pos;
};
my $btwn = sub {
return -1 if $pos == -1;
my $s = $fwdPast->($_[0]);
return undef if $s == -1;
my $e = $fwdTo->($_[1]);
return undef if $e == -1;
my $item = substr $data, $s, $e - $s;
return $item;
};
my $all = sub {
my @all;
while (-1 != $pos) {
my $item = $btwn->(@_);
last unless $item;
push @all, $item;
}
return @all;
};
return ($setData,$start,$fwd,$bck,$fwdTo,$fwdPast,$bckTo,$btwn,$all);
}
and here is how it gets used:
sub getPageInfo {
my($topic) = @_;
my $data = $UserAgent->get("$BaseURL/PageInfo.jsp?page=$topic")->content;
print "$data\n";
my($setData,$start,$fwd,$bck,$fwdTo,$fwdPast,$bckTo,$btwn,$all) = makeParser($data);
my $table = $btwn->('Version','</table>');
($setData,$start,$fwd,$bck,$fwdTo,$fwdPast,$bckTo,$btwn,$all) = makeParser($table);
print join("\t",qw(Version Date Author Size Changes from Previous)),"\n";
dumpRow($_) for $all->('<tr>','</tr>');
}
sub dumpRow {
my($row) = @_;
my($setData,$start,$fwd,$bck,$fwdTo,$fwdPast,$bckTo,$btwn,$all) = makeParser($row);
print join("\t",map { simpleStrip($_) } $all->('<td>','</td>')),"\n";
}
|
|
|
| Automating Invitations to Mailman Lists |
[Feb. 28th, 2008|04:07 pm] |
This is just a code example - I had a new mailing list created at work and needed a large-ish list of people to get onto it. To save a bit of effort (admittedly a small amount) I scripted it.
; (require 'asdf-install)
; (asdf-install:install :drakma)
(require :drakma)
(use-package :drakma)
;; (http-request "http://mailman/mailman/listinfo/code-reviews")
;; <FORM Method=POST ACTION=\"http://mailman/mailman/subscribe/code-reviews\">
;; <INPUT type=\"Text\" name=\"email\" size=\"30\" value=\"\">
;; <INPUT type=\"Text\" name=\"fullname\" size=\"30\" value=\"\">
;; <INPUT type=\"Password\" name=\"pw\" size=\"15\">
;; <INPUT type=\"Password\" name=\"pw-conf\" size=\"15\">
;; <input type=radio name=\"digest\" value=\"0\" CHECKED> No
;; <INPUT type=\"Submit\" name=\"email-button\" value=\"Subscribe\">
(defun subscribe-user-to-code-reviews (email full-name pass)
(drakma:http-request "http://mailman/mailman/subscribe/code-reviews"
:method :post
:parameters
`(("email" . ,email)
("fullname" . ,full-name)
("pw" . ,pass)
("pw-conf" . ,pass)
("digest" . "0")
("email-button" . "Subscribe"))))
; (subscribe-user-to-code-reviews "kburton@some-domain.com" "Kyle Burton" "some-password")
;; I went into outlook, opened the shared public contact list, did
;; CTRL-A, CTRL-C, went to excel and pasted, then saved as addrs.csv, then
;; do a bit of vi-magic to format the lines as expected below
(defun get-lines ()
(with-open-file (in "/home/kburton/foo.txt" :direction :input)
(loop for line = (read-line in nil nil) then (read-line in nil nil)
while line
collect line)))
; (get-lines)
(require :cl-ppcre)
(use-package :cl-ppcre)
(defun split-line (line)
(multiple-value-bind
(string matches)
(cl-ppcre:scan-to-strings "\"([^\"]+)\".*E-mail: (.+)$" line)
(list (aref matches 0)
(aref matches 1))))
;(split-line "\"Smith, John\" E-mail: jsmith@some-domain.com")
(loop for line in (get-lines)
while line
do
(destructuring-bind
(name email)
(split-line line)
(format t "name:~a email:~a~&" name email)
(subscribe-user-to-code-reviews email "" "")))
|
|
|
| Developer to Engineer |
[Feb. 27th, 2008|08:05 pm] |
Kudos to you Josh [Crean]. He stepped up and gave a tech-talk today at work
on using unit testing and Devel::Cover. Not only did he
give the talk, but he did it as a live interactive session - creating
a module and its test, fielding questions and taking input from the
other developers.
Actually he followed test driven development. This was based on
comments by the audience -- he switched tack during the session.
Josh responded well to the "should we shoot for 100% coverage",
and even showed an example (by coding it up as part of the live
performance) where unit tests can drive software to 100% coverage and
still contain bugs (100% coverage based on unit tests doesn't
prove the code is correct in any way).
If you haven't checked out Test::Unit or Devel::Cover on CPAN, you
should take a look. They can really improve the software you're
developing. |
|
|
| Yahoo as a takeover target for Microsoft |
[Feb. 4th, 2008|08:18 am] |
|
Something interesting is going on that is a rare event: large tech firms are looking to merge. Microsoft has offered to merge (buy) Yahoo. This appears to be in an attempt to strengthen its on-line business units.
Some bloggers are taking this as Microsoft effectively admitting that they can't catch up in search or on-line advertising (not necessarily technologically, but definitely with respect to market share).
A lot of people have a lot to say...regardless of all the ideas that are flouted, the effects will ripple out for a long time. If MS succeeds, it will cause issues for Yahoo's current employees (there is rumor that many don't want to work for MS), if Google steps in and somehow scuttles the deal, it may be seen as some kind of evidence of weakness in MS and strength in Google.
Regardless, Microsoft seems to have some serious issues to work out - it seems to be in the process of finding itself again. How are they going to brand their on-line presence after this kind of merger? What are they going to do with Yahoo from a technology perspective? Yahoo is definitely not an MS shop (unix, PHP, Java). I wonder about this in context of what happened with the Hotmail acquisition of old...
|
|
|
| Scheme's let loop form |
[Jan. 30th, 2008|10:03 pm] |
I particularly like Scheme's let loop form:
(let loop ((x 1) (y 2))
(write "x=") (write x)
(write ", y=") (write y)
(newline)
(cond ((= x y)
#t)
(else
(loop (+ 1 x) y))))
It allows you to define a locally scoped recursive function, with arguments and initial values, that is immediately called. It is clean and private - the name loop only exists within the let declaration - it is not available in any other scope.
I wanted the same form for programming in CL. There are two unfortunate things about this, the first is that loop is already defined (a public symbol in the CL-USER package). Although this is analogous to a reserved word in many programming languages, I could 'steal' the meaning of loop for just this scope, I just choose not to use the symbol loop when using this macro. The other unfortunate thing is that you can't 'extend' the let form (at least I don't know if it supports the same kind of run-time extension that setf supports). Regardless, if I chose a sensible name, it can be done:
(defmacro llet (name bindings &body body)
(let ((args (mapcar #'first bindings))
(initial-values (mapcar #'second bindings)))
`(labels ((,name ,args
,@body))
(,name ,@initial-values))))
(macroexpand-1
'(llet lp ((x 1) (y 2))
(format t "x:~a y:~a~&" x y)
(cond ((= x y)
t)
(t
(lp (1+ x) y)))))
(llet lp ((x 1) (y 2))
(format t "x:~a y:~a~&" x y)
(cond ((= x y)
t)
(t
(lp (1+ x) y))))
(llet recur ((x 1) (y 2))
(format t "x:~a y:~a~&" x y)
(cond ((= x y)
t)
(t
(recur (1+ x) y))))
Not exactly the same, but just as useful and clean.
|
|
|
| navigation |
| [ |
viewing |
| |
most recent entries |
] |
| [ |
go |
| |
earlier |
] |
| |
|
|