Home
Asymmetrical [entries|archive|friends|userinfo]
kyle_burton

[ website | My Website ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

List Comprehensions in Clojure [Nov. 18th, 2008|10:03 pm]
[Tags|]

Just a quick example of list comprehensions in Clojure.



;; generate all the positions on a chess board:
(for [file "ABCDEFGH"
      rank (range 1 9)]
  (format "%c%d" file rank))
;; ("A1" "A2" "A3" "A4" "A5" "A6" "A7" "A8" 
;;  "B1" "B2" "B3" "B4" "B5" "B6" "B7" "B8" 
;;  "C1" "C2" "C3" "C4" "C5" "C6" "C7" "C8" 
;;  "D1" "D2" "D3" "D4" "D5" "D6" "D7" "D8" 
;;  "E1" "E2" "E3" "E4" "E5" "E6" "E7" "E8" 
;;  "F1" "F2" "F3" "F4" "F5" "F6" "F7" "F8" 
;;  "G1" "G2" "G3" "G4" "G5" "G6" "G7" "G8" 
;;  "H1" "H2" "H3" "H4" "H5" "H6" "H7" "H8")

(count  (for [file "ABCDEFGH"
              rank (range 1 9)]
          (format "%c%d" file rank)))
;;64

;; the pythagorean triples example:
(for [aa (range 1 10)
      bb (range 1 10)
      cc (range 1 10)
      :when (= (* cc cc)
               (+ (* aa aa)
                  (* bb bb)))]
  (list aa bb cc))
;; ((3 4 5) (4 3 5))

;; all permutations?
(defn all-permutations [things]
  (if (= 1 (count things))
    (list things)
    (for [head things
          tail (all-permutations (disj (set things) head))]
      (do
        (cons head tail)))))

(all-permutations '(a b c))
;; ((a c b) (a b c) (b a c) (b c a) (c a b) (c b a))

linkpost comment

Cloud Con East Notes [Oct. 21st, 2008|11:29 am]
Overall Computing Among The Clouds was a good conference.

The main theme I took away from it was that it is a continuing trend
and services will continue to appear and be developed that will make
taking advantage of these resource pools even easier.

The trend for physical data centers will continue to become more and
more outsourced to organizations that can provide those services with
greater economy of scale. Currently Amazon's offerings are slightly
more expensive than a hosted system that you own. More guidelines
came out about when the trade off is appropriate. Keep in mind that as
a trend, cloud computing is still new - it is likely that these trade
offs will shift - even as soon as over the next year or two (eg: it is
likely, in my opinion, that the raw cost of 24x7 allocation will fall
below the cost of ownership due to economies of scale - and I think
this will happen by no longer than 3 years).

Own your own:

- you have a steady load

- AWS, used 24x7x365 is more expensive than owning and hosting
physical machine - though this does not include administrative
costs

- you have hard SLA's

- you can not anticipate peak loads but must have spare capacity to
handle peaks

- you have sensitive data that requires more certainty about where the
data gets stored, etc

- though there are cloud computing platforms that are working
towards security data data protection certifications

Use an AWS style platform:

- your needs scale up and then can scale back down

- you can plan for these needs such that you have time to provision
the

- you have no capitol budget

- you have a service where you can charge in direct proportion to
utilization rather than based on capacity.

Higher Level Application Stacks, like Google App Engine:

- keep in mind these are new and the space is still being explored,
more will appear, a model will develop around these, more languages
and frameworks will be supported - though it mostly will be based
on those that can be easily hosted/sandboxed, which is why you're
seeing Python first and Java following along

- all of the provisioning is hidden from you - Google's offering
dynamically scales your application up and down based on
utilization - this is a significant reduction in your design and
administrative workload

- simplicity of application development _and_ scalability are rarely
found in existing technologies, this is one of the most compelling
aspects of these kinds of application stacks



Organizations and individuals are starting to learn how they need to
change the design of their services and applications to take better
advantage of the Cloud. It takes a bit of a change in mind-set - the
phrase was dropped "Machine Instances are the new processes" and I
think that's an appropriate framing of one of the changes in mindset
you should have to take advantage of the sea of resources that is
becoming available.

Changing your software to be more easily bootstrapped, eliminate the
assumption that you have access to local, disk based services -
everything is pulled remotely, use URLs, and services - don't assume
local interfaces, assume remote. Design to come up / boot faster -
scaling up / down quickly you don't get the same amortization over
time for start up costs. Design with the crash fast mentality - as
robust as these systems are, you should still design with the idea in
mind that the system could go away at a moment's notice -- in addition
to the benefit to unexpected outages, this allows you to scale _down_
faster. Keep your persistent data in the provided data stores and
use the provided queuing systems to distribute work.


What these offerings need to do to gain wider acceptance and the the
value-adds to come:
--------------------------------------------------------

I think that harder SLAs will develop as there is more competition.
Higher level tools will develop on top of the instance-based cloud
offerings (EC2) to allow for more automated provisioning - this will
make it easier for you, but not so easy as for full restricted stacks
like GAE.

I think we'll see tools and offerings develop that will come down
towards traditional data centers to allow a simpler mixing of a
traditional service with bleed over to the cloud as resources need to
be scaled up but also so that you can have control over the processing
or data (sensitive data?) in your own protected environment but push
generic activity up into the cloud as necessary.



Other notable happenings from the conference or that were recently
announced:
------------------------------------------------------------------

- Microsoft is creating AMIs for Windows on EC2.

- Google just announced that Java will be a supported App Engine
development language - previously only Python was supported.




Haskell in the corporate environment
----------------------------------------

Seemed out of place for the event - not really cloud-ish. Though I
personally see Functional Programming being a larger industry trend.
It follows from structured -> procedural -> object oriented ->
functional - with respect to the time line of coming out of academia at
least, not necessarily the pejorative idea of one being 'higher level'
than the other - though so far, time has implied that with the other
patterns.

The presenter (Jeff Polakow) is using it extensively at his current
employer.

Those kinds of firms (wall st) allow a lot of latitude to the
technical staff, so its easier to experiment (R&D) with new
technologies. It's much harder for a company like HMS to decide to
take on something like this - it's hard to find developers, how to
develop, deploy, monitor and design with these technologies is
undetermined for companies like HMS.

I think the FP trend is being pushed into industry by the shift to
multi-core, the past difficulties of developing concurrent,
parallel/distributed applications (its hard and giving developers
access to create threads directly has proven to be a difficult road
to travel), the need for infrastructural level automatic scaling, and
the easier path to robustness that languages like Erlang offer.

In languages like Java, you have to take into consideration all the
libraries you're using with respect to their referential transparency
- it's not the default. In the FP languages referential transparency
is the _default_ case, so you can, in general, make that assumption.
The underlying stack can also make that assumption about your code as
well - which is why the concurrency / distribution model is less
coupled to the implementation than it is in the more imperative
languages.


Horizontal Scaling with HiveDB
----------------------------------------

CafePress has a _large_ catalog. I was kind of shocked to hear that
they have 265 million products. They have a low margin based on the
aggregate amount of data they have to store and serve up, so solutions
like Oracle were just not an option for them simply from a cost basis.

They spent time analyzing their options and didn't find anything that
fit their needs (cost, performance, no off-line resharding), and
created a more scalable data storage architecture.

The solution they created performs better, scales better, is more
robust and has a better SLA than many of the other commercial
solutions.

It is effectively a hibernate extension that uses MySQL and does data
partitioning (pseudo-automatically) by using a set of replicated MySQL
databases as a catalog to map to where the data is stored for your
shard (replicated 3x). The system supports dynamic repartitioning -
migration of shards away from a shard-host to get less busy data away
from data that is more 'hot' - the busiest data sets end up on their
own shard-node with everything else having been pushed away from them.

They only need to 'lock' the data for a single user's data when
migrating it. The system as a whole doesn't go down. The MySQL
catalogs are replicated (3 machines, master-master, writing to 1) and
can be upgraded by taking 1 of the 3 out of the cluster. The same
kind of approach goes for the other sharing servers.


Panel Discussion
----------------------------------------


Hive and Hadoop
----------------------------------------

Hive is a data storage system built on top of Hadoop with its own
query language (HiveQL), built by Facebook. The goals are a bit
different from HiveDB - HiveDB is more for OLTP, while Hive is more
for large-scale analytics. Being built on top of Hadoop, HiveDB is
much more batch oriented. Facebook uses it for doing analytics /
data-mining / machine-learning of their user and transactional data
sets (logs, user activities, etc.) to mine out aggregate and trending
intelligence from the large data set.

Surprising facts: 2Tb of growth _per_ _day_.


Building Scalable Web Applications with Google App Engine
---------------------------------------------------------

Stacks like GAE take a more managed environment approach than the more
raw / primitive services provided by Amazon. The two fit into
different use cases though and, IMO, one will not necessarily supplant
the other.

GAE takes away from you all the concerns about deployment, production
architecture, system management or administration. It gives you a data
store with an OO API, and a web-app development environment that you
develop your application within. There are things you can't do, for
example, you can't run arbitrary software or services on GAE like you
can on the more machine-image based cloud services (AWS EC2).

What you gain from giving up those capabilities is: Google's
infrastructure for scaling is _your_ infrastructure for scaling. Your
app is designed in a psuedo-functional way - the stack encourages you
to design your app to perform all dynamism on put/post time and to
just render/display at get time. This approach helps with the scaling
of the system. Storage location transparency helps with spooling up
other instances of the app in disparate data centers, etc.

This kind of stack really makes it easy to develop the most common
case of web applications - it is both easy to do and it scales. This
is a combination that almost _never_ get together.

I see these kinds of stacks as becoming more established and a large
part of Internet based application development - I think that more
organizations will offer these kind of stacks across more
technologies.

You really should sign up and at least try GAE out.


Developing and Deploying Java applications on the Amazon Elastic Compute Cloud
------------------------------------------------------------------------------

Chris Richardson has created cloud-tools, a package of utilities (and
a maven plug-in) for provisioning EC2 instances, pushing your
application up and executing tasks across your cluster of instances.
link1 comment|post comment

destructuring-bind [Sep. 18th, 2008|08:25 am]
[Tags|]

;; This is a typical usage, for pulling apart a list                                                                         
(destructuring-bind
      (first second)
    '(1 2)
  (format t "~%~%;;; => first:~a second:~a~&" first second))

;;; => first:1 second:2                                                                                                      

;; You can also pull apart improper lists:                                                                                   
(destructuring-bind
      (first . second)
    '(1 . 2)
  (format t "~%~%;;; => first:~a second:~a~&" first second))

;;; => first:1 second:2                                                                                                      

;; The first argument to destructuring-bind is a lambda list, but you                                                        
;; can grab the remainder by either using a dotted list:                                                                     

(destructuring-bind
      (first second . stuff)
    '(1 2 3 4 5)
  (format t "~%~%;;; => first:~a second:~a rest:~a~&" first second stuff))

;;; => first:1 second:2 rest:(3 4 5)                                                                                         

;; or you can grab the remainder with &rest, just like you do for                                                            
;; functions that take a variable number of arguments:                                                                       
(destructuring-bind
      (first second &rest stuff)
    '(1 2 3 4 5)
  (format t "~%~%;;; => first:~a second:~a rest:~a~&" first second stuff))

;;; => first:1 second:2 rest:(3 4 5)                                                                                         

;; It really is a lambda list, you can use default parameters:                                                               
(destructuring-bind
      (first second &optional (third 'default))
    '(1 2)
  (format t "~%~%;;; => first:~a second:~a third:~a~&" first second third))

;;; => first:1 second:2 third:DEFAULT                                                                                        

(destructuring-bind
      (first second &optional (third 'default))
    '(1 2 3)
  (format t "~%~%;;; => first:~a second:~a third:~a~&" first second third))

;;; => first:1 second:2 third:3                                                                                              


;; And you can use keyword parameters:                                                                                       
(destructuring-bind
      (first second &key third)
    '(1 2 :third 3)
  (format t "~%~%;;; => first:~a second:~a third:~a~&" first second third))

;;; => first:1 second:2 third:3                                                                                              




;; Finally, you can use it to 'unparse' trees as well, which is a                                                            
;; really great feature, since your variable declaration matches the                                                         
;; 'shape' of the data strucutre you're pulling apart.  This technique                                                       
;; is really handy for dealing with XML after it's been converted to                                                         
;; s-expressions.                                                                                                            
(destructuring-bind
      (a (b (c d e (f g) h i j)) &rest remainder)
    '(1 (2 (3 4 5 (6 7) 8 9 10)) 11 12 13 14 15)
  (format t
          "~%~%;;; => a:~a b:~a c:~a d:~a e:~a f:~a g:~a h:~a i:~a j:~a remainder:~a ~&"
          a b c d e f g h i j remainder))

;;; => a:1 b:2 c:3 d:4 e:5 f:6 g:7 h:8 i:9 j:10 remainder:(11 12 13 14 15)                                                   



linkpost comment

Bring me Data! Landmark Parsing in CL [Aug. 16th, 2008|12:55 am]
[Tags|]

(require :drakma)
(require :landmark-parser)

(use-package :landmark-parser)

(defvar *doc* (drakma:http-request "http://www.loc.gov/standards/iso639-2/php/code_list.php"))

(defun get-main-table (doc)
  (extract-between
   (new-parser doc)
   '((forward-past "<table ")
     (forward-past "<table ")
     (forward-past "<tr")
     (forward-to "<tr"))
   '((forward-to "</table"))))
                      
(defun get-rows (block)
    (extract-all (new-parser block)
                 '((forward-to "<tr"))
                 '((forward-past "</tr>"))))
                                   
(defun row->cells (row)                                                                                                   
  (extract-all (new-parser row)    
               '((forward-past "<td")
                 (forward-past ">"))
               '((forward-to "</td>"))))

(with-open-file (out "/home/mortis/iso.tab"
                     :direction :output
                     :if-exists :supersede)
  (loop for row in (mapcar #'row->cells (get-rows (get-main-table *doc*)))                                                
     do
       (format out "~{~a~^      ~}~&" row)))

I just uploaded the Landmark Parser tonight.

linkpost comment

OSCon Day 1 [Jul. 21st, 2008|10:57 pm]
Andrew and I arrived for registration and took advantage of the
continental breakfast before heading up to the Intro to Python.

O'Reilly had the registration process pretty streamlined. They had a
long bank of laptops wich you needed only enter your registration
code, or your email address (if you registered on the oscon conference
web site). Register, then walk up to the materials station and pick
up your ID and badge card.

There were plenty of juices, coffee, fruit and pastries. There was
also plenty of seating. To either O'Reilly's or the Oregon Conference
Center's credit, things were very well organized.

The conference room we were must have had seating for a few hundred
people and it was effectively full. There was limited space for each
attendee and their items (it was at least cramped for me) - though
they expected a laptop per attendee - there were plenty of power
strips, laid along every other row of tables, within easy reach of
every single seat. It was well planned and laid out.

The intro to Python got underway at 8:30 and although it was geared
toward an audience with some programming experience, it assumed (as
the title suggested) no python experience. Steve Holden was a great
speaker, filling in twice with anecdotes while technical issues were
worked out with equipment (once was a mis configuration of his laptop,
the other was a power interruption).

Python is a very capable language. It is more consistent about its
OO and syntax when compared to Perl. It is also a lot bigger on
conventions, broadly adopted by the community. This is mostly focused
on formatting (one expression per line), in-line documentation and
coding stle in general.

Functions are first class types, you can take a function into a
variable, you can implement the equivalent of funcall and apply in
python. Functions can be passed as arguments. Python supports
positional parameters, default values for function params and calling
functions, any function, with positional arguments, named arguments, a
tuple of arguments (similar to funcall), or a dictionary (an indirect
way of using named arguments).

Python actually has a lot of features which were inspired by
functional programming (including list comprehensions).

Python is byte-compiled, like Java. You write code in a .py file, and
the first time it is loaded as a module (import), python compiles the
code for you. The time stamp check fo the .pyc vs the .py file is
transparent, it's automatically handled.

Strings are immutable, which is something that helps Jython be a
natural fit in the JVM.

Python supports some destructuring constructs, based in what it calls
tuples. It's easier to show an example:

a, (b, c) = (1, (2, 3))

print a,b,c => 1, 2, 3

Tuples, and this kind of binding syntax, is widely used in processing
things like lists, and maps.

An interesting feature of the language is the pair of functions,
local() and gobal(). local() returns a dictionary (Python's name for
a Map), of all of the variable bindings (and values) that are visible
in the current scope (exclusive of global variables). globals()
returns the variables in the entire module's scope (not local, lexical
or class scope, and not global in the sense of a Perl global - not
universally global).

Other highlights:

- the yield() form, which is like a weak kind of continuation
- for, and while loops can have an else clause, which is executed
when the form terminates normally (as opposed to breaking out
of the loop)
- the Python try/catch form (try/except/finally) can have an else
form, again, which is executed if no exception was thrown in
the try block


After a break for lunch, both Andrew and I attended the Introduction
to Django, presented by Jacob Kaplan-Moss.

Django is an MVC framework for Python for rapid development of
interactive web sites. It is an MVC framework very much in the spirit
of Ruby on Rails - I've done a small site in Rails and the parallels
were very close between the two frameworks.

Django has a code generation framework, an ORM layer (which is very
similar to Rails' ActiveRecord), an html template system (with a
default syntax based on PHPs smarty template system), and integrated
support for testing.

Django has an interesting testing feature called doctests. If you've
worked with an interactive language with a REPL, you have probably
used it to explore the behavior of code and to informally test the
code. Doctests are a way of (almost literally) taking a cut and paste
of the interactive session and vivifying the transcript as a
regression test. I like the idea of a recorded test, but as Andrew
and I talked about it, he convinced me that the literal representation
wasn't all that great a choice for implementing those kinds of tests.
I do like the reduction of effort that comes with that kind of
testing, and recognize the inherent informality of it.

All that said, Django (like Rails) is big on doing test driven
development.

I looked up the status of Django on Jython and apparently it's close
to being a 1.0 release (nothing I'd recommend for use at HMS at the
moment, but Sun has hired people to work on Jython and Django is one
of the frameworks they are concerned with making work).

I'm looking forward to tomorrow.
link1 comment|post comment

Sending a signal... [Jul. 20th, 2008|11:25 am]
[Tags|]

Something has come up recently that has forced me to think about and
articulate the difference in how I perceive occasional cigar smoking
and the habit of cigarette smoking.

This is within the context of having kids and having to be in the
position of having to be the arbitrator of what they're exposed to.
Capable or not I am one of the people who must lead them though
experiencing the world. The end goals being to get them to be able to
operate independently, handle whatever the world throws at them and be
able to help others.

The difference, I've recently come to realize is: it's the signal that
each one sends.

Seeing a person decide to smoke a cigar at the end of the day, after
most activity is over sends a certain signal. It is something which
is done out of the context of normal daily activity. It is something
that can wait, it can be delayed, it is optional.

Seeing a person stop all activity to step outside to take themselves
away to smoke a cigarette sends a signal. A kid will see this adult
decide that what they are about to do is more important than playing
another game, more important than starting on desert, than watching
the movie, than being with the kid for those five minutes.

The cigarette smoker is compelled to smoke, while the (typical) cigar
smoker is not.

Kids pick up on what is important to the adults in their lives, its a
necessary part of how they learn about the world. The compulsory
behavioral traits that I exhibit in front of my kids seem to be the
things which are most influential. I can tell them that brushing
their teeth is good for them, but unless they see me do it,
consistently, no amount of saying it to them causes the value to
transfer. At least not at about 5 years of age.

I've become more and more aware of the signal that I send in a lot of
contexts. Having children has made many of these signals much more
apparent.
linkpost comment

Idempotency or Singleton Memoization [Jun. 18th, 2008|09:14 am]
[Tags|]

sub makeDoOnce {
  my($sub) = @_;
  my $alreadyDone = undef;
  my @result      = undef;
  my $exception   = undef;
  return sub {
    die $exception if $exception;
    if ($alreadyDone) {return wantarray ? @result : $result[0];}

    eval {
      my $w = wantarray;
      if (not defined $w) {             $sub->(@_)}
      if ($w)             {@result    = $sub->(@_)}
      else                {$result[0] = $sub->(@_)}
    };
    $exception = $@ if $@;
    die $exception if $exception;
    $alreadyDone = 1;
    return wantarray ? @result : $result[0];
  };
}
linkpost comment

The same and yet different [Jun. 7th, 2008|09:05 am]
[Tags|]

I've just opened the book Dive into Python by Mark Pilgrim. The first example is a function that creates a (database?) connection string. Knowing Perl, I wrote a close equivalent so I could look at the differences.

First, the Python:

def buildConnectionString(params):
    """Build a connection string from a dictionary of parameters.

    Returns string."""

    return ";".join(["%s=%s" % (k,v) for k,v in params.items()])

if "__main__" == __name__:
    print buildConnectionString({
            "server":"localhost",
            "database":"master",
            "uid":"sa",
            "pwd":"sekret"
            })

Ok, defines a function of one parameter (buildConnectionString) - and that parameter is called out in the signature. This leads me to suspect that Python may actually know that the function takes a single argument (though not it's type), and that information may be available to the runtime (this is the kind of thing that helps IDEs and other software tools).

Within that function we can provide a docstring. This is something I'm familiar with from Common Lisp and it's a feature that I like. I suspect that I'll find that this is available in the Python runtime later on...

Then Mr. Pilgrim immediately indoctrinates us to Python's list comprehensions. Good. We can see that literal strings have methods (we're calling .join on the literal ";"), so they're an object (in Perl strings are not). I'm not sure what params.items() returns yet - a flat list? A set of pairs? I'm sure this is waiting for me on the next few pages.

Ok, this is a simple enough task, and the code is nice and succinct. The list comprehension makes the task clear (at least to me). What does the equivalent Perl look like? (well, my equivalent Perl, this is _my_ way of doing it, TIMTOWTDI after all).

use strict;
use warnings;

=head2 buildConnectionString                                                                                                 
                                                                                                                             
Build a connection string from a dictionary of parameters.                                                                   
                                                                                                                             
Returns string.                                                                                                              
                                                                                                                             
=cut                                                                                                                         
                                                                                                                             
sub buildConnectionString {
    my($params) = @_;
    return join(";",map { "$_=$params->{$_}" } keys %$params);
}

if (__PACKAGE__ eq "main") {
    print buildConnectionString({
        "server"   => "localhost",
        "database" => "master",
        "uid"      => "sa",
        "pwd"      => "sekret"
        }),"\n";
}

Ok, right off, I won't write Perl without the 'use strict;' and 'use warnings;' pragmas since they are big time savers. The first way in which they save time is that they help identify errors at compile-time that would otherwise be caught at run-time, and they point out conditions that you likely weren't handling (like using the contents of a variable before assigning to it).

Then there is the external documentation. My team has adopted the javadoc-like habit of making the api documentation in-line with the code. The difference here though is that there is no formal relationship between the POD and the function itself - unlike in the Python docstring (and in Javadoc, where adjacency is an implicit relationship). This means that the Perl documentation can't be (easily) the same kind of runtime value-add as the Python docstring (go Lisp).

The next thing is the function signature, or rather lack of, in the Perl function. Again, my team has adopted the convention of using a single line to do function argument destructuring - that way the first line of our functions is at least representationally equivalent to a signature. This makes our Perl applications a lot easier to read and maintain. Of course there are some cases (at least in Perl) when it can be useful to slightly modify the argument list, replace your function call with some other and pretend like the first never happened, though I'm not going into that right now. The main difference is that our convention isn't a formal part of the language, while in Python it is - and may additionally be a value-add at runtime (I'm sure I'll find out later, again, this is the kind of thing that helps IDEs and tools).

As far as the actual implementation, there is little logical or semantic difference between Perl's join/map and Python's join/list-comprehension. Perl's join is not a method. Neither uses an iterator off of the collection (as Ruby would), but both apply an operation to every (logical) item in a set (in the Python case it's a pair, though as I said I'm not sure how the destructuring happens yet).

There are other smallish differences too, I could have used Perl's sprintf, which would have more closely aligned with Python's implicit formatter:

sub buildConnectionString {
    my($params) = @_;
    return join(";",map { sprintf "%s=%s", $_, $params->{$_} } keys %$params);
}

But either language is more than capable. With the overloading of curly braces, even with my 10+ years of Perl experience, it's still a bit fuzzy to my eyes where the break is between the end of the hash-table (which is a map) and the code-block for the map. It's not that I can't read that code, or read it quickly, the dual meaning of the curly braces means that I have to use grammatical cues rather than just syntactic ones.

This is less than a first impression mind you. I'm trying hard to stay out of the Turing Tar pit. I like the list comprehensions and I like that Mr. Pilgrim introduced them right away in the book. I'm looking forward to the rest of it.

linkpost comment

The difference between requirements and code. [May. 18th, 2008|10:12 pm]

Programmers tend to know this without questioning it and often without being able to describe why: you can't reverse engineer requirements from code. This idea seems to be irresistibly seductive to non-developers though.

Requirements are, at their heart, declarative. Code is imperative. Well I suppose that depends and I'll come back to this topic later. A program is one possible solution to the requirement, one possible implementation. It is not the requirement.

Requirements declare "clean the car". Code says "Get a bucket. Get the car wash soap. Get rags. Get the hose. Turn on the hose. Fill the bucket." and so on. It could also have said "Drive to the car wash". Of course those instructions aren't necessary if the car isn't dirty (but you can't tell that from the instructions). They also could have meant "Wash the dog" or "Make lots of bubbles". If you didn't already have it in your head that the intent was to wash a car you'd probably be lost. Worse, you'd be lucky to figure out what it meant.

And what are the chances that there is a bug in the code? That bug would leave you with an error in your requirement. What if the instructions in the code are working around limitations in the programming language? What about the OS? The network? The database? What if....what if the program is working around errors in the data, which were introduced by some other application? What if it's a legacy version of the program itself?

You can use code as a clue though, as a guide. If you know what the goal probably was you can use the code as a way to try to disprove your hunch.

You can't reverse engineer requirements from code - you'll never hit the spirit or the meaning of the original requirements. You can certainly use the existing code to produce requirements for a system that re-implements the code itself (including workarounds, bugs and all), but you'll never produce the intent or core need that caused the application to be created in the first place. Not from looking at the code alone.

So, what about the earlier supposition? When you use non-imperative approaches to implement requirements what you end up with is a declarative (or functional) statement of...the requirements. They are stated in a more formal manner certainly, but they will be a statement of the requirements nonetheless.

This a core reason DSLs and XML configurations are often good solutions. SQL is successful because it allows you to say what you want, not how to get it from some internal, vendor specific, database API.

link2 comments|post comment

Compile Time Evaluation, w/o Macros [Apr. 28th, 2008|05:05 pm]
[Tags|]

We all have constants in our applications, sometimes we have
numeric values that are only used once and are clearer if they're left
as an expression (like the number of seconds in a day, well these
things are better off as constants in many cases, though I'm only
using it as an example).



(defun days->seconds (days)
  (* days (* 24 60 60)))

(format t "seconds in ~a days:~a~&" 1 (days->seconds 1))

(time (loop repeat 1000000
	    do (days->seconds 10)))
 => 2.093 sec.


In common lisp there is a reader macro you can use to have these
kinds of expressions executed at compile time - and thus use no CPU
time during your running application (or even within the compiled
executable):



(defun compile-time-days->seconds (days)
  (* days #.(* 24 60 60)))

(time (loop repeat 1000000
	    do (compile-time-days->seconds 10)))
 => 1.87s


Admittedly in this case it's not a worthwhile improvement in
performance - but these are the kinds of things many technologies
represent as build processes. When you're configuring a build to be
specialized for a locale, or particular production environment, you
either factor those parts of the application into a configuration
file, or you generate code (almost certainly with some tool set which is
not your programming language).

link1 comment|post comment

Updated Lisp Presentation Slides [Apr. 28th, 2008|01:54 pm]
[Tags|]

I took the "make the title of your presentation confrontational" advice and changed the title from "Introduction to Lisp" to "Why Won't Lisp Just Go Away?"

It went much more smoothly the second time around. I also got some good feedback from Rob DiMarco and JP Vossen (author of the Bash Cookbook). It has been difficult to come up with a 60-90 minute talk, keeping it short enough, while still talking up the strong points.

linkpost comment

Data Analysis in the [Unix] shell [Apr. 28th, 2008|11:51 am]

I often prefer the shell and Unix utilities to having to wait to load data into a relational database or MS Access (unless a lot of what I'm doing requires complex joins). It's often possible to not even have to transform the encoding of the files before analyzing them. There are a couple of recipes for doing SQL equivalents with the shell utilities. There are a bunch of them, these are a few that I just used this morning. Most of the time all it takes is a bit of imagination about how to create a simple data-flow by composing a small handful of ubiquitous Unix utilities.

All these examples will also work within the Cygwin environment for Windows, or at a Terminal in OS X (especially when combined with the additional software available via Fink or Mac Ports projects..

"SELECT COUNT(*) FROM TABLE"

Just selecting the count of records from an input file is one of the easiest things to accomplish (if your file is already line-oriented). The wc, or word count utility can do this easily. By default it counts characters, words and lines. With '-l' it will emit only the count of lines.

user@host:~/data$ wc -l table.tab
10

If you want to ignore the header, start with the second line (see the next example for a more thorough explanation):

user@host:~/data$ tail -n +2 | table.tab
9
"SELECT COUNT(DISTINCT(FIELD1)) FROM TABLE"

For getting a distinct count of values in a column :

user@host:~/data$ cut -f1 table.tab | tail -n +2 | sort | uniq -c

The first part cut is a utility that allows you to take particular columns from a tab-delimited file, or character ranges from a fixed-width file. cut also allows you to specify the delimiter - but be warned that the commonly encountered CSV format requires more complexity in parsing than just using a comma as a delimiter. So this takes the first column out of the input file.

The next part of that is the tail command. tail is a command that outputs the end or 'tail' of a file. The '-n' option says what line number to start at (counted from the end of the file) - in this case the '+' tells tail to start at the second line from the beginning. This effectively tosses out the header line.

Next the values themselves are sorted. This is necessary for the uniq command, which will only collapse or count duplicate lines when they are adjacent.

Finally we reduce duplicate lines with uniq. The '-c' tells it to emit the count of duplicates when collapsing them.

Dealing with varoius file archive types

I often work with files in zip archives and tar (unix tape archive) archives, sometimes with additional compression applied to them (.Z, unix compress; .gz, gzip; and .bz2 bzip). It is possible to work with these files without having to unarchive or decompress them permanently if all you need is a simple count of lines or to only process them once.

Pulling a file from a Zip Archive

To pull one or more files from within a zip archive, and send them to another command (as part of a pipeline):

user@host:~/data$ unzip -l archive.zip
Archive: archive.zip
  Length    Date   Time   Name
 --------   ----   ----   ----
      34  04-28-08 11:01  table1.tab
      56  04-28-08 11:01  table1.tab
user@host:~/data$ unzip -c archive.zip table1.tab table2.tab | wc -l
36

That example uses the unzip command to pull 2 files out and send them to standard output - to either the screen or the next command in the pipeline. In this case that is wc to get the combined record count for the two files. We don't have to worry about cleaning up the two files when we're done either.

unzip -l lists the files in a zip archive. If we left off the '-c' (and the '| wc -l') unzip would have extracted just those two files from the archive (in case there were more and you only wanted a small handful).

Dealing with various file encodings

The first barrier to using most of these readily available utilities is often the file formats themselves. The utilities are line-oriented for the records and tab-oriented for the fields. So the first step is often figuring out how to even get your data into a tab-delimited format.

There is more to that than I have time to write right now. I'll work at following up with examples for another posting.

linkpost comment

Some Reflection in CLOS [Apr. 25th, 2008|10:22 am]
[Tags|]

CLOS and slot (member) lookup recently came across c.l.l.

;; some reflection in clos

(defclass test-class
  ()
  ((s1 :reader :slot1
       :writer :set-slot1)
   (s2 :reader :slot2
       :writer :set-slot2)
   (s3 :reader :third-slot
       :writer :update-three)))


(defvar x nil)
(setf x (make-instance 'test-class))

(use-package :clos)

(class-direct-slots (class-of x))
;; =< 
;; (#<standard-direct-slot-definition s1="S1" #x19f471c9="#x19F471C9">
;;  #<standard-direct-slot-definition s2="S2" #x19f471f5="#x19F471F5">
;;  #<standard-direct-slot-definition s3="S3" #x19f47221="#x19F47221">)


(class-direct-superclasses (class-of x))

;; => 
;; (#<standard-class standard-object="STANDARD-OBJECT">)

(mapcar #'slot-definition-readers (class-direct-slots (class-of x)))
;; => ((:SLOT1) (:SLOT2) (:THIRD-SLOT))

(mapcar #'slot-definition-writers (class-direct-slots (class-of x)))
;; => ((:SET-SLOT1) (:SET-SLOT2) (:UPDATE-THREE))

linkpost comment

IRC in Emacs with ERC [Apr. 14th, 2008|10:29 am]
[Tags|]

Jason Stelzer just turned me on to ERC, an IRC mode for Emacs. It even just works with Windows, and comes with recent versions of Emacs (as of April 2008).

There are even easy ways to automatically start a session and connect to a channel - wrapping that up helped make it more automatic:



(defun krb-start-erc ()
  (interactive)
  (erc-open
   "irc-server-name-or-ip"
   6667
   "your-username"
   "your-full-name"
   t ;; connect
   "your-passwd")
  (erc-join-channel "datapump"))
link1 comment|post comment

Keyword symbols vs non-keyword symbols [Mar. 10th, 2008|08:57 pm]
[Tags|]

It took too long for it to really click so while I'm thinking about this again I'm writing it down. There are good reasons why you should avoid using symbols of the form 'foo in preference to :foo. The main semantic difference is that the one preceded with a single quote is tied to the package where it occurrs, while the one preceded by the colon is placed within the keyword package.

What this means to you and your code may vary. One immediate reason to use keyword symbols (the ones that start with a colon) is that they are the same everywhere - they are equal (eq even). This means that they're shared - if you take symbols as arguments, or return symbols as values, then the keyword symbols will be easier to use. You won't have to export them from your package for them to be accessible (except in very rare, unhygenic, cases, eg: aif).

CL-USER> (defpackage :foo                                                                                                    
           (:use :common-lisp)                                                                                                        
           (:export :qux))
#<PACKAGE "FOO">
CL-USER> (in-package :foo)
#<PACKAGE "FOO">
FOO> (defun qux ()                                                                                                           
       'bar)
QUX
FOO> (equal 'bar (qux))
T
FOO> (in-package :cl-user)
#<PACKAGE "COMMON-LISP">
CL-USER> (equal 'bar (foo:qux))
NIL
CL-USER> 

That last one there was somehow confusing to me for a long time. Using keyword symbols solves that issue neatly:

CL-USER> (defpackage :foo                                                                                                         
  (:use :common-lisp)                                                                                                        
  (:export :qux))
#<PACKAGE "FOO">
CL-USER> (in-package :foo)
#<PACKAGE "FOO">
FOO> (defun qux ()                                                                                                           
  :bar)
STYLE-WARNING: redefining QUX in DEFUN                                                                                       
QUX
FOO> (equal :bar (qux))
T
FOO> (in-package :cl-user)
#<PACKAGE "COMMON-LISP">
CL-USER> (equal :bar (foo:qux))
T
CL-USER> 

The other reason to use them is that they're not just interned for the package they were defined in, but all keyword symbols are interned in the keyword package - saving memory in your running instance.

This point was lost on me until I started using packages seriously. I had no context for understanding it until I started using packages to organize my code.

In general you should probably be using keyword symbols as your default.

linkpost comment

Landmark Parsing [Feb. 29th, 2008|12:22 am]
[Tags|]

Someone asked for an example so I dug up my jspwiki tool. Here is the guts of the parser:


sub makeParser {
  my($data) = @_;
  my $pos = 0;
  my $setData = sub { $data = $_[0]; $pos = 0; };
  my $start   = sub { $pos = 0 };
  my $fwd     = sub { return -1 if $pos == -1; $pos += $_[0]; $pos = -1 if $pos >= length($data); $pos };
  my $bck     = sub { return -1 if $pos == -1; $pos -= $_[0]; $pos = -1 if $pos < 0;              $pos };
  my $bckTo   = sub { return -1 if $pos == -1; $pos = rindex $data, $_[0], $pos; };
  my $fwdTo   = sub { return -1 if $pos == -1; $pos =  index $data, $_[0], $pos; };
  my $fwdPast = sub {
    return $pos if $pos == -1;
    $pos = index $data, $_[0], $pos;
    return $pos if $pos == -1;
    $pos += length($_[0]);
    $pos >= length($data) ? $pos = -1 : $pos;
  };
  my $btwn = sub {
    return -1 if $pos == -1;
    my $s = $fwdPast->($_[0]);
    return undef if $s == -1;
    my $e = $fwdTo->($_[1]);
    return undef if $e == -1;
    my $item = substr $data, $s, $e - $s;
    return $item;
  };

  my $all = sub {
    my @all;
    while (-1 != $pos) {
      my $item = $btwn->(@_);
      last unless $item;
      push @all, $item;
    }
    return @all;
  };

  return ($setData,$start,$fwd,$bck,$fwdTo,$fwdPast,$bckTo,$btwn,$all);
}

and here is how it gets used:

sub getPageInfo {
  my($topic) = @_;
  my $data = $UserAgent->get("$BaseURL/PageInfo.jsp?page=$topic")->content;
  print "$data\n";
  my($setData,$start,$fwd,$bck,$fwdTo,$fwdPast,$bckTo,$btwn,$all) = makeParser($data);
  my $table = $btwn->('Version','</table>');

  ($setData,$start,$fwd,$bck,$fwdTo,$fwdPast,$bckTo,$btwn,$all) = makeParser($table);
  print join("\t",qw(Version Date Author Size Changes from Previous)),"\n";
  dumpRow($_) for $all->('<tr>','</tr>');
}

sub dumpRow {
  my($row) = @_;
  my($setData,$start,$fwd,$bck,$fwdTo,$fwdPast,$bckTo,$btwn,$all) = makeParser($row);
    print join("\t",map { simpleStrip($_) } $all->('<td>','</td>')),"\n";
}

link1 comment|post comment

Automating Invitations to Mailman Lists [Feb. 28th, 2008|04:07 pm]
[Tags|]


This is just a code example - I had a new mailing list created at work and needed a large-ish list of people to get onto it. To save a bit of effort (admittedly a small amount) I scripted it.




; (require 'asdf-install)
; (asdf-install:install :drakma)

(require :drakma)
(use-package :drakma)

;; (http-request "http://mailman/mailman/listinfo/code-reviews")

;; <FORM Method=POST ACTION=\"http://mailman/mailman/subscribe/code-reviews\">
;; <INPUT type=\"Text\" name=\"email\" size=\"30\" value=\"\">
;; <INPUT type=\"Text\" name=\"fullname\" size=\"30\" value=\"\">
;; <INPUT type=\"Password\" name=\"pw\" size=\"15\">
;; <INPUT type=\"Password\" name=\"pw-conf\" size=\"15\">
;; <input type=radio name=\"digest\" value=\"0\" CHECKED> No
;; <INPUT type=\"Submit\" name=\"email-button\" value=\"Subscribe\">

(defun subscribe-user-to-code-reviews (email full-name pass)
  (drakma:http-request "http://mailman/mailman/subscribe/code-reviews"
                       :method :post
                       :parameters
                       `(("email"        . ,email)
                         ("fullname"     . ,full-name)
                         ("pw"           . ,pass)
                         ("pw-conf"      . ,pass)
                         ("digest"       . "0")
                         ("email-button" . "Subscribe"))))

; (subscribe-user-to-code-reviews "kburton@some-domain.com" "Kyle Burton" "some-password")

;; I went into outlook, opened the shared public contact list, did
;; CTRL-A, CTRL-C, went to excel and pasted, then saved as addrs.csv, then
;; do a bit of vi-magic to format the lines as expected below
(defun get-lines ()
  (with-open-file (in "/home/kburton/foo.txt" :direction :input)
    (loop for line = (read-line in nil nil)  then (read-line in nil nil)
       while line
       collect line)))

; (get-lines)

(require :cl-ppcre)
(use-package :cl-ppcre)

(defun split-line (line)
  (multiple-value-bind
        (string matches)
      (cl-ppcre:scan-to-strings "\"([^\"]+)\".*E-mail: (.+)$" line)
    (list (aref matches 0)
          (aref matches 1))))

;(split-line "\"Smith, John\" E-mail: jsmith@some-domain.com")

(loop for line in (get-lines)
     while line
     do
     (destructuring-bind
           (name email)
         (split-line line)
       (format t "name:~a email:~a~&" name email)
       (subscribe-user-to-code-reviews email "" "")))
linkpost comment

Developer to Engineer [Feb. 27th, 2008|08:05 pm]
[Tags|]

Kudos to you Josh [Crean]. He stepped up and gave a tech-talk today at work on using unit testing and Devel::Cover. Not only did he give the talk, but he did it as a live interactive session - creating a module and its test, fielding questions and taking input from the other developers.

Actually he followed test driven development. This was based on comments by the audience -- he switched tack during the session.

Josh responded well to the "should we shoot for 100% coverage", and even showed an example (by coding it up as part of the live performance) where unit tests can drive software to 100% coverage and still contain bugs (100% coverage based on unit tests doesn't prove the code is correct in any way).

If you haven't checked out Test::Unit or Devel::Cover on CPAN, you should take a look. They can really improve the software you're developing.

linkpost comment

Yahoo as a takeover target for Microsoft [Feb. 4th, 2008|08:18 am]

Something interesting is going on that is a rare event: large tech firms are looking to merge. Microsoft has offered to merge (buy) Yahoo. This appears to be in an attempt to strengthen its on-line business units.

Some bloggers are taking this as Microsoft effectively admitting that they can't catch up in search or on-line advertising (not necessarily technologically, but definitely with respect to market share).

A lot of people have a lot to say...regardless of all the ideas that are flouted, the effects will ripple out for a long time. If MS succeeds, it will cause issues for Yahoo's current employees (there is rumor that many don't want to work for MS), if Google steps in and somehow scuttles the deal, it may be seen as some kind of evidence of weakness in MS and strength in Google.

Regardless, Microsoft seems to have some serious issues to work out - it seems to be in the process of finding itself again. How are they going to brand their on-line presence after this kind of merger? What are they going to do with Yahoo from a technology perspective? Yahoo is definitely not an MS shop (unix, PHP, Java). I wonder about this in context of what happened with the Hotmail acquisition of old...

linkpost comment

Scheme's let loop form [Jan. 30th, 2008|10:03 pm]
[Tags|]

I particularly like Scheme's let loop form:


(let loop ((x 1) (y 2))
     (write "x=") (write x)
     (write ", y=") (write y)
     (newline)
     (cond ((= x y)
            #t)
           (else
            (loop (+ 1 x) y)))) 

It allows you to define a locally scoped recursive function, with arguments and initial values, that is immediately called. It is clean and private - the name loop only exists within the let declaration - it is not available in any other scope.

I wanted the same form for programming in CL. There are two unfortunate things about this, the first is that loop is already defined (a public symbol in the CL-USER package). Although this is analogous to a reserved word in many programming languages, I could 'steal' the meaning of loop for just this scope, I just choose not to use the symbol loop when using this macro. The other unfortunate thing is that you can't 'extend' the let form (at least I don't know if it supports the same kind of run-time extension that setf supports). Regardless, if I chose a sensible name, it can be done:

(defmacro llet (name bindings &body body)
  (let ((args (mapcar #'first bindings))
        (initial-values (mapcar #'second bindings)))
  `(labels ((,name ,args
              ,@body))
     (,name ,@initial-values))))

(macroexpand-1
 '(llet lp ((x 1) (y 2))
   (format t "x:~a y:~a~&" x y)
   (cond ((= x y)
          t)
         (t 
          (lp (1+ x) y)))))

(llet lp ((x 1) (y 2))
   (format t "x:~a y:~a~&" x y)
   (cond ((= x y)
          t)
         (t 
          (lp (1+ x) y))))  

(llet recur ((x 1) (y 2))
  (format t "x:~a y:~a~&" x y)
  (cond ((= x y)
         t)
        (t
         (recur (1+ x) y))))

Not exactly the same, but just as useful and clean.

linkpost comment

navigation
[ viewing | most recent entries ]
[ go | earlier ]

Advertisement