You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

Mark Shead’s Home Office

For the past 13 years my primary office has been in my home. During this time, I’ve experimented with a lot of different things from simple small desks in Mexico to large glass topped workspaces.  Here is what my current desk looks like.

Mark Shead Desk

The first thing people usually notice is the weird lights. I work on video conference pretty much 8 hours a day right now. The lights help make sure that the video conferencing is as effective as possible. The studio lights were about $150 and give nice even lighting for the video as well as for the work surface. Each stand has two 80 watt CFL bulbs which are supposed to be equivalent to about 300W in a traditional light.

The monitor on the right is hooked to a small Windows 8 computer that is running the video conference software. You’ll have to click on the image to see it better, but there is a small Cisco video camera on the monitor.

The computer on the left hand side is running Linux and the laptop hooked to the center screen is my MacBook Air. If you notice, there is only a single keyboard. I use a program called Synergy to share between the screens. My MacBook Air is the server and lets me move the mouse and keyboard to the screens on either side. This may sound like a small thing, but it is a huge improvement to be able to treat multiple computers as one from your input device.

The desk is a Turnstone Bivi. My favorite thing about the desk is that it has lots of places to hide wires. The top half of the work surface is a tray that I use to put my power supplies and cables and it gets many of the wires out of the way. Also the desk has 16 outlets built into it–and I’m using all of them.  I have the Bivi double so there are actually two work surfaces facing each other. Last summer when I had an assistant helping me she used the opposite desk, but now I use it for my printer or if I need some additional room to spread things out.

Mark Shead in Home Office

This picture is looking from my desk backwards. The desk to the side is my old glass top desk and there is another one on the other side as well. I’ll sometimes use these for organizing paper work but often they start collecting junk so I may replace them with some shelves or storage units.

Agile Principles vs. Agile Practices

When people see a successful Agile team, the first thing they start looking at is all the practices the team is doing. They notice the daily stand ups, iteration reviews, test driven development, continuous integration, etc. Then they try to go replicate the success by copying all the practices they see being done. The problem with this is that for the practices to really be effective, they need to grow out of the Agile principles.

Continuous delivery of software by itself is of less value than when the practice grows naturally out of trying to follow this principle:

Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.

Enforcing a 40-hour work week is a whole lot different than trying to follow this principle:

Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.

You can’t just blindly emulate a few practices that are working for others. Without understanding the principles of Agile, you will miss the fountain necessary to adapt different practices to suit the special things that make your team unique. Most of the benefit of Agile comes from its ability to help teams work effectively and efficiently regardless of all the differences.

Tapestry 5.2.4 Out

I have been doing quite a bit of work with Apache Tapestry.  In the past week, Tapestry 5.2.4 has been released and is worth checking out. Coming from a WebObjects background, Tapestry seems like a very logical approach to web development.  If you aren’t familiar with WebObjects, the learning curve can be a little steep.

There are an increasing number of open source projects using Tapestry which is making it a lot easier to get started by looking at code examples.  Here are a few worth checking out:

  • Wookie – Collaborative book editing.
  • Tapestry Hotel Booking – An example application that shows how to use common Tapestry features with live demo.
  • Tapestry Jumpstart – A basic application setup along with demonstrations of a bunch of common components. This is very useful as a reference.

Keep From Giving Away Passwords Like Gawker

Some time ago I made a comment on Lifehacker.com. In order to comment I had to create a login.  I used an email account that I mainly use for junk email and a simple password.  In the past week, some hackers got into Gawker’s servers and downloaded among other things the user database.

When you store a password in a database, you are supposed to hash it.  In theory this turns your password into a random string so no one who has access to the database, can actually see the password.  To verify your login, the server takes the password you give it and runs it through the same hash software.  If it matches what is stored in the database then you are given access.

The hash function is one way.  You can easily create the hashed value from the password, but it is difficult or impossible to create the password from the hashed value.  However, if you have a list of the hashes from a bunch of passwords, it is easy to search for a hash and then find its corresponding password.

To prove how great they were, the hackers did this to nearly 200,000 accounts and posted it online.  These accounts were listed with their username, password and email address.  Mine was one of them.  Fortunately, the password is something I haven’t used for several years and I haven’t used it for anything important for about a decade.  Still it is a bit unnerving to see my password out there for everyone to read.

I traced down a few places where the password was still being used–on a bookmark site and another comment account and changed those.  Fortunately, I have been using 1Password for a few years so it was easy to do a search and find the accounts that were still using this password.

More recently I have been using 1Password to create a random password for each site I visit.  This is ideal because a long random password is going to be much more difficult to search for unless they have an enumerated list of every possible combination of keyboard characters of any length. Further, even if someone was able to find the random password that corresponds to the hash, it would only give them access to the account on one website–which they probably had to have access to in order to get to the database in the first place.

Salting Passwords

One thing Gawker/Lifehacker should have done is “salted” the passwords.  This would simply be adding some characters to the beginning or end of each password before running it through the hash function.  When you go to login again, the server simply takes the password you give and adds the same characters to it before running the hash on it again.  This makes the hashes much more difficult to find because it helps make sure that they aren’t common words.  If the hackers don’t know what the salt is or how it is applied, this should make it pretty much impossible to figure out the passwords.

However, in this case, the hackers got access to the Gawker source code as well, so they would have known exactly how any salt was applied before making the hash. The hackers could go through and create hashes for a bunch of common words with the salt characters added to it, but that means they can’t rely on existing databases of hash to password mappings–they would have to calculate everything from scratch.  Still this is possible and not unreasonable–particularly because they would only need to run through a bunch of words once in order to see if any user had used that as a password.

So if you have 1,000,000 users, running such a process against 1,000,000 common passwords will probably give you access to a number of the accounts.  That would require 1,000,000 hash operations using the dictionary file. I did a test and found that 1,000,000 MD5 hashes takes about 25 minutes on my i7 MacBook Pro.  This is just using the command line and a shell script.  I’m guessing that it could be done much faster using a different method, running it on a bigger machine or dividing it up among different computers.

Things can be made much more secure by setting things up so the password to hash mappings can’t be used to search the whole dataset. You want the hackers to need to recalculate the hashes for the entire dictionary file for each user.  This can be done if each password in hashed using a different salt value for each user’s password.

Different Salt Values

This could be as simple as merging the username and passwords together and then creating the password hash from that. This means that in order to compromise the same number of accounts as could be done with the same salt value would now require 1,000,000 X 1,000,000 hash operations.  This would mean significantly more time and/or more machines to run it against all of the passwords. It would take my computer somewhere in the 50 year range to do this.  Still if you got a cluster of 100 computers that are all 10x faster than my laptop, it could be done in around 400 hours.  So it is definitely doable, but it is starting to get much more expensive. Amazon’s high end EC2 instances cost $0.50 to $2.10 per hour so we are looking at costs in the $20,000 or higher range.

Still this isn’t prohibitive–particularly if the hackers are willing to use a smaller dictionary of passwords.  With at 100,000 dictionary of potential passwords the cost on Amazon falls into the $2,000 or higher range.

Hashing the Username or Email Address

Another idea would be to hash the username or email address. Usernames are going to be a bit more difficult to attack using a dictionary because they tend to be more unique.  In fact, if you were to combine the username and the password into the same hash it would be significantly more difficult because you couldn’t try to break the username or password by itself.  So something like:

hash(username + password)

Of course this would only work if you didn’t need the username anywhere else.  For example, you couldn’t use it to show the name of the person who left a comment.  This might work out ok if your application uses a username to login and a “screen name” to show to other users.

The problem with this is that you couldn’t look up a user without having their correct username and password.  This could be problematic for doing any type of password recovery. You might be able to work around this by using a third field that can uniquely identify the user–like an email address.

If your system doesn’t need to send emails, hashing the email address might be a better solution because these tend to be much longer and are automatically unique.  You could still send emails for password recovery because when given an email address you could find the row with a matching hash value.

If the average email address is 20 characters and can contain A-Z, @ and a period, then you’d be looking at 20^28 different combinations for a dictionary attack.  Actually the numbers would be significantly reduced if you took into consideration the fact that most email addresses end in .com, there is only one @ and other common patterns.  This is partially offset by the fact that some email addresses are longer than 20 characters and the fact that many people have username portions of their email that are more complicated than their passwords.

Something like the hash shown below would be impossible to break on any large scale, but wouldn’t allow traditional password recovery using email.

hash(email address + password)

If someone forgot their password, it would be impossible to lookup their account via email using this approach.  Still, in some situations that might be acceptable–particularly if an alternative form of password recovery could be done either in person, SMS or some other method.

Can You Measure It?

For years there have been two elementary schools in Fort Scott. Eugene Ware Elementary School on one side of town and Winfield Scott Elementary School on the other. This year the school board voted to make a change and all of the same grades at the same school. This means regardless of where you live, you have to go to the same school for your grade. This was presented as a way to save money.

I can see some ways that it might save on the number of teachers. If your maximum class size is 20 and you have 30 students in a particular class at each school, then it will require four teachers with two at each school. Each would have an average of 15 students. If all of these students are at the same school, you could have classes with 20 students and only need 3 teachers. So the combination may be able to reduce the number of teachers required–but only at the expense of larger class sizes.

While I applaud the idea of trying something new, I was very disappointed that there was no talk about it being done as an experiment. These type of changes have a way of being much more complicated than they look on the surface and it is impossible to tell if it is really going to save money until you actually try it.

This isn’t just a problem with school systems. All kinds of organizations make big changes with no clear plan for how they are going to measure the results. Continual improvement is the process of changing and then measuring. You keep the changes that are beneficial and jettison the ones that are not. This only works if the measurement aspect is “built-in” to the change process. With good measurements and a true commitment to roll back anything that doesn’t work, you will make progress even if you just make random changes.

Obviously, well planned changes with some intelligence behind them will move you forward much faster than just making random decisions. However, it is a mistake to assume that intelligent decisions can substitute for identifying and rolling back the things that don’t work. The power is in getting rid of the things that don’t perform as expected.