Where the Startups Roam – Top Startup Cities in the US

As a high-tech entrepreneur I’ve always thought about relocating to one of the top startup cities in the US. But which cities are these? Besides the Silicon Valley many cities have claimed at one point or another to be the “top startup city in the US”.

I tried to take a more quantitative approach to defining this list of cities using Crunchbase as a source of information on high-tech companies. I ran a search on Crunchbase to see how many companies were listed as having been founded since January 2009 in different cities around the US.

Top 10 Startup Cities (companies founded from Jan. 2009)

Top 10 Startup Cities (companies founded from Jan. 2009)

As these results were quite interesting, I decided to check the number of companies founded in each city since January 2008 to see if the results were consistent with my previous findings.

Top 10 Startup Cities (companies founded from Jan. 2008)

Top 10 Startup Cities (companies founded from Jan. 2008)

And then I checked the general number of Crunchbase companies in each city without any limitation on date of founding

Top 10 Startup Cities (no time limit)

Top 10 Startup Cities (no time limit)

As you can see the results are quite clear and are fairly consistent between the different tests.

Silicon Valley is the undisputed ruler of startup-land, with New York in second place. Following is a threesome with relatively similar figures: Boston, Seattle, and Los Angeles. The next tier is comprised of Austin, Chicago, and Washington D.C. And finally there is a group of cities that just barely made the list, but have some great potential: Boulder, Portland, Atlanta, and Philadelphia.

So there you have it – those are the top startup cities in the US according to the Crunchbase-Baror criteria.

Enjoy!
Yuval.

* Methodology: Crunchbase was used to search for all companies listed as being founded from the given date and being located within a 10 mile radius of the selected city. For the Silicon Valley the search criteria was a 50 mile radius from San Francisco to make sure the whole valley was indeed accounted for.

Posted in Uncategorized | 7 Comments

Benchmarking PostgreSQL queries

In order to benchmark my SQL queries on PostgreSQL databases, I like to check the run time of the queries in two situations: with caching and without caching. If the query runs often and accesses the same pages in memory, these pages will be cached and therefore the run time of the query will be substantially lower. However if the query runs infrequently, accesses different pages in the memory each time, or requires a large set of data that doesn’t fit in the memory, then the query will run much more slowly.

It is usually difficult to know ahead of time exactly how often a query will run and how often the pages it needs will be cached, so when I try to benchmark queries for optimization purposes I always check both cases. I used to do this manually – run the query a few times to get the pages in the cache, and then run it a few more times to measure the run time. Then clear the cache and perform the query a few times, clearing the cache in between each test. This is quite tedious work so I wrote a script to do it for me.

CheckQueryRunTime

Download this script, change the extension to .pl and run it with ‘-help’ to see a detailed usage explanation message. Some usage examples:

[yuval@dev ~]# ./CheckQueryRunTime.pl mydb myuser mypass “select * from mytable where mytimefield > now() – interval ’10 minutes'”
Preparing to run query with cache
Running query with cache
…………………………………………..
With cache: 2.72678 ms. (stddev 0.0763655131587091 ms.)
Running query without cache
……….
Without cache: 418.4086 ms. (stddev 14.083357931969 ms.)

[yuval@dev ~]# ./CheckQueryRunTime.pl mytable myuser mypass “select * from mytable where mytimefield between now() – interval ’10 minutes’ and now()”

Preparing to run query with cache
Running query with cache
…………………………………………..
With cache: 2.04068 ms. (stddev 0.0308184619992597 ms.)
Running query without cache
……….
Without cache: 366.6442 ms. (stddev 13.6194627485802 ms.)

Note: in order to clear the cache the script shuts down the postgresql service, cleans the OS caches, and the restarts the service.

Enjoy!

Posted in Uncategorized | Leave a comment

PostgreSQL help and resources

Whenever I want to try something new with a PostgreSQL DB the first thing I do is check if there is someone else out there who has already encountered my problem and has solved it for me. Around 90% of the time I find the perfect solution and the problem is solved in minutes.

So where can you find help on PostgreSQL?

My first stop is Google, which usually leads me to http://www.postgresql.org/. The official PostgreSQL website has detailed documentation about all of the different parts of the PostgreSQL DB.

Until a few weeks ago when I didn’t find my answer online I would try to build my own solution from scratch. But then I read a great book called “PostgreSQL 9 Admin Cookbook”. It contains recipes for many of the different problems you encounter when working with PostgreSQL. It has clear explanations that beginners will have no trouble understanding, and it has great examples so that even pros will learn some new things. I really like the “cookbook” format that allows you to read only those chapters or recipes that fit the specific problem you want to solve, or the specific topic you want to learn about.

While I went through the book I found myself learning a lot about several parts of the PostgreSQL DB that I wasn’t familiar with, understanding a bit about how things work under the hood, and learning from the examples and SQL code in the different recipes. For example there are some great recipes on identifying and removing duplicate data, creating test data, tweaking the PostgreSQL configuration parameters, finding the longest running queries in your DB, and finding locks in the DB. These are just some of the great recipes that the book contains.

So if you haven’t done so yet – go check it out!

Posted in Uncategorized | Leave a comment

PostgreSQL select with default value

I recently ran into a situation in which I wanted to select data from a table, and wanted a default value in case the data does not exist in the table. The straightforward solution is to use a CASE statement in the SELECT query like this:

SELECT CASE
WHEN (SELECT count(*) FROM my_table WHERE my_field = ‘my_value’) > 0
THEN my_value_field
ELSE ‘my_default_value’
END
FROM my_table WHERE my_field = ‘my_value’

However this straightforward solution is not efficient. The query on my_table happens twice: once when counting rows that match the query, and once when selecting the value if a matching row is found.

I was convinced that there has to be a simple way to get the same result with only one search in the table. After fiddling with different ideas for a while I came up with the following solution:

— If the first argument is not null, return it. Otherwise return the default value (second argument).
CREATE OR REPLACE FUNCTION add_default(INTEGER, INTEGER)
RETURNS INTEGER AS
$$
SELECT
CASE WHEN $1 is null
THEN $2
ELSE $1
END
$$
LANGUAGE SQL
IMMUTABLE;

This function receives the integer output of a query. If this integer value is not null – it is returned. But if the input is null (no rows were found in the output of the query) – the default value (second argument) is returned. It can be used like this:

SELECT add_default((SELECT my_value_field FROM my_table WHERE my_field = ‘my_value’), 17)

And what if your query returns a float and not an integer? Just overload the function with float inputs and outputs:

CREATE OR REPLACE FUNCTION add_default(FLOAT, FLOAT)
RETURNS FLOAT AS
$$

Note that this function only works with queries that output a single value.
If you have any other solutions – please share them in the comments section.

Enjoy!
Yuval.

Posted in Uncategorized | 2 Comments

Postgresql: Creating unique values from a non-unique field

Suppose you have a database table with a field that currently contains non-unique values. Now suppose you want to update the aforementioned field to contain only unique values, but you don’t want to update any values that were already unique.

To do this you can use the following SQL query:

UPDATE sometable

SET somefield=somefield||uniquekeyfield

WHERE uniquekeyfield NOT IN
( SELECT max(dup.uniquekeyfield)

FROM sometable as dup GROUP BY dup.somefield );

This query will update each non-unique value in somefield to have the matching value from uniquefield concatenated to it. All values that were already unique will stay untouched.

For example, if you had the following table:

 id  | username
---------------
 7   | mike
 8   | dave
 9   | mike
 10  | mike
 11  | john
 12  | jeff

And you decided that you want to change the username field to be unique, you could run the query and your table would become:

 id  | username
---------------
 7   | mike7
 8   | dave
 9   | mike9
 10  | mike
 11  | john
 12  | jeff

All usernames are now unique, and only minimal changes have been made.

id  | name
—————
7   | Mike
8   | Dave
9   | Mike
10  | Mike
11  | John
12  | Jeff
Posted in Uncategorized | Leave a comment

Sluggish PostgreSQL Databases and Reindexing Indexes

The natural process of erosion affects not only the organic elements of the Earth, but also the hardware we use, the code we write, and the databases we keep. When it comes to maintaining databases, there are several things that can be done to fight the degradation in query run time, and the increase in disk usage. There is one such action that is especially useful in PostgreSQL DBs and this is the command REINDEX INDEX. This command works on a single index and rebuilds it from scratch. Along the way it frees up any unnecessary disk space used by the index, and improves the response time of all queries using the index.

How much can this help? The answer is: a lot. In one example I witnessed reindexing an index improved the run time of a query using that index from several minutes to a few seconds. In another example, reindexing all of the indexes in a DB reduced the disk space used by that DB by over 40%.

So how can it be done? Simple:
REINDEX INDEX <index_name>;

Now assume that you want to reindex all of the indexes related to a list of tables in your database, all you need to do is create a script that extracts the names of the indexes in your DB that are related to these tables and then run the REINDEX command on each of them. Here is an example bash script that does just that:

#!/bin/bash

database=mydb
indexes=$(psql -q -t -A -P pager=off -c ”
select
c.relname
from
pg_catalog.pg_class c
join pg_catalog.pg_namespace n on n.oid        = c.relnamespace
join pg_catalog.pg_index i     on i.indexrelid = c.oid
join pg_catalog.pg_class t     on i.indrelid   = t.oid
where
c.relkind = ‘i’
and n.nspname not in (‘pg_catalog’, ‘pg_toast’)
and pg_catalog.pg_table_is_visible(c.oid)
and t.relname in (‘table1’, ‘table2’, ‘table3’)” $database)

for index in $indexes
do
echo `date “+%y-%m-%d %H:%M:%S”`” – Reindexing index $index”
psql -c “REINDEX INDEX $index” $database
done

echo `date “+%y-%m-%d %H:%M:%S”`” – Done”

Posted in Uncategorized | Leave a comment

My fonts look different in Firefox

One of the most annoying parts of web development is making sure your website looks the same on all browsers. A few days ago I noticed that on one of my websites the fonts are different in IE and in Firefox. I tried everything I could find on the web to make the fonts look the same – setting the font-family, font-size, font-weight, font-style, font-face, and every other aspect of the font. But the fonts still looked different.

Then I read about ClearType – Microsoft’s subpixel rendering technology that is turned off by default in Windows XP, but turned on by default in IE7 and IE8. This means that when using Windows XP, the same font will be rendered differently when viewed in IE and in Firefox.

To enable ClearType go to the Control Panel, then select Display, select the Appearance tab, and click on the Effects button. The second drop-down is labeled “Use the following method to smooth edges of screen fonts“. Change the selection from “Standard” to “ClearType“. Now any web page you load will be rendered using ClearType.

Enabling ClearType will eliminate the difference between the fonts in the different browsers. However, this will also change the way fonts are rendered throughout the whole operating system. To date there is no way to enable ClearType only in specific applications.

Here are some samples of pages before enabling ClearType and after enabling it:

Before


IE8 - Before ClearType

IE8 - Before ClearType

Chrome - Before ClearType

Chrome - Before ClearType

Firefox - Before ClearType

Firefox - Before ClearType

IE6 - Before ClearType

IE6 - Before ClearType


_____________________________________

After

IE8 - After ClearType (No change)

IE8 - After ClearType (No change)

Chrome - After ClearType

Chrome - After ClearType

Firefox - After ClearType

Firefox - After ClearType

IE6 - After ClearType

IE6 - After ClearType

Posted in Uncategorized | Leave a comment

Adding a new Bugzilla version to all products

The Bugzilla GUI doesn’t support adding a new version to all products. I used to add new versions to each product one by one. This got very tedious and annoying, especially as the number of products grew. I searched online for solutions provided by others and found only one thread on the topic on Nabble and on the Bugzilla support forum. In these threads the only suggested solutions were to use SQL on the Bugzilla DB, or to use custom fields instead of the built-in version field.

None of these solutions are satisfactory in my opinion, so I decided to create a better one. I created it in the form of a Perl script using the Bugzilla Perl API. The script can be run on the machine on which Bugzilla in installed and is really easy to use. Just put it in the installation directory of Bugzilla and run:

AddBugzillaVersion.pl <version_name>

Here is the script itself:

AddBugzillaVersion.pl

Notes:

  1. The script must be in the Bugzilla installation directory in order to work properly. Alternatively – you can put the script anywhere on the machine and add the Bugzilla installation directory into the Perl library path.
  2. The script uses the Bugzilla API, which requires several Perl modules to be installed. These are usually installed by the install-module.pl script, but if any are missing – they can be installed from CPAN. Instructions on installing CPAN modules can be found here.
    Posted in Uncategorized | 4 Comments

    ‘ERROR: Garbage option.’ when using ps and awk

    The other day I wanted to create a simple bash script that would receive the name of a process and print out its pid. I wanted to run ps and then use awk to filter out the correct process and print out its pid. So I created the following bash script (called getPIDOf):

    #! /bin/bash

    command=”ps -o pid,command -C $1″
    for (( i=1; i<=$#; i+=1 )); do
    curval=${!i}
    command=”$command | grep \”$curval\””
    done
    command=”$command | awk ‘!/awk/ && !/getPIDOf/ {print \$1}'”

    echo $command
    $command

    I was quite pleased with myself that I had even added in the option to specify multiple identifiers for the process. But when I ran the script I got the following output:

    yuval@obt:$ ./getPIDOf nc
    ps -o pid,command -C nc | grep “nc” | awk ‘!/awk/ && !/getPIDOf/ {print $1}’
    ERROR: Garbage option.
    ********* simple selection *********  ********* selection by list *********
    -A all processes                      -C by command name
    -N negate selection                   -G by real group ID (supports names)
    -a all w/ tty except session leaders  -U by real user ID (supports names)
    -d all except session leaders         -g by session OR by effective group name
    -e all processes                      -p by process ID
    T  all processes on this terminal     -s processes in the sessions given
    a  all w/ tty, including other users  -t by tty
    g  OBSOLETE — DO NOT USE             -u by effective user ID (supports names)
    r  only running processes             U  processes for specified users
    x  processes w/o controlling ttys     t  by tty
    *********** output format **********  *********** long options ***********
    -o,o user-defined  -f full            –Group –User –pid –cols –ppid
    -j,j job control   s  signal          –group –user –sid –rows –info
    -O,O preloaded -o  v  virtual memory  –cumulative –format –deselect
    -l,l long          u  user-oriented   –sort –tty –forest –version
    -F   extra full    X  registers       –heading –no-heading –context
    ********* misc options *********
    -V,V  show version      L  list format codes  f  ASCII art forest
    -m,m,-L,-T,H  threads   S  children in sum    -y change -l format
    -M,Z  security data     c  true command name  -c scheduling class
    -w,w  wide output       n  numeric WCHAN,UID  -H process hierarchy

    I thought that I must have something wrong with the escaping. But when I ran the command that was printed out, it worked fine:

    yuval@obt:$ ps -o pid,command -C nc | grep “nc” | awk ‘!/awk/ && !/getPIDOf/ {print $1}’
    892

    I even tried the following simplified script:

    #! /bin/bash

    command=”ps -o pid,command -C $1″
    command2=”awk ‘!/awk/ && !/getPIDOf/ {print \$1}'”

    echo “$command | $command2”
    $command | $command2

    And got the following output:

    yuval@obt:$ ./getPIDOf nc
    ps -o pid,command -C nc | awk ‘!/awk/ && !/getPIDOf/ {print $1}’
    awk: 1: unexpected character ”’

    By this time I was quite puzzled. My simple getPIDOf script had become full of strange behaviors. So I decided to take the quick route out and use Perl instead of awk.

    Here is my final version of the getPIDOf script:

    #!/usr/bin/perl -w
    use strict;

    die “Usage: $0 <identifier1> [identifier2] …\n” unless (@ARGV);

    my $command = “/bin/ps -o pid,command -C \”$ARGV[0]\””;
    my $output = `$command`;
    foreach my $line (split(/\n/, $output))
    {
    if ($line =~ /^\s*(\d+)\s+(.+?)$/)
    {
    my $pid = $1;
    my $command = $2;
    my $found_mismatch = 0;
    foreach my $filter (@ARGV)
    {
    if ($command !~ /$filter/)
    {
    $found_mismatch = 1;
    last;
    }
    }

    if ($found_mismatch){next;
    }

    print “$pid\n”;
    }
    }

    And the output, as desired, is:

    yuval@obt:$ ./getPIDOf nc
    892

    Posted in Uncategorized | Leave a comment

    Improved SWIG C# wrappers for std::vector and std::map

    My previous posts on this topic part 1 and part 2 were posted about half a year ago. Since then SWIG version 1.3.40 has been released and it includes several enhancements in the C# wrapper for std::vector and std::map based on the work that I did, but with several great improvements implemented by William Fulton, the person in charge of the C# wrappers in SWIG.

    Since version 1.3.40 wrapping std::vector and std::map in C# is extremely simple. I’ll give a quick overview:

    Wrapping std::vector

    Lets assume you have a C++ class called MyClass with a method std::vector<int> GetIntVector(); You want to wrap MyClass from  C++ to C# and have the GetIntVector method return an IList<int> in C#. In order to do this, your swig interface file (the .i file) should look something like this:

    /* File : MyProject.i */
    %module MyProject

    %{
    #include “MyClass.h”
    %}

    %include “std_vector.i”
    %include “MyClass.h”

    %template(Int_Vector) std::vector<int>;

    The C# class Int_Vector will implement the IList<int> interface as desired.

    Now lets make things a bit trickier by introducing a new class called MyItem and lets add a method to MyClass with the following signature: std::vector<MyItem> GetItems(); In this case you’ll want the C# method to return an IList<MyItem>. One important issue that should be mentioned here is that any class being used in the IList<> generic must overload the == operator. Otherwise, it is only possible to use the IEnumerable<> generic interface.

    So if MyItem doesn’t overload the == operator, and you’re content with having the C# method return an IEnumerable<MyItem>, then your swig interface file will look something like this:

    /* File : MyProject.i */
    %module MyProject

    %{
    #include “MyClass.h”
    #include “MyItem.h”
    %}

    %include “std_vector.i”
    %include “MyClass.h”
    %include “MyItem.h”

    %template(Int_Vector) std::vector<int>;
    %template(Item_Vector) std::vector<MyItem>;

    The C# class Item_Vector will implement the IEnumerable<MyItem> interface.

    If, however, MyItem overloads the == operator, then you can have the C# Item_Vector implement the IList<MyItem> interface. You’ll need to tell swig that your class can be used with the IList<> interface by using the SWIG_STD_VECTOR_ENHANCED macro. Your swig interface file will look something like this:

    /* File : MyProject.i */
    %module MyProject

    %{
    #include “MyClass.h”
    #include “MyItem.h”
    %}

    %include “std_vector.i”
    %include “MyClass.h”
    %include “MyItem.h”

    %template(Int_Vector) std::vector<int>;
    SWIG_STD_VECTOR_ENHANCED(MyItem)
    %template(Item_Vector) std::vector<MyItem>;

    Now the C# class Item_Vector will implement the IList<MyItem> interface.

    Wrapping std::map

    Thanks to some nifty features added to the SWIG core, it is now possible to wrap C++ std::maps into C# IDictionary<> generic classes very easily.

    Lets assume you have a C++ class called MyClass with a method std::map<std::string, int> GetMap(); You want to wrap MyClass from  C++ to C# and have the GetMap method return an IDictionary<string,int> in C#. In order to do this, your swig interface file should look something like this:

    /* File : MyProject.i */
    %module MyProject

    %{
    #include “MyClass.h”
    %}

    %include “std_string.i”
    %include “std_vector.i”
    %include “MyClass.h”

    %template(String_Int_Map) std::map<std::string, int>;

    The C# class String_Int_Map will implement the IDictionary<string,int> interface as desired. It’s that simple. No need for any specialization macros or anything.

    Posted in Uncategorized | 1 Comment