## Heartbleed vulnerability

Just saw some friends sharing this in Wechat. Seems I will have a lot of work to do – patching my servers, replacing certificates, regenerating keys, etc.:(

It’s really unbelievable. This will be a big blow on the open source community. I already saw people saying “see? Open source is no securer”. Well, they are right.

To me, this has nothing to do with open source or not. M$may have something even worse but you will need longer time to find out. I just searched the web and found that as a library that has been so widely used, OpenSSL has only one full time developer and receives on average$2,000 donation per year. What do you expect?

But that simply won’t justify such a disaster and it will cast a bad image for open source community in general. I can only hope leaders in this industry see this differently and start to support these great open source projects. They deserve better!

## spf记录怎么用

How To use an SPF Record to Prevent Spoofing & Improve E-mail Reliability

v=spf1 spf记录的起始标志。目前只有1号版本的spf，所以只有这一种写法。

1. + 表示该地址被明确允许发送来自该域名的邮件。该修饰符可省略。
2. – 表示该地址被明确禁止发送来自该域名的邮件。
3. ? 表示对该地址目前暂时没有策略。
4. ~ 表示对该地址应该使用柔性策略，接收方应该接受，但是可以进行特殊标记。

1. all 匹配任何地址；
2. a 匹配该域中任何A记录地址；
3. ip4: 匹配紧随其后的IPv4地址或者地址段；
4. ip6: 匹配紧随其后的IPv6地址或者地址段；
5. mx 匹配该域中的任何mx记录地址；

v=spf1 ip4:123.23.4.5 -all

v=spf1 ip4:123.23.4.5 ip4:123.23.4.5 -all

v=spf1 mx -all

v=spf1 mx ip4:107.8.9.0/24 -all

v=spf1 mx ip4:107.8.9.0/24 ?123.23.4.5 -all

v=spf1 mx ip4:107.8.9.0/24 ~123.23.4.5 -all

v=spf1 mx ip4:107.8.9.0/24 -all

Just noticed today that Google finally discontinue Google Reader. This had been quite annoying to me, since Google announced the decision in March, but now I’m already using noinnion’s gReader and I’m quite happy with it. All my feeds have been migrated without a problem. The UI is very simple and intuitive. Highly recommended!

## Linear regression in Excel – continued

Doing linear regression in Excel is simple. Now I’d like to show you how exactly this is done.

Let’s use the same data set:

 x 9.42 2.78 5.87 5.49 6.48 4.91 7.82 2.7 2.65 8.42 5.85 0.93 y 52.24 4.15 13.72 7 5.28 6.98 42 7.15 3.41 31 12.59 1.09

Let’s say we want do a order-2 linear regression. That means, we want a function:

f(x)=c*x^2 + b*x+a

a, b and c should be chosen in a way such that if we put x into f(x), the difference/error between f(x) and y, will be minimized, in the least square sense.

So we have the difference R as a function of a,b,c：

R(a,b,c)=(c* 9.42^2 + b* 9.42 + a – 52.24)^2+(c*2.78^2+b*2.78+a-4.15)^2+…+(c*0.93^2+b*0.93+a)

R is smooth every where in space. So R is minimal where:

D(R)/D(a)=0; D(R)/D(b)=0; D(R)/D(c)=0

Now we’ll have to employ the compact form to avoid tedious typing:

In order for R to be minimal, we need:

Insert R into the equations and expand them, we get:

That is:

Or in matrix form:

Now this is a very simple equation, so we can simply solve it using Cramer’s rule:

It looks scary but actually not too complex. We need the sum of x, x^2, x^3, x^4, x*y and x^y. We can actually verify this in excel:

 x y x^2 x^3 x^4 x*y x^2*y 9.42 52.24 88.69578 835.323 7866.942 491.974 4633.334 2.78 4.15 7.721688 21.45697 59.62447 11.53932 32.06537 5.87 13.72 34.43254 202.0476 1185.6 80.52797 472.5321 5.49 7.00 30.18894 165.8715 911.3719 38.47401 211.3934 6.48 5.28 41.99053 272.099 1763.204 34.19838 221.6059 4.91 6.98 24.08858 118.227 580.2595 34.23907 168.0457 7.82 42.00 61.22442 479.0568 3748.43 328.6334 2571.426 2.70 7.15 7.269068 19.59829 52.83935 19.27266 51.96143 2.65 3.41 7.003793 18.53532 49.05312 9.028243 23.89296 8.42 31.00 70.90802 597.0945 5027.948 261.0414 2198.149 5.85 12.59 34.27254 200.6409 1174.607 73.71933 431.5732 0.93 1.09 0.869064 0.810173 0.755272 1.014426 0.945685 63.32 186.61 408.665 2930.761 22420.63 1383.662 11016.92

So we have:

So in the end we get:

This is exactly what Excel displayed on the chart:

I actually put all this in an Excel file so you can try it yourself. Please note that in the excel file we have a data set of 12 points. If your data set has a different size, you may have to adapt the formulas a little bit. 🙂

## Linear regression in Excel

Excel has a very handy feature that does linear regression for you very intuitively.

Let’s say you have a data set like this:

 x 9.42 2.78 5.87 5.49 6.48 4.91 7.82 2.7 2.65 8.42 5.85 0.93 y 52.24 4.15 13.72 7 5.28 6.98 42 7.15 3.41 31 12.59 1.09

First, you’ll have to insert a scatter chart:

You’ll get a chart like this (Excel is smart enough to figure out what should be the x-axis and what should be the y-axis):

Now, right click on the dots and then you can select “Add Trendline…” from the pop-up menu.

Then you’ll be presented with a dialog like this:

Note that Excel has different options for “Linear” and “Polynomial”. Here the option is about the relationship between the variables. All these different options are all Linear Regressions.

Let’s try a order 2 polynomial regression and check the option “Display Equation on chart” and “Display R-squared value on chart”. This is what you’ll get:

Good thing about doing this in Excel is, you can always change the data set and your chart and equation will be updated automatically.

Depending on the nature of your data set, you may want to try different model to get a better fit, the procedure will be the same.

## Windows mind set vs. Linux mind set

I wanted to talk about this for a long time, however, the key points were not clear until recently I ran into the pipeline concept in PowerShell.

When you do pipeline in PowerShell, you’re passing not a text stream from the left command to the right command, but rather, you’re passing a .net object.

When I read this from TechNet, I smiled to myself, yes, this is typical Microsoft!

Windows PowerShell provides a new architecture that is based on objects, rather than text.
The cmdlet that receives an object can act directly on its properties and methods without
any conversion or manipulation. Users can refer to properties and methods of the object by
name, rather than calculating the position of the data in the output.

And I said to myself, now I know what I wanted to say.

One key difference between Windows and Linux is, Windows always tries to be smarter, where GNU Linux tries to stay plain and humble. Pipeline in scripting is just one recent example.

Linux has been using text configuration files. Windows came along and said, we need something better. That’s how Windows Registry came about.

Linux has been using pipeline for IPC. Windows came along and said, we need something better. That’s how COM came about.

Linux has been using lock files to prevent processes to start another instance. Windows came along and said, we need something better. So they use Kernel Object instead.

Linux has been using permissions as a basic security measure. Windows came along and said, we need something better. So they do everything using ACLs.

Linux has been using symbolic links. Windows came along and said, we need something better and introduced short cuts.

To be fair, not all of them are bad ideas.

Despite its complexity and awkward configuration, COM gained such popularity that it became the basic foundation of modern Windows. Open registry in any Windows that has been used for a while and chances are the biggest tree is HKLM\Classes\CLSIDS. (One of the key reasons why you Windows becomes slower as you install more and more software).

Kernel Object is much more reliable than lock files. ACLs indeed provide much flexibility in terms of access control. Linux is also doing it now.

However, we all know that Windows have record of being smart in cheap ways and then fail pathetically. (SilverLight is the one that came into my mind as I write this) To me, this idea of passing objects through pipeline looks like just another one that will fail.

Having said that, I have to admit that, this difference between Windows and Linux is also not surprising.  Linux is developed by community led by technical experts. Introducing new features always involves extensive discussions between these experts.That’s why Linux has a bad reputation of not listening to its users.

Where as for Windows, most likely, new features are proposed by requirement collection team and developer team, then the list of new features have to go through rounds of prioritization processes. If there are disputes, then there will be escalations and some manager will decide. Once decided, then features still in the list will be implemented. Period.

Now, it is actually surprising that Microsoft actually made some key decisions right, right? 🙂

## Monty Hall Problem simulation in Excel

Monty Hall Problem is definitively the most confusing puzzle among educated people. After explaining this to different colleagues and friends (With a success rate below 50%), I found it’s easier simply presenting the fact. So here’s a simulation of it in excel.

Each row in the table is a test case, or an experiment.

In each such a experiment, a random number is generated to determine where do we put the car. A goat will be placed behind each of the rest doors. So in the table, we have “C” denotes a Car and “G” denotes a goat.

If the number is between 0-1/3, then the car will be placed behind Door1. If the number is between 1/3-2/3, then the car will be placed behind Door2. If the number is between 2/3-1, then the car will be placed behind Door3.

Then in the 6th column of the table, another random number is generated to determine the 1st choice.

Again, if the number is between 0-1/3, that means the guest chose Door1 in the first attempt. If the number is between 1/3-2/3, then the guest chose Door2 in his/her first attempt. If the number is between 2/3-1, then he/she chose Door3 in his/her first attempt.

Note that both of the 2 random numbers have a unify distribution between 0 and 1. And the 2 numbers are completely independent to each other.

Also in the table, the guest’s first choice is high-lighted in green.

The next column, the 7th column, shows what the guest will get if he sticks to his/her first attempt. And the 8th column shows whether he/she wins a car or not. (1 wins, 0 loss).

The 9th column shows what the guest will get if he/she switches to the other door when he/she is given another chance. And the next column shows whether he/she will win or not if he switches.

In the end of the table, after 400+ experiments, the outcome of the 2 strategy was summed up. Out of 432 experiments, you’ll get a car 142 times, if you stick to your original choice, you’ll get a car 290 times, if you switch.

## CAN dissector available in Wireshark

Just noticed that Wireshark has a CAN dissector, as well as a CAN bus captor, though the captor is only available in Linux.

Back in 2011, I was asked to come up with a parser for data captured on CAN network but never really finished it. Now I can forget about it.

## What slows down your Windows?

Windows has long been complained of performing slower and slower as being used. This essay tries to explain how and why.

Before we go into details, let’s make it clear that we’re talking about the architecture design of Windows that makes it not performing in certain situation. So this is consistently measurable performance difference. We’re not talking about bad performance because of wrong configuration. Nor are we going to talk about performance of a specific program.  A specific instance of slow down, for example, your notepad will performance slower when you have a lot of other programs running, is not what we’re going to talk about.

First of all, there’s no reason Windows should perform worse (or better) if you just spend more time on it. If the installation keeps the same, the size of your computer keeps the same, Windows should perform the same.

However, if one of your running programs or a device driver has memory leak, then it will eat up more and more memory as time pass by. That will slow down your Windows. The unique thing about this type of performance issue is, after a fresh restart, Windows should perform well. In the early days, memory leak was a common issue on Windows. That attributed much to the common belief that you should restart your Windows once in a while to keep better performance. Now, most commonly used program is mature enough to be free of memory leak, Windows should perform just the same as time goes by.

However, as time goes by, you’ll probably keep installing new software on your Windows. This could indeed slow down your computer. The reason is, by installing any non-trivial software, you’re not only copying files to the disk, but also registering COM components to Windows. These registry key/values will be keep in memory. So the more software you install, the less memory your program will be able to get.

As an example, after you install a program that is able to open a new file format, chances are:

• You’ll not be able to see a thumb nail that shows the content of the file in Explorer;
• You’ll see this program listed in the pop-up menu of this file type;

They were made possible using COM technology that depends heavily on Windows Registry.

The problem is, this registry keys/values won’t get cleaned up when you uninstall the software. So Windows registry keeps growing and growing, your programs have lesser and lesser memory to use.

The other factor that would for sure slow down your computer is disk fragmentation. Disk fragmentation affect performance not only when your program does disk IO. If your page file is fragmented, paging operation will be slow. That will cause noticeable sluggish. Also remember that all executable files and dll files become memory map files, so if they are fragmented, your program will be slow not only during start up, but also in running phase.

Bloated registry and disk fragmentation are the two reason that your Windows slows down in the long run. In some of my Windows computers, I actually create separate partitions for page file, for outlook pst file and Windows tmp file. These techniques worked quite well.

## 泊松分布在容量管理中的应用

POISSON(x,tp*ci,TRUE)