Hướng dẫn sử dụng grep cho người mới bắt đầu

27 Oct 2010

Grep là công cụ cơ bản trong linux/unix dùng để tìm kiếm từ khoá trong file hoặc từ input. Ngoài grep ra còn có egrep giống như grep -E, rgrep giống như grep -r và fgrep giống như grep -F.

Dưới đây là vài ví dụ đơn giản có thể dùng hàng ngày

1. Tìm từ khoá keyword trong tất cả các file trong thư mục hiện thời và hiện đường dẫn của file đó
[code lang="bash"]
grep -H -r "keyword" *
[/code]

2. Tìm từ khoá kết hợp output từ lệnh khác
[code lang="bash"]
cat filename | grep 'something'
cat /proc/cpuinfo | grep -i 'Model'
[/code]

3. Đếm số kết quả tìm được
[code lang="bash"]
$ grep -c false /etc/passwd
7
[/code]

grep là công cụ rất nhanh và hữu ích. Hãy tìm hiểu thêm ở đây hoặc gõ "man grep" trong terminal. Chúc bạn thành công.

Shareable Resource - Pay attention

29 Sep 2010

I have a chance to work with the system, consists of two separate servers , both using Django. The server can communicate with each other via socket. However, the problem with socket is that it is not easy to send the big data and retain different resources. Commonly, the two servers need to share a common resource, usually the database. It is not effective to maintain the local database in each server and replicate the data between them whenever there is certain change to the server.

Everything seems to work, server A and B communicate via server, and both access to the same database (in my case it is using MySQL database) ... but then a bug emerges ...

The server A creates some data and save it to the database, using Django ORM, and send a request to the server B telling B to process the data server A just saves to database. Server B gets the request and needed information to process the data the server A saved to the database. The problem is server B cannot find the data server A just saved. What is going on???

My first suspect is that when server A saves the data, it might take a little time to MySQL to save it, so before data is actually stored to db, server B gets the request and cannot find it. But this is not true, when the error also come after 30 seconds ... It cannot be that saving data to database takes more than 30 seconds.

Then I suspect the cache of Django ORM, it is possible that the Django ORM retains its own cache, it retrieves data without really hit the database, and only refresh after a certain time. But then I find out that Django ORM does not really have any caching mechanism at all. But when I tried to use direct access to MySQL, the data can be found. So what is going on?

The problem is due to a thing I never thought of. The cache of MySQL database. The scenario is as followed, to save the time, usually a server retain a certain connection (session) to the database server when requesting multiple queries in its life span. It is not efficient to create and close the connection when making connection for each query. Server A retain a connection (session) to MySQL server, writing or reading data has no problem at all. In the same meaning, the server B also retains the same connection with MySQL server. Even after the server A saves the data, MySQL server still not updates or reflects the new data to the connection made and retained by B. This is due to the cache set up in MySQL database. That's why server B cannot find the database made by A. So we know the reason, how can we solve the problem? Two options

1. Disable the cache in MySQL, usually this is not the good option since the cache will improve significantly the performance of MySQL and usually we don't have root access to MySQL db at all

2. Easier one and feasible one, restart the connection to database before processing new request sent from server A. It is a bit less efficient, but it's working properly. The problem is how to reset the connection to db in Django ORM. Django ORM in fact dos not support any method to reset the db connection. Finally I found the trick to do that.

[code]
from django.db import connection
connection.close()
[/code]

This is not actually reseting the connection to db, but in fact close it. However, when you make a new query by Django ORM, the connection will be created again when it is seeing that the connection is closed before, it turns out a way to reset the db connection.

Conclusion

Even though this happens in the case the server is using Django framework in Python. But the problem might happen to other platforms as well. There are many chances you see the system with different servers access or share the same resource. Pay attention and make sure that the resource reflect the change effectively and the server can always obtain the latest state, without cache or delay in storing data. Otherwise, it can happens in a way that the server B lost the data or mis-used the data saved by other servers.

Architecture Model of Chrome Extension

22 Sep 2010

When I first came to Chrome extension development, it seems quite easy to me since it uses the familiar concepts in Web Development. It is written in JavaScript with UI in HTML and CSS. The only thing new is the manifest.json file which is used to define the basic properties and configurations of the extension. However, when going deeper, I encountered a lot of problems due to the confusions in the architecture. It seems to look like a normal Web development but actually it is not. It is totally different. So I hope this post will clear you out a little bit, avoid the confusions and have a better concepts in Chrome extension development.

The post assumes that you have a basic knowledge in Chrome extension development, if not check it out here http://code.google.com/chrome/extensions/getstarted.html

In the manifest json file, there might be two properties definitions for background-page and content-script. These properties are the most common used in the manifest file of Chrome extensions. They are also the most important entities to understand the architecture model of Chrome extension. Remind a bit about Chrome browser, in Chrome we can open many tabs as we want, but the extension is always stayed on top of those. The extension is like a small program sitting on top of the tabs and no matter how many tabs we open, there is only one instance of extension running and executing.

So what is the background page? I’m not surprised if you might think it is the page for displaying the background of the extension :) ... In fact, background page, which is named as a page, a page should contains the UI elements and is meant for displaying, but in this case the background page usually does not contain any UI elements. It is written HTML, Javascript, CSS and the file structure is exactly like any .html files, but it usually contains code in Javascript block. However, this file can access to a range of API provided by Chrome such as tabs, browser, bookmarks, events, history, window .etc... and everything is certainly written in Javascript.

Background page is like the many entry script of the extension, executed when the extension is started and no matter how many tabs we open, there is only one single instance of background page is created, spanned over the extension life time. In contrast, the content script is a the script executed whenever the page is loaded or the new tab is opened. The content script page is just a normal Javascript file. Even though the content script is executed every time a page is reloaded or opened, content script cannot access anything (objects, data, variables) in the page script (the script comes from the page) but the DOM elements. Let think at this way, content script is executed in parallel with the page script but it is executed in a special, isolated environment with page. In that way, the content script cannot potentially modify any important data or break the page script. However, the content script has the full access to the DOM elements in the page. This open the ability that the extension we created can control the view, the look of the page without unintentionally break the page script. If the extension needs communicate with the page script, that can be done by writing the data to a certain DOM element that both knows. This thing gave me a lot of confusions when I see that I could access the DOM elements but cannot access any data in the page script.

The architecture model of Chrome extension

Now back to the background page and content script. The background can access to mostly many properties of the browser such as history, bookmark ..etc... but cannot access the user page nor the content script. However, the background page can make a call to execute the methods inside the content script and if the content script want to access the data in the background page, it has to make the request to the background page via method
chrome.extension.sendRequest. The background page on the other hand can make any cross site requests but the content script cannot due to the security of cross site scripting. The only options for content script is to make the request with JSONP to a service or make the request to the background page to make the request to the required service.

To summarise things, if your extension is communicating with a certain Web services or APIs. That has to be done in the background page. If the extension wants to modify the view of user page, it must send request or execute the method defined in content script then the content script will access the DOM elements and modify the view accordingly. If the extension wishes to receive some data from the user page, then it has to be done in the content script and the content script will send the data to background page via the Chrome extension request. Finally note that the background has only one instances over many tabs or page, but the content script is created new on new page reloaded or open.

Conclusion
Chrome extension is a good API provided by Chrome browser, helps us writing more useful applications that can run directly inside the browser. The API does not require much knowledge beside Javascript. However, things might come too difficult to debug or development if you don’t understand the whole architecture model behind it. I hope this post helped you clear that out.

Unicode and Encoding in Python

22 Sep 2010

I used to have many errors in Unicode and encoding in Python due to that I underestimated it. Unicode and encoding are very basic concepts to understand, but handling without care might give unexpected errors. There are chances that converting from a byte stream object to Unicode object will give error. For example

[code]
s = "Thank you pälä"
u = unicode(s)
[/code]

The first line assigns the byte stream containing the character 'T', 'h' ..etc.. to the variable s. The next line will convert it to the Unicode object. However, the second will give you an exception due to that the default encoding in Python is ascii. Python will try to convert the byte stream data to Unicode string using ascii encoding, but the character ä which is encoded as 2 bytes 00 and E4 which is out of the range of first 128 characters (ascii codec can only process the first 128 characters in the ascii character map)

You might notice that the second line give no error with some other strings. In fact, that is when the byte stream data does not contain any Unicode character out of the range of the first 128 characters and because ascii characters set is a subset of Unicode, they are same for the first 128 characters.

To overcome the problem in converting the byte stream data to Unicode string. We must know the encoding of the string. There are certain cases that we cannot know the encoding in advance, the resolution is to guess it by trying various known and popular encodings such as ascii, UTF-8 and UTF-16 .etc… Assume we know the string is encoded in UTF-8, the second line can change to

[code]
u = unicode(s, 'utf-8')
[/code]

That is the case of converting byte stream to Unicode, how about the opposite case? Look following example

[code]
u = "Hello pälä"
f = open(“file.txt”, “w”)
f.write(u)
[/code]

This will give an exception in the third line

[code]
UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xe4′ in position 7: ordinal not in range(128)
[/code]

The reason is the same, when writing to file, sockets or some other media. It requires the byte stream object, the object contains byte by byte, not the Unicode string. Unicode string is a special object handled internally by Python, other media cannot understand it. Hence, before writing to file or sending data to the socket, Python will convert the Unicode string to the byte stream and guess what, it will use default encoding ascii to convert it. And since character ä is not in the range of 128 first characters, the result is that it will throw an exception error.

To overcome this problem, we must know the required encoding of the media we are going to deal with. Does the file require utf-8 encoded byte stream? There are many other encodings around contain the ä but the utf-8 is the most popular one. So the correct third line is

[code]
f.write(u.encode(‘utf-8’))
[/code]

Conclusion
Unicode is a industry standard of characters set used in various applications worldwide. Encoding on the other hand is how data is stored in the disk file. When data is moved around different environment or media, such as from the client browser, web server, socket and database. The data is encoded to byte stream and decoded to Unicode string to process many times. Each media might requires different representation of the data to process. So pay attention to the encoding of the data, process it properly. The cost is much more when encoding is handled improperly, data lost, bug introduced, crashed .etc… The choice is yours :D

Làm thế nào để browser không lưu cache các file CSS, JavaScript và Images

14 Aug 2010

Bạn đã rơi vào trường hợp phải show 1 design mới cho khách hàng và cần phải update các file CSS, JavaScript... và tất nhiên bạn không muốn browser của khách phải load lại các file có sẵn trong cache. Sau đây là một thủ thuật giúp bạn làm điều đó

[sourcecode lang="php" htmlscript="true"]
<link href="/css/stylesheet.css?<?php echo time(); ?>" rel="stylesheet" type="text/css" />
[/sourcecode]

[sourcecode lang="php" htmlscript="true"]
<link href="/css/stylesheet.css?1281732818" rel="stylesheet" type="text/css" />
[/sourcecode]

Trong ví dụ trên, hàm time() tạo ra 1 timestamp phía sau mỗi đường dẫn và làm cho browser load lại file đó mỗi lần user refresh lại trang. Thủ thuật này có thể áp dụng cho tất cả mọi loại file từ CSS, JS và file ảnh...

Tuy nhiên bạn nên cẩn thận khi sử dụng thì caching là điều tốt giúp giảm tải server và băng thông, vì vậy cách này nên dùng chỉ khi bạn đang develop hay test trang web. Trong môi trường production, các file CSS và JS nên được cache và thậm chí nén vì 2 mục đích trên.

Chúc bạn thành công

Force download trong Symfony 1.4

09 Aug 2010

Trong Symfony 1.4, mỗi action đều yêu cầu có 1 template hiển thị nội dung tương ứng, nhưng trong trường hợp bạn muốn đưa 1 file cho user download, bạn có thể dùng đoạn code dưới đây:

[code language="php"]
public function executeDownload(sfWebRequest $request)
{
$this->forward404Unless($this->getUser()->isAuthenticated());

$file = $this->getRoute()->getObject();

$this->forward404Unless(file_exists($file->getPath()), 'File not found');

$this->getResponse()->clearHttpHeaders();
$this->getResponse()->setHttpHeader('Content-Type', 'application/octet-stream');
$this->getResponse()->setHttpHeader('Content-Disposition', 'attachment; filename="' . basename($file->getPath()).'"');
$this->getResponse()->setHttpHeader('Content-Transfer-Encoding', 'binary');
$this->getResponse()->setHttpHeader('Content-Length', $file->getSize());
$this->getResponse()->setHttpHeader('Connection', 'close');

$this->getResponse()->sendHttpHeaders();

@readfile($file->getPath());

return sfView::NONE;
}
[/code]

Trong đoạn code trên, $file là một object lấy từ model object, bạn có thể tùy ứng thay đổi biến này để lấy các thông số như size, path...

Sau khi lấy được file cần thiết, chúng ta thông báo cho browser chuẩn bị download thông qua method setHttpHeader(), đây là method tương đương với hàm header() của PHP.

Sau đó đọc nội dung của file ra output buffer và các bạn đừng quên là phải có

[code language="php"]
return sfView::NONE;
[/code]

để Symfony không cần template để render HTML nữa.

Chúc các bạn thành công

Cài đặt LAMP server chỉ với 1 dòng lệnh trên Debian/Ubuntu

26 Jul 2010

Trên Debian/Ubuntu bạn có thể cài đặt Apache, MySQL và PHP chỉ với 1 dòng lệnh "apt-get" nhưng có một vấn đề là không ai có thể nhớ được hết tên các package. Cho dù bạn nhớ hết thì đây có thể là dòng lệnh mà bạn phải gõ:

[code]
apt-get install mysql-client mysql-common mysql-server mysql-server php5-mysql php-apc php-db php-pear php5 php5-cli php5-common php5-curl php5-gd php5-imagick php5-mcrypt php5-memcache php5-memcache php5-mysql php5-sqlite php5-suhosin apache2 apache2-doc apache2-mpm-prefork apache2-suexec apache2-utils apache2.2-common libapache2-mod-php5
[/code]

Nhưng thật ra có 1 câu lệnh ngắn hơn giúp việc cài đặt LAMP server rất dễ dàng:

[code]
apt-get install phpmyadmin
[/code]

Gói phpmyadmin yêu cầu bạn phải có những gói cơ bản như apache, php và mysql để có thể cài đặt. Nếu bạn cài phpmyadmin coi như bạn đã có một LAMP server hoàn chỉnh.

Chúc bạn cài đặt thành công :D

Creating utf-8 tables in symfony 1.4 /doctrine 1.2

09 Jun 2010

Add this method to your ProjectConfiguration.class.php

[sourcecode language="php"]
public function configureDoctrine(Doctrine_Manager $manager)
{
$manager->setCollate('utf8_unicode_ci');
$manager->setCharset('utf8');
}
[/sourcecode]

Test Your Might - Symfony vs Rails Framework Combat

09 May 2010

Today I stumbled upon this slide from BarCampNashville. It is a interesting feud for beginner to get a overall comparison between Symfony and Rails, Ruby and PHP. Enjoy

[slideshare id=2277608&doc=testyourmight-091019093300-phpapp02]

Introduction to Doctrine 2

24 Apr 2010

Found today interesting talking about Doctrine 2 with Jonathan Wage. Thanks for the original post from SymfonyLab. Enjoy.

[vimeo http://vimeo.com/11146571 w=500&h=400]

Older Newer

Zocoi Vietnamese /ʒoʊ kɔɪ/: (informal) Let's watch/read together