Liu 的个人资料唯有仰望是真实的照片日志列表更多 工具 帮助

日志


2008/1/21

用solr搭建中文搜索应用

因为工作的需要,要搭建一个小型的搜索应用,数据源已经存在了mysql里面,之前的版本是用mysql like做的,这次升级要增强扩展性和性能,添加一些功能上去。

可以选用的方案有几个,

1,在原有的mysql like的基础上扩展,利用mysql的全文查找功能,这方面没有做过,考虑到性能和其它因素,首先排除掉了。

2,套用在公司自己的search机制上,存在的问题是目前的search页面和检索耦合太严重,要套在这个框架中,要定制一些自己的rank和展现的改动比较麻烦,代码看起来有些恐怖。

3,最后决定使用lucene + solr, lucene提供了比较强大的索引检索接口,solr将其封装的很简单,方便各种语言的扩展,不用自己实现基于lucene的api的search server, 提交doc建索引和查询都通过http请求,可以返回xml/json格式的结果,非常的方便。

搭建服务:

1,下载lucene和solr,以及tomcat, 解压将solr下dist中solr.jar放到tomcat/webapps下面命名为solr.war, 拷贝solr目录下example/solr到当前目录或配置tomcat告诉其solr的目录,启动tomcat访问http://localhost:port/solr/admin能够看到solr已经运行了。

2,将使用的中文分词包,我这里选用jesoft的je-analysis.jar放到solr/lib, 配置solr/conf/schema.xml里面加上

<fieldType name="text_chinese" class="solr.TextField">
          <analyzer class="jeasy.analysis.MMAnalyzer" />
</fieldType>

3,修改solr/conf/schema.xml中Fields,定制你要检索的域。

4,ibm developerworks上一个solr转php文章中的代码,将http请求和构建doc的xml封装起来用php调用,你可以参考着做其它程序的接口。Apache Lucene quick-start guide 

5,这样就搭出了最基础的检索框架,在此之上能做什么要靠你的想象了:)

参考文章:

1, Search smarter with Apache Solr, Part 1- Essential features and the Solr schema

2, Search smarter with Apache Solr, Part 2- Solr for the enterprise

3, Apache Lucene quick-start guide

4, 使用solr搭建你的全文检索-我的知识库

5, http://jesoft.cn je-analysis MMAnalyzer的中文分词。

6, Lucene中文分词-庖丁解牛Paoding Analysis,另一个很赞的分词,开源的哦,相当不错。

7, solr 的主页 http://lucene.apache.org/solr/, 有一个不错的wiki,关于tomcat配置部署和高级faceted search和cache一些高级的功能都可以在这里看到。

转:转:转:黄牛党揭秘火车票哪儿去了(zz)

每一个理性的人,或者每一个喜欢思考的程序民工总是会对一件事情,一个流程是如何work的很感兴趣,或者更有追求的人,会想到如何改变使得这个世界更加美好。

方旭按:上周其仁的课学到的唯一一点就是:人们会在某个特定制度下做出最理性的选择。所以,黄牛扎票,人们拥挤上下车,等等所谓“没素质”现象恰恰是最有素质的表现-这是经过思考的大脑做出的判断。根不除,大陆永远是开着鲜花的毒瘤。

这篇文章不知真假,但是我是倾向于相信文中所讲的,所以如果春运不幸还要回家的人们还没有买到票,又不想费心费力省钱,最好的办法就是直接找黄牛,如果真的想省钱,不妨试试文中最后的办法。

转:黄牛党揭秘火车票哪儿去了

王佩 @ 2008-1-20 1:37:08 阅读(69) 引用通告 分类: 不分类

转自天涯社区–黄牛党自述:火车票哪去了以及如何买到火车票

按一按:还是老话,白板报转载只为传递信息之目的,不对其真实性负责。不过,火车票确实都没了?如果不是被铁路局长用来烧纸钱用掉的话,那么肯定有个去处。

  首先票肯定不在售票窗口,当然也不在每个代理点.

  票都在我们这些人手里面,至于怎么拿到的,大家有兴趣听不?

  还有就是春节票价涨还是不涨对广大需要买票回家的人来说没什么实际实际的意义,因为买不上票,何来高低.说来车票不涨价最大的受益者是谁?是贩票卖票的,因为成本低了,而利润不会减少,风险也相应的下降了.

  首先来说火车票按照国家的要求只有2个地方可以买到.第一就是火车站的售票窗口,第二就是取得火车票代售资格的正规代理点.火车站窗口售票是不能加价的,只能按照票面金额收取,代售点每张只能收取5元的手续费,多收则违法,轻者取消代理资格,重的要糟行政处罚.

  然后我们来说说关于代售点的经营,其实代售点的成本不高,如果正常渠道申请,2W块钱就能够申请下来,每月费用不足2000元.所以5元的代售 费用是完全能够保证其经营的.但是由于铁路是垄断行业要申请代售点,是一个非常复杂的过程,其中环节黑幕吓得惊人,用我们的话来说50W能够申请下来还是 你找到了正确的路子.很多时候就是你拿钱都不能够申请到的.

  既然各个代售点,都是花了大价钱换来的代售资格.那么收回成本和创造大的利润就是他们的唯一现实的想法.在平时火车票不饱和的情况下这样的想法 是很难实现的,但春节就不同了,典型的一个卖方市场,我有着时下最紧俏的商品,所以大多的代售点就打起了春节这一黄金时间的主意.但是,因为有不能加价 的,所以就只能想其他办法了.比如找个熟悉而可靠的人(第一级的黄牛,当然还有比这个更大的黄牛, 不幸的是我还不曾认识!那就是直接在火车站拿票的.)找到人后,就把票大量的打给他,期间每张票加价10到20不等.还有值得注意的是即使代售点打票也不 能随心所欲的打,一手最多能打九张(不能打多了,因为一次打上10张票,就要向铁路局申报.)也就是说一趟车即使手脚在快也最多能够打两手,18张票.但是全部代售点都这样做的话,一趟车也就是说1分钟之内票就会被打光.这就是为什么春节大多数代售点没票卖的原因.

  现在说说大家最关心的打票时间问题.大家有个误解,打票是预售10天的,所以早上8点去火车站排票.这里有个时间差,火车站是早上8点开始上班 打票,但是放票确是晚上12点已经开始放了,我上面说过了1分钟内所有的票都被代售点打光了,火车站排第一位的同志都没票买,其中原由大家应该知道了吧! 火车站没说谎,确实没票可卖了.

  当一个大的黄牛手中囤积了大量票的时候,我们这些属于中间层的中黄牛,就会想苍蝇看见狗屎一样贴上他们.人趋利嘛,大家应该可以理解.我们这些 人又加了20-30/张在大黄牛那里拿到票,再加30-50不等到车站,路边,旅店兜售的小黄牛.最后才到大家真正需要票的人手上,那个时候可能就会高出 票面100或者更多,一句话中间经手的人越多,你买到的票就越贵.

  这就是我经历的倒票的一个基本流程.对于火车站那块的倒法还在钻研中.

  至于怎么才能买上票,用个专业的说法,如果你想在自己需要的时间买到自己需要的票.最简单直接的办法,找黄牛买高价.(没办法啊,吃这碗饭的,不能坏了规矩希望大家理解!)

  但是如果你耗得起时间的话,可以告诉你们一个很笨但是很可行的办法.(去火车站排队,是最不可行的!)

  白天的时候找上一个可以现场出票的正规的代售点,当然你那时侯去问肯定是没有票的,记好在什么地方,联系电话,营业执照等数据.千万小心,春节的时候各个代售点都会异常谨慎的,所以记的时候小心点:)

  然后你就该干什么干什么去.等,等到晚上11点30左右你就回到那里,你会发现虽然大门紧闭但是灯火闪亮.你就上去敲门,这敲门是有窍门的.不 能太急促,会吓坏人家的.然后就会看见灯火熄灭鸭雀无声,不要走继续轻轻的敲,实在不行就打电话告诉里面的人你没恶意.只是想要某某天某次车的车票.一般 里面的人会搭理你的,你也会得到你想要的票.因为你呆在外面,他们心理有压力,怕啊!你不走他们就不敢大张其鼓的打票,就没钱赚,对于你那张票,他们就是 顺便而已.所以一般可以.听好了,我说的是一般,不是一定.(昨年,我们就欲上了这样一个神人,在我们门口站了2小时害得我们不敢出门,最后磨不过那老 兄,原价卖给他三张,痛心啊!)

2008/1/20

The seven rules of unobtrusive JavaScript(zz)

from http://dev.opera.com/articles/view/the-seven-rules-of-unobtrusive-javascrip/

By Christian Heilmann · 19 Dec, 2007

Published in: unobtrusive, javascript, DOM

Introduction

I've found the following rules over the years developing, teaching and implementing JavaScript in an unobtrusive manner. They have specifically been the outline of a workshop on unobtrusive JavaScript for the Paris Web conference 2007 in Paris, France.

I hope that they help you understand a bit why it is a good idea to plan and execute your JavaScript in this way. It has helped me deliver products faster, with much higher quality and a lot easier maintenance.

1. Do not make any assumptions (JavaScript, the unreliable helper)

Probably the most important feature of unobtrusive JavaScript is that you stop making assumptions:

  • You don't expect JavaScript to be available but make it a nice-to-have rather than a dependency
  • You don't expect browsers to support certain methods and have the correct properties but you test for them before you access them
  • You don't expect the correct HTML to be at your disposal, but check for it and do nothing when it is not available
  • You keep your functionality independent of input device
  • You expect other scripts to try to interfere with your functionality and keep the scope of your scripts as secure as possible.

The first thing to consider before you even start planning your script is to look at the HTML you are enhancing with scripting and see what you can use for your own purposes.

2. Find your hooks and relationships (HTML, the base to build on)

Before you start your script look at the base that you build upon. If the HTML is unstructured or unknown there is hardly any way to create a clever scripting solution - you will most likely create either far too much markup with JavaScript or your application will depend on JavaScript.

There are several things to consider in your HTML - hooks and relationships

HTML Hooks

HTML hooks are first and foremost IDs, as these can be accessed with the fastest DOM method - getElementById. These are safe as IDs are unique in a valid HTML document (IE has a bug with name and ID, but good libraries work around that) and easy to test for.

Other hooks are HTML elements which can be read out with getElementsByTagName and CSS classes, which can not be read out with a native DOM method in most browsers (Mozilla will soon have one and Opera 9.5 already does though). However, there are a lot of helper methods that allow for a getElementsByClassName.

HTML relationships

The other interesting thing about HTML is the relationships of your markup. Questions to ask yourself are:

  • How can I reach this element the easiest and with the least steps traversing the DOM?
  • What element do I need to alter to reach as many child elements that I need to change?
  • What attributes or information does a certain element have that I can use to link to another?

Traversing the DOM is expensive and can be slow, that is why it is a good idea to leave it to a technology that is already in use in browsers.

3. Leave traversing to the experts (CSS, the faster DOM traveller)

It is pretty interesting that DOM scripting and traversing the DOM with its methods and properties (getElementsByTagName, nextSibling, previousSibling, parentNode and so on) appears as a confusing matter to a lot of people. It is interesting as we already do it with a different technology: CSS.

CSS is a technology that takes a CSS selector and traverses the DOM to access the desired elements and change their visual attributes. A rather complex JavaScript using DOM can be replaced with a single CSS selector:


var n = document.getElementById('nav');
if(n){
    var as = n.getElementsByTagName('a');
    if(as.length > 0){
        for(var i=0;as[i];i++){
            as[i].style.color = '#369';
            as[i].style.textDecoration = 'none';
        }
    }
}

/* is the same as */

#nav a{
    color:#369;
    text-decoration:none
}

This is a very powerful companion to have and you can piggyback on it. You do that by dynamically assigning classes to elements higher up in the DOM hierarchy or altering IDs. If you simply add a class to the body of the document using DOM you can easily offer a chance for a designer to define both the static and dynamic version of the document:


JavaScript:

var dynamicClass = 'js';
var b = document.body;
b.className = b.className ? b.className + ' js' : 'js';

CSS:
/* static version */

#nav {
  ....
}

/* dynamic version */

body.js #nav {
  ....
}

4. Understand browsers and users (build on existing working usage patterns and create what you need)

A really important part of unobtrusive JavaScript is to understand how browsers work (and especially how browsers fail) and what users expect to happen. It is easy to go overboard with JavaScript and create a completely different interface with it. Drag and Drop interfaces, collapsible sections, scrollbars and sliders can all be created with JavaScript, but there is much more to those than just the technical implementation. You have to ask yourself:

  • Will my new interface work independent of input device, and if not, what should be the fallback?
  • Is the new interface that I am building following rules of the browser or the richer interfaces it came from (can you navigate a multi level menu with your cursors or do you need to tab through it?)
  • What is functionality that I need to offer but that is dependent on JavaScript?

The latter is really no issue, as you can use the DOM to create HTML on the fly in case you need it. An example of this are "print this" links - browsers don't offer a non-JavaScript way of printing a document, which is why you should create links like these with the DOM. The same applies to clickable headings that collapse and expand content. Headings can not be activated with a keyboard, but links can. In order to create clickable headings you should use JavaScript to inject links inside them and all is well - even keyboard users can then collapse and expand the content sections.

Great resources for solutions of this kind of problem are design pattern libraries. As for knowing what works in browsers independent of input device, this is a matter of experience. First of all you need to understand the concept of event handling.

5. Understand Events (Event handling to initiate change)

Event handling is the next step to truly unobtrusive JavaScript. The point is not to make everything draggable and clickable or add inline handling. The point is to understand that Event Handling is true separation. We separate HTML, CSS and JavaScript but with Event Handling we go much further.

Elements in the document are there to wait for handlers to listen to a change happening to them. If that happens, the handlers retrieve a magical object (normally as a parameter called e) that tells them what happened to what and what can be done with it.

The really cool thing about most event handling is though that it does not only happen to the element you want to reach but also to all the elements above it in the DOM hierarchy (this does not apply to all events though - focus and blur don't do that). This allows you to assign one single event handler to for example a navigation list and use event handling's methods to reach what element was really involved. This technique is called event delegation and it has several benefits:

  • You only need to test if a single element exists, not each of them
  • You can dynamically add or remove new child elements without having to remove or add new handlers
  • You can react to the same event on different elements

The other thing to remember is that you can stop events from being reported to parent elements and you can override the default action HTML elements like links have. However, sometimes this is not a good idea, as browsers apply them for a reason. An example would be links pointing to in-page targets. Allowing for them to be followed makes sure that users can bookmark the state of your script.

6. Play well with others (Namespacing, scope and patterns)

Your code will hardly ever be the only script used in the document. It is therefore of utmost importance that you make sure your code does not have global function or variable names that other scripts can override. There are several patterns available to avoid this issue. The most basic is that you instantiate every variable using the var keyword. Let's say we have the following script:


var nav = document.getElementById('nav');
function init(){
    // do stuff 
}
function show(){
    // do stuff 
}
function reset(){
    // do stuff 
}

This has a global variable called nav and functions called init, show and reset. The functions can access the variable and each other by name:


var nav = document.getElementById('nav');
function init(){
    show();
    if(nav.className === 'show'){
        reset();
    }
    // do stuff 
}
function show(){
    var c = nav.className;
    // do stuff 
}
function reset(){
    // do stuff 
}

You can avoid all this global code by wrapping it in an object using the object literal, thus turning the functions into methods and the variables into properties.You need to define the methods and variable with a name followed by a colon and you need to separate each of them from the others with a comma.


var myScript = {
    nav:documentgetElementById('nav'),
    init:function){
        // do stuff 
    },
    show:function){
        // do stuff 
    },
    reset:function){
        // do stuff 
    }
}

Each of these can be accessed from outside and inside the object by prepending the object name followed by a full stop.


var myScript = {
    nav:documentgetElementById('nav'),
    init:function){
        myScript.show();
        if(myScript.nav.className === 'show'){
            myScript.reset();
        }
        // do stuff 
    },
    show:function){
        var c = myScript.nav.className;
        // do stuff 
    },
    reset:function){
        // do stuff 
    }
}

The drawbacks of this pattern is that you have to repeat the name of the object every time you access it from another method and that everything you have in your object is publicly accessible. What if you want to only make parts of the script accessible to other script in the document? For this you can use the module pattern:


var myScript = function(){
    // these are all private methods and properties
    var nav = document.getElementById('nav');
    function init(){
        // do stuff 
    }
    function show(){
        // do stuff 
    }
    function reset(){
        // do stuff 
    }
    // public methods and properties wrapped in a return 
    // statement and using the object literal
    return {
        public:function){
        
        },
        foo:'bar'
    }
}();

You can access the public properties and methods that are returned the same way you can in the object literal, in this case myScript.public() and myScript.foo. There is another annoyance though: if you want to access one public method from another or from a private method you need to go through the verbose long name again (the main object name can get rather long). To avoid this, you define them as private methods and only return an object with synonyms:


var myScript = function(){
    // these are all private methods and properties
    var nav = document.getElementById('nav');
    function init(){
        // do stuff 
    }
    function show(){
        // do stuff 
        // do stuff 
    }
    function reset(){
        // do stuff 
    }
    var foo = 'bar';
    function public(){
    
    }
    // return public pointers to the private methods and 
    // properties you want to reveal
    return {
        public: public,
        foo:foo
    }
}();

Ths allows for a consistency in coding style and also allows you to write shorter synonyms when you reveal them.

If you don't want to reveal any of your methods or properties to the outside world, you can wrap the whole code block in a anonymous function and call it immediately after it was defined:


(function(){
    // these are all private methods and properties
    var nav = document.getElementById('nav');
    function init(){
        // do stuff
        show(); // no need for prepended object name
    }
    function show(){
        // do stuff 
    }
    function reset(){
        // do stuff 
    }
})();

This is a great pattern for functionality that just needs to be executed once and has no dependency on other functions.

Following all of this will make your code work well for the user and the machine it is running on as well as other developers. However, there is one more group you have to think about.

7. Work for the next developer (Making maintenance easier)

The last step to make your script truly unobtrusive is to give it another go-over when you finished and think about the next developer who has to take over from you once this went into production. Consider the following:

  • Are all the variable and function names logical and easy to understand?
  • Is the code logically structured? Can you "read" it from top to bottom?
  • Are the dependencies obvious?
  • Have you commented areas that might be confusing?

The most important bit is to understand that the HTML and CSS of a document is much more likely to change than the JavaScript (as these make up visual output). Therefore it is a great idea not to have any class and ID names or strings that will be shown to the end user buried somewhere in the code but separate it out into a configuration object instead.


myscript = function(){
    var config = {
        navigationID:'nav',
        visibleClass:'show'
    };
    var nav = document.getElementById(config.navigationID);
    function init(){
        show();
        if(nav.className === config.visibleClass){
            reset();
        };
        // do stuff 
    };
    function show(){
        var c = nav.className;
        // do stuff 
    };
    function reset(){
        // do stuff 
    };
}();

That way maintainers know exactly where to change these without having to alter the rest of your code.

More information

These are the seven rules I found. If you want more in-depth information about the subjects that were covered, try out the following links:

(The original posting of this article is available at http://icant.co.uk/articles/seven-rules-of-unobtrusive-javascript/ - this version reproduced under a Creative Commons license, and with agreement from the delectable Mr. Heilmann.)

2008/1/16

一个影视广告传媒方面的想法

那天吃饭前在电梯里想到的想法,不知道是否现有的传媒公司已经投入力量去做了,也不知道技术和法律上可行性如何,但是我这猪脑子能偶尔想出点东西来,还是写出来吧,而且给同事们说了说,大家觉得还行,尤其铮姐似乎对此认可比较高。

前提是我对影视广告传媒的理解几乎都是自己瞎猜的,不对的地方有牛人请指正:

1, 目标: 更准确和细致的广告覆盖受众的分析,带来更精准的广告投放,更好的效果,更多的银子。

2,分析方法: 给现在的视频广告(如分众的写字楼电视广告配置摄像设备), 对视频进行分析,
观看人群人数和时间统计,
人群组成,年龄,性别,职业(这个难度很大)分布,
特定广告的关注度,多少人在盯着看,看的时间长度,

3, 数据利用:
可以离线分析然后拿出报表给广告投放商看,证明投放价值,和试点投放,更好的效果反馈
更高级的可以实时的针对受众进行广告投放,这会女生多,多投放化妆品类,男生多了,汽车类,年轻人多了,时尚消费品类。

4, 市场前景:
目前国内的广告业突飞猛进,在web上我觉得这几年有进步,但是目前来看户外传媒的市场更大更靠谱,光看分众还有几家的势头就毋庸我多说了,这个东东做出来不是和别人竞争,可以是为这些大头服务。

5, 技术储备:
这点我没有,不过很期待找时间看看视频编解码和图像分析处理的东东,至少这些知识我知道如何去学习,不像女人,永远都搞不懂!!!

2008/1/10

哈佛大学自习室墙上贴的训言(zz)

2007-10-07 17:52

1此刻打盹,你将做梦;而此刻学习,你将圆梦。
2.我荒废的今日,正是昨日殒身之人祈求的明日。
3.觉得为时已晚的时候,恰恰是最早的时候。
4.勿将今日之事拖到明日。
5.学习时的苦痛是暂时的,未学到的痛苦是终生的。
6.学习这件事,不是缺乏时间,而是缺乏努力。
7.幸福或许不排名次,但成功必排名次。
8.学习并不是人生的全部。但,既然连人生的一部分——学习也无法征服,还能做什么呢?
9.请享受无法回避的痛苦。
10.只有比别人更早、更勤奋地努力,才能尝到成功的滋味。
11.谁也不能随随便便成功,它来自彻底的自我管理和毅力。
12.时间在流逝。
13.现在淌的哈喇子,将成为明天的眼泪。
14.狗一样地学,绅士一样地玩。
15.今天不走,明天要跑。
16.投资未来的人是,忠于现实的人。
17.教育程度代表收入。
18.一天过完,不会再来。
19.即使现在,对手也不停地翻动书页。
20.没有艰辛,便无所获。