?????????????????
???????HDFS
????1??HDFS????????????????
????NameNode??????????????????????????????
????DataNode?????????????????????檔???????????????NameNode??????????????????洢??????
????2??HDFS?????????
????3??HDFS??????
????4??NameNode?????????????
?????????????????????????д?????????????У???????Secondary NameNode??checkpoint?????fsImage????????к????
??????????checkpoint????????
????5???????????????????????????????
??????????dfs.namenode.name.dir ·??????????·????nfs???????·????
????6??hdfs????У??????????????NameNode????Datanode?
??????NameNode?????DataNode????????????????????????????NameNode????????????洢??????????????????棬?????????????
???????????NameNode??????????????????????????????????С??150B????????????С?128M????????15G??NameNode??????洢12PB???????
????7??datanode??????????????????е????datanode?б???????У??????
?????????????????????Data???£??????????NameNode????????NameNode?????
????8??????????window?У?????????
?????????ò????????????д????????д???????????winutil?????????????????в??????????????????????????????Java????д???д??
????9??hadoop??HA????????
????????MapReduce
????1??MapReduce?У?fileinputformat -> map -> shuffle -> reduce?????
????2??MapReduce?У?job???????
????3???????Javabean???????????extends writableandCompareble????
????4???????outputformat?????в???????????
????5??MapReduce???Щ??ó???
????1?????????? TOPOne ??TOPN
????2????????????????μ?????????????????????????????????
????3??reduce???join
????4??map??join
????5?????????????
????????hive
????1??????hive??
?????????sql????MapReduce????????????????????????????????????mysql????????????????????????????????HDFS?С?
????Hive????HDFS?洢?????????MapReduce???????????
????hive2.0?汾??????????Spark???????
??????????????????jline??汾?????
????2??????????
????3????е?sql?????
????hiveshell?? hive -e "sql????"?? hive -f "???????????SQL???????"
????4??hive??????????
??????????????   ????????????external ?? location??
????????????????
????5??hive?????
????join
???????????
??????????
???????????????????????
????6??hive????庯????UDF??
???????sqoop
????????hadoop??map????????????е???????
?????????HDFS???????HDFS??·????Hive·?????ɡ?
?????塢flume
????1??agent??sources ?? channel ?? sinks
????2??sources??exec??spooldir??arvo ???????????????3??channel??men ?? disk4??sinks??arvo ??HDFS??kafka
????5??flume????????????????
????6??????????????????class myiterceptor implements Iterceptor
????//??????????????????????
????public static class mybuilder implements Iterceptor.Builder
????7????????flume??????????????????????
????????????storm????
????storm
????1:storm?????????????????????????????洢???????spout??open??nextTuple????????洢????kafka??????????????????????bolt?????
????bolt????prepare??execute????????????????????????bolt????????????????????????д??????????洢???С?
????2??storm?????????????
??????????????????????????м??????0??????
????3??????spout_max_pending (????????)
????4??jstorm???????????????worker?????????????????????
????5??storm????????
????nimbus??zookeeper??supervisor??worker
????nimbus????????????????????????????????д??zookeeper?С?
????supervisor??????nimbus??????????????????????????????worker?????supervisor?????????????????????????С?
???????????д???????????supervisor??????????????????????????????????
????worker?????spoutTask??boltTask???????????????????
????6??storm???????
????topology????spout??bolt?????????????????????????????????
????spout??
????open
????nexttuple
????declareOutputFields
????bolt:
????prepare
????execute
????declareOutputFields
????6??storm??tuple??????????????????????????list???????map
????list??????????
????map????????????ζ?????±?????????±?????????list?????????
????7??storm???????????