Title Author(s) Citation Issue Date BDD/ZDDを基盤とする種々の離散構造の演算処理と代数 系(algebra)について 湊, 真一 2010年度科学技術振興機構ERATO湊離散構造処理系プロ ジェクト講究録. p.148-157. 2011-06 DOI Doc URL http://hdl.handle.net/2115/48463 Right Type conference presentation Additional Information File Information 22_all.pdf Instructions for use Hokkaido University Collection of Scholarly and Academic Papers : HUSCAP ERATO セミナ 2010 - No. 22 BDD/ZDD を基盤とする種々の離散構造の演算処理 と代数系 (algebra) について 湊 真一 北大情報科学研究科/JST ERATO 湊離散構造プロジェクト 2010/10/22 概要 本 ERATO プ ロ ジ ェ ク ト で は 、離 散 構 造 を 統 合 的 に 扱 う 基 本 処 理 系 と し て BDD/ZDD を位置づけ，分野横断的な応用を持つ技術体系として再構築することを 目指して研究活動を開始している．本講演では，BDD/ZDD を基盤とする離散構造 とその演算処理の代数系 (algebra) の例として，BDD による論理関数処理系, ZDD による組合せ集合の処理系，ZDD ベクトル表現による組合せ頻度表の演算処理系， および Sequence BDD による系列集合の演算処理系を取り上げ、それらの代数系を 比較するとともに、今後の展望について述べる。 148 149 2010年度 ERATO湊離散構造処理系プロジェクト講究録 Background Recent Topics on Decision Diagrams and Discrete Structure Manipulation BDD-based algorithms have been developed mainly in VLSI logic design area. (since early 1990’s.) Shin-ichi Minato Graduate School of Information Science and Technology Equivalence checking for combinational circuits. Symbolic model checking for logic / behavioral designs. Logic synthesis / optimization. Test pattern generation. Recently, BDDs are applied for not only VLSI design but also for more general purposes. Hokkaido University, Japan. Data mining (Fast frequent itemset mining) [Minato2005,2008][Loekito,Bailey2006] Computation of Bayesian networks for probabilistic system analysis.[Minato2007] Sep. 17, 2010 Contents of this talk Frequent itemset mining ZDD vector and “Itemset-histogram algebra” Set of sequences and ZDD Sequence BDD and “sequence family algebra” Sep. 17, 2010 3 Set of sequences and ZDD Sequence BDD and “sequence family algebra” Our project for discrete structure manipulation system Shin-ichi Minato Frequent itemset mining ZDD vector and “Itemset-histogram algebra” Sequence BDD for set of sequences JST “ERATO” project and current status BDD and Boolean function algebra ZDD and “Family algebra” Database analysis based on ZDD manipulation Our project for discrete structure manipulation system BDD and ZDD for discrete structure manipulation Sequence BDD for set of sequences BDD and Boolean function algebra ZDD and “Family algebra” Database analysis based on ZDD manipulation Contents of this talk BDD and ZDD for discrete structure manipulation JST “ERATO” project and current status Sep. 17, 2010 BDD reduction rules Graph representation of Boolean function data. Canonical form obtained by applying reduction rules to a binary tree with a fixed variable ordering. a a c c 0 1 0 c c c 0 1 0 1 1 0 Binary decision tree equivalent to truth table Sep. 17, 2010 f 1 reduction 1 1 x 0 b b x x x (share) f0 f1 f0 f1 (jump) f Eliminate all redundant nodes. b Share all equivalent nodes. 1 0 Gives Gives aa unique unique and and compressed compressed representation representation for for aa given given Boolean Boolean function function under under aa fixed fixed variable variable ordering. ordering. 1 Reduced Ordered BDD Shin-ichi Minato 4 Shin-ichi Minato BDD (Binary Decision Diagram) [Bryant86] 2 Shin-ichi Minato 5 Sep. 17, 2010 150 Shin-ichi Minato 6 Effect of BDD reduction rules BDD-based logic operation algorithm Exponential advantage can be seen in extreme cases. Depends on instances, but effective for many practical ones. If we generate BDDs from the binary tree: always requires exponential time & space. (Æ impracticable for large number of variables) Innovative BDD synthesis algorithm Proposed by R. Bryant in 1986. Best cited paper for many years in EE&CS areas. F F and G AND (Reduced) BDD BDD BDD BDD (Reduced) G (Reduced) BDD BDD A BDD can be constructed from the two operands of BDDs. (Computation time is linear to BDD size.) O(n) O(2n) Sep. 17, 2010 Shin-ichi Minato 7 Sep. 17, 2010 Boolean function and combinatorial itemset a b c F 0 1 0 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 0 0 1 1 1 1 Sep. 17, 2010 Zero-suppressed BDD (ZDD) [Minato93] Boolean function: F = (a b ~c) V (~b c) Eliminate all nodes whose “1-edge” directly points to 0-terminal. Share equivalent nodes as well as ordinary BDDs. If an item x does not appear in any itemset, the ZDD node of x is automatically eliminated. When average appearance ratio of each item is 1%, ZDDs are more compact than ordinary BDDs, up to 100 times. x f Complement set Æ logical NOT Shin-ichi Minato A variant of BDDs for combinatorial itemets. Uses a new reduction rule different from ordinary BDDs. 0 Combinatorial itemset: 0 F = {ab, ac, c} 0 (customer’s choice) 1 Æ ab 1 Æ c Operations of combinatorial itemsets can be done by BDD-based logic 1 Æ ac operations. 0 Union of sets Æ logical OR Intersection of sets Æ logical AND 0 9 Sep. 17, 2010 Shin-ichi Minato 0 f Zero-suppressed reduction 10 Shin-ichi Minato Knuth evaluated not only the data structure of ZDDs, but more interested in the algebra on ZDDs. 㱢, {1} P.top P.onset(v) P.offset(v) P.change(v) 㺥, 㺤, 䌜 I honored to serve proofreading of the draft version of his article. Knuth recommended to use “ZDD” instead of “ZBDD.” He reorganized ZDD operations and named “Family Algebra.” 2010/05, I visited Knuth’s home and discussed the direction of future work. 13 Sep. 17, 2010 f f (jump) Algebraic operations for ZDDs The latest Knuth’s book fascicle (Vol. 4-1) includes a BDD section with 140 pages and 236 exercises. In this section, Knuth used 30 pages for ZDDs, including more than 70 exercises. x (jump) Ordinary BDD reduction BDDs/ZDDs in the Knuth’s book 8 Shin-ichi Minato P.count P*Q P/Q P%Q Empty and singleton set. (0/1-terminal) Returns the item-ID at the top node of P. Selects the subset of itemsets including or excluding v. Switching v (add / delete) on each itemset. Returns union, intersection, and difference set. Counts number of combinations in P. Cartesian product set of P and Q. Quotient set of P divided by Q. Reminder set of P divided by Q. Formerly I called this “unate cube set algebra,” but Knuth reorganized as “Family algebra.” 11 Sep. 17, 2010 151 Shin-ichi Minato Basic operations (Corresponds to Boolean algebra) New operations introduced by Minato. Useful Usefulfor formany many practical practicalapplications. applications. 12 Comparison of BDDs and ZDDs Principles for performance improvement General principles: Data compression & Pruning search space. Two basic techniques for data compression: s s Direct processing (naïve data) Sep. 17, 2010 compress compressed Shin-ichi Minato(output data) x = Efficient BDD/ZDD algebraic operations. 13 BDD and Boolean function algebra ZDD and “Family algebra” Frequent itemset mining ZDD vector and “Itemset-histogram algebra” Set of sequences and ZDD Sequence BDD and “sequence family algebra” Our project for discrete structure manipulation system JST “ERATO” project and current status Sep. 17, 2010 Shin-ichi Minato 15 Sep. 17, 2010 Frequent itemset mining is one of the fundamental data mining problems. Previous work: Apriori [Agrawal1993] First efficient method of enumerating all frequent patterns. Breadth-first search with dynamic programming. Eclat [Zaki1997] Depth-first search algorithm. Less memory consuming. In some cases, faster than Apriori. FP-growth [Han2000] Depth-first search using “FP-tree,” graph-based data structure. Shin-ichi Minato 0 f x x x f0 f1 f0 f (share) f1 14 Tuple abc ab abc bc ab abc c abc abc ab bc Frequency threshold = 10 { b } Frequency threshold = 8 { ab, a, b, c } Frequency threshold = 7 { ab, bc, a, b, c } Frequency threshold = 5 {abc, ab, bc, ac, a, b, c } Frequency threshold = 1 {abc, ab, bc, ac, a, b, c } Shin-ichi Minato 16 LCM: [Uno2003] Output-linear time algorithm of frequent itemset mining. ZDD: [Minato93] A compact graph-based representation for large-scale sets of combinations. Generates Generates large-scale large-scale frequent frequent itemsets itemsets on on the the main main memory, with a very small overhead from the memory, with a very small overhead from the original original LCM. LCM. with a theoretical bound as output linear time. known as one of the fastest implementation. Sep. 17, 2010 Quasi-reduced Quasi-reducedBDD BDD Sub-graph sharing (Dictionary-based) Naïve representation Shin-ichi Minato Naïve representation (jump) Combination of the two techniques LCM (Linear time Closed itemset Miner) [Uno2003] x Asymmetric reduction “LCM over ZDDs” [Minato et al. 2008] Existing itemset mining algorithms ZDD ZDD Basic and well-known problem in database analysis. Record ID 1 2 3 4 5 6 7 8 9 10 11 Sequence BDD for set of sequences etc. Frequent itemset mining Database analysis based on ZDD manipulation f Sep. 17, 2010 BDD and ZDD for discrete structure manipulation (jump) f Contents of this talk - Machine learning & clustering BDD BDD Symmetric reduction (input data) decompress compressed - Risk analysis - Calculation with Bayesian networks - VLSI Logic Design - Formal Verification - etc. Run-length coding (= BDD/ZDD redundant node deletion) Computability without decompression. (naïve data) - Data mining & Knowledge discovery - Market data analysis - Web data analysis (Symmetric world) 0[15] Dictionary-based coding (= BDD/ZDD sub-graph sharing) (Asymmetric world) Many of real-life problems are likely asymmetric. - Formal concept analysis 000000000000000 10111 10111 (Æ Sub-linear time and space to the number of solutions when ZDD compression works well.) 17 Sep. 17, 2010 152 Shin-ichi Minato 18 # solutions LCM over ZDDs: An example LCM over ZDDs Original LCM The results of frequent itemsets are obtained as ZDDs on the main memory. (not generating a file.) Record ID 1 2 3 4 5 6 7 8 9 10 11 Tuple F abc ab abc bc ab abc c abc abc ab bc a 0 LCM over ZDDs Freq. thres. 㱍 = 7 0 { ab, bc, a, b, c } c 0 b 1 b 1 c 1 1 0 0 Sep. 17, 2010 1 0 1 19 Shin-ichi Minato Sep. 17, 2010 Performance of LCM over ZDDs Post Processing after LCM over ZDDs previous method (LCM-dump) new method (LCM over ZDDs) 㪋㪇㪇 3843.06 Dataset Dataset11 㪊㪌㪇 LCM over ZDDs CPU time (sec) 㪊㪇㪇 Dataset Dataset22 㪉㪇㪇 ZDD ZDD ? ZDD ZDD LCM over ZDDs 㪉㪌㪇 ZDD algebraic operation ZDD ZDD Distinctive Frequent Itemsets Freq. AllAll Frequent Itemsets 㪈㪌㪇 㪈㪇㪇 㪌㪇 㪇 m 20 Shin-ichi Minato m oo hr us 4D 0I T1 0K 10 measured by a Linux PC, Core2Duo E6600, 2.4GHz, 2GB memory. Sep. 17, 2010 BM We can extract distinctive itemsets by comparing frequent itemsets for multiple sets of databases. -1 ew Vi eb W S- s t e s e c sb - 2 ch nn um iew co p ebV Shin-ichi Minato BM W S- 21 Various ZDD algebraic operations can be used for the comparison of the huge number of frequent itemsets. Sep. 17, 2010 Itemset-histograms for DB analysis ZDD Vectors and Multi-Terminal BDD We sometimes need to represent not only yes/no but also the frequency (number of Itemset-histogram occurrence) for each itemset. Itemset Freq. Itemset DB Record ID 1 2 3 4 5 6 7 8 9 10 11 Sep. 17, 2010 Itemset abc ab abc bc ab abc c abc abc ab bc abc 5 ab 3 bc 2 c 1 Shin-ichi Minato 22 Shin-ichi Minato The two representation can be considered. Just a variable ordering problem of a same BDD. This is a histogram of itemsets if frequency of all itemsets are generated. Integer-valued Sum-OfProduct form. (VSOP) Function from Combination to Integer. (CtoI) Multi-set of combinations 23 a a 0 F0 F1 F2 0 1 a 0 1 b 1 c 1 b 1 0 0 0 0 ZDD Vector Sep. 17, 2010 153 0 a 1 1 b b b 0 1 0 c c c c 1 0 1 1 2 3 5 Multi-Terminal BDD (MTBDD) Shin-ichi Minato 24 Algebra for itemset-histograms ZDD vector for itemset-histogram H A ZDD distinguishes only existence of each tuple in the transaction data. (cannot count frequency.) We use a binary encoded method with ZDD vectors: Tuple Freq. Encode frequency numbers into m-bit binary code, and represent each bit of combination set using a ZDD. () itemset frequency F2 F1 F0 abc 5 (101) 1 0 1 ab 3 (011) 0 1 1 bc 2 (010) 0 1 0 c 1 (001) 0 0 1 a 0 0 1 Sep. 17, 2010 0 1 b 1 1 0 0 0 F0 = {abc, ab, c} F1 = {ab, bc}, F2 = {abc} a b c 5 ab 3 F0 F1 F2 abc bc 2 c 1 1 b 0 1 c 1 0 0 b bcd 2 cd 1 27 Itemset-histogram operation LCM over ZDDs ZDDV ZDDV All Freq. Frequent Itemsets Itemset-histograms Distinctive Itemsets Extracting long/short frequent patterns. Comparison of two sets of frequent patterns. Calculating statistical data (e.g. confidence, support) Finding disjoint sub-factors in frequent patterns. Shin-ichi Minato 28 Sets of combinations: Frequent itemset mining ZDD vector and “Itemset-histogram algebra” 29 Distinguishes all finite sequences. {㱗}, { ab, aba, bbc }, { a, aa, aaa, aaaa }, etc. Here we exclude infinite sequences such as { a* }. So many real-life applications. JST “ERATO” project and current status Don’t consider order and duplication of items “abcc” and “bca” are the same. Sets of sequences: Set of sequence and ZDD Sequence BDD and “sequence family algebra” Shin-ichi Minato ZDD ZDD Sets of sequences (sets of strings) BDD and Boolean function algebra ZDD and “Family algebra” Sep. 17, 2010 ? ZDDV ZDDV Sep. 17, 2010 Our project for discrete structure manipulation system LCM over ZDDs The result can be analyzed flexibly using itemsethistogram operations. Addition of number of occurrences. Sequence BDD for set of sequences 26 Shin-ichi Minato Post Processing after LCM over ZDD(V)s Database analysis based on ZDD manipulation Factoring into two parts by an item. Attaching an item. Sum of two histograms. Counting lines in the table. - Each table is compactly represented by ZDDs. - ZDDs are shared each other in the memory. Sep. 17, 2010 BDD and ZDD for discrete structure manipulation 1 3 Contents of this talk c d 2 1 Primitive operations: ab bc 5 3 Freq. 3 H0=H.factor0(a) Dataset Dataset22 Shin-ichi Minato 7 b c Dataset Dataset11 Sep. 17, 2010 bc Tuple Freq. Basic operations of Itemset-histogram algebra P+Q 5 Tuple 1 25 Tuple Freq. abcd 1 Shin-ichi Minato H1 + H0 Tuple Freq. bc H*d a H1=H.factor1(a) Text search and indexing Web (html/xml) data mining Bio informatics Sep. 17, 2010 154 Shin-ichi Minato 30 Sequence BDD (SeqBDD) Encoded ZDDs for Sets of sequences Pair of (Item - position) is considered different symbol. “aaa” Æ “a1 a2 a3” “aba” Æ “a1 b2 a3” Alphabet size: |㰿| Maximum length of sequences: n Total encoded symbols: |㰿|㬍n Not very efficient. Loekito, Bailey, and Pei (2009) Same as ZDD reduction rule. Only 0-edges keep variable ordering. 1-edges has no restriction. Still unique representation for a given set of sequences. Each path from root to 1-terminal corresponds to a sequence. Many symbols needed. We need to put a fixed maximum length of sequences. Sep. 17, 2010 Shin-ichi Minato 31 Sep. 17, 2010 32 Shin-ichi Minato Basic operations of sequence family algebra ZDD-like algebraic operations. onset, offset, and push operations are different. Other operations are almost same. Sep. 17, 2010 Shin-ichi Minato Suffix-DD Sep. 17, 2010 33 Frequent itemset mining ZDD vector and “Itemset-histogram algebra” Sequence BDD for set of sequences z Executed by JST (Japan Science and Technology Agency). z 5 projects / Year are accepted from all scientific subjects. (Computer Science: 0 or 1 project / Year.) z 100 projects have been accepted in 30 years. z Each project has 5 years long, total 1 billion Yen. about 10 PD researchers and 3 admin staffs. BDD and Boolean function algebra ZDD and “Family algebra” Database analysis based on ZDD manipulation Top projects for scientific research in Japan. BDD and ZDD for discrete structure manipulation Set of sequences and ZDD Sequence BDD and “sequence family algebra” This project is accepted on Oct. 2009. z Research activities started from April 2010, until March 2015. Our project for discrete structure manipulation system JST “ERATO” project and current status Sep. 17, 2010 Shin-ichi Minato 34 ERATO Projects Contents of this talk Shin-ichi Minato 35 36 155 Discrete structures and applications Technical stance of our project Many problems solved by computers can be decomposed as a type of discrete structures using simple primitive operations. design designautomation automation data datamining mining//knowledge knowledgediscovery discovery fault machinelearning learning//classification classification bioinformatics informatics machine faultanalysis analysis bio constraint constraintsatisfaction satisfactionproblem problem web webdata dataanalysis analysis Our Ourmain main subject subject Discrete structure manipulation system set settheory theory symbolic inductiveproof proof symboliclogic logic inductive combinatorics probabilistictheory theory graphtheory theory probabilistic combinatorics graph Knowledge discovery & data mining Digital system optimization & verification Æ Often needs a huge amount of enumerative operations. Application-specific technical areas. So many applications Æ Great effects for the society. Application (Engineering) Our objective layer: - Not only concept / theory, but also efficiency of implementation. - Beauty of simplicity / universality. Performance Improvement (10–100x) Application (Engineering) Discrete structure manipulation system (“Art” / algebraic architecture) Basis of CS & math. (conceptual, theoretical) Foundational materials for C.S. and math. Application (Engineering) Statistical Analysis & modeling Computation theory (Science / Mathematics) 37 38 Direction of our research Primary output Applications in asymmetric world Current Data mining, Machine learning ZDDs Advanced searching Location of laboratories Main lab is located in Hokakido Univ. z Center of attractive city: Sapporo. z 300m2 space devoted for the project. z Convenient access from the airport and the station. Æ Good environment for the members to concentrate into research. Still many applications remains where ZDDs would be effective. etc. (Combinatorial) (Higher model) Further outputs - Multisets Advanced Advanced - Sequences ZDD-like Advanced ZDD-like - Permutations structure ZDD-like structure - Partitions structure - Trees, DAGs - Networks Develop special new - etc. algebraic operations. Applications with higher data model Sapporo Satellite labs located in Tokyo and Osaka. Sequence data analysis z Collaboration with many univ and companies. z Connecting by highquality tele-conf. system. Numerical data processing Processing of trees or semi-structured data Tokyo Osaka (Kyushu) 25 40 39 Visit to Knuth’s home Outside of Knuth’s house In the faculty’s house area of Stanford campus on an 80 years’ lease (1968䌾2048) May, 2010, during my trip to US for attending SIAM Data Mining Conference Visited Knuth’s home in Stanford Univ. campus. He Hehas hasaaroom roomininUniv., Univ.,but but he hemainly mainlyworks worksininhis hishome. home. z Greetings and thanks for personal check giving for me and our students who found error of the Book. z Discussion on future work on BDD/ZDD and other discrete structures. No Noeducational educationalwork. work.Only Only attends attendsannual annualspecial speciallecture lecture And Andweekly weeklyfaculty’s faculty’slunch. lunch. 43 44 156 Knuth’s pipe-organ and his work room Bookshelves 45 46 Material for his book Kyoto-award medal and display frame 47 48 Discussion with Prof. Knuth Summary Focus on “discrete structure manipulation system.” z Fundamentals for various practical applications. Sequence BDD and applications. Based on BDDs/ZDDs. z He is very interested in the new data structure. z Representing “logic” and “set,” primitive models of discrete structures. z We will consider higher-level algebraic structures. I told him that I got too busy to manage the project since ZDD is written in Knuth book. Technical stance of the project. z He recognizes his writing may have significant effect, and he said: “I’m partly responsible to make your life change. So, let’s discuss future work of your project.” z Knuth’s proposals have same direction as our ERATO project: “Finding and organizing higher-level algebraic structures based on BDD/ZDD.” z Producing “Art” to connect Science and Engineering. Two objectives of the project. z Organizing new algebraic structures / operations. z Providing implementation with deep knowhow. Best use of ERATO framework z Organize an active research group with synergistic effect. z Contribution to the society to provide good research results as well as good researchers. 49 50 157

© Copyright 2020 ExploreDoc