Haskell Inverted Index, written by Kimberley Burchett, May 2003. You may use this code in any way you see fit -- it is public domain. http://www.kimbly.com/code/invidx/haskell/ > module Index( > Index, > listToIdx, idxToList, > idxToStr, idxParser, > idxInvert, > idxLookup, idxLookupAll, idxLookupAny, > ) where > > import FiniteMap > import Set > import Maybe > import QuickCheck(quickCheck, verboseCheck) > import Parsec An Index is basically an association from a key to a set of values. For our purposes, it's important that it associates to a set of values instead of just one value, because otherwise the Index couldn't be inverted. We use a FiniteMap to represent the association, and a Set to store the values. > data Index a b = Index (FiniteMap a (Set b)) "listToIdx" constructs an Index from an association list (i.e. a list of pairs, where the first element of the pair is the key, and the second is the value). And "idxToList" converts it back. Note that we remove pairs that associate to the empty list. We do this because keys that map to the empty list are semantically equivalent to keys that aren't in the map, but if we leave them then equality testing (==) doesn't work -- it will think two maps are different if one contains a key mapping to [] but the other doesn't. > listToIdx :: (Ord a, Ord b) => [(a,[b])] -> Index a b > listToIdx listPairs = Index (listToFM setPairs) > where setPairs = map listPairToSetPair nonEmptyPairs > listPairToSetPair (key, vals) = (key, mkSet vals) > nonEmptyPairs = [(k, v) | (k, v) <- listPairs, not (null v)] > > idxToList :: Index a b -> [(a,[b])] > idxToList (Index fm) = [(key, setToList vals) | (key, vals) <- fmToList fm] "idxInvert" inverts an Index, so that the values become the keys, and vice versa. > idxInvert :: (Ord a, Ord b) => (Index a b) -> (Index b a) > idxInvert (Index fm) = Index (listToFM [(val, valToKeys val) | val <- vals]) > where vals = setToList (unionManySets (eltsFM fm)) > valToKeys val = mkSet [x | (x, y) <- fmToList fm, > val `elementOf` y] "idxLookup" queries the index for a particular key. It returns emptySet if the key is not listed in the Index. > idxLookup :: (Ord a) => Index a b -> a -> Set b > idxLookup (Index fm) key = lookupWithDefaultFM fm emptySet key "idxLookupAll" queries the index for multiple keys, and intersects their value lists. > idxLookupAll :: (Ord a, Ord b) => Index a b -> [a] -> Set b > idxLookupAll idx [] = emptySet > idxLookupAll idx (key:[]) = idxLookup idx key > idxLookupAll idx (key:keys) = intersect (idxLookup idx key) rest > where rest = idxLookupAll idx keys "idxLookupAny" queries the index for multiple keys, and unions their value lists. > idxLookupAny :: (Ord a, Ord b) => Index a b -> [a] -> Set b > idxLookupAny idx [] = emptySet > idxLookupAny idx (key:[]) = idxLookup idx key > idxLookupAny idx (key:keys) = union (idxLookup idx key) rest > where rest = idxLookupAny idx keys Indexes can be converted to strings and back. The format looks like this: key1 -> val1 val2 val3 ... key2 -> val4 val5 val6 ... That is, keys are separated from their values by " -> ", values are separated from each other by spaces, and each key/value pair ends with a newline. > idxToStr :: (a -> String) -> (b -> String) -> (Index a b) -> String > idxToStr keyToStr valToStr (Index fm) = pairsToStr (fmToList fm) > where pairsToStr pairs = unlines (map pairToStr pairs) > pairToStr (k, v) = (keyToStr k) ++ " -> " ++ (valsToStr v) > valsToStr vals = unwords (map valToStr (setToList vals)) "idxParser" parses an Index from a String. We use the Parsec parsing library, described at http://www.cs.uu.nl/~daan/papers/parsec.html > idxParser :: (Ord a, Ord b) => Parser a -> Parser b -> Parser (Index a b) > idxParser keyParser valParser = > do { pairs <- pair `sepBy` (string "\n"); > return (Index (listToFM pairs)) > } > where pair = > do { key <- keyParser; > string " -> "; > vals <- valParser `sepBy` (char ' '); > return (key, mkSet vals) > } In order to make debugging easier, we allow Indexes to be printed. > instance (Show a, Show b) => Show (Index a b) where > show idx = idxToStr show show idx Indexes can be compared by comparing the underlying FiniteMap. Note that this depends on the fact that we remove key/value pairs where the values are empty -- this is done in listToIdx. > instance (Eq a, Eq b) => Eq (Index a b) where > (Index fm1) == (Index fm2) = (fm1 == fm2) Testing functions ----------------- idxInvert should be reversible: inverting an Index twice should the same Index. > prop_invertReversible idx = cidx == idxInvert (idxInvert cidx) > where cidx = listToIdx idx > types = idx::[(Int, [Int])] idxLookupAll should always be a subset of idxLookup. > prop_allIsSubset idx keys = all isSubset (map (idxLookup cidx) keys) > where cidx = listToIdx idx > allVal = idxLookupAll cidx keys > isSubset oneVal = isEmptySet (allVal `minusSet` oneVal) > types = idx::[(Int, [Int])] idxLookupAny should always be a superset of idxLookup. > prop_anyIsSuperset idx keys = all isSuperset (map (idxLookup cidx) keys) > where cidx = listToIdx idx > anyVal = idxLookupAny cidx keys > isSuperset oneVal = isEmptySet (oneVal `minusSet` anyVal) > types = idx::[(Int, [Int])] idxParser can parse the output of idxToStr. We make parseIntIdx a toplevel definition in order to make it easier to debug this test if it fails. > prop_parseUnparse idx = parseIntIdx (show cidx) == cidx > where cidx = listToIdx idx > types = idx::[(Int, [Int])] > > parseIntIdx str = case parse p "" str of > Left errMsg -> error (show errMsg) > Right parsedIdx -> parsedIdx > where p = idxParser parseInt parseInt > parseInt = do { digits <- many1 (oneOf "+-0123456789"); > return (read digits) } > types = parseIntIdx::String -> Index Int Int This checks all the invariants described above. > checkAll = do > putStr "idxInvert is reversible? " > quickCheck prop_invertReversible > putStr "idxLookupAll is subset of idxLookup? " > quickCheck prop_allIsSubset > putStr "idxLookupAny is superset of idxLookup? " > quickCheck prop_anyIsSuperset > putStr "index to string is reversible? " > quickCheck prop_parseUnparse