Module 5: Disk-based Index Structures
  Lecture 24: Analysis of High Dimensional Data
 

                                           

 

 

Analysis of high dimensional data
  • Assumptions
 
  • Uniformly distributed data
 
  • In a -dimensional hypercube
 
  • Distance is Euclidean
 
  • Dimensions are independent
  • Most data lies near the boundary
 
  • When within of outer boundary, volume of inside hypercube is
 
  • Example: For , inside volume is
  • Even for small size of answer set, the range on each dimension should be large
 
  • For selectivity of points, query range on each dimension should be
 
  • Example: For , query range is
  • "Curse of dimensionality"