In a society driven by visual information and with the drastic expansion of low-priced cameras, text recognition is nowadays a fast changing field. In particular, natural scene text understanding aiming at extracting text from daily images is the main concern of this text. From text extraction to correction of recognition errors, each sub-step is deeply studied to enhance versatility for handling most images, even the most complex ones. Either in color camera-based images or in low resolution thumbnails, inherent degradations, such as complex backgrounds, artistic fonts, uneven lighting or unsatisfactory resolution, must be taken into account. In order to circumvent or correct them, studies of image formation and degradation sources challengingly led to overcome too constrained definitions of color spaces. Hence the selective metric text extraction attempts to combine magnitude and directional processing of colors in an unsupervised framework. Text extraction from background is simultaneously linked to subsequent steps of character segmentation and recognition. This intermingled chain mainly aims at combining color, intensity and spatial information of pixels for robustness and accuracy. Each of these features addresses different issues; the first one for text extraction and the two latter ones for recovering initial separation between characters through log-Gabor filtering. In order to reach higher quality results, pre- and postprocessing of natural scene text understanding are necessary and deal with Teager-based super-resolution, assuming a simple affine motion between frames with the SURETEXT proposition for the first one and with association of recognition outputs and linguistic information through lightweight finite state machines for the second one. In the final part of each step, results are clearly mentioned to highlight effectiveness of the methods. Moreover, several databases, to be independent of a particular one, and a public and renowned data set, are used to assess results and compare them with recent and competing lgorithms. Finally a large discussion is opened through presented achievements of this text and required future extensions in natural scene text understanding to complete exciting applications, such as reading tool for visually impaired or innovative web images search engines in a life-log context!