Data Transformation
Dan Suchy (aka celerysword) twitter:
“Data Transformation” sounds fancier than “Find and Replace”
I’m in the middle of some very complex and intellectually challenging Data Transformation.
LOL.
ALA-accredited programs on Google Map
It’s nice to know where you can find the nearest library school. American Library Association (ALA) office for Accreditation created a Google map showing their locations.
Each link goes to a description page on ALA website. Some of the links go directly to the school’s website.
power in simplicity
The title of this post is one of the comments for this comic, which I find very troo…
Horizon Reports – emerging technologies for higher education
Horizon Project is a collaboration between the New Media Consortium and the EDUCAUSE Learning Initiative. Since 2004, they produce reports on emerging technologies that “will impact higher education within three adoption horizons over the net one to five years.”
Last year (2007 Horizon Report) , their report touched user-created content and social networkings and projected this adoption in one year or less. The adoption mobile phones for education & learning and using virtual worlds as learning spaces were projected in two to three years. The new scholarship & emerging forms of publication (new models of publication and nontraditional scholarly products) as well as multi player educational gaming time-to-adoption were projected in four to five years.
This report can be found at http://www.nmc.org/horizon/2007/report
Recently, they just produced a new report (2008 edition) that touched several key emerging technologies to be applied to teaching and learning:
- Grassroots Video: better and cheaper (if not free) tools allow the creation of educational videos and disseminate them quickly. No need to rely on an exclusive group of professionals and on expensive equipments or infrastructure.
- Collaboration Webs: using web-based collaboration tools for teaching/learning and research activities.
- Mobile Broadband: more powerful personal devices and can be used to access the educational content.
- Data Mashups: combining data from different sources and producing new datasets.
- Collective Intelligence: knowledge and understanding that emerges from large groups of people. This is facilitated by the collaboration webs and utilizing the data mashups.
- Social Operating Systems: connecting people through network. The organization of the networking would be around people rather than around content.
This year’s report can be found at http://www.nmc.org/publications/2008-horizon-report
Most of the items mentioned above are probably already implemented at least on personal or group level (think YouTube, GoogleDocs, iPhone, or wikipedia.com.) Utilizing similar technology for teaching and learning does present some challenges especially in the area of assessments, policy, growing expectations, and changes in infrastructure.
WorldWideScience.org
WorldWideScience.org is a collaborated effort that allow scientists to do search on national and international databases. It was developed by the U.S. Department of Energy’s Office of Scientific and Technical Information in partnership with the British Library and other sources.
Pretty cool effort, considering they have to deal with various databases with different metadata, database structure, and search syntax.
I tried it out by using my usual dorky word search, “java”, because I always curious how a system would distinguish it between java programming language, java island, javanese people, java language (yes, the language of javanese), and, of course, java coffee.
Well, not much happening on the search result. You will get a list of results supposedly based on a relevancy and no clear categorization. However, I suppose the scientists will probably use more specific search terms and can expect to get a more precise results.
Using wildcard (*) works, as well as boolean search.
Ten Root Conditions of Data Quality Problems
Ten Root Conditions of Data Quality Problems:
- Multiple data sources. Multiple data sources of the same information produce different values for this information. This can include values that were accurate at a given point in time.
- Subjective judgment in data production. Information production using subjective judgment can result in the production of biased information.
- Limited computing resources. Lack of sufficient computing resources limits accessibility to relevant information.
- Security/accessibility trade-off. Easy access to information may conflict with requirements for security, privacy, and confidentiality.
- Coded data across disciplines. Coded data from different functions and disciplines is difficult to decipher and understand. Also, codes may conflict.
- Complex data representations. Algorithms are not available for automated content analysis across instances of text and image information. Non-numeric information can be difficult to index in a way that permits location of relevant information.
- Volume of data. Large volumes of stored information make it difficult to access needed information in a reasonable time.
- Input rules too restrictive or bypassed. Input rules that are too restrictive may impose unnecessary controls on data input and lose data that has important meaning. Data entry clerks may skip entering data into field (missing information) or arbitrarily change a value to conform to rules and pass an edit check (erroneous information).
- Changing data needs. As information consumers’ tasks and the organization environment (such as new market, new legal requirements, new trends) change, the information that is relevant and useful changes.
- Distributed heterogeneous systems. Distributed heterogeneous systems without proper integration mechanisms lead to inconsistent definitions, formats, rules, and values. The original meaning of data may be lost or distorted as data flows and is retrieved from a different system, time, place, data consumer, for same or different purposes.
Lee, Yang et.al. Journey to Data Quality. Cambridge: The MIT Press, 2006. 80-81









