WebProNews

Tag: ML

  • Linux Foundation Tackles Data Collaboration With Permissive License

    Linux Foundation Tackles Data Collaboration With Permissive License

    The Linux Foundation has announced the CDLA-Permissive-2.0 license agreement to make it easier to share AI and ML data.

    The rise of artificial intelligence and machine learning have created a need for a new type of license that allows data sets and learning models to be shared, as well as incorporated into AI and ML applications.

    The Linux Foundation described the challenges in a blog post:

    Open data is different. Various laws and regulations treat data differently from software or other creative content. Depending on what the data is and which country’s laws you’re looking at, the data often may not be subject to copyright protection, or it might be subject to different laws specific to databases, i.e., sui generis database rights in the European Union. 

    Additionally, data may be consumed, transformed, and incorporated into Artificial Intelligence (AI) and Machine Learning (ML) models in ways that are different from how software and other creative content are used. Because of all of this, assumptions made in commonly-used licenses for software and creative content might not apply in expected ways to open data.

    While the Linux Foundation previously offered the CDLA-Permissive-1.0 license, it was often criticized for being too long and complex. In contrast, 2.0 is less than a page long and is greatly simplified over its predecessor.

    In response to perceptions of CDLA-Permissive-1.0 as overly complex, CDLA-Permissive-2.0 is short and uses plain language to express the grant of permissions and requirements. Like version 1.0, the version 2.0 agreement maintains the clear rights to use, share and modify the data, as well as to use without restriction any “Results” generated through computational analysis of the data.

    A key element of the new license is the ability to collaborate and maintain compatibility with other licenses, such as Creative Commons licenses. The addition of CDLA-Permissive-2.0 is already being met with acclaim from the industry, with both IBM and Microsoft making data sets available using the language.

    “IBM has been at the forefront of innovation in open data sets for some time and as a founding member of the Community Data License Agreement. We have created a rich collection of open data sets on our Data Asset eXchange that will now utilize the new CDLAv2, including the recent addition of CodeNet – a 14-million-sample dataset to develop machine learning models that can help in programming tasks.” Ruchir Puri, IBM Fellow, Chief Scientist, IBM Research

  • MIT Removes AI Training Dataset Over Racist Concerns

    MIT Removes AI Training Dataset Over Racist Concerns

    MIT has removed a massive dataset after finding it contained racist, misogynistic terms and offensive images.

    Artificial intelligence (AI) and machine learning (ML) systems use datasets as training data. MIT created the Tiny Images dataset, which contained some 80 million images.

    In an open letter, Bill Freeman and Antonio Torralba, both professors at MIT, as well as NYU professor Rob Fergus, outlined issues they became aware of, and the steps they took to resolve them.

    “It has been brought to our attention that the Tiny Images dataset contains some derogatory terms as categories and offensive images,” write the professors. “This was a consequence of the automated data collection procedure that relied on nouns from WordNet. We are greatly concerned by this and apologize to those who may have been affected.

    “The dataset is too large (80 million images) and the images are so small (32 x 32 pixels) that it can be difficult for people to visually recognize its content. Therefore, manual inspection, even if feasible, will not guarantee that offensive images can be completely removed.

    “We therefore have decided to formally withdraw the dataset. It has been taken offline and it will not be put back online. We ask the community to refrain from using it in future and also delete any existing copies of the dataset that may have been downloaded.”

    This has been an ongoing issue with AI and ML training data, with some experts warning that it is far too easy for these systems to inadvertently develop biases based on the data. With their announcement, it appears MIT is certainly doing their share to try to rectify that issue.

  • Coronavirus: YouTube Turns to AI to Address Shortage of Human Moderators

    Coronavirus: YouTube Turns to AI to Address Shortage of Human Moderators

    YouTube is warning that some users’ videos may be improperly flagged due to the company relying on artificial intelligence (AI) to moderate videos.

    With more and more employees working from home during the coronavirus pandemic, YouTube is turning to AI and machine learning (ML) to make up for the shortage of human moderators. Unfortunately, AI and ML doesn’t always get it right and YouTube is warning that—in an attempt to keep violative content in check—some videos may be removed without actually violating policies.

    “Our Community Guidelines enforcement today is based on a combination of people and technology: Machine learning helps detect potentially harmful content and then sends it to human reviewers for assessment,” the blog post reads. “As a result of the new measures we’re taking, we will temporarily start relying more on technology to help with some of the work normally done by reviewers. This means automated systems will start removing some content without human review, so we can continue to act quickly to remove violative content and protect our ecosystem, while we have workplace protections in place.”

    Recognizing the potential inconvenience the situation will cause, YouTube will not be quick to issue “strikes” for removed content, and recommends users appeal any decision they believe was made in error.

    “As we do this, users and creators may see increased video removals, including some videos that may not violate policies. We won’t issue strikes on this content except in cases where we have high confidence that it’s violative. If creators think that their content was removed in error, they can appeal the decision and our teams will take a look. However, note that our workforce precautions will also result in delayed appeal reviews. We’ll also be more cautious about what content gets promoted, including livestreams. In some cases, unreviewed content may not be available via search, on the homepage, or in recommendations.”

    This is just another example of the pandemic’s far-reaching effects, as well as the increasing role AI and ML can play in a variety of situations.

  • GitHub Using AI To Recommend Bug Fixes

    GitHub Using AI To Recommend Bug Fixes

    GitHub is using artificial intelligence (AI) and machine learning (ML) to recommend open software issues to address first, according to a blog post.

    GitHub is a company that offers a version control hosting platform for software projects. The company was looking for a way to make it easier for new users and programmers to be able to contribute to projects. In May 2019, they rolled out their “good first issues” feature, which made recommendations for easy, low-hanging-fruit issues.

    The first iteration of the feature relied on project maintainers to label issues. This “led to a list of about 300 label names used by popular open source repositories—all synonyms for either ‘good first issue’ or ‘documentation.’” Ultimately, this could lead to more work, leaving “maintainers with the burden of triaging and labeling issues. Instead of relying on maintainers to manually label their issues, we wanted to use machine learning to broaden the set of issues we could surface.”

    As a result, GitHub has introduced a second iteration of the feature, with ML-based, as well as the original label-based, issue recommendations. The end result is that the system now surfaces “good first issues” in approximately 70% of repositories, as opposed to 40% with the first iteration.

    GitHub plans on expanding this feature to add “ better signals to our repository recommendations to help users find and get involved with the best projects related to their interests. We also plan to add a mechanism for maintainers and triagers to approve or remove ML-based recommendations in their repositories. Finally, we plan on extending issue recommendations to offer personalized suggestions on next issues to tackle for anyone who has already made contributions to a project.”

    The entire blog post is a fascinating read about how AI and ML can be used to transform even mundane tasks.

  • NASA & AWS Partner To Use AI To Protect Life On Earth

    NASA & AWS Partner To Use AI To Protect Life On Earth

    NASA and AWS are working together to use artificial intelligence to protect Earth from solar superstorms, according to an Amazon blog post.

    As the world becomes ever more wired, solar coronal mass ejections (CME) represent a significant threat to countries around the globe. One such event occurred in March 1989, affecting the U.S. and Canada.

    According to Amazon, “the Hydro-Quebec electric grid collapsed within 90 seconds. A strong electric current surged through the surface bedrock making all intervention impossible. Over 6 million people were left without power for nine hours. At the same time, over in the United States, 200 instances of power grid malfunctions were reported. More worryingly, the step-up transformer at the New Jersey Salem Nuclear Power Plant failed and was put out of commission.”

    Given how much more digital the world is now, a CME like the ‘89 one could wreak havoc on power grids, satellites, wireless communication and much more. As a result, NASA is continually looking for ways to detect and warn of CMEs as early as possible, to give grid and satellite operators time to take protective measures. This is where AWS and Amazon’s experience with machine learning come into play.

    “NASA is working with AWS Professional Services and the Amazon Machine Learning (ML) Solutions Lab to use unsupervised learning and anomaly detection to explore the extreme conditions associated with superstorms,” writes Arun Krishnan, editor of the Amazon Science website. “The Amazon ML Solutions Lab is a program that enables AWS customers to connect with machine learning experts within Amazon.

    “With the power and speed of AWS, analyses to predict superstorms can be carried out by sifting through as many as 1,000 data sets at a time. NASA’s approach relies on classifying superstorms based on anomalies, rather than relying on an arbitrary range of magnetic indices. More specifically, NASA’s anomaly detection relies on simultaneous observations of solar wind drivers and responses in the magnetic fields around earth.”

    By analyzing anomalies, it gives NASA the ability to better understand what causes a solar superstorm and predict when one will occur.

    “To improve forecasting models, scientists can examine the anomalies and create simulations of what it would take to reproduce the superstorms we see today,” the blog continues. “They can amplify these simulations to replicate the most extreme cases in historical records, enabling model development to highlight subtle precursors to major space weather events.”

    NASA and Amazon are providing another excellent example of the transformative effect artificial intelligence and machine learning will continue to have on day-to-day life.