Announcement

Collapse
No announcement yet.

Data Mining and Screen Scraping – Right or Wrong?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data Mining and Screen Scraping – Right or Wrong?

    I have requested my BAM file and "was" planning to upload the results to yfull.com

    but......

    Some friend on this site alerted me to the fact that yfull.com was Russian based.

    This was after a earlier discussion of a NCBI, a U.S. Governmental Agency requesting the use of my Big Y results.

    As I was searching the internet, I came across this article, which points out some disturbing facts about yfull.com

    http://dna-explained.com/category/yfull-company/

    I do not mind providing my DNA results for a good cause, but only if it's my decision, with the right to rescind/withdraw.

    We paid a lot of good money for our DNA tests and it seems only fair if someone at least gives you a "mother may I", instead of being a theft in the night.

    So, hoping folks will opine as to whether my thinking is way off the beaten path and what does the community think about data mining without the expressed consent of the owner.

    As of right now, not sure if I'll forward my data to yfull.com or NCBI, although NCBI would have to abide by U.S. law, right?

  • #2
    Originally posted by K. L. Adams View Post
    I have requested my BAM file and "was" planning to upload the results to yfull.com

    but......

    Some friend on this site alerted me to the fact that yfull.com was Russian based.

    This was after a earlier discussion of a NCBI, a U.S. Governmental Agency requesting the use of my Big Y results.

    As I was searching the internet, I came across this article, which points out some disturbing facts about yfull.com

    http://dna-explained.com/category/yfull-company/

    I do not mind providing my DNA results for a good cause, but only if it's my decision, with the right to rescind/withdraw.
    Maybe you should also read the the comment section with some critiques on this blog entry. Ups, they are not there anymore Sadly, I have not made some screen scrapping about that

    The guys from yfull make a perfect job, and deliver the most up-to-date y-haplotree as I know of. Definitely they do not deserve such a Russophobic rant.

    Comment


    • #3
      Originally posted by Mpawe View Post
      Maybe you should also read the the comment section with some critiques on this blog entry. Ups, they are not there anymore Sadly, I have not made some screen scrapping about that

      The guys from yfull make a perfect job, and deliver the most up-to-date y-haplotree as I know of. Definitely they do not deserve such a Russophobic rant.
      That is not what the original post was about.

      If you are, as the original poster, in the US, do you see any difference between submitting to NCBI and YFull.com? (One is abroad, one belongs to the US government.)

      Sometimes it is enough to be in Canada, the state north of the 49th parallel , to be dumped into a black hole here. When was the last time people discussed here merits of testing with Genebase? ? ?

      W. (Mr.)

      Comment


      • #4
        Originally posted by Mpawe View Post
        Maybe you should also read the the comment section with some critiques on this blog entry. Ups, they are not there anymore Sadly, I have not made some screen scrapping about that

        The guys from yfull make a perfect job, and deliver the most up-to-date y-haplotree as I know of. Definitely they do not deserve such a Russophobic rant.
        So........ the only data collected is from folks volunteering there results to yfull, right?

        Company performs no data mining of other sites without the contributors consent?

        If so, then my apologies for bringing up the subject and they need a really big kudo for such an amazing job with the y-haplotree with just volunteer participation, really amazing.

        As for Russophobic, as an American I am far more worried about my own government overreach than what is going on in Russia. However, as a U.S. citizen I do have some avenues to use in our judicial system that I may or may not have in another country.

        Comment


        • #5
          Originally posted by K. L. Adams View Post
          So, hoping folks will opine as to whether my thinking is way off the beaten path and what does the community think about data mining without the expressed consent of the owner.
          First, the blog complaint was not about YFull, but about another web site, Semargl. The blogger simply noticed that some of the same people behind one site were also behind the other.

          Second, the blogger was gravely mistaken: Semargl never violated American copyright law at all. American copyright law is ultimately based on a clause in the Constitution, and the Supreme Court has made its interpretation clear: One cannot own or copyright facts. So for example, FTDNA can copyright its formatting of project Y-STR results, but the results themselves are facts which, once published, are open to public use and re-aggregation. The Founding Fathers meant copyrights and patents to be a special reward for genuine creativity and innovation, not a government subsidy for mere data collection.

          The blogger was gravely mistaken in another regard: Russia has ratified the same international copyright treaties that Western countries have ratified. It is true that an American legal judgment has immediate effect only over the defendant's American assets, but Yfull is registered as a dot-com (i.e., American) domain and Semargl is registered in Montenegro, a Western-aligned country which may join NATO soon.

          In any case, both Semargl and FTDNA have taken steps to reassure any FTDNA customers who may be "queasy" about public duplication of data. Semargl is more careful about what queries it accepts and what information it presents, and makes clear the source of that information. FTDNA now provides an option to make one's data project-only (i.e., not visible to anyone outside the project). In fact, this new option is now the default behavior for new accounts--to the great frustration of customers who complain to hapless administrators that their DNA results are not showing up on the project's web site.
          Last edited by lgmayka; 11 July 2015, 06:47 PM.

          Comment


          • #6
            Also, the blogger failed to explain why she objected to Semargl and not to Google, which does exactly the same thing only more clumsily (because Google doesn't have specific knowledge of how to present DNA results meaningfully). We have known for years that one of the easiest ways to find Y-DNA results is simply to google-search for the kit number (or surname or ancestral name) and the word ftdna .

            Interestingly, the blogger did not expect the strong outpouring of support that Semargl got from FTDNA customers who at least occasionally need to search for numerical neighbors across all projects. Semargl actually did close down, briefly; it was the outcry from FTDNA customers that encouraged Semargl's administrators to reopen it.

            Comment


            • #7
              Oh no, the Russians are here!

              Originally posted by K. L. Adams View Post
              I have requested my BAM file and "was" planning to upload the results to yfull.com

              but......

              Some friend on this site alerted me to the fact that yfull.com was Russian based.
              I sincerely wish that Roberta would have taken that particular blog post down, or at least done some judicious editing of it. Most of her posts are well written and well thought out, but that was not the case for that entry.

              What does it matter that YFull is Russian based, so long as they continue to provide a valuable service. Roberta really stepped off a pedestal with her attack on the semargl database, also because it was Russian {eeekk}. She used her husband's outrage about discovering his already public results in the Russian database to justify her attack . . . what she should have done was first educate herself and then her husband and readers about what the site was, and it's usefulness to the genetic genealogy community. The overwhelmingly negative response she got for that blog post, both for not doing better research and for the silly anti-Russian phobia she displayed, hopefully educated some members of this community.

              As far as my opinion about data mining as it applies here, I have no problem with it. Google and the other search engines make it a simple process, and as has been pointed out many multitudes of times if you have joined any public project you have consented to your data's exposure.

              That exposure is a good thing, without it the advances made in building out the Y tree would not have been as rapid as it has been.

              Comment


              • #8
                Originally posted by Mpawe View Post
                Maybe you should also read the the comment section with some critiques on this blog entry. Ups, they are not there anymore Sadly, I have not made some screen scrapping about that
                The comments are still there at http://dna-explained.com/2014/04/06/...ight-or-wrong/

                There is also a place to vote and these are the results -

                It's fine. I have no problem with this. 59.29% (335 votes)
                It's wrong and unethical. 32.39% (183 votes)
                I'm undecided. 5.66% (32 votes)
                Other: 2.65% (15 votes)
                Total Votes: 565


                Originally posted by Mpawe View Post
                The guys from yfull make a perfect job, and deliver the most up-to-date y-haplotree as I know of. Definitely they do not deserve such a Russophobic rant.
                I agree. An admin of a FTDNA haplogroup project has gone as far as saying that they should be hired by FTDNA.

                Comment


                • #9
                  Originally posted by lgmayka View Post
                  First, the blog complaint was not about YFull, but about another web site, Semargl. The blogger simply noticed that some of the same people behind one site were also behind the other.

                  Second, the blogger was gravely mistaken: Semargl never violated American copyright law at all. American copyright law is ultimately based on a clause in the Constitution, and the Supreme Court has made its interpretation clear: One cannot own or copyright facts. So for example, FTDNA can copyright its formatting of project Y-STR results, but the results themselves are facts which, once published, are open to public use and re-aggregation. The Founding Fathers meant copyrights and patents to be a special reward for genuine creativity and innovation, not a government subsidy for mere data collection.

                  The blogger was gravely mistaken in another regard: Russia has ratified the same international copyright treaties that Western countries have ratified. It is true that an American legal judgment has immediate effect only over the defendant's American assets, but Yfull is registered as a dot-com (i.e., American) domain and Semargl is registered in Montenegro, a Western-aligned country which may join NATO soon.

                  In any case, both Semargl and FTDNA have taken steps to reassure any FTDNA customers who may be "queasy" about public duplication of data. Semargl is more careful about what queries it accepts and what information it presents, and makes clear the source of that information. FTDNA now provides an option to make one's data project-only (i.e., not visible to anyone outside the project). In fact, this new option is now the default behavior for new accounts--to the great frustration of customers who complain to hapless administrators that their DNA results are not showing up on the project's web site.
                  Great information you provided in your post and much appreciated.

                  Is yFull.com a private entity, corporation, quasi government agency, associated with a college?

                  I pulled up yfull.com web site "about" page but did not see what type of legal entity it is for conducting business.

                  It would seem if yfull.com was some type of non-profit corporation created properly the work will continue long after all of us move on to greener pastures.

                  We only have to look at what happened with Sorenson Molecular Genealogy Foundation (SMGF) http://www.smgf.org/ to know what happens with a family foundation.

                  Anyway, not sure if you knew if it was a corporation non profit structure or not, perhaps you or someone within the community knows.

                  Once again, thank you for the information you provided. Kevin

                  Comment


                  • #10
                    Originally posted by Armando View Post
                    The comments are still there at http://dna-explained.com/2014/04/06/...ight-or-wrong/

                    There is also a place to vote and these are the results -

                    It's fine. I have no problem with this. 59.29% (335 votes)
                    It's wrong and unethical. 32.39% (183 votes)
                    I'm undecided. 5.66% (32 votes)
                    Other: 2.65% (15 votes)
                    Total Votes: 565



                    I agree. An admin of a FTDNA haplogroup project has gone as far as saying that they should be hired by FTDNA.
                    Yes the post is still on the blog, that allows folks like me to find with a simple google search.

                    ftDNA hiring the administrators of yfull.com would have been an excellent option and a great fit between both organizations. Perhaps with the data/back up data located on a ftDNA server, so it would be physically located in two geographical locations.

                    Comment


                    • #11
                      As a side note, I noticed Full Genomes offers Interpretation of BAM files for $50.00 USD.

                      Full Genomes Corporation, Inc. is located at the following address:

                      2275 Research Blvd, Suite 500
                      Rockville, MD 20850

                      Curious if anyone has used this U.S. based corporation for there BAM files analysis and whether they had a positive experience. Thanks, Kevin

                      Comment


                      • #12
                        Originally posted by K. L. Adams View Post
                        As a side note, I noticed Full Genomes offers Interpretation of BAM files for $50.00 USD.

                        Full Genomes Corporation, Inc. is located at the following address:

                        2275 Research Blvd, Suite 500
                        Rockville, MD 20850

                        Curious if anyone has used this U.S. based corporation for there BAM files analysis and whether they had a positive experience. Thanks, Kevin
                        FGC is the preferred Big Y BAM analysis company of the administrators of the R1b-U106 Project. Partly this is related to the fact that two of the people associated with this company are members of the R1b-U106 Project. Justin Loe is one of the founders of the company. Greg Magoon is one of their analysts of submitted BAM files. Greg also is a co-author of a "citizen scientist" paper on the discovery of new R1b subclades through analysis of the 1,000 Genomes Project public sequences. The paper was published in 2012 at http://journals.plos.org/plosone/art...l.pone.0041634.

                        Besides offering analysis of Big Y BAM files from FTDNA customers, FGC also offers its own yDNA sequencing test. They offer a couple of options, both of which test significantly more of the y chromosome than Big Y does - for more money of course.

                        I used FGC for analysis of my Big Y BAM file. I was given a list, based on their analysis, of my best quality novel variants, which they named - FGC13480-FGC13492. I then had 12 of the 13 made testable at YSEQ, which one of my semi-close matches at FTDNA tested. (We're an 83/111 match. My estimate of when our common ancestor lived is 1,200-1,500 years ago.) He was found to be FGC13492+, forming a new subclade of R-CTS2509.

                        Just the other day I was looking on FTDNA's Advanced Testing SNPs menu and noticed that FTDNA tests 9 pages of FGC-prefix SNPs, with about 50 SNPs/page. (This is not surprising, since I think most of the SNPs named by FGC come from their analysis of BAM files from Big Y testers at FTDNA.) Among those offered is FGC13492, the SNP found in my Big Y test.

                        My experience with them was very positive and I highly recommend them for BAM analysis.
                        Last edited by MMaddi; 12 July 2015, 01:46 PM.

                        Comment


                        • #13
                          Full Genomes does a good analysis

                          I used the FGC Y Elite test and was very pleased with their analysis of the test resulsts.

                          Eldon

                          Comment


                          • #14
                            Originally posted by MMaddi View Post
                            FGC is the preferred Big Y BAM analysis company of the administrators of the R1b-U106 Project. Partly this is related to the fact that two of the people associated with this company are members of the R1b-U106 Project. Justin Loe is one of the founders of the company. Greg Magoon is one of their analysts of submitted BAM files. Greg also is a co-author of a "citizen scientist" paper on the discovery of new R1b subclades through analysis of the 1,000 Genomes Project public sequences. The paper was published in 2012 at http://journals.plos.org/plosone/art...l.pone.0041634.

                            Besides offering analysis of Big Y BAM files from FTDNA customers, FGC also offers its own yDNA sequencing test. They offer a couple of options, both of which test significantly more of the y chromosome than Big Y does - for more money of course.

                            I used FGC for analysis of my Big Y BAM file. I was given a list, based on their analysis, of my best quality novel variants, which they named - FGC13480-FGC13492. I then had 12 of the 13 made testable at YSEQ, which one of my semi-close matches at FTDNA tested. (We're an 83/111 match. My estimate of when our common ancestor lived is 1,200-1,500 years ago.) He was found to be FGC13492+, forming a new subclade of R-CTS2509.

                            Just the other day I was looking on FTDNA's Advanced Testing SNPs menu and noticed that FTDNA tests 9 pages of FGC-prefix SNPs, with about 50 SNPs/page. (This is not surprising, since I think most of the SNPs named by FGC come from their analysis of BAM files from Big Y testers at FTDNA.) Among those offered is FGC13492, the SNP found in my Big Y test.

                            My experience with them was very positive and I highly recommend them for BAM analysis.
                            Thank you so much for the excellent information provided which will help me and perhaps others understand the best options for our limited dollars.

                            Comment


                            • #15
                              Originally posted by MMaddi View Post
                              My experience with them was very positive and I highly recommend them for BAM analysis.
                              We have some project members who have BigY results and YFull analysis. Does FGC add anything to those findings? I'm willing to make the investment if it seems worthwhile.

                              Jim

                              Comment

                              Working...
                              X