Additional good/reference materials for EDA and e2e MLP

Tagged: , , , ,

  • Additional good/reference materials for EDA and e2e MLP

     CaffeinePowered updated 1 month ago 4 Members · 13 Posts
  • CaffeinePowered

    Member
    June 3, 2022 at 6:13 am

    Hi All

    I wish to check if there are additional material for EDA and e2e MLP? There is some practise material in FG3, Chapter 6 and I found some through google, but good can be subjected to individual interpretation. Therefore I wish to get align with the group on what is ideal to look at for progressing in the journey of learning.

    Likewise, it will be helpful if we can be pointed in the direction for e2e MLP as well. Maybe a git page sample?

    I might have asked them in a separate post but wish to bring it up again as my concern falls on my lack of conviction of ‘definitive good’.

    Many thanks for help !

    By the way, in FG9, the link to datakind is dated, I am wondering if there is a replacement to it?

  • laurenceliew

    Member
    June 3, 2022 at 9:08 am

    Thanks @CaffeinePowered for all your questions. I have asked my team to share the latest info they have. Give them a few days.

    Cheers!

  • Syak

    Member
    June 6, 2022 at 1:39 pm

    Hi @CaffeinePowered ,

    My teammate @Ryzal should have reached out to you with regards to additional resources pertaining to the E2E ML Pipelining. For the benefit of the rest of the members here for future reference, here are the links to understand more about that and more MLOps (and EDA content:

    MLOps:
    https://ml-ops.org
    https://madewithml.com

    EDA:
    https://jakevdp.github.io/PythonDataScienceHandbook/

    Cheers,
    Syakyr

  • CaffeinePowered

    Member
    August 24, 2022 at 4:14 am

    Hi friends,

    Hope all has been well.

    I am here again, for some clarification.

    I refer to link: https://aisingapore.org/2021/01/aiap-technical-assessment/

    [quote]Assessment 2 requires applicants to perform exploratory data analysis (part 1) and build an end-to-end machine learning pipeline (part 2) on an unseen dataset. In part 1, applicants are expected to extract the dataset from a database before performing an exploratory data analysis of the dataset (EDA). In the EDA, applicants should develop a good understanding of the dataset by analyzing each feature individually as well as their interactions. Applicants should also form hypotheses about the dataset and verify them during the EDA. A good submission involves presenting these findings in a logical and well-thought-out process with insights.

    In part 2, multiple machine learning models have to be developed and evaluated based on the given task. In the development of these models, applicants are required to document their design decisions in data pre-processing and reasoning to support their choice of models. In addition, models should be evaluated in a meaningful manner with an evaluation metric that is appropriate for the task at hand. When tackling a problem using different approaches, applicants may have to be creative to find a common evaluation metric to ensure consistency across comparisons.

    As part of the submission process, applicants will be required to package their work for review. Submissions should include an iPython notebook (.ipynb file format), Python (.py file format) and bash (.sh file format) scripts. Other files required to recreate the development environment should also be included. Reproducibility is an extremely important tenet in machine learning and the ability to reproduce an applicant’s results is a key assessment point. Submissions should also demonstrate an understanding of software engineering fundamentals and all code submitted should incorporate proper coding conventions while being readable and easily understood.[/quote]

    I wish to clarify that the following is true,

    Part 1:

    EDA, digest data, analyze and visualize

    Part 2:

    MLP, data preprocess, model training, evaluation, logging(if applicable)

    The reason I am asking as on a few sources, as well as madewithML(example https://github.com/GokuMohandas/Made-With-ML/blob/main/notebooks/06_Linear_Regression.ipynb ) they seem to be all on the same notebook. Therefore, to avoid further speculation and confusion on my end, I decided to clarity it on a lower level.

    Please let me know?

    Thank you !

    • CaffeinePowered

      Member
      August 24, 2022 at 4:22 am

      Sorry, I cant seem to edit the post…

      The sections in part 1 and 2, are what come across, please add to them if I have missed any.

      Thank you !

      • Ryzal

        Member
        August 24, 2022 at 5:57 pm

        Hi @CaffeinePowered . I am afraid I don’t quite understand your question. Can you please rephrase?

        • CaffeinePowered

          Member
          August 24, 2022 at 8:50 pm

          Hello Ryzal !

          Good to hear from you again.

          I just wanted to get clarification that part 1 and 2 of the assessment are as follows:

          Part 1:

          EDA, digest data, analyze and visualize, etc

          Part 2:

          MLP, data preprocess, model training, evaluation, logging(if applicable), etc?

          Is this understanding correct?

          Many thanks.

          • Ryzal

            Member
            August 24, 2022 at 9:46 pm

            Historically, for the recent assessments, that has been the format. That being said, assessment formats are subjected to changes.

Viewing 1 - 4 of 4 replies